K8S | 集群资源监控 heapster
安装heapster的原因
查看dashboard状态
$ sudo kubectl get pods --all-namespaces | grep dashboard
kubernetes-dashboard dashboard-metrics-scraper-6b4884c9d5-mvnb9 1/1 Running 0 40m
kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc 1/1 Running 0 40m
查看dashboard的日志
$ sudo kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc
日志遇到这样的情况
No metric client provided. Skipping metrics.
2020/08/03 09:02:22 [2020-08-03T09:02:22Z] Outcoming response to 192.168.84.241:40086 with 200 status code
2020/08/03 09:02:23 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
下载heapster的代码
直接现在Github上的最新代码。
git clone https://github.com/kubernetes/heapster.git
目前的最高版本是1.5.4。 在heapster/deploy/kube-config/influxdb目录下有几个yaml文件:
-rw-r--r--. 1 root root 2276 10月 5 07:59 grafana.yaml
-rw-r--r--. 1 root root 1100 10月 5 07:59 heapster.yaml
-rw-r--r--. 1 root root 960 10月 5 07:59 influxdb.yaml
我们再看下用了哪些镜像:
$ grep 'image:' *
grafana.yaml: image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
heapster.yaml: image: k8s.gcr.io/heapster-amd64:v1.5.4
influxdb.yaml: image: k8s.gcr.io/heapster-influxdb-amd64:v1.5.2
下载镜像
由于国内网络限制,不能直接下载k8s.gcr.io中的镜像,在阿里云(https://cr.console.aliyun.com/cn-hangzhou/instances/images)搜索到了下面的镜像:
registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-grafana-amd64:v5.0.4
registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-amd64:v1.5.4
registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-influxdb-amd64:v1.5.2
创建脚本heapster.sh
#!/bin/bash
images=(heapster-amd64:v1.5.4 heapster-influxdb-amd64:v1.5.2 heapster-grafana-amd64:v5.0.4)
for imageName in ${images[@]} ; do
sudo docker pull registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName
sudo docker tag registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName k8s.gcr.io/$imageName
sudo docker rmi registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName
done
安装
查看这些文件,看看所使用的镜像版本是否是你上面所下载的镜像的版本。确认无误之后运行下面的命令即可配置完成:
$ sudo kubectl create -f heapster/deploy/kube-config/influxdb/
这样,我们再次进入到k8s-dashboard就可以看到各种以图表形式展示的系统,各个pod的实时的监控数据了。
可能会遇到的错误
参考链接:https://cloud.tencent.com/developer/article/1394657
错误1:
error: error validating "test_pod_svc.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false
修改文件heapster/deploy/kube-config/influxdb/grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-grafana
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: grafana
修改文件heapster/deploy/kube-config/influxdb/influxdb.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-influxdb
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: influxdb
修改文件heapster/deploy/kube-config/influxdb/heapster.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: heapster
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: heapster
错误2:
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/grafana.yaml": deployments.apps "monitoring-grafana" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/grafana.yaml": services "monitoring-grafana" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/heapster.yaml": serviceaccounts "heapster" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/heapster.yaml": services "heapster" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/influxdb.yaml": services "monitoring-influxdb" already exists
unable to recognize "heapster/deploy/kube-config/influxdb/heapster.yaml": no matches for kind "Deployment" in version "extensions/v1beta1"
执行删除命令
sudo kubectl delete -f heapster/deploy/kube-config/influxdb/
错误解决
参考:https://my.oschina.net/u/4335103/blog/4280880/print
查看dashboard状态
$ sudo kubectl get pods --all-namespaces | grep dashboard
kubernetes-dashboard dashboard-metrics-scraper-6b4884c9d5-mvnb9 1/1 Running 0 40m
kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc 1/1 Running 0 40m
查看dashboard的日志
$ sudo kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc
2020/04/08 01:54:31 Non-critical error occurred during resource retrieval: events is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "events" in API group "" in the namespace "default"
2020/04/08 01:54:31 [2020-04-08T01:54:31Z] Outcoming response to 192.168.122.21:7788 with 200 status code
2020/04/08 01:54:31 Getting list of all replication controllers in the cluster
2020/04/08 01:54:31 Non-critical error occurred during resource retrieval: replicationcontrollers is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "replicationcontrollers" in API group "" in the namespace "default"
解决方法
写入文件vim heapster-rbac.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
rules:
# Allow Metrics Scraper to get metrics from the Metrics server
- apiGroups: ["","apps","batch","extensions", "metrics.k8s.io"]
resources: ["*"]
verbs: ["get", "list", "watch"]
执行生成命令:
$ sudo kubectl create -f heapster-rbac.yaml
--完--
- 原文作者: 留白
- 原文链接: https://zfunnily.github.io/2020/08/heapster/
- 更新时间:2024-04-16 01:01:05
- 本文声明:转载请标记原文作者及链接