安装heapster的原因

查看dashboard状态

$ sudo kubectl get pods --all-namespaces | grep dashboard
kubernetes-dashboard   dashboard-metrics-scraper-6b4884c9d5-mvnb9   1/1     Running   0          40m
kubernetes-dashboard   kubernetes-dashboard-d7f7f565d-zhpsc         1/1     Running   0          40m

查看dashboard的日志

$ sudo  kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc

日志遇到这样的情况

No metric client provided. Skipping metrics.
2020/08/03 09:02:22 [2020-08-03T09:02:22Z] Outcoming response to 192.168.84.241:40086 with 200 status code
2020/08/03 09:02:23 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

下载heapster的代码

直接现在Github上的最新代码。

git clone https://github.com/kubernetes/heapster.git

目前的最高版本是1.5.4。 在heapster/deploy/kube-config/influxdb目录下有几个yaml文件:

-rw-r--r--. 1 root root 2276 10月  5 07:59 grafana.yaml
-rw-r--r--. 1 root root 1100 10月  5 07:59 heapster.yaml
-rw-r--r--. 1 root root  960 10月  5 07:59 influxdb.yaml

我们再看下用了哪些镜像:

$ grep 'image:' *
grafana.yaml:        image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
heapster.yaml:        image: k8s.gcr.io/heapster-amd64:v1.5.4
influxdb.yaml:        image: k8s.gcr.io/heapster-influxdb-amd64:v1.5.2

下载镜像

由于国内网络限制,不能直接下载k8s.gcr.io中的镜像,在阿里云(https://cr.console.aliyun.com/cn-hangzhou/instances/images)搜索到了下面的镜像:

registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-grafana-amd64:v5.0.4
registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-amd64:v1.5.4
registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/heapster-influxdb-amd64:v1.5.2

创建脚本heapster.sh

#!/bin/bash
images=(heapster-amd64:v1.5.4 heapster-influxdb-amd64:v1.5.2 heapster-grafana-amd64:v5.0.4)
for imageName in ${images[@]} ; do
        sudo  docker pull registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName
        sudo  docker tag registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName k8s.gcr.io/$imageName
        sudo  docker rmi registry.cn-hangzhou.aliyuncs.com/mirror_googlecontainers/$imageName
done

安装

查看这些文件,看看所使用的镜像版本是否是你上面所下载的镜像的版本。确认无误之后运行下面的命令即可配置完成:

$ sudo kubectl create -f  heapster/deploy/kube-config/influxdb/

这样,我们再次进入到k8s-dashboard就可以看到各种以图表形式展示的系统,各个pod的实时的监控数据了。

可能会遇到的错误

参考链接:https://cloud.tencent.com/developer/article/1394657
错误1:

error: error validating "test_pod_svc.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false

修改文件heapster/deploy/kube-config/influxdb/grafana.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: grafana

修改文件heapster/deploy/kube-config/influxdb/influxdb.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-influxdb
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: influxdb

修改文件heapster/deploy/kube-config/influxdb/heapster.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels: 
      k8s-app: heapster

错误2:

Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/grafana.yaml": deployments.apps "monitoring-grafana" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/grafana.yaml": services "monitoring-grafana" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/heapster.yaml": serviceaccounts "heapster" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/heapster.yaml": services "heapster" already exists
Error from server (AlreadyExists): error when creating "heapster/deploy/kube-config/influxdb/influxdb.yaml": services "monitoring-influxdb" already exists
unable to recognize "heapster/deploy/kube-config/influxdb/heapster.yaml": no matches for kind "Deployment" in version "extensions/v1beta1"

执行删除命令

sudo kubectl delete -f heapster/deploy/kube-config/influxdb/

错误解决

参考:https://my.oschina.net/u/4335103/blog/4280880/print
查看dashboard状态

$ sudo kubectl get pods --all-namespaces | grep dashboard
kubernetes-dashboard   dashboard-metrics-scraper-6b4884c9d5-mvnb9   1/1     Running   0          40m
kubernetes-dashboard   kubernetes-dashboard-d7f7f565d-zhpsc         1/1     Running   0          40m

查看dashboard的日志

$ sudo  kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-d7f7f565d-zhpsc
2020/04/08 01:54:31 Non-critical error occurred during resource retrieval: events is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "events" in API group "" in the namespace "default"
2020/04/08 01:54:31 [2020-04-08T01:54:31Z] Outcoming response to 192.168.122.21:7788 with 200 status code
2020/04/08 01:54:31 Getting list of all replication controllers in the cluster
2020/04/08 01:54:31 Non-critical error occurred during resource retrieval: replicationcontrollers is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "replicationcontrollers" in API group "" in the namespace "default"

解决方法
写入文件vim heapster-rbac.yaml

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
rules:
  # Allow Metrics Scraper to get metrics from the Metrics server
  - apiGroups: ["","apps","batch","extensions", "metrics.k8s.io"]
    resources: ["*"]
    verbs: ["get", "list", "watch"]

执行生成命令:

$ sudo kubectl create -f heapster-rbac.yaml

--完--