homework-jianmu/docs/zh/10-deployment/03-k8s.md

14 KiB
Raw Blame History

sidebar_label title
Kubernetes 在 Kubernetes 上部署 TDengine 集群

配置 ConfigMap

为 TDengine 创建 taoscfg.yaml,此文件中的配置将作为环境变量传入 TDengine 镜像,更新此配置将导致所有 TDengine POD 重启。

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: taoscfg
  labels:
    app: tdengine
data:
  CLUSTER: "1"
  TAOS_KEEP: "3650"
  TAOS_DEBUG_FLAG: "135"

配置服务

创建一个 service 配置文件:taosd-service.yaml,服务名称 metadata.name (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口:

---
apiVersion: v1
kind: Service
metadata:
  name: "taosd"
  labels:
    app: "tdengine"
spec:
  ports:
    - name: tcp6030
      protocol: "TCP"
      port: 6030
    - name: tcp6035
      protocol: "TCP"
      port: 6035
    - name: tcp6041
      protocol: "TCP"
      port: 6041
    - name: udp6030
      protocol: "UDP"
      port: 6030
    - name: udp6031
      protocol: "UDP"
      port: 6031
    - name: udp6032
      protocol: "UDP"
      port: 6032
    - name: udp6033
      protocol: "UDP"
      port: 6033
    - name: udp6034
      protocol: "UDP"
      port: 6034
    - name: udp6035
      protocol: "UDP"
      port: 6035
    - name: udp6036
      protocol: "UDP"
      port: 6036
    - name: udp6037
      protocol: "UDP"
      port: 6037
    - name: udp6038
      protocol: "UDP"
      port: 6038
    - name: udp6039
      protocol: "UDP"
      port: 6039
    - name: udp6040
      protocol: "UDP"
      port: 6040
  selector:
    app: "tdengine"

有状态服务 StatefulSet

根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 tdengine.yaml

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: "tdengine"
  labels:
    app: "tdengine"
spec:
  serviceName: "taosd"
  replicas: 2
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: "tdengine"
  template:
    metadata:
      name: "tdengine"
      labels:
        app: "tdengine"
    spec:
      containers:
        - name: "tdengine"
          image: "zitsen/taosd:develop"
          imagePullPolicy: "Always"
          envFrom:
            - configMapRef:
                name: taoscfg
          ports:
            - name: tcp6030
              protocol: "TCP"
              containerPort: 6030
            - name: tcp6035
              protocol: "TCP"
              containerPort: 6035
            - name: tcp6041
              protocol: "TCP"
              containerPort: 6041
            - name: udp6030
              protocol: "UDP"
              containerPort: 6030
            - name: udp6031
              protocol: "UDP"
              containerPort: 6031
            - name: udp6032
              protocol: "UDP"
              containerPort: 6032
            - name: udp6033
              protocol: "UDP"
              containerPort: 6033
            - name: udp6034
              protocol: "UDP"
              containerPort: 6034
            - name: udp6035
              protocol: "UDP"
              containerPort: 6035
            - name: udp6036
              protocol: "UDP"
              containerPort: 6036
            - name: udp6037
              protocol: "UDP"
              containerPort: 6037
            - name: udp6038
              protocol: "UDP"
              containerPort: 6038
            - name: udp6039
              protocol: "UDP"
              containerPort: 6039
            - name: udp6040
              protocol: "UDP"
              containerPort: 6040
          env:
            # POD_NAME for FQDN config
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # SERVICE_NAME and NAMESPACE for fqdn resolve
            - name: SERVICE_NAME
              value: "taosd"
            - name: STS_NAME
              value: "tdengine"
            - name: STS_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # TZ for timezone settings, we recommend to always set it.
            - name: TZ
              value: "Asia/Shanghai"
            # TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
            - name: TAOS_SERVER_PORT
              value: "6030"
            # Must set if you want a cluster.
            - name: TAOS_FIRST_EP
              value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
            # TAOS_FQND should always be setted in k8s env.
            - name: TAOS_FQDN
              value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
          volumeMounts:
            - name: taosdata
              mountPath: /var/lib/taos
          readinessProbe:
            exec:
              command:
                - taos
                - -s
                - "show mnodes"
            initialDelaySeconds: 5
            timeoutSeconds: 5000
          livenessProbe:
            tcpSocket:
              port: 6030
            initialDelaySeconds: 15
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: taosdata
      spec:
        accessModes:
          - "ReadWriteOnce"
        storageClassName: "csi-rbd-sc"
        resources:
          requests:
            storage: "10Gi"

启动集群

将前述三个文件添加到 Kubernetes 集群中:

kubectl apply -f taoscfg.yaml
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml

上面的配置将生成一个两节点的 TDengine 集群dnode 是自动配置的,可以使用 show dnodes 命令查看当前集群的节点:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"

输出如下:

Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      1 |     40 | ready      | any   | 2021-06-01 17:13:24.181 |                          |
      2 | tdengine-1.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 17:14:09.257 |                          |
Query OK, 2 row(s) in set (0.000997s)

集群扩容

TDengine 集群支持自动扩容:

kubectl scale statefulsets tdengine --replicas=4

上面命令行中参数 --replica=4 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:

kubectl get pods -l app=tdengine

输出如下:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          161m
tdengine-1   1/1     Running   0          161m
tdengine-2   1/1     Running   0          32m
tdengine-3   1/1     Running   0          32m

此时 POD 的状态仍然是 RunningTDengine 集群中的 dnode 状态要等 POD 状态为 ready 之后才能看到:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"

扩容后的四节点 TDengine 集群的 dnode 列表:

Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 11:58:12.915 |                          |
      2 | tdengine-1.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 11:58:33.127 |                          |
      3 | tdengine-2.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 14:07:27.078 |                          |
      4 | tdengine-3.taosd.default.sv... |      1 |     40 | ready      | any   | 2021-06-01 14:07:48.362 |                          |
Query OK, 4 row(s) in set (0.001293s)

集群缩容

TDengine 的缩容并没有自动化,我们尝试将一个三节点集群缩容到两节点。

首先,确认一个三节点 TDengine 集群正常工作,在 TDengine CLI 中查看 dnode 的状态:

taos> show dnodes
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      1 |     40 | ready      | any   | 2021-06-01 16:27:24.852 |                          |
      2 | tdengine-1.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 16:27:53.339 |                          |
      3 | tdengine-2.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 16:28:49.787 |                          |
Query OK, 3 row(s) in set (0.001101s)

想要安全的缩容,首先需要将节点从 dnode 列表中移除,也即从集群中移除:

kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 'tdengine-2.taosd.default.svc.cluster.local:6030'"

通过 show dondes 命令确认移除成功后,移除相应的 POD

kubectl scale statefulsets tdengine --replicas=2

最后一个 POD 会被删除,使用 kubectl get pods -l app=tdengine 查看集群状态:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          3h40m
tdengine-1   1/1     Running   0          3h40m

POD 删除后,需要手动删除 PVC否则下次扩容时会继续使用以前的数据导致无法正常加入集群。

kubectl delete pvc taosdata-tdengine-2

此时的集群状态是安全的,需要时还可以再次进行扩容:

kubectl scale statefulsets tdengine --replicas=3


show dnodes 输出如下:

taos> show dnodes
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      1 |     40 | ready      | any   | 2021-06-01 16:27:24.852 |                          |
      2 | tdengine-1.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 16:27:53.339 |                          |
      4 | tdengine-2.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 16:40:49.177 |                          |


删除集群

完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。

kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg

常见错误

错误一

扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态:

Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 11:58:12.915 |                          |
      2 | tdengine-1.taosd.default.sv... |      0 |     40 | ready      | any   | 2021-06-01 11:58:33.127 |                          |
      3 | tdengine-2.taosd.default.sv... |      0 |     40 | offline    | any   | 2021-06-01 14:07:27.078 | status msg timeout       |
      4 | tdengine-3.taosd.default.sv... |      1 |     40 | offline    | any   | 2021-06-01 14:07:48.362 | status msg timeout       |
Query OK, 4 row(s) in set (0.001236s)


drop dnode 的行为按不会按照预期进行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。

错误二

TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:

创建一个库使用 replica 参数为 2插入部分数据

kubectl exec -i -t tdengine-0 -- \
  taos -s \
  "create database if not exists test replica 2;
   use test;
   create table if not exists t1(ts timestamp, n int);
   insert into t1 values(now, 1)(now+1s, 2);"


缩容到单节点:

kubectl scale statefulsets tdengine --replicas=1

在 taos shell 中的所有数据库操作将无法成功。

taos> show dnodes;
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      2 |     40 | ready      | any   | 2021-06-01 15:55:52.562 |                          |
      2 | tdengine-1.taosd.default.sv... |      1 |     40 | offline    | any   | 2021-06-01 15:56:07.212 | status msg timeout       |
Query OK, 2 row(s) in set (0.000845s)

taos> show dnodes;
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      2 |     40 | ready      | any   | 2021-06-01 15:55:52.562 |                          |
      2 | tdengine-1.taosd.default.sv... |      1 |     40 | offline    | any   | 2021-06-01 15:56:07.212 | status msg timeout       |
Query OK, 2 row(s) in set (0.000837s)

taos> use test;
Database changed.

taos> insert into t1 values(now, 3);

DB error: Unable to resolve FQDN (0.013874s)