14 KiB
sidebar_label | title |
---|---|
Kubernetes | 在 Kubernetes 上部署 TDengine 集群 |
配置 ConfigMap
为 TDengine 创建 taoscfg.yaml
,此文件中的配置将作为环境变量传入 TDengine 镜像,更新此配置将导致所有 TDengine POD 重启。
---
apiVersion: v1
kind: ConfigMap
metadata:
name: taoscfg
labels:
app: tdengine
data:
CLUSTER: "1"
TAOS_KEEP: "3650"
TAOS_DEBUG_FLAG: "135"
配置服务
创建一个 service 配置文件:taosd-service.yaml
,服务名称 metadata.name
(此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口:
---
apiVersion: v1
kind: Service
metadata:
name: "taosd"
labels:
app: "tdengine"
spec:
ports:
- name: tcp6030
protocol: "TCP"
port: 6030
- name: tcp6035
protocol: "TCP"
port: 6035
- name: tcp6041
protocol: "TCP"
port: 6041
- name: udp6030
protocol: "UDP"
port: 6030
- name: udp6031
protocol: "UDP"
port: 6031
- name: udp6032
protocol: "UDP"
port: 6032
- name: udp6033
protocol: "UDP"
port: 6033
- name: udp6034
protocol: "UDP"
port: 6034
- name: udp6035
protocol: "UDP"
port: 6035
- name: udp6036
protocol: "UDP"
port: 6036
- name: udp6037
protocol: "UDP"
port: 6037
- name: udp6038
protocol: "UDP"
port: 6038
- name: udp6039
protocol: "UDP"
port: 6039
- name: udp6040
protocol: "UDP"
port: 6040
selector:
app: "tdengine"
有状态服务 StatefulSet
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 tdengine.yaml
:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
serviceName: "taosd"
replicas: 2
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: "tdengine"
template:
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
containers:
- name: "tdengine"
image: "zitsen/taosd:develop"
imagePullPolicy: "Always"
envFrom:
- configMapRef:
name: taoscfg
ports:
- name: tcp6030
protocol: "TCP"
containerPort: 6030
- name: tcp6035
protocol: "TCP"
containerPort: 6035
- name: tcp6041
protocol: "TCP"
containerPort: 6041
- name: udp6030
protocol: "UDP"
containerPort: 6030
- name: udp6031
protocol: "UDP"
containerPort: 6031
- name: udp6032
protocol: "UDP"
containerPort: 6032
- name: udp6033
protocol: "UDP"
containerPort: 6033
- name: udp6034
protocol: "UDP"
containerPort: 6034
- name: udp6035
protocol: "UDP"
containerPort: 6035
- name: udp6036
protocol: "UDP"
containerPort: 6036
- name: udp6037
protocol: "UDP"
containerPort: 6037
- name: udp6038
protocol: "UDP"
containerPort: 6038
- name: udp6039
protocol: "UDP"
containerPort: 6039
- name: udp6040
protocol: "UDP"
containerPort: 6040
env:
# POD_NAME for FQDN config
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# SERVICE_NAME and NAMESPACE for fqdn resolve
- name: SERVICE_NAME
value: "taosd"
- name: STS_NAME
value: "tdengine"
- name: STS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# TZ for timezone settings, we recommend to always set it.
- name: TZ
value: "Asia/Shanghai"
# TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
- name: TAOS_SERVER_PORT
value: "6030"
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
# TAOS_FQND should always be setted in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
readinessProbe:
exec:
command:
- taos
- -s
- "show mnodes"
initialDelaySeconds: 5
timeoutSeconds: 5000
livenessProbe:
tcpSocket:
port: 6030
initialDelaySeconds: 15
periodSeconds: 20
volumeClaimTemplates:
- metadata:
name: taosdata
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "csi-rbd-sc"
resources:
requests:
storage: "10Gi"
启动集群
将前述三个文件添加到 Kubernetes 集群中:
kubectl apply -f taoscfg.yaml
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
上面的配置将生成一个两节点的 TDengine 集群,dnode 是自动配置的,可以使用 show dnodes
命令查看当前集群的节点:
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
输出如下:
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 17:13:24.181 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 17:14:09.257 | |
Query OK, 2 row(s) in set (0.000997s)
集群扩容
TDengine 集群支持自动扩容:
kubectl scale statefulsets tdengine --replicas=4
上面命令行中参数 --replica=4
表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:
kubectl get pods -l app=tdengine
输出如下:
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
此时 POD 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 POD 状态为 ready
之后才能看到:
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
扩容后的四节点 TDengine 集群的 dnode 列表:
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 14:07:27.078 | |
4 | tdengine-3.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 14:07:48.362 | |
Query OK, 4 row(s) in set (0.001293s)
集群缩容
TDengine 的缩容并没有自动化,我们尝试将一个三节点集群缩容到两节点。
首先,确认一个三节点 TDengine 集群正常工作,在 TDengine CLI 中查看 dnode 的状态:
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:28:49.787 | |
Query OK, 3 row(s) in set (0.001101s)
想要安全的缩容,首先需要将节点从 dnode 列表中移除,也即从集群中移除:
kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 'tdengine-2.taosd.default.svc.cluster.local:6030'"
通过 show dondes
命令确认移除成功后,移除相应的 POD:
kubectl scale statefulsets tdengine --replicas=2
最后一个 POD 会被删除,使用 kubectl get pods -l app=tdengine
查看集群状态:
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 3h40m
tdengine-1 1/1 Running 0 3h40m
POD 删除后,需要手动删除 PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。
kubectl delete pvc taosdata-tdengine-2
此时的集群状态是安全的,需要时还可以再次进行扩容:
kubectl scale statefulsets tdengine --replicas=3
show dnodes
输出如下:
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
4 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:40:49.177 | |
删除集群
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
常见错误
错误一
扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态:
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | offline | any | 2021-06-01 14:07:27.078 | status msg timeout |
4 | tdengine-3.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 14:07:48.362 | status msg timeout |
Query OK, 4 row(s) in set (0.001236s)
但 drop dnode
的行为按不会按照预期进行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。
错误二
TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:
创建一个库使用 replica 参数为 2,插入部分数据:
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
缩容到单节点:
kubectl scale statefulsets tdengine --replicas=1
在 taos shell 中的所有数据库操作将无法成功。
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
taos> use test;
Database changed.
taos> insert into t1 values(now, 3);
DB error: Unable to resolve FQDN (0.013874s)