453 lines
14 KiB
Markdown
453 lines
14 KiB
Markdown
---
|
||
sidebar_label: Kubernetes
|
||
title: 在 Kubernetes 上部署 TDengine 集群
|
||
---
|
||
|
||
## 配置 ConfigMap
|
||
|
||
为 TDengine 创建 `taoscfg.yaml`,此文件中的配置将作为环境变量传入 TDengine 镜像,更新此配置将导致所有 TDengine POD 重启。
|
||
|
||
```yaml
|
||
---
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: taoscfg
|
||
labels:
|
||
app: tdengine
|
||
data:
|
||
CLUSTER: "1"
|
||
TAOS_KEEP: "3650"
|
||
TAOS_DEBUG_FLAG: "135"
|
||
```
|
||
|
||
## 配置服务
|
||
|
||
创建一个 service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口:
|
||
|
||
```yaml
|
||
---
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: "taosd"
|
||
labels:
|
||
app: "tdengine"
|
||
spec:
|
||
ports:
|
||
- name: tcp6030
|
||
protocol: "TCP"
|
||
port: 6030
|
||
- name: tcp6035
|
||
protocol: "TCP"
|
||
port: 6035
|
||
- name: tcp6041
|
||
protocol: "TCP"
|
||
port: 6041
|
||
- name: udp6030
|
||
protocol: "UDP"
|
||
port: 6030
|
||
- name: udp6031
|
||
protocol: "UDP"
|
||
port: 6031
|
||
- name: udp6032
|
||
protocol: "UDP"
|
||
port: 6032
|
||
- name: udp6033
|
||
protocol: "UDP"
|
||
port: 6033
|
||
- name: udp6034
|
||
protocol: "UDP"
|
||
port: 6034
|
||
- name: udp6035
|
||
protocol: "UDP"
|
||
port: 6035
|
||
- name: udp6036
|
||
protocol: "UDP"
|
||
port: 6036
|
||
- name: udp6037
|
||
protocol: "UDP"
|
||
port: 6037
|
||
- name: udp6038
|
||
protocol: "UDP"
|
||
port: 6038
|
||
- name: udp6039
|
||
protocol: "UDP"
|
||
port: 6039
|
||
- name: udp6040
|
||
protocol: "UDP"
|
||
port: 6040
|
||
selector:
|
||
app: "tdengine"
|
||
```
|
||
|
||
## 有状态服务 StatefulSet
|
||
|
||
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 `tdengine.yaml`:
|
||
|
||
```yaml
|
||
---
|
||
apiVersion: apps/v1
|
||
kind: StatefulSet
|
||
metadata:
|
||
name: "tdengine"
|
||
labels:
|
||
app: "tdengine"
|
||
spec:
|
||
serviceName: "taosd"
|
||
replicas: 2
|
||
updateStrategy:
|
||
type: RollingUpdate
|
||
selector:
|
||
matchLabels:
|
||
app: "tdengine"
|
||
template:
|
||
metadata:
|
||
name: "tdengine"
|
||
labels:
|
||
app: "tdengine"
|
||
spec:
|
||
containers:
|
||
- name: "tdengine"
|
||
image: "zitsen/taosd:develop"
|
||
imagePullPolicy: "Always"
|
||
envFrom:
|
||
- configMapRef:
|
||
name: taoscfg
|
||
ports:
|
||
- name: tcp6030
|
||
protocol: "TCP"
|
||
containerPort: 6030
|
||
- name: tcp6035
|
||
protocol: "TCP"
|
||
containerPort: 6035
|
||
- name: tcp6041
|
||
protocol: "TCP"
|
||
containerPort: 6041
|
||
- name: udp6030
|
||
protocol: "UDP"
|
||
containerPort: 6030
|
||
- name: udp6031
|
||
protocol: "UDP"
|
||
containerPort: 6031
|
||
- name: udp6032
|
||
protocol: "UDP"
|
||
containerPort: 6032
|
||
- name: udp6033
|
||
protocol: "UDP"
|
||
containerPort: 6033
|
||
- name: udp6034
|
||
protocol: "UDP"
|
||
containerPort: 6034
|
||
- name: udp6035
|
||
protocol: "UDP"
|
||
containerPort: 6035
|
||
- name: udp6036
|
||
protocol: "UDP"
|
||
containerPort: 6036
|
||
- name: udp6037
|
||
protocol: "UDP"
|
||
containerPort: 6037
|
||
- name: udp6038
|
||
protocol: "UDP"
|
||
containerPort: 6038
|
||
- name: udp6039
|
||
protocol: "UDP"
|
||
containerPort: 6039
|
||
- name: udp6040
|
||
protocol: "UDP"
|
||
containerPort: 6040
|
||
env:
|
||
# POD_NAME for FQDN config
|
||
- name: POD_NAME
|
||
valueFrom:
|
||
fieldRef:
|
||
fieldPath: metadata.name
|
||
# SERVICE_NAME and NAMESPACE for fqdn resolve
|
||
- name: SERVICE_NAME
|
||
value: "taosd"
|
||
- name: STS_NAME
|
||
value: "tdengine"
|
||
- name: STS_NAMESPACE
|
||
valueFrom:
|
||
fieldRef:
|
||
fieldPath: metadata.namespace
|
||
# TZ for timezone settings, we recommend to always set it.
|
||
- name: TZ
|
||
value: "Asia/Shanghai"
|
||
# TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
|
||
- name: TAOS_SERVER_PORT
|
||
value: "6030"
|
||
# Must set if you want a cluster.
|
||
- name: TAOS_FIRST_EP
|
||
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
|
||
# TAOS_FQND should always be setted in k8s env.
|
||
- name: TAOS_FQDN
|
||
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
|
||
volumeMounts:
|
||
- name: taosdata
|
||
mountPath: /var/lib/taos
|
||
readinessProbe:
|
||
exec:
|
||
command:
|
||
- taos
|
||
- -s
|
||
- "show mnodes"
|
||
initialDelaySeconds: 5
|
||
timeoutSeconds: 5000
|
||
livenessProbe:
|
||
tcpSocket:
|
||
port: 6030
|
||
initialDelaySeconds: 15
|
||
periodSeconds: 20
|
||
volumeClaimTemplates:
|
||
- metadata:
|
||
name: taosdata
|
||
spec:
|
||
accessModes:
|
||
- "ReadWriteOnce"
|
||
storageClassName: "csi-rbd-sc"
|
||
resources:
|
||
requests:
|
||
storage: "10Gi"
|
||
```
|
||
|
||
## 启动集群
|
||
|
||
将前述三个文件添加到 Kubernetes 集群中:
|
||
|
||
```bash
|
||
kubectl apply -f taoscfg.yaml
|
||
kubectl apply -f taosd-service.yaml
|
||
kubectl apply -f tdengine.yaml
|
||
|
||
```
|
||
|
||
上面的配置将生成一个两节点的 TDengine 集群,dnode 是自动配置的,可以使用 `show dnodes` 命令查看当前集群的节点:
|
||
|
||
```bash
|
||
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
|
||
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
|
||
|
||
```
|
||
|
||
输出如下:
|
||
|
||
```
|
||
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
|
||
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
|
||
|
||
taos> show dnodes
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 17:13:24.181 | |
|
||
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 17:14:09.257 | |
|
||
Query OK, 2 row(s) in set (0.000997s)
|
||
|
||
```
|
||
|
||
## 集群扩容
|
||
|
||
TDengine 集群支持自动扩容:
|
||
|
||
```bash
|
||
kubectl scale statefulsets tdengine --replicas=4
|
||
|
||
```
|
||
|
||
上面命令行中参数 `--replica=4` 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:
|
||
|
||
```bash
|
||
kubectl get pods -l app=tdengine
|
||
|
||
```
|
||
|
||
输出如下:
|
||
|
||
```
|
||
NAME READY STATUS RESTARTS AGE
|
||
tdengine-0 1/1 Running 0 161m
|
||
tdengine-1 1/1 Running 0 161m
|
||
tdengine-2 1/1 Running 0 32m
|
||
tdengine-3 1/1 Running 0 32m
|
||
|
||
```
|
||
|
||
此时 POD 的状态仍然是 Running,TDengine 集群中的 dnode 状态要等 POD 状态为 `ready` 之后才能看到:
|
||
|
||
```bash
|
||
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
|
||
|
||
```
|
||
|
||
扩容后的四节点 TDengine 集群的 dnode 列表:
|
||
|
||
```
|
||
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
|
||
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
|
||
|
||
taos> show dnodes
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
|
||
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
|
||
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 14:07:27.078 | |
|
||
4 | tdengine-3.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 14:07:48.362 | |
|
||
Query OK, 4 row(s) in set (0.001293s)
|
||
|
||
```
|
||
|
||
## 集群缩容
|
||
|
||
TDengine 的缩容并没有自动化,我们尝试将一个三节点集群缩容到两节点。
|
||
|
||
首先,确认一个三节点 TDengine 集群正常工作,在 TDengine CLI 中查看 dnode 的状态:
|
||
|
||
```bash
|
||
taos> show dnodes
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
|
||
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
|
||
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:28:49.787 | |
|
||
Query OK, 3 row(s) in set (0.001101s)
|
||
|
||
```
|
||
|
||
想要安全的缩容,首先需要将节点从 dnode 列表中移除,也即从集群中移除:
|
||
|
||
```bash
|
||
kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 'tdengine-2.taosd.default.svc.cluster.local:6030'"
|
||
|
||
```
|
||
|
||
通过 `show dondes` 命令确认移除成功后,移除相应的 POD:
|
||
|
||
```bash
|
||
kubectl scale statefulsets tdengine --replicas=2
|
||
|
||
```
|
||
|
||
最后一个 POD 会被删除,使用 `kubectl get pods -l app=tdengine` 查看集群状态:
|
||
|
||
```
|
||
NAME READY STATUS RESTARTS AGE
|
||
tdengine-0 1/1 Running 0 3h40m
|
||
tdengine-1 1/1 Running 0 3h40m
|
||
|
||
```
|
||
|
||
POD 删除后,需要手动删除 PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。
|
||
|
||
```bash
|
||
kubectl delete pvc taosdata-tdengine-2
|
||
|
||
```
|
||
|
||
此时的集群状态是安全的,需要时还可以再次进行扩容:
|
||
|
||
```bash
|
||
kubectl scale statefulsets tdengine --replicas=3
|
||
|
||
|
||
```
|
||
|
||
`show dnodes` 输出如下:
|
||
|
||
```
|
||
taos> show dnodes
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
|
||
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
|
||
4 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:40:49.177 | |
|
||
|
||
|
||
```
|
||
|
||
## 删除集群
|
||
|
||
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
|
||
|
||
```bash
|
||
kubectl delete statefulset -l app=tdengine
|
||
kubectl delete svc -l app=tdengine
|
||
kubectl delete pvc -l app=tdengine
|
||
kubectl delete configmap taoscfg
|
||
|
||
```
|
||
|
||
## 常见错误
|
||
|
||
### 错误一
|
||
|
||
扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态:
|
||
|
||
```
|
||
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
|
||
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
|
||
|
||
taos> show dnodes
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
|
||
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
|
||
3 | tdengine-2.taosd.default.sv... | 0 | 40 | offline | any | 2021-06-01 14:07:27.078 | status msg timeout |
|
||
4 | tdengine-3.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 14:07:48.362 | status msg timeout |
|
||
Query OK, 4 row(s) in set (0.001236s)
|
||
|
||
|
||
```
|
||
|
||
但 `drop dnode` 的行为按不会按照预期进行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。
|
||
|
||
### 错误二
|
||
|
||
TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:
|
||
|
||
创建一个库使用 replica 参数为 2,插入部分数据:
|
||
|
||
```bash
|
||
kubectl exec -i -t tdengine-0 -- \
|
||
taos -s \
|
||
"create database if not exists test replica 2;
|
||
use test;
|
||
create table if not exists t1(ts timestamp, n int);
|
||
insert into t1 values(now, 1)(now+1s, 2);"
|
||
|
||
|
||
```
|
||
|
||
缩容到单节点:
|
||
|
||
```bash
|
||
kubectl scale statefulsets tdengine --replicas=1
|
||
|
||
```
|
||
|
||
在 taos shell 中的所有数据库操作将无法成功。
|
||
|
||
```
|
||
taos> show dnodes;
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
|
||
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
|
||
Query OK, 2 row(s) in set (0.000845s)
|
||
|
||
taos> show dnodes;
|
||
id | end_point | vnodes | cores | status | role | create_time | offline reason |
|
||
======================================================================================================================================
|
||
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
|
||
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
|
||
Query OK, 2 row(s) in set (0.000837s)
|
||
|
||
taos> use test;
|
||
Database changed.
|
||
|
||
taos> insert into t1 values(now, 3);
|
||
|
||
DB error: Unable to resolve FQDN (0.013874s)
|
||
|
||
```
|