homework-jianmu/docs/zh/10-deployment/03-k8s.md

453 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
sidebar_label: Kubernetes
title: 在 Kubernetes 上部署 TDengine 集群
---
## 配置 ConfigMap
为 TDengine 创建 `taoscfg.yaml`,此文件中的配置将作为环境变量传入 TDengine 镜像,更新此配置将导致所有 TDengine POD 重启。
```yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: taoscfg
labels:
app: tdengine
data:
CLUSTER: "1"
TAOS_KEEP: "3650"
TAOS_DEBUG_FLAG: "135"
```
## 配置服务
创建一个 service 配置文件:`taosd-service.yaml`,服务名称 `metadata.name` (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口:
```yaml
---
apiVersion: v1
kind: Service
metadata:
name: "taosd"
labels:
app: "tdengine"
spec:
ports:
- name: tcp6030
protocol: "TCP"
port: 6030
- name: tcp6035
protocol: "TCP"
port: 6035
- name: tcp6041
protocol: "TCP"
port: 6041
- name: udp6030
protocol: "UDP"
port: 6030
- name: udp6031
protocol: "UDP"
port: 6031
- name: udp6032
protocol: "UDP"
port: 6032
- name: udp6033
protocol: "UDP"
port: 6033
- name: udp6034
protocol: "UDP"
port: 6034
- name: udp6035
protocol: "UDP"
port: 6035
- name: udp6036
protocol: "UDP"
port: 6036
- name: udp6037
protocol: "UDP"
port: 6037
- name: udp6038
protocol: "UDP"
port: 6038
- name: udp6039
protocol: "UDP"
port: 6039
- name: udp6040
protocol: "UDP"
port: 6040
selector:
app: "tdengine"
```
## 有状态服务 StatefulSet
根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 `tdengine.yaml`
```yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
serviceName: "taosd"
replicas: 2
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: "tdengine"
template:
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
containers:
- name: "tdengine"
image: "zitsen/taosd:develop"
imagePullPolicy: "Always"
envFrom:
- configMapRef:
name: taoscfg
ports:
- name: tcp6030
protocol: "TCP"
containerPort: 6030
- name: tcp6035
protocol: "TCP"
containerPort: 6035
- name: tcp6041
protocol: "TCP"
containerPort: 6041
- name: udp6030
protocol: "UDP"
containerPort: 6030
- name: udp6031
protocol: "UDP"
containerPort: 6031
- name: udp6032
protocol: "UDP"
containerPort: 6032
- name: udp6033
protocol: "UDP"
containerPort: 6033
- name: udp6034
protocol: "UDP"
containerPort: 6034
- name: udp6035
protocol: "UDP"
containerPort: 6035
- name: udp6036
protocol: "UDP"
containerPort: 6036
- name: udp6037
protocol: "UDP"
containerPort: 6037
- name: udp6038
protocol: "UDP"
containerPort: 6038
- name: udp6039
protocol: "UDP"
containerPort: 6039
- name: udp6040
protocol: "UDP"
containerPort: 6040
env:
# POD_NAME for FQDN config
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# SERVICE_NAME and NAMESPACE for fqdn resolve
- name: SERVICE_NAME
value: "taosd"
- name: STS_NAME
value: "tdengine"
- name: STS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# TZ for timezone settings, we recommend to always set it.
- name: TZ
value: "Asia/Shanghai"
# TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
- name: TAOS_SERVER_PORT
value: "6030"
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
# TAOS_FQND should always be setted in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
readinessProbe:
exec:
command:
- taos
- -s
- "show mnodes"
initialDelaySeconds: 5
timeoutSeconds: 5000
livenessProbe:
tcpSocket:
port: 6030
initialDelaySeconds: 15
periodSeconds: 20
volumeClaimTemplates:
- metadata:
name: taosdata
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "csi-rbd-sc"
resources:
requests:
storage: "10Gi"
```
## 启动集群
将前述三个文件添加到 Kubernetes 集群中:
```bash
kubectl apply -f taoscfg.yaml
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
```
上面的配置将生成一个两节点的 TDengine 集群dnode 是自动配置的,可以使用 `show dnodes` 命令查看当前集群的节点:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
```
输出如下:
```
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 17:13:24.181 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 17:14:09.257 | |
Query OK, 2 row(s) in set (0.000997s)
```
## 集群扩容
TDengine 集群支持自动扩容:
```bash
kubectl scale statefulsets tdengine --replicas=4
```
上面命令行中参数 `--replica=4` 表示要将 TDengine 集群扩容到 4 个节点,执行后首先检查 POD 的状态:
```bash
kubectl get pods -l app=tdengine
```
输出如下:
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 161m
tdengine-1 1/1 Running 0 161m
tdengine-2 1/1 Running 0 32m
tdengine-3 1/1 Running 0 32m
```
此时 POD 的状态仍然是 RunningTDengine 集群中的 dnode 状态要等 POD 状态为 `ready` 之后才能看到:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
```
扩容后的四节点 TDengine 集群的 dnode 列表:
```
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 14:07:27.078 | |
4 | tdengine-3.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 14:07:48.362 | |
Query OK, 4 row(s) in set (0.001293s)
```
## 集群缩容
TDengine 的缩容并没有自动化,我们尝试将一个三节点集群缩容到两节点。
首先,确认一个三节点 TDengine 集群正常工作,在 TDengine CLI 中查看 dnode 的状态:
```bash
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:28:49.787 | |
Query OK, 3 row(s) in set (0.001101s)
```
想要安全的缩容,首先需要将节点从 dnode 列表中移除,也即从集群中移除:
```bash
kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 'tdengine-2.taosd.default.svc.cluster.local:6030'"
```
通过 `show dondes` 命令确认移除成功后,移除相应的 POD
```bash
kubectl scale statefulsets tdengine --replicas=2
```
最后一个 POD 会被删除,使用 `kubectl get pods -l app=tdengine` 查看集群状态:
```
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 3h40m
tdengine-1 1/1 Running 0 3h40m
```
POD 删除后,需要手动删除 PVC否则下次扩容时会继续使用以前的数据导致无法正常加入集群。
```bash
kubectl delete pvc taosdata-tdengine-2
```
此时的集群状态是安全的,需要时还可以再次进行扩容:
```bash
kubectl scale statefulsets tdengine --replicas=3
```
`show dnodes` 输出如下:
```
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 1 | 40 | ready | any | 2021-06-01 16:27:24.852 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:27:53.339 | |
4 | tdengine-2.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 16:40:49.177 | |
```
## 删除集群
完整移除 TDengine 集群,需要分别清理 statefulset、svc、configmap、pvc。
```bash
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg
```
## 常见错误
### 错误一
扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态:
```
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:12.915 | |
2 | tdengine-1.taosd.default.sv... | 0 | 40 | ready | any | 2021-06-01 11:58:33.127 | |
3 | tdengine-2.taosd.default.sv... | 0 | 40 | offline | any | 2021-06-01 14:07:27.078 | status msg timeout |
4 | tdengine-3.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 14:07:48.362 | status msg timeout |
Query OK, 4 row(s) in set (0.001236s)
```
`drop dnode` 的行为按不会按照预期进行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。
### 错误二
TDengine 集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:
创建一个库使用 replica 参数为 2插入部分数据
```bash
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
```
缩容到单节点:
```bash
kubectl scale statefulsets tdengine --replicas=1
```
在 taos shell 中的所有数据库操作将无法成功。
```
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
id | end_point | vnodes | cores | status | role | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 2 | 40 | ready | any | 2021-06-01 15:55:52.562 | |
2 | tdengine-1.taosd.default.sv... | 1 | 40 | offline | any | 2021-06-01 15:56:07.212 | status msg timeout |
Query OK, 2 row(s) in set (0.000837s)
taos> use test;
Database changed.
taos> insert into t1 values(now, 3);
DB error: Unable to resolve FQDN (0.013874s)
```