homework-jianmu/docs/en/13-operation/10-monitor.md

329 lines
12 KiB
Markdown

---
title: TDengine Monitoring
description: This document describes how to monitor your TDengine cluster.
---
After TDengine is started, it automatically writes monitoring data including CPU, memory and disk usage, bandwidth, number of requests, disk I/O speed, slow queries, into a designated database at a predefined interval through taosKeeper. Additionally, some important system operations, like logon, create user, drop database, and alerts and warnings generated in TDengine are written into the `log` database too. A system operator can view the data in `log` database from TDengine CLI or from a web console.
The collection of the monitoring information is enabled by default, but can be disabled by parameter `monitor` in the configuration file.
## TDinsight
TDinsight is a complete solution which uses the monitoring database `log` mentioned previously, and Grafana, to monitor a TDengine cluster.
A script `TDinsight.sh` is provided to deploy TDinsight automatically.
Download `TDinsight.sh` with the below command:
```bash
wget https://github.com/taosdata/grafanaplugin/raw/master/dashboards/TDinsight.sh
chmod +x TDinsight.sh
```
Prepare:
1. TDengine Server
- The URL of REST service: for example `http://localhost:6041` if TDengine is deployed locally
- User name and password
2. Grafana Alert Notification
You can use below command to setup Grafana alert notification.
An existing Grafana Notification Channel can be specified with parameter `-E`, the notifier uid of the channel can be obtained by `curl -u admin:admin localhost:3000/api/alert-notifications |jq`
```bash
./TDinsight.sh -a http://localhost:6041 -u root -p taosdata -E <notifier uid>
```
Launch `TDinsight.sh` with the command above and restart Grafana, then open Dashboard `http://localhost:3000/d/tdinsight`.
## log database
The data of tdinsight dashboard is stored in `log` database (default. You can change it in taoskeeper's config file. For more infrmation, please reference to [taoskeeper document](../../reference/taosKeeper)). The taoskeeper will create log database on taoskeeper startup.
### cluster\_info table
`cluster_info` table contains cluster information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|first\_ep|VARCHAR||first ep of cluster|
|first\_ep\_dnode\_id|INT||dnode id or first\_ep|
|version|VARCHAR||tdengine version. such as: 3.0.4.0|
|master\_uptime|FLOAT||days of master's uptime|
|monitor\_interval|INT||monitor interval in second|
|dbs\_total|INT||total number of databases in cluster|
|tbs\_total|BIGINT||total number of tables in cluster|
|stbs\_total|INT||total number of stables in cluster|
|dnodes\_total|INT||total number of dnodes in cluster|
|dnodes\_alive|INT||total number of dnodes in ready state|
|mnodes\_total|INT||total number of mnodes in cluster|
|mnodes\_alive|INT||total number of mnodes in ready state|
|vgroups\_total|INT||total number of vgroups in cluster|
|vgroups\_alive|INT||total number of vgroups in ready state|
|vnodes\_total|INT||total number of vnode in cluster|
|vnodes\_alive|INT||total number of vnode in ready state|
|connections\_total|INT||total number of connections to cluster|
|topics\_total|INT||total number of topics in cluster|
|streams\_total|INT||total number of streams in cluster|
|protocol|INT||protocol version|
|cluster\_id|NCHAR|TAG|cluster id|
### d\_info table
`d_info` table contains dnodes information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|status|VARCHAR||dnode status|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### m\_info table
`m_info` table contains mnode information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|role|VARCHAR||the role of mnode. leader or follower|
|mnode\_id|INT|TAG|master node id|
|mnode\_ep|NCHAR|TAG|master node endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### dnodes\_info table
`dnodes_info` table contains dnodes information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|uptime|FLOAT||dnode uptime in `days`|
|cpu\_engine|FLOAT||cpu usage of tdengine. read from `/proc/<taosd_pid>/stat`|
|cpu\_system|FLOAT||cpu usage of server. read from `/proc/stat`|
|cpu\_cores|FLOAT||cpu cores of server|
|mem\_engine|INT||memory usage of tdengine. read from `/proc/<taosd_pid>/status`|
|mem\_system|INT||available memory on the server in `KB`|
|mem\_total|INT||total memory of server in `KB`|
|disk\_engine|INT|||
|disk\_used|BIGINT||usage of data dir in `bytes`|
|disk\_total|BIGINT||the capacity of data dir in `bytes`|
|net\_in|FLOAT||network throughput rate in byte/s. read from `/proc/net/dev`|
|net\_out|FLOAT||network throughput rate in byte/s. read from `/proc/net/dev`|
|io\_read|FLOAT||io throughput rate in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_write|FLOAT||io throughput rate in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_read\_disk|FLOAT||io throughput rate of disk in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_write\_disk|FLOAT||io throughput rate of disk in byte/s. read from `/proc/<taosd_pid>/io`|
|req\_select|INT||number of select queries received per dnode|
|req\_select\_rate|FLOAT||number of select queries received per dnode divided by monitor interval.|
|req\_insert|INT||number of insert queries received per dnode|
|req\_insert\_success|INT||number of successfully insert queries received per dnode|
|req\_insert\_rate|FLOAT||number of insert queries received per dnode divided by monitor interval|
|req\_insert\_batch|INT||number of batch insertions|
|req\_insert\_batch\_success|INT||number of successful batch insertions|
|req\_insert\_batch\_rate|FLOAT||number of batch insertions divided by monitor interval|
|errors|INT||dnode errors|
|vnodes\_num|INT||number of vnodes per dnode|
|masters|INT||number of master vnodes|
|has\_mnode|INT||if the dnode has mnode|
|has\_qnode|INT||if the dnode has qnode|
|has\_snode|INT||if the dnode has snode|
|has\_bnode|INT||if the dnode has bnode|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### data\_dir table
`data_dir` table contains data directory information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|name|NCHAR||data directory. default is `/var/lib/taos`|
|level|INT||level for multi-level storage|
|avail|BIGINT||available space for data directory in `bytes`|
|used|BIGINT||used space for data directory in `bytes`|
|total|BIGINT||total space for data directory in `bytes`|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### log\_dir table
`log_dir` table contains log directory information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|name|NCHAR||log directory. default is `/var/log/taos/`|
|avail|BIGINT||available space for log directory in `bytes`|
|used|BIGINT||used space for data directory in `bytes`|
|total|BIGINT||total space for data directory in `bytes`|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### temp\_dir table
`temp_dir` table contains temp dir information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|name|NCHAR||temp directory. default is `/tmp/`|
|avail|BIGINT||available space for temp directory in `bytes`|
|used|BIGINT||used space for temp directory in `bytes`|
|total|BIGINT||total space for temp directory in `bytes`|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### vgroups\_info table
`vgroups_info` table contains vgroups information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|vgroup\_id|INT||vgroup id|
|database\_name|VARCHAR||database for the vgroup|
|tables\_num|BIGINT||number of tables per vgroup|
|status|VARCHAR||status|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### vnodes\_role table
`vnodes_role` table contains vnode role information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|vnode\_role|VARCHAR||role. leader or follower|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### log\_summary table
`log_summary` table contains log summary information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|error|INT||error count|
|info|INT||info count|
|debug|INT||debug count|
|trace|INT||trace count|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### grants\_info table
`grants_info` table contains grants information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|expire\_time|BIGINT||time until grants expire in seconds|
|timeseries\_used|BIGINT||timeseries used|
|timeseries\_total|BIGINT||total timeseries|
|dnode\_id|INT|TAG|dnode id|
|dnode\_ep|NCHAR|TAG|dnode endpoint|
|cluster\_id|NCHAR|TAG|cluster id|
### keeper\_monitor table
`keeper_monitor` table contains keeper monitor information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|cpu|FLOAT||cpu usage|
|mem|FLOAT||memory usage|
|identify|NCHAR|TAG||
### taosadapter\_restful\_http\_request\_total table
`taosadapter_restful_http_request_total` table contains taosadapter rest request information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|client\_ip|NCHAR|TAG|client ip|
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
|status\_code|NCHAR|TAG|status code|
### taosadapter\_restful\_http\_request\_fail table
`taosadapter_restful_http_request_fail` table contains taosadapter failed rest request information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|client\_ip|NCHAR|TAG|client ip|
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
|status\_code|NCHAR|TAG|status code|
### taosadapter\_restful\_http\_request\_in\_flight table
`taosadapter_restful_http_request_in_flight` table contains taosadapter rest request information record in real time. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|endpoint|NCHAR|TAG|taosadpater endpoint|
### taosadapter\_restful\_http\_request\_summary\_milliseconds table
`taosadapter_restful_http_request_summary_milliseconds` table contains the summary or rest information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|count|DOUBLE|||
|sum|DOUBLE|||
|0.5|DOUBLE|||
|0.9|DOUBLE|||
|0.99|DOUBLE|||
|0.1|DOUBLE|||
|0.2|DOUBLE|||
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
### taosadapter\_system\_mem\_percent table
`taosadapter_system_mem_percent` table contains taosadapter memory usage information. The timestamp of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|endpoint|NCHAR|TAG|taosadpater endpoint|
### taosadapter\_system\_cpu\_percent table
`taosadapter_system_cpu_percent` table contains taosadapter cup usage information. The timestamp of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||mertic value|
|endpoint|NCHAR|TAG|taosadpater endpoint|