homework-jianmu/docs/en/13-operation/10-monitor.md

12 KiB

title description
TDengine Monitoring This document describes how to monitor your TDengine cluster.

After TDengine is started, it automatically writes monitoring data including CPU, memory and disk usage, bandwidth, number of requests, disk I/O speed, slow queries, into a designated database at a predefined interval through taosKeeper. Additionally, some important system operations, like logon, create user, drop database, and alerts and warnings generated in TDengine are written into the log database too. A system operator can view the data in log database from TDengine CLI or from a web console.

The collection of the monitoring information is enabled by default, but can be disabled by parameter monitor in the configuration file.

TDinsight

TDinsight is a complete solution which uses the monitoring database log mentioned previously, and Grafana, to monitor a TDengine cluster.

A script TDinsight.sh is provided to deploy TDinsight automatically.

Download TDinsight.sh with the below command:

wget https://github.com/taosdata/grafanaplugin/raw/master/dashboards/TDinsight.sh
chmod +x TDinsight.sh

Prepare:

  1. TDengine Server

    • The URL of REST service: for example http://localhost:6041 if TDengine is deployed locally
    • User name and password
  2. Grafana Alert Notification

You can use below command to setup Grafana alert notification.

An existing Grafana Notification Channel can be specified with parameter -E, the notifier uid of the channel can be obtained by curl -u admin:admin localhost:3000/api/alert-notifications |jq

 ```bash
 ./TDinsight.sh -a http://localhost:6041 -u root -p taosdata -E <notifier uid>
 ```

Launch TDinsight.sh with the command above and restart Grafana, then open Dashboard http://localhost:3000/d/tdinsight.

log database

The data of tdinsight dashboard is stored in log database (default. You can change it in taoskeeper's config file. For more infrmation, please reference to taoskeeper document). The taoskeeper will create log database on taoskeeper startup.

cluster_info table

cluster_info table contains cluster information records.

field type is_tag comment
ts TIMESTAMP timestamp
first_ep VARCHAR first ep of cluster
first_ep_dnode_id INT dnode id or first_ep
version VARCHAR tdengine version. such as: 3.0.4.0
master_uptime FLOAT days of master's uptime
monitor_interval INT monitor interval in second
dbs_total INT total number of databases in cluster
tbs_total BIGINT total number of tables in cluster
stbs_total INT total number of stables in cluster
dnodes_total INT total number of dnodes in cluster
dnodes_alive INT total number of dnodes in ready state
mnodes_total INT total number of mnodes in cluster
mnodes_alive INT total number of mnodes in ready state
vgroups_total INT total number of vgroups in cluster
vgroups_alive INT total number of vgroups in ready state
vnodes_total INT total number of vnode in cluster
vnodes_alive INT total number of vnode in ready state
connections_total INT total number of connections to cluster
topics_total INT total number of topics in cluster
streams_total INT total number of streams in cluster
protocol INT protocol version
cluster_id NCHAR TAG cluster id

d_info table

d_info table contains dnodes information records.

field type is_tag comment
ts TIMESTAMP timestamp
status VARCHAR dnode status
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

m_info table

m_info table contains mnode information records.

field type is_tag comment
ts TIMESTAMP timestamp
role VARCHAR the role of mnode. leader or follower
mnode_id INT TAG master node id
mnode_ep NCHAR TAG master node endpoint
cluster_id NCHAR TAG cluster id

dnodes_info table

dnodes_info table contains dnodes information records.

field type is_tag comment
ts TIMESTAMP timestamp
uptime FLOAT dnode uptime in days
cpu_engine FLOAT cpu usage of tdengine. read from /proc/<taosd_pid>/stat
cpu_system FLOAT cpu usage of server. read from /proc/stat
cpu_cores FLOAT cpu cores of server
mem_engine INT memory usage of tdengine. read from /proc/<taosd_pid>/status
mem_system INT available memory on the server in KB
mem_total INT total memory of server in KB
disk_engine INT
disk_used BIGINT usage of data dir in bytes
disk_total BIGINT the capacity of data dir in bytes
net_in FLOAT network throughput rate in byte/s. read from /proc/net/dev
net_out FLOAT network throughput rate in byte/s. read from /proc/net/dev
io_read FLOAT io throughput rate in byte/s. read from /proc/<taosd_pid>/io
io_write FLOAT io throughput rate in byte/s. read from /proc/<taosd_pid>/io
io_read_disk FLOAT io throughput rate of disk in byte/s. read from /proc/<taosd_pid>/io
io_write_disk FLOAT io throughput rate of disk in byte/s. read from /proc/<taosd_pid>/io
req_select INT number of select queries received per dnode
req_select_rate FLOAT number of select queries received per dnode divided by monitor interval.
req_insert INT number of insert queries received per dnode
req_insert_success INT number of successfully insert queries received per dnode
req_insert_rate FLOAT number of insert queries received per dnode divided by monitor interval
req_insert_batch INT number of batch insertions
req_insert_batch_success INT number of successful batch insertions
req_insert_batch_rate FLOAT number of batch insertions divided by monitor interval
errors INT dnode errors
vnodes_num INT number of vnodes per dnode
masters INT number of master vnodes
has_mnode INT if the dnode has mnode
has_qnode INT if the dnode has qnode
has_snode INT if the dnode has snode
has_bnode INT if the dnode has bnode
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

data_dir table

data_dir table contains data directory information records.

field type is_tag comment
ts TIMESTAMP timestamp
name NCHAR data directory. default is /var/lib/taos
level INT level for multi-level storage
avail BIGINT available space for data directory in bytes
used BIGINT used space for data directory in bytes
total BIGINT total space for data directory in bytes
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

log_dir table

log_dir table contains log directory information records.

field type is_tag comment
ts TIMESTAMP timestamp
name NCHAR log directory. default is /var/log/taos/
avail BIGINT available space for log directory in bytes
used BIGINT used space for data directory in bytes
total BIGINT total space for data directory in bytes
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

temp_dir table

temp_dir table contains temp dir information records.

field type is_tag comment
ts TIMESTAMP timestamp
name NCHAR temp directory. default is /tmp/
avail BIGINT available space for temp directory in bytes
used BIGINT used space for temp directory in bytes
total BIGINT total space for temp directory in bytes
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

vgroups_info table

vgroups_info table contains vgroups information records.

field type is_tag comment
ts TIMESTAMP timestamp
vgroup_id INT vgroup id
database_name VARCHAR database for the vgroup
tables_num BIGINT number of tables per vgroup
status VARCHAR status
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

vnodes_role table

vnodes_role table contains vnode role information records.

field type is_tag comment
ts TIMESTAMP timestamp
vnode_role VARCHAR role. leader or follower
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

log_summary table

log_summary table contains log summary information records.

field type is_tag comment
ts TIMESTAMP timestamp
error INT error count
info INT info count
debug INT debug count
trace INT trace count
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

grants_info table

grants_info table contains grants information records.

field type is_tag comment
ts TIMESTAMP timestamp
expire_time BIGINT time until grants expire in seconds
timeseries_used BIGINT timeseries used
timeseries_total BIGINT total timeseries
dnode_id INT TAG dnode id
dnode_ep NCHAR TAG dnode endpoint
cluster_id NCHAR TAG cluster id

keeper_monitor table

keeper_monitor table contains keeper monitor information records.

field type is_tag comment
ts TIMESTAMP timestamp
cpu FLOAT cpu usage
mem FLOAT memory usage
identify NCHAR TAG

taosadapter_restful_http_request_total table

taosadapter_restful_http_request_total table contains taosadapter rest request information record. The timestamp column of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
gauge DOUBLE metric value
client_ip NCHAR TAG client ip
endpoint NCHAR TAG taosadpater endpoint
request_method NCHAR TAG request method
request_uri NCHAR TAG request uri
status_code NCHAR TAG status code

taosadapter_restful_http_request_fail table

taosadapter_restful_http_request_fail table contains taosadapter failed rest request information record. The timestamp column of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
gauge DOUBLE metric value
client_ip NCHAR TAG client ip
endpoint NCHAR TAG taosadpater endpoint
request_method NCHAR TAG request method
request_uri NCHAR TAG request uri
status_code NCHAR TAG status code

taosadapter_restful_http_request_in_flight table

taosadapter_restful_http_request_in_flight table contains taosadapter rest request information record in real time. The timestamp column of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
gauge DOUBLE metric value
endpoint NCHAR TAG taosadpater endpoint

taosadapter_restful_http_request_summary_milliseconds table

taosadapter_restful_http_request_summary_milliseconds table contains the summary or rest information record. The timestamp column of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
count DOUBLE
sum DOUBLE
0.5 DOUBLE
0.9 DOUBLE
0.99 DOUBLE
0.1 DOUBLE
0.2 DOUBLE
endpoint NCHAR TAG taosadpater endpoint
request_method NCHAR TAG request method
request_uri NCHAR TAG request uri

taosadapter_system_mem_percent table

taosadapter_system_mem_percent table contains taosadapter memory usage information. The timestamp of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
gauge DOUBLE metric value
endpoint NCHAR TAG taosadpater endpoint

taosadapter_system_cpu_percent table

taosadapter_system_cpu_percent table contains taosadapter cup usage information. The timestamp of this table is _ts.

field type is_tag comment
_ts TIMESTAMP timestamp
gauge DOUBLE mertic value
endpoint NCHAR TAG taosadpater endpoint