Merge pull request #29732 from taosdata/docs/TS-5965
update TDinsight doc
This commit is contained in:
commit
d772b95f12
|
@ -70,6 +70,7 @@ Metric details (from top to bottom, left to right):
|
||||||
- **Databases** - Number of databases.
|
- **Databases** - Number of databases.
|
||||||
- **Connections** - Current number of connections.
|
- **Connections** - Current number of connections.
|
||||||
- **DNodes/MNodes/VGroups/VNodes**: Total and alive count of each resource.
|
- **DNodes/MNodes/VGroups/VNodes**: Total and alive count of each resource.
|
||||||
|
- **Classified Connection Counts**: The current number of active connections, classified by user, application, and IP.
|
||||||
- **DNodes/MNodes/VGroups/VNodes Alive Percent**: The ratio of alive/total for each resource, enable alert rules, and trigger when the resource survival rate (average healthy resource ratio within 1 minute) is less than 100%.
|
- **DNodes/MNodes/VGroups/VNodes Alive Percent**: The ratio of alive/total for each resource, enable alert rules, and trigger when the resource survival rate (average healthy resource ratio within 1 minute) is less than 100%.
|
||||||
- **Measuring Points Used**: Number of measuring points used with alert rules enabled (no data for community edition, healthy by default).
|
- **Measuring Points Used**: Number of measuring points used with alert rules enabled (no data for community edition, healthy by default).
|
||||||
|
|
||||||
|
@ -183,22 +184,22 @@ After importing, click on "Alert rules" on the left side of the Grafana interfac
|
||||||
|
|
||||||
The specific configuration of the 14 alert rules is as follows:
|
The specific configuration of the 14 alert rules is as follows:
|
||||||
|
|
||||||
| alert rule| Rule threshold| Behavior when no data | Data scanning interval |Duration | SQL |
|
| alert rule | Rule threshold | Behavior when no data | Data scanning interval | Duration | SQL |
|
||||||
| ------ | --------- | ---------------- | ----------- |------- |----------------------|
|
| ------------------------------------------------------------- | ------------------------------------ | --------------------- | ---------------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|CPU load of dnode node|average > 80%|Trigger alert|5 minutes|5 minutes |`select now(), dnode_id, last(cpu_system) as cup_use from log.taosd_dnodes_info where _ts >= (now- 5m) and _ts < now partition by dnode_id having first(_ts) > 0 `|
|
| CPU load of dnode node | average > 80% | Trigger alert | 5 minutes | 5 minutes | `select now(), dnode_id, last(cpu_system) as cup_use from log.taosd_dnodes_info where _ts >= (now- 5m) and _ts < now partition by dnode_id having first(_ts) > 0 ` |
|
||||||
|Memory of dnode node |average > 60%|Trigger alert|5 minutes|5 minutes|`select now(), dnode_id, last(mem_engine) / last(mem_total) * 100 as taosd from log.taosd_dnodes_info where _ts >= (now- 5m) and _ts <now partition by dnode_id`|
|
| Memory of dnode node | average > 60% | Trigger alert | 5 minutes | 5 minutes | `select now(), dnode_id, last(mem_engine) / last(mem_total) * 100 as taosd from log.taosd_dnodes_info where _ts >= (now- 5m) and _ts <now partition by dnode_id` |
|
||||||
|Disk capacity occupancy of dnode nodes | > 80%|Trigger alert|5 minutes|5 minutes|`select now(), dnode_id, data_dir_level, data_dir_name, last(used) / last(total) * 100 as used from log.taosd_dnodes_data_dirs where _ts >= (now - 5m) and _ts < now partition by dnode_id, data_dir_level, data_dir_name`|
|
| Disk capacity occupancy of dnode nodes | > 80% | Trigger alert | 5 minutes | 5 minutes | `select now(), dnode_id, data_dir_level, data_dir_name, last(used) / last(total) * 100 as used from log.taosd_dnodes_data_dirs where _ts >= (now - 5m) and _ts < now partition by dnode_id, data_dir_level, data_dir_name` |
|
||||||
|Authorization expires |< 60天|Trigger alert|1 day|0 0 seconds|`select now(), cluster_id, last(grants_expire_time) / 86400 as expire_time from log.taosd_cluster_info where _ts >= (now - 24h) and _ts < now partition by cluster_id having first(_ts) > 0 `|
|
| Authorization expires | < 60天 | Trigger alert | 1 day | 0 0 seconds | `select now(), cluster_id, last(grants_expire_time) / 86400 as expire_time from log.taosd_cluster_info where _ts >= (now - 24h) and _ts < now partition by cluster_id having first(_ts) > 0 ` |
|
||||||
|The used measurement points has reached the authorized number|>= 90%|Trigger alert|1 day|0 seconds|`select now(), cluster_id, CASE WHEN max(grants_timeseries_total) > 0.0 THEN max(grants_timeseries_used) /max(grants_timeseries_total) * 100.0 ELSE 0.0 END AS result from log.taosd_cluster_info where _ts >= (now - 30s) and _ts < now partition by cluster_id having timetruncate(first(_ts), 1m) > 0`|
|
| The used measurement points has reached the authorized number | >= 90% | Trigger alert | 1 day | 0 seconds | `select now(), cluster_id, CASE WHEN max(grants_timeseries_total) > 0.0 THEN max(grants_timeseries_used) /max(grants_timeseries_total) * 100.0 ELSE 0.0 END AS result from log.taosd_cluster_info where _ts >= (now - 30s) and _ts < now partition by cluster_id having timetruncate(first(_ts), 1m) > 0` |
|
||||||
|Number of concurrent query requests | > 100|Do not trigger alert|1 minute|0 seconds|`select now() as ts, count(*) as slow_count from performance_schema.perf_queries`|
|
| Number of concurrent query requests | > 100 | Do not trigger alert | 1 minute | 0 seconds | `select now() as ts, count(*) as slow_count from performance_schema.perf_queries` |
|
||||||
|Maximum time for slow query execution (no time window) |> 300秒|Do not trigger alert|1 minute|0 seconds|`select now() as ts, count(*) as slow_count from performance_schema.perf_queries where exec_usec>300000000`|
|
| Maximum time for slow query execution (no time window) | > 300秒 | Do not trigger alert | 1 minute | 0 seconds | `select now() as ts, count(*) as slow_count from performance_schema.perf_queries where exec_usec>300000000` |
|
||||||
|dnode offline |total != alive|Trigger alert|30 seconds|0 seconds|`select now(), cluster_id, last(dnodes_total) - last(dnodes_alive) as dnode_offline from log.taosd_cluster_info where _ts >= (now -30s) and _ts < now partition by cluster_id having first(_ts) > 0`|
|
| dnode offline | total != alive | Trigger alert | 30 seconds | 0 seconds | `select now(), cluster_id, last(dnodes_total) - last(dnodes_alive) as dnode_offline from log.taosd_cluster_info where _ts >= (now -30s) and _ts < now partition by cluster_id having first(_ts) > 0` |
|
||||||
|vnode offline |total != alive|Trigger alert|30 seconds|0 seconds|`select now(), cluster_id, last(vnodes_total) - last(vnodes_alive) as vnode_offline from log.taosd_cluster_info where _ts >= (now - 30s) and _ts < now partition by cluster_id having first(_ts) > 0 `|
|
| vnode offline | total != alive | Trigger alert | 30 seconds | 0 seconds | `select now(), cluster_id, last(vnodes_total) - last(vnodes_alive) as vnode_offline from log.taosd_cluster_info where _ts >= (now - 30s) and _ts < now partition by cluster_id having first(_ts) > 0 ` |
|
||||||
|Number of data deletion requests |> 0|Do not trigger alert|30 seconds|0 seconds|``select now(), count(`count`) as `delete_count` from log.taos_sql_req where sql_type = 'delete' and _ts >= (now -30s) and _ts < now``|
|
| Number of data deletion requests | > 0 | Do not trigger alert | 30 seconds | 0 seconds | ``select now(), count(`count`) as `delete_count` from log.taos_sql_req where sql_type = 'delete' and _ts >= (now -30s) and _ts < now`` |
|
||||||
|Adapter RESTful request fail |> 5|Do not trigger alert|30 seconds|0 seconds|``select now(), sum(`fail`) as `Failed` from log.adapter_requests where req_type=0 and ts >= (now -30s) and ts < now``|
|
| Adapter RESTful request fail | > 5 | Do not trigger alert | 30 seconds | 0 seconds | ``select now(), sum(`fail`) as `Failed` from log.adapter_requests where req_type=0 and ts >= (now -30s) and ts < now`` |
|
||||||
|Adapter WebSocket request fail |> 5|Do not trigger alert|30 seconds|0 seconds|``select now(), sum(`fail`) as `Failed` from log.adapter_requests where req_type=1 and ts >= (now -30s) and ts < now``|
|
| Adapter WebSocket request fail | > 5 | Do not trigger alert | 30 seconds | 0 seconds | ``select now(), sum(`fail`) as `Failed` from log.adapter_requests where req_type=1 and ts >= (now -30s) and ts < now`` |
|
||||||
|Dnode data reporting is missing |< 3|Trigger alert|180 seconds|0 seconds|`select now(), cluster_id, count(*) as dnode_report from log.taosd_cluster_info where _ts >= (now -180s) and _ts < now partition by cluster_id having timetruncate(first(_ts), 1h) > 0`|
|
| Dnode data reporting is missing | < 3 | Trigger alert | 180 seconds | 0 seconds | `select now(), cluster_id, count(*) as dnode_report from log.taosd_cluster_info where _ts >= (now -180s) and _ts < now partition by cluster_id having timetruncate(first(_ts), 1h) > 0` |
|
||||||
|Restart dnode |max(update_time) > last(update_time)|Trigger alert|90 seconds|0 seconds|`select now(), dnode_id, max(uptime) - last(uptime) as dnode_restart from log.taosd_dnodes_info where _ts >= (now - 90s) and _ts < now partition by dnode_id`|
|
| Restart dnode | max(update_time) > last(update_time) | Trigger alert | 90 seconds | 0 seconds | `select now(), dnode_id, max(uptime) - last(uptime) as dnode_restart from log.taosd_dnodes_info where _ts >= (now - 90s) and _ts < now partition by dnode_id` |
|
||||||
|
|
||||||
TDengine users can modify and improve these alert rules according to their own business needs. In Grafana 7.5 and below versions, the Dashboard and Alert rules functions are combined, while in subsequent new versions, the two functions are separated. To be compatible with Grafana7.5 and below versions, an Alert Used Only panel has been added to the TDinsight panel, which is only required for Grafana7.5 and below versions.
|
TDengine users can modify and improve these alert rules according to their own business needs. In Grafana 7.5 and below versions, the Dashboard and Alert rules functions are combined, while in subsequent new versions, the two functions are separated. To be compatible with Grafana7.5 and below versions, an Alert Used Only panel has been added to the TDinsight panel, which is only required for Grafana7.5 and below versions.
|
||||||
|
|
||||||
|
@ -258,19 +259,19 @@ Install and configure TDinsight dashboard in Grafana on Ubuntu 18.04/20.04 syste
|
||||||
|
|
||||||
Most command line options can also be achieved through environment variables.
|
Most command line options can also be achieved through environment variables.
|
||||||
|
|
||||||
| Short Option | Long Option | Environment Variable | Description |
|
| Short Option | Long Option | Environment Variable | Description |
|
||||||
| ------------ | ------------------------------- | ------------------------------ | -------------------------------------------------------- |
|
| ------------ | -------------------------- | ---------------------------- | ----------------------------------------------------------------------- |
|
||||||
| -v | --plugin-version | TDENGINE_PLUGIN_VERSION | TDengine datasource plugin version, default is latest. |
|
| -v | --plugin-version | TDENGINE_PLUGIN_VERSION | TDengine datasource plugin version, default is latest. |
|
||||||
| -P | --grafana-provisioning-dir | GF_PROVISIONING_DIR | Grafana provisioning directory, default is `/etc/grafana/provisioning/` |
|
| -P | --grafana-provisioning-dir | GF_PROVISIONING_DIR | Grafana provisioning directory, default is `/etc/grafana/provisioning/` |
|
||||||
| -G | --grafana-plugins-dir | GF_PLUGINS_DIR | Grafana plugins directory, default is `/var/lib/grafana/plugins`. |
|
| -G | --grafana-plugins-dir | GF_PLUGINS_DIR | Grafana plugins directory, default is `/var/lib/grafana/plugins`. |
|
||||||
| -O | --grafana-org-id | GF_ORG_ID | Grafana organization ID, default is 1. |
|
| -O | --grafana-org-id | GF_ORG_ID | Grafana organization ID, default is 1. |
|
||||||
| -n | --tdengine-ds-name | TDENGINE_DS_NAME | TDengine datasource name, default is TDengine. |
|
| -n | --tdengine-ds-name | TDENGINE_DS_NAME | TDengine datasource name, default is TDengine. |
|
||||||
| -a | --tdengine-api | TDENGINE_API | TDengine REST API endpoint. Default is `http://127.0.0.1:6041`. |
|
| -a | --tdengine-api | TDENGINE_API | TDengine REST API endpoint. Default is `http://127.0.0.1:6041`. |
|
||||||
| -u | --tdengine-user | TDENGINE_USER | TDengine user name. [default: root] |
|
| -u | --tdengine-user | TDENGINE_USER | TDengine user name. [default: root] |
|
||||||
| -p | --tdengine-password | TDENGINE_PASSWORD | TDengine password. [default: taosdata] |
|
| -p | --tdengine-password | TDENGINE_PASSWORD | TDengine password. [default: taosdata] |
|
||||||
| -i | --tdinsight-uid | TDINSIGHT_DASHBOARD_UID | TDinsight dashboard `uid`. [default: tdinsight] |
|
| -i | --tdinsight-uid | TDINSIGHT_DASHBOARD_UID | TDinsight dashboard `uid`. [default: tdinsight] |
|
||||||
| -t | --tdinsight-title | TDINSIGHT_DASHBOARD_TITLE | TDinsight dashboard title. [default: TDinsight] |
|
| -t | --tdinsight-title | TDINSIGHT_DASHBOARD_TITLE | TDinsight dashboard title. [default: TDinsight] |
|
||||||
| -e | --tdinsight-editable | TDINSIGHT_DASHBOARD_EDITABLE | If the provisioning dashboard could be editable. [default: false] |
|
| -e | --tdinsight-editable | TDINSIGHT_DASHBOARD_EDITABLE | If the provisioning dashboard could be editable. [default: false] |
|
||||||
|
|
||||||
:::note
|
:::note
|
||||||
The new version of the plugin uses the Grafana unified alerting feature, the `-E` option is no longer supported.
|
The new version of the plugin uses the Grafana unified alerting feature, the `-E` option is no longer supported.
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 104 KiB After Width: | Height: | Size: 131 KiB |
|
@ -60,6 +60,7 @@ TDinsight 仪表盘旨在提供 TDengine 相关资源的使用情况和状态,
|
||||||
- **Databases** - 数据库个数。
|
- **Databases** - 数据库个数。
|
||||||
- **Connections** - 当前连接个数。
|
- **Connections** - 当前连接个数。
|
||||||
- **DNodes/MNodes/VGroups/VNodes**:每种资源的总数和存活数。
|
- **DNodes/MNodes/VGroups/VNodes**:每种资源的总数和存活数。
|
||||||
|
- **Classified Connection Counts**:当前活跃连接数,按用户、应用和 ip 分类。
|
||||||
- **DNodes/MNodes/VGroups/VNodes Alive Percent**:每种资源的存活数/总数的比例,启用告警规则,并在资源存活率(1 分钟内平均健康资源比例)不足 100%时触发。
|
- **DNodes/MNodes/VGroups/VNodes Alive Percent**:每种资源的存活数/总数的比例,启用告警规则,并在资源存活率(1 分钟内平均健康资源比例)不足 100%时触发。
|
||||||
- **Measuring Points Used**:启用告警规则的测点数用量(社区版无数据,默认情况下是健康的)。
|
- **Measuring Points Used**:启用告警规则的测点数用量(社区版无数据,默认情况下是健康的)。
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue