Merge pull request #29022 from taosdata/merge/mainto3.0

merge: from main to 3.0 branch
This commit is contained in:
Shengliang Guan 2024-12-04 09:03:46 +08:00 committed by GitHub
commit beec1c55e4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
185 changed files with 14586 additions and 13215 deletions

View File

@ -24,16 +24,25 @@
English | [简体中文](README-CN.md) | [TDengine Cloud](https://cloud.tdengine.com) | [Learn more about TSDB](https://tdengine.com/tsdb/)
# Table of Contents
1. [What is TDengine?](#what-is-tdengine)
1. [Documentation](#documentation)
1. [Building](#building)
1. [Installing](#installing)
1. [Try TDengine](#try-tdengine)
1. [Developing with TDengine](#developing-with-tdengine)
1. [Contribute to TDengine](#contribute-to-tdengine)
1. [Join the TDengine Community](#join-the-tdengine-community)
# What is TDengine
1. [What is TDengine?](#1-what-is-tdengine)
2. [Documentation](#2-documentation)
3. [Building](#3-building)
1. [Install build tools](#31-install-build-tools)
1. [Get the source codes](#32-get-the-source-codes)
1. [Special Note](#33-special-note)
1. [Build TDengine](#34-build-tdengine)
4. [Installing](#4-installing)
1. [On Linux platform](#41-on-linux-platform)
1. [On Windows platform](#42-on-windows-platform)
1. [On macOS platform](#43-on-macos-platform)
1. [Quick Run](#44-quick-run)
5. [Try TDengine](#5-try-tdengine)
6. [Developing with TDengine](#6-developing-with-tdengine)
7. [Contribute to TDengine](#7-contribute-to-tdengine)
8. [Join the TDengine Community](#8-join-the-tdengine-community)
# 1. What is TDengine
TDengine is an open source, high-performance, cloud native [time-series database](https://tdengine.com/tsdb/) optimized for Internet of Things (IoT), Connected Cars, and Industrial IoT. It enables efficient, real-time data ingestion, processing, and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors. TDengine differentiates itself from other time-series databases with the following advantages:
@ -43,19 +52,19 @@ TDengine is an open source, high-performance, cloud native [time-series database
- **[Cloud Native](https://tdengine.com/tdengine/cloud-native-time-series-database/)**: Through native distributed design, sharding and partitioning, separation of compute and storage, RAFT, support for kubernetes deployment and full observability, TDengine is a cloud native Time-Series Database and can be deployed on public, private or hybrid clouds.
- **[Ease of Use](https://tdengine.com/tdengine/easy-time-series-data-platform/)**: For administrators, TDengine significantly reduces the effort to deploy and maintain. For developers, it provides a simple interface, simplified solution and seamless integrations for third party tools. For data users, it gives easy data access.
- **[Ease of Use](https://tdengine.com/tdengine/easy-time-series-data-platform/)**: For administrators, TDengine significantly reduces the effort to deploy and maintain. For developers, it provides a simple interface, simplified solution and seamless integrations for third party tools. For data users, it gives easy data access.
- **[Easy Data Analytics](https://tdengine.com/tdengine/time-series-data-analytics-made-easy/)**: Through super tables, storage and compute separation, data partitioning by time interval, pre-computation and other means, TDengine makes it easy to explore, format, and get access to data in a highly efficient way.
- **[Easy Data Analytics](https://tdengine.com/tdengine/time-series-data-analytics-made-easy/)**: Through super tables, storage and compute separation, data partitioning by time interval, pre-computation and other means, TDengine makes it easy to explore, format, and get access to data in a highly efficient way.
- **[Open Source](https://tdengine.com/tdengine/open-source-time-series-database/)**: TDengines core modules, including cluster feature, are all available under open source licenses. It has gathered 19.9k stars on GitHub. There is an active developer community, and over 139k running instances worldwide.
For a full list of TDengine competitive advantages, please [check here](https://tdengine.com/tdengine/). The easiest way to experience TDengine is through [TDengine Cloud](https://cloud.tdengine.com).
For a full list of TDengine competitive advantages, please [check here](https://tdengine.com/tdengine/). The easiest way to experience TDengine is through [TDengine Cloud](https://cloud.tdengine.com).
# Documentation
# 2. Documentation
For user manual, system design and architecture, please refer to [TDengine Documentation](https://docs.tdengine.com) ([TDengine 文档](https://docs.taosdata.com))
# Building
# 3. Building
At the moment, TDengine server supports running on Linux/Windows/macOS systems. Any application can also choose the RESTful interface provided by taosAdapter to connect the taosd service . TDengine supports X64/ARM64 CPU, and it will support MIPS64, Alpha64, ARM32, RISC-V and other CPU architectures in the future. Right now we don't support build with cross-compiling environment.
@ -65,7 +74,7 @@ TDengine provide a few useful tools such as taosBenchmark (was named taosdemo) a
To build TDengine, use [CMake](https://cmake.org/) 3.13.0 or higher versions in the project directory.
## Install build tools
## 3.1 Install build tools
### Ubuntu 18.04 and above or Debian
@ -158,7 +167,7 @@ cmake .. -DBUILD_HTTP=false
TDengine includes a few components developed by Rust language. Please refer to rust-lang.org official documentation for rust environment setup.
## Get the source codes
## 3.2 Get the source codes
First of all, you may clone the source codes from github:
@ -174,11 +183,11 @@ You can modify the file ~/.gitconfig to use ssh protocol instead of https for be
insteadOf = https://github.com/
```
## Special Note
## 3.3 Special Note
[JDBC Connector](https://github.com/taosdata/taos-connector-jdbc) [Go Connector](https://github.com/taosdata/driver-go)[Python Connector](https://github.com/taosdata/taos-connector-python)[Node.js Connector](https://github.com/taosdata/taos-connector-node)[C# Connector](https://github.com/taosdata/taos-connector-dotnet) [Rust Connector](https://github.com/taosdata/taos-connector-rust) and [Grafana plugin](https://github.com/taosdata/grafanaplugin) has been moved to standalone repository.
## Build TDengine
## 3.4 Build TDengine
### On Linux platform
@ -254,9 +263,9 @@ mkdir debug && cd debug
cmake .. && cmake --build .
```
# Installing
# 4. Installing
## On Linux platform
## 4.1 On Linux platform
After building successfully, TDengine can be installed by
@ -282,7 +291,7 @@ taos
If TDengine CLI connects the server successfully, welcome messages and version info are printed. Otherwise, an error message is shown.
## On Windows platform
## 4.2 On Windows platform
After building successfully, TDengine can be installed by:
@ -290,8 +299,7 @@ After building successfully, TDengine can be installed by:
nmake install
```
## On macOS platform
## 4.3 On macOS platform
After building successfully, TDengine can be installed by:
@ -317,7 +325,7 @@ taos
If TDengine CLI connects the server successfully, welcome messages and version info are printed. Otherwise, an error message is shown.
## Quick Run
## 4.4 Quick Run
If you don't want to run TDengine as a service, you can run it in current shell. For example, to quickly start a TDengine server after building, run the command below in terminal: (We take Linux as an example, command on Windows will be `taosd.exe`)
@ -333,7 +341,7 @@ In another terminal, use the TDengine CLI to connect the server:
option "-c test/cfg" specifies the system configuration file directory.
# Try TDengine
# 5. Try TDengine
It is easy to run SQL commands from TDengine CLI which is the same as other SQL databases.
@ -351,7 +359,7 @@ SELECT * FROM t;
Query OK, 2 row(s) in set (0.001700s)
```
# Developing with TDengine
# 6. Developing with TDengine
## Official Connectors
@ -366,11 +374,11 @@ TDengine provides abundant developing tools for users to develop on TDengine. Fo
- [C#](https://docs.tdengine.com/reference/connectors/csharp/)
- [RESTful API](https://docs.tdengine.com/reference/connectors/rest-api/)
# Contribute to TDengine
# 7. Contribute to TDengine
Please follow the [contribution guidelines](CONTRIBUTING.md) to contribute to the project.
# Join the TDengine Community
# 8. Join the TDengine Community
For more information about TDengine, you can follow us on social media and join our Discord server:

View File

@ -2,7 +2,7 @@
IF (DEFINED VERNUMBER)
SET(TD_VER_NUMBER ${VERNUMBER})
ELSE ()
SET(TD_VER_NUMBER "3.3.4.3.alpha")
SET(TD_VER_NUMBER "3.3.4.8.alpha")
ENDIF ()
IF (DEFINED VERCOMPATIBLE)

View File

@ -14,6 +14,6 @@ This website contains the user documentation for TDengine:
- Software developers can consult the [Developer's Guide](developer-guide/) for information about creating applications that interoperate with TDengine and writing user-defined functions that run within TDengine.
- Database administrators will find valuable information in [Operations and Maintenance](operations-and-maintenance/) and [TDengine Reference](tdengine-reference/) to assist in managing, maintaining, and monitoring their TDengine deployments.
TDengine, including this documentation, is an open-source project, and we welcome contributions from the community. If you find any errors or unclear descriptions, click **Edit this document** at the bottom of the page to submit your corrections. To view the source code, visit our [GitHub repository](https://github.com/taosdata/tdengine).
TDengine, including this documentation, is an open-source project, and we welcome contributions from the community. If you find any errors or unclear descriptions, click **Edit this page** at the bottom of the page to submit your corrections. To view the source code, visit our [GitHub repository](https://github.com/taosdata/tdengine).
Together, we make a difference!

76
docs/en/03-intro.md Normal file
View File

@ -0,0 +1,76 @@
---
sidebar_label: Introduction
title: Introduction to TDengine
slug: /introduction
---
import Image from '@theme/IdealImage';
import imgEcosystem from './assets/introduction-01.png';
TDengine is a time-series database designed to help traditional industries overcome the challenges of Industry 4.0 and Industrial IoT. It enables real-time ingestion, storage, analysis, and distribution of petabytes of data per day, generated by billions of sensors and data collectors. By making big data accessible and affordable, TDengine helps everyone — from independent developers and startups to industry stalwarts and multinationals — unlock the true value of their data.
## TDengine Offerings
- [TDengine OSS](https://tdengine.com/oss/) is an open-source, cloud-native time-series database. Its source code is licensed under the AGPL and publicly available on GitHub. TDengine OSS serves as the code base for our paid offerings and provides the same core functionality. Unlike some open-core products, TDengine OSS is a full-featured solution that includes the necessary components for production use, including clustering.
- [TDengine Enterprise](https://tdengine.com/enterprise/) is a high-performance big data platform designed for Industry 4.0 and the Industrial IoT. Built on the open-source TDengine OSS, it delivers an enterprise-grade feature set tailored to the needs of traditional industries.
- [TDengine Cloud](https://cloud.tdengine.com) delivers all features of TDengine Enterprise as a fully managed service that can run on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
## What Makes TDengine Different
TDengine differentiates itself from typical time-series databases with the following four core competencies:
1. **High Performance at Any Scale:** With its distributed scalable architecture that grows together with your business, TDengine can store and process massive datasets up to 10.6x faster than other TSDBs — all while providing the split-second latency that your real-time visualization and reporting apps demand.
2. **Efficient Data Storage:** With its unique design and data model, TDengine provides the most cost-effective solution for storing your operations data, including tiered storage, S3, and 10:1 data compression, ensuring that you can get valuable business insights from your data without breaking the bank.
3. **Data Consolidation Across Sites:** With built-in connectors for a wide variety of industrial sources — MQTT, Kafka, OPC, PI System, and more — TDengine delivers zero-code data ingestion and extract, transform, and load (ETL) in a centralized platform that acts as a single source of truth for your business.
4. **Comprehensive Solution for Industrial Data:** With out-of-the-box data subscription, caching, and stream processing, TDengine is more than just a time-series database — it includes all key components needed for industrial data storage and processing built into a single product and accessible through familiar SQL statements.
## What TDengine Delivers
With its innovative "one table per device" design, unique supertable concept, and highly optimized storage engine, TDengine is purpose-built to meet the unique needs of ingesting, querying, and storing massive time-series datasets. In its role at the core of the industrial data architecture, it provides the following functionality:
1. [Data Ingestion](../basic-features/data-ingestion/): You can write data into TDengine with standard SQL or in schemaless mode over the InfluxDB Line Protocol, OpenTSDN Telnet Protocol, and OpenTSDB JSON Protocol. TDengine also seamlessly integrates with data collectors like Telegraf and Prometheus.
2. [Data Querying](../basic-features/data-querying): In addition to standard SQL query syntax, TDengine includes time-series extensions such as downsampling and windowing and functions such as cumulative sum and time-weighted average to better meet the needs of time-series data processing. TDengine also supports user-defined functions (UDF), which can be written in C or Python.
3. [Read Caching](../advanced-features/caching/): TDengine uses a time-driven first-in, first-out (FIFO) cache management strategy, keeping the most recent data in the cache. This makes it easy and fast to access the real-time status of any metric without the need for other caching tools like Redis, simplifying system architecture and reducing operational costs.
4. [Stream Processing](../advanced-features/stream-processing/): TDengine's built-in stream processing engine provides the capability to process data streams in real-time as they are written, supporting not only continuous queries but also event-driven stream processing. This lightweight but optimized solution can return results in milliseconds even during high-throughput data ingestion.
5. [Data Subscription](../advanced-features/data-subscription): TDengine includes data subscription out of the box, eliminating the need to deploy other complex products to provide this critical feature. You can define topics in SQL, subscribing to a query, supertable, or database, and use a Kafka-like API to consume these topics in your applications.
6. [Visualization](../third-party-tools/visualization/) and [BI](../third-party-tools/analytics/): Through its REST API and standard JDBC and ODBC interfaces, TDengine seamlessly integrates with leading platforms like Grafana, Power BI, and Seeq.
7. [Clustering](../operations-and-maintenance/deploy-your-cluster/): TDengine supports clustered deployment so that you can add nodes to scale your system and increase processing capacity. At the same time, it provides high availability through multi-replica technology and supports Kubernetes deployment. It also offers various operational tools to facilitate system administrators in managing and maintaining robust cluster operations.
8. Data Migration: TDengine provides various convenient data import and export functions, including script file import/export, data file import/export, and the [taosdump](../tdengine-reference/tools/taosdump/) tool.
9. [Client Libraries](../tdengine-reference/client-libraries/): TDengine offers client libraries for a variety of different programming languages, including Java, Python, and C/C++, so that you can build custom applications in your favorite language. Sample code that you can copy and paste into your apps is also provided to make the development process even easier.
10. O&M Tools: You can use the interactive [command-line interface (CLI)](../tdengine-reference/tools/tdengine-cli/) for managing clusters, checking system status, and performing ad hoc queries. The stress-testing tool [taosBenchmark](../tdengine-reference/tools/taosbenchmark/) is a quick way to generate sample data and test the performance of TDengine. And TDengine's GUI component [taosExplorer](../tdengine-reference/components/taosexplorer/) simplifies the operations and management process.
11. [Data Security](https://tdengine.com/security/): With TDengine Enterprise, you can implement fine-grained access controls with rich user and permissions management features. IP whitelisting helps you control which accounts can access your cluster from which servers, and audit logs record sensitive operations. In TDengine Enterprise, you can also configure encryption in transit on the server level and encryption at rest on the database level, which is transparent to operations and has minimal impact on performance.
12. [Zero-Code Data Connectors](https://tdengine.com/data-sources/): TDengine Enterprise includes zero-code connectors for industrial data protocols like MQTT and OPC, traditional data historians like AVEVA PI System and Wonderware Historian, relational databases like Oracle Database and SQL Server, and other time-series databases like InfluxDB and OpenTSDB. With these connectors, you can synchronize or migrate diverse time-series datasets to TDengine in the GUI without touching a line of code.
## How TDengine Benefits You
With its high performance, standard SQL support, and component integration, TDengine can reduce your total cost of data operations:
1. **Industry-leading performance:** TDengine significantly outperforms other time-series databases with up to 16 times faster ingestion and over 100 times higher query performance than InfluxDB or TimescaleDB while requiring fewer storage resources. Because TDengine ingests data faster, stores data more efficiently, and responds to queries more quickly, it uses fewer CPU and storage resources and adds less to your bills.
2. **Easy to use with no learning costs:** TDengine is easier to use than other time-series database solutions and does not require specialized training. This is because TDengine supports standard SQL, is easy to integrate with third-party tools, and comes with client libraries for various programming languages, including sample code.
3. **Simplified, fully integrated solution:** By including stream processing, caching, and data subscription as built-in components at no extra cost, TDengine eliminates the need to deploy third-party products just to process time-series data. Its components are simple, easy to use, and purpose-built to process time-series data.
## TDengine Ecosystem
With its open ecosystem, TDengine allows you the freedom to construct the data stack that is best for your business. Its support for standard SQL, zero-code connectors for a wide range of industrial protocols and data solutions, and seamless integration with visualization, analytics, and business intelligence (BI) applications make it easy to fit TDengine into your infrastructure.
<figure>
<Image img={imgEcosystem} alt="TDengine ecosystem"/>
<figcaption>Figure 1. TDengine ecosystem</figcaption>
</figure>
As shown in the figure, TDengine acts as the central source of truth in an industrial data ecosystem, ingesting data from a variety of sources and sharing that data with business applications and stakeholders.
## Application Scenarios
TDengine is the only time-series database purpose-built for industrial scenarios and is fully capable of storing and processing the massive, high-frequency datasets generated by a range of industries, especially the following:
- [Renewable energy](https://tdengine.com/renewable-energy/)
- [Manufacturing](https://tdengine.com/manufacturing/)
- [Connected cars](https://tdengine.com/connected-cars/)
TDengine can also form the core component of a data stack to enable the following industrial applications:
- [Predictive maintenance](https://tdengine.com/predictive-maintenance/)
- [Vibration analysis](https://tdengine.com/high-frequency-data/)
- [Condition monitoring](https://tdengine.com/condition-monitoring)

View File

@ -28,7 +28,7 @@ To install TDengine on your local machine instead of in a container, see [Get St
```bash
docker pull tdengine/tdengine:3.3.0.0
```
:::
2. Start a container with the following command:
@ -42,7 +42,7 @@ To install TDengine on your local machine instead of in a container, see [Get St
```bash
docker run -d -v <local-data-directory>:/var/lib/taos -v <local-log-directory>:/var/log/taos -p 6030:6030 -p 6041:6041 -p 6043-6060:6043-6060 -p 6043-6060:6043-6060/udp tdengine/tdengine
```
3. Verify that the container is running properly:
```bash
@ -89,7 +89,6 @@ When the ingestion process is finished, taosBenchmark outputs the time taken to
After inserting data with taosBenchmark as described above, you can use the TDengine CLI to test TDengine's query performance in your container:
1. Start the TDengine CLI:
```bash

View File

@ -48,7 +48,7 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
```bash
sudo rpm -ivh TDengine-server-<version>-Linux-x64.rpm
```
Replace `<version>` with the version of the package that you downloaded.
</TabItem>
@ -62,7 +62,7 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
```bash
tar -zxvf TDengine-server-<version>-Linux-x64.tar.gz
```
Replace `<version>` with the version of the package that you downloaded.
3. In the directory where you decompressed the package, run the following command to install TDengine:
@ -71,9 +71,9 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
```
:::note
The `install.sh` script requires you to enter configuration information in the terminal. For a non-interactive installation, run `./install.sh -e no`. You can run `./install.sh -h` for detailed information about all parameters.
:::
</TabItem>
@ -118,9 +118,9 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
2. Run the installation package to install TDengine.
:::note
If the installation is blocked, right-click on the package and choose **Open**.
:::
</TabItem>
@ -138,9 +138,9 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
```bash
sudo start-all.sh
```
Alternatively, you can manage specific TDengine services through systemd:
```bash
sudo systemctl start taosd
sudo systemctl start taosadapter
@ -157,7 +157,7 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
</TabItem>
<TabItem label="macOS" value="macos">
Run the following command to start all TDengine services:
```bash
@ -175,7 +175,7 @@ The TDengine OSS installation package is provided for Linux users in .deb, .rpm,
</TabItem>
</Tabs>
You can now work with TDengine on your local machine. For example, you can run the `taos` command to open the TDengine command-line interface.
## What to Do Next

View File

@ -23,7 +23,7 @@ You can register for a TDengine Cloud account for free and automatically obtain
2. Determine whether you want to use any public databases and click **Next**.
The TDengine DB Mart includes several public databases that you can use for testing purposes. To enable access to a public database in your account, select the toggle. You can modify these settings after the account creation process is finished.
3. Create an organization.
1. Enter a name for your organization in TDengine Cloud. This name must be unique.
2. Specify whether to enable single sign-on (SSO).

View File

@ -1,150 +1,157 @@
---
sidebar_label: Data Model
title: Understand the TDengine Data Model
title: The TDengine Data Model
slug: /basic-features/data-model
---
import Image from '@theme/IdealImage';
import dataModel from '../assets/data-model-01.png';
This document describes the data model and provides definitions of terms and concepts used in TDengine.
To clearly explain the concepts of time-series data and facilitate the writing of example programs, the TDengine documentation uses smart meters as an example. These example smart meters can collect three metrics: current, voltage, and phase. In addition, each smart meter also has two static attributes: location and group ID. The data collected by these smart meters is shown in the table below.
The TDengine data model is illustrated in the following figure.
|Device ID| Timestamp | Current | Voltage | Phase | Location | Group ID |
|:-------:|:---------:|:-------:|:-------:|:-----:|:--------:|:--------:|
|d1001 |1538548685000 | 10.3 | 219 | 0.31 | California.SanFrancisco |2|
|d1002 | 1538548684000 | 10.2 | 220 | 0.23 | California.SanFrancisco |3|
|d1003 | 1538548686500 | 11.5 | 221 | 0.35 | California.LosAngeles | 3 |
|d1004 | 1538548685500 | 13.4 | 223 | 0.29 | California.LosAngeles | 2 |
|d1001 | 1538548695000 | 12.6 | 218 | 0.33 | California.SanFrancisco |2|
|d1004 | 1538548696600 | 11.8 | 221 | 0.28 | California.LosAngeles | 2 |
|d1002 | 1538548696650 | 10.3 | 218 | 0.25 | California.SanFrancisco | 3 |
|d1001 | 1538548696800 | 12.3 | 221 | 0.31 | California.SanFrancisco | 2 |
These smart meters collect data based on external trigger events or preset periods, ensuring the continuity and temporality of the data, thus forming a continuously updated data stream.
## Basic Concepts
### Metric
A metric refers to a physical quantity, such as current, voltage, or temperature, obtained from a sensor, device, or other data collection point. Since these physical quantities change over time, the types of data collected are diverse, including integers, floating-point numbers, and strings. As time passes, the stored data will continue to grow. For example, in smart meters, current, voltage, and phase are typical metrics collected.
### Tag
A tag refers to a static attribute associated with a sensor, device, or other data collection point. These are attributes that do not change over time, such as device model, color, or location. The data type of tags can be any type. Although tags themselves are static, in practical applications, you may need to modify, delete, or add tags. Unlike quantities collected, the amount of tag data stored remains relatively stable over time and does not show a significant growth trend. In the example of smart meters, location and group ID are typical tags.
### Data Collection Point
A data collection point (DCP) refers to a hardware or software device responsible for collecting metrics at a certain preset time period or when triggered by specific events. A data collection point can collect one or more quantities at the same time, but these quantities are obtained at the same moment and have the same timestamp. Complex structured devices typically include multiple data collection points, each with different collection cycles, and they operate independently without interference. For example, a car might have a dedicated data collection point for collecting location information, some for monitoring engine status, and others focused on monitoring the interior environment. Thus, a car could contain three different types of data collection points. In the example of smart meters, identifiers such as d1001, d1002, and d1003 represent different data collection points.
### Table
Given that the time-series data collected from DCPs is usually structured, TDengine uses the traditional relational database model to manage data. At the same time, to fully utilize the characteristics of time-series data, TDengine adopts a "one table per device" design, requiring a separate table for each data collection point. For example, if there are millions of smart meters, a corresponding number of tables need to be created in TDengine. In the example data of smart meters, the smart meter with device ID d1001 corresponds to a table in TDengine, and all the time-series data collected by this meter is stored in this table. This design approach retains the usability of relational databases while fully utilizing the unique advantages of time-series data:
1. Since the data generation process at different data collection points is completely independent, and each data collection point has a unique data source, there is only one writer per table. This allows for lock-free data writing, significantly increasing the write speed.
2. For a data collection point, the data it generates is in chronological order, so the write operation can be implemented in an append-only manner, further greatly enhancing the data writing speed.
3. The data from a data collection point is stored continuously in blocks. Thus, reading data from a specific time period can significantly reduce random read operations, dramatically improving the speed of data reading and querying.
4. Within a data block, columnar storage is used, and different compression algorithms can be applied to different data types to improve the compression ratio. Moreover, since the rate of data collection changes is usually slow, the compression ratio will be higher.
If the traditional method of writing data from multiple data collection points into a single table is used, due to uncontrollable network latency, the sequence of data arrival at the server from different data collection points cannot be guaranteed, and the write operation needs to be protected by locks. Moreover, it is difficult to ensure that the data from one data collection point is stored continuously together. Using the method of one data collection point per table can ensure to the greatest extent that the performance of insertion and querying for a single data collection point is optimal, and the data compression ratio is the highest.
In TDengine, the name of the data collection point (e.g., d1001) is usually used as the table name, and each data collection point can have multiple metrics (such as current, voltage, phase, etc.), each corresponding to a column in a table. The data type of the metrics can be integer, floating-point, string, etc.
Additionally, the first column of the table must be a timestamp. For each metric, TDengine will use the first column timestamp to build an index and use columnar storage. For complex devices, such as cars, which have multiple data collection points, multiple tables need to be created for one car.
### Supertable
Although the "one table per device" design helps to manage each collection point specifically, as the number of devices increases, the number of tables also increases dramatically, posing challenges for database management and data analysis. When performing aggregation operations across data collection points, users need to deal with a large number of tables, making the work exceptionally cumbersome.
To solve this problem, TDengine introduces the supertable. A supertable is a data structure that can aggregate certain types of data collection points together into a logically unified table. These data collection points have the same table structure, but their static properties (such as tags) may differ. When creating a supertable, in addition to defining the metrics, it is also necessary to define the tags of the supertable. A supertable must contain at least one timestamp column, one or more metric columns, and one or more tag columns. Moreover, the tags of the supertable can be flexibly added, modified, or deleted.
In TDengine, a table represents a specific data collection point, while a supertable represents a collection of data collection points with the same attributes. Taking smart meters as an example, we can create a supertable for this type of meter, which includes all the common properties and metrics of smart meters. This design not only simplifies table management but also facilitates aggregation operations across data collection points, thereby improving the efficiency of data processing.
### Subtable
A subtable is a logical abstraction of a data collection point and is a specific table belonging to a supertable. You can use the definition of the supertable as a template and create subtables by specifying the tag values of the subtables. Thus, tables generated through the supertable are referred to as subtables. The relationship between the supertable and subtables is mainly reflected in the following aspects.
- A supertable contains multiple subtables, which have the same table structure but different tag values.
- The table structure of subtables cannot be directly modified, but the columns and tags of the supertable can be modified, and the modifications take effect immediately for all subtables.
- A supertable defines a template and does not store any data or tag information itself.
In TDengine, query operations can be performed on both subtables and supertables. For queries on supertables, TDengine treats the data from all subtables as a whole, first filtering out the tables that meet the query conditions through tags, then querying the time-series data on these subtables separately, and finally merging the query results from each subtable. Essentially, by supporting queries on supertables, TDengine achieves efficient aggregation of multiple similar data collection points. To better understand the relationship between metrics, tags, supertables, and subtables, here is an example of a data model for smart meters. You can refer to the data model diagram below for a more intuitive understanding of these concepts.
To better understand the relationship between metrics, tags, supertables, and subtables, taking smart meters as an example, refer to the following diagram.
<figure>
<Image img={dataModel} alt="Data Model Diagram"/>
<figcaption>Figure 1. The TDengine data model</figcaption>
</figure>
## Terminology
### Metric
A metric is a measurement obtained from a data collection point. With smart meters, for example, current, voltage, and phase are typical metrics.
### Tag
A tag is a static attribute associated with a data collection point and that does not typically change over time, such as device model, color, or location. With smart meters, for example, location and group ID are typical tags.
### Data Collection Point
A data collection point (DCP) is a hardware or software device responsible for collecting metrics at a predetermined time interval or upon a specific event trigger. A DCP can collect one or more metrics simultaneously, but all metrics from each DCP share the same timestamp.
Complex devices often have multiple DCPs, each with its own collection cycle, operating independently of each other. For example, in a car, one DCP may collect GPS data while a second monitors the engine status and a third monitors the interior environment.
### Table
A table in TDengine consists of rows of data and columns defining the type of data, like in the traditional relational database model. However, TDengine stores the data from each DCP in a separate table. This is known as the "one table per DCP" model and is a unique feature of TDengine.
Note that for complex devices like cars, which have multiple DCPs, this means that multiple tables are created for a single device.
Typically, the name of a DCP is stored as the name of the table, not as a separate tag. The `tbname` pseudocolumn is used for filtering by table name. Each metric collected by and tag associated with the DCP is represented as a column in the table, and each column has a defined data type. The first column in the table must be the timestamp, which is used to build an index.
TDengine includes two types of tables: subtables, which are created within a supertable, and basic tables, which are independent of supertables.
### Basic Table
A basic table cannot contain tags and does not belong to any supertable. The functionality of a basic table is similar to that of a table in a relational database management system.
### Supertable
A supertable is a data structure that groups together a specific type of DCP into a logical unified table. The tables created within a supertable are known as subtables. All subtables within a supertable have the same schema. The supertable is a unique concept in TDengine that simplifies table management and facilitates aggregation across DCPs.
Each supertable contains at least one timestamp column, at least one metric column, and at least one tag column. Tags in a supertable can be added, modified, or deleted at any time without creating new time series. Note that data is not stored within a supertable, but within the subtables in that supertable.
With smart meters, for example, one supertable would be created for all smart meters. Within that supertable, one subtable is created for each smart meter. For more information, see [TDengine Concepts: Supertable](https://tdengine.com/tdengine-concepts-supertable/).
### Subtable
A subtable is created within a supertable and inherits the schema of the supertable. The schema of a subtable cannot be modified; any modifications to the supertable schema affect all subtables that it contains.
### Database
A database defines storage policies for its supertables and tables. Each database can contain one or more supertables, but each supertable or table belongs to only one database.
A database in TDengine is used to manage a collection of tables. TDengine allows a running instance to contain multiple databases, and each database can be configured with different storage strategies. Since different types of data collection points usually have different data characteristics, such as data collection frequency, data retention period, number of replicas, data block size, etc., it is recommended to create supertables with different data characteristics in different databases.
A single TDengine deployment can contain multiple databases with different policies. You can create multiple databases to achieve finer-grained data management and optimization.
In a database, one to many supertables can be included, but each supertable can only belong to one database. At the same time, all subtables owned by a supertable are also stored in that database. This design helps to achieve more fine-grained data management and optimization, ensuring that TDengine can provide the best processing performance based on different data characteristics.
### Timestamp
### Timestamps
Timestamps are a complex but essential part of time-series data management. TDengine stores timestamps in Unix time, which represents the number of milliseconds elapsed since the Unix epoch of January 1, 1970 at 00:00 UTC. However, when an application queries data in TDengine, the TDengine client automatically converts the timestamp to the local time zone of the application.
Timestamps play a crucial role in time-series data processing, especially when applications need to access the database from multiple time zones, making the issue more complex. Before delving into how TDengine handles timestamps and time zones, let's first introduce a few basic concepts.
- When TDengine ingests a timestamp in RFC 3339 format, for example `2018-10-03T14:38:05.000+08:00`, the time zone specified in the timestamp is used to convert the timestamp to Unix time.
- When TDengine ingests a timestamp that does not contain time zone information, the local time zone of the application is used to convert the timestamp to Unix time.
- Local date and time: Refers to the local time of a specific region, usually expressed as a string in the format yyyy-MM-dd hh:mm:ss.SSS. This representation of time does not include any time zone information, such as "2021-07-21 12:00:00.000".
- Time zone: Standard time in different geographical locations on Earth. Coordinated Universal Time (UTC) or Greenwich Mean Time is the international time standard, and other time zones are usually expressed as an offset from UTC, such as "UTC+8" representing East Eight Zone time. UTC timestamp: Represents the number of milliseconds since the UNIX epoch (i.e., UTC time January 1, 1970, at 0:00). For example, "1700000000000" corresponds to the date and time "2023-11-14 22:13:20 (UTC+0)". In TDengine, when saving time-series data, what is actually saved is the UTC timestamp. When writing data, TDengine handles timestamps in the following two ways.
- RFC-3339 format: When using this format, TDengine can correctly parse time strings with time zone information into UTC timestamps. For example, "2018-10-03T14:38:05.000+08:00" will be converted into a UTC timestamp.
- Non-RFC-3339 format: If the time string does not contain time zone information, TDengine will use the time zone setting of the application to automatically convert the time into a UTC timestamp.
## Sample Data
When querying data, the TDengine client will automatically convert the saved UTC timestamps into local time according to the current time zone setting of the application, ensuring that users in different time zones can see the correct time information.
In this documentation, a smart meters scenario is used as sample data. The smart meters in this scenario collect three metrics, current, voltage, and phase; and two tags, location and group ID. The device ID of each smart meter is used as its table name.
## Data Modeling
An example of data collected by these smart meters is shown in the following table.
This section uses smart meters as an example to briefly introduce how to use SQL to create databases, supertables, and basic table operations in TDengine.
| Device ID | Timestamp | Current | Voltage | Phase | Location | Group ID |
| :-------: | :-----------: | :-----: | :-----: | :---: | :---------------------: | :------: |
| d1001 | 1538548685000 | 10.3 | 219 | 0.31 | California.SanFrancisco | 2 |
| d1002 | 1538548684000 | 10.2 | 220 | 0.23 | California.SanFrancisco | 3 |
| d1003 | 1538548686500 | 11.5 | 221 | 0.35 | California.LosAngeles | 3 |
| d1004 | 1538548685500 | 13.4 | 223 | 0.29 | California.LosAngeles | 2 |
| d1001 | 1538548695000 | 12.6 | 218 | 0.33 | California.SanFrancisco | 2 |
| d1004 | 1538548696600 | 11.8 | 221 | 0.28 | California.LosAngeles | 2 |
| d1002 | 1538548696650 | 10.3 | 218 | 0.25 | California.SanFrancisco | 3 |
| d1001 | 1538548696800 | 12.3 | 221 | 0.31 | California.SanFrancisco | 2 |
### Creating a Database
## Data Management
This section describes how to create databases, supertables, and tables to store your data in TDengine.
### Create a Database
You use the `CREATE DATABASE` statement to create a database:
The SQL to create a database for storing meter data is as follows:
```sql
CREATE DATABASE power PRECISION 'ms' KEEP 3650 DURATION 10 BUFFER 16;
```
The name of the database created is `power` and its parameters are explained as follows:
This SQL will create a database named `power`, with the following parameters explained:
- `PRECISION 'ms'`: The time-series data in this database uses millisecond-precision timestamps.
- `KEEP 3650`: The data in this database is retained for 3650 days. Any data older than 3650 days is automatically deleted.
- `DURATION 10`: Each data file contains 10 days of data.
- `BUFFER 16`: A 16 MB memory buffer is used for data ingestion.
- `PRECISION 'ms'`: This database uses millisecond (ms) precision timestamps for its time-series data
- `KEEP 3650`: The data in this database will be retained for 3650 days, and data older than 3650 days will be automatically deleted
- `DURATION 10`: Data for every 10 days is stored in one data file
- `BUFFER 16`: Writing uses a memory pool of size 16MB.
For a list of all database parameters, see [Manage Databases](../../tdengine-reference/sql-manual/manage-databases/).
You use the `USE` statement to set a current database:
After creating the power database, you can execute the USE statement to switch databases.
```sql
USE power;
use power;
```
This SQL statement switches the current database to `power`, meaning that subsequent statements are performed within the `power` database.
This SQL switches the current database to `power`, indicating that subsequent insertions, queries, and other operations will be performed in the current `power` database.
### Create a Supertable
### Creating a Supertable
You use the `CREATE STABLE` statement to create a supertable.
The SQL to create a supertable named `meters` is as follows:
```sql
CREATE STABLE meters (
ts TIMESTAMP,
current FLOAT,
voltage INT,
phase FLOAT
ts timestamp,
current float,
voltage int,
phase float
) TAGS (
location VARCHAR(64),
group_id INT
location varchar(64),
group_id int
);
```
The name of the supertable created is `meters` and the parameters following the name define the columns in the supertable. Each column is defined as a name and a data type. The first group of columns are metrics and the second group, following the `TAGS` keyword, are tags.
In TDengine, the SQL statement to create a supertable is similar to that in relational databases. For example, in the SQL above, `CREATE STABLE` is the keyword, indicating the creation of a supertable; then, `meters` is the name of the supertable; in the parentheses following the table name, the columns of the supertable are defined (column names, data types, etc.), with the following rules:
:::note
1. The first column must be a timestamp column. For example: `ts timestamp` indicates that the timestamp column name is `ts`, and its data type is `timestamp`;
2. Starting from the second column are the measurement columns. The data types of measurements can be integer, float, string, etc. For example: `current float` indicates that the measurement current `current`, data type is `float`;
- The first metric column must be of type `TIMESTAMP`.
- Metric columns and tag columns cannot have the same name.
Finally, TAGS is a keyword, indicating tags, and in the parentheses following TAGS, the tags of the supertable are defined (tag names, data types, etc.).
:::
1. The data type of tags can be integer, float, string, etc. For example: `location varchar(64)` indicates that the tag region `location`, data type is `varchar(64)`;
2. The names of tags cannot be the same as the names of measurement columns.
### Create a Subtable
### Creating a Table
You use the `CREATE TABLE` statement with the `USING` keyword to create a subtable:
The SQL to create a subtable `d1001` using the supertable is as follows:
```sql
CREATE TABLE d1001
@ -157,29 +164,65 @@ USING meters (
);
```
The name of the subtable created is `d1001` and it is created within the `meters` supertable. The `location` and `group_id` tag columns are used in this subtable, and their values are set to `California.SanFrancisco` and `2`, respectively.
In the SQL above, `CREATE TABLE` is a keyword indicating the creation of a table; `d1001` is the name of the subtable; `USING` is a keyword indicating the use of a supertable as a template; `meters` is the name of the supertable; in the parentheses following the supertable name, `location`, `group_id` are the names of the tag columns of the supertable; `TAGS` is a keyword, and the values of the tag columns for the subtable are specified in the following parentheses. `"California.SanFrancisco"` and `2` indicate that the location of subtable `d1001` is `California.SanFrancisco`, and the group ID is `2`.
Note that when creating a subtable, you can specify values for all or a subset of tag columns in the target supertable. However, these tag columns must already exist within the supertable.
When performing write or query operations on a supertable, users can use the pseudocolumn `tbname` to specify or output the name of the corresponding subtable.
### Create a Basic Table
### Automatic Table Creation
You use the `CREATE TABLE` statement to create a basic table.
In TDengine, to simplify user operations and ensure smooth data entry, even if a subtable does not exist, users can use the automatic table creation SQL with the `using` keyword to write data. This mechanism allows the system to automatically create the subtable when it encounters a non-existent subtable, and then perform the data writing operation. If the subtable already exists, the system will write the data directly without any additional steps.
The SQL for writing data while automatically creating tables is as follows:
```sql
CREATE TABLE d1003(
ts TIMESTAMP,
current FLOAT,
voltage INT,
phase FLOAT,
location VARCHAR(64),
group_id INT
INSERT INTO d1002
USING meters
TAGS (
"California.SanFrancisco",
2
) VALUES (
NOW,
10.2,
219,
0.32
);
```
The name of the basic table is `d1003` and it includes the columns `ts`, `current`, `voltage`, `phase`, `location`, and `group_id`. Note that this table is not associated with any supertable and its metric and tag columns are not separate.
In the SQL above, `INSERT INTO d1002` indicates writing data into the subtable `d1002`; `USING meters` indicates using the supertable `meters` as a template; `TAGS ("California.SanFrancisco", 2)` indicates the tag values for subtable `d1002` are `California.SanFrancisco` and `2`; `VALUES (NOW, 10.2, 219, 0.32)` indicates inserting a record into subtable `d1002` with values NOW (current timestamp), 10.2 (current), 219 (voltage), 0.32 (phase). When TDengine executes this SQL, if subtable `d1002` already exists, it writes the data directly; if subtable `d1002` does not exist, it first automatically creates the subtable, then writes the data.
## Multi-Column Model vs. Single-Column Model
### Creating Basic Tables
Typically, each supertable in TDengine contains multiple columns, one for each metric and one for each tag. However, in certain scenarios, it can be preferable to create supertables that contain only one column.
In TDengine, apart from subtables with tags, there are also basic tables without any tags. These tables are similar to tables in traditional relational databases, and users can create them using SQL.
For example, when the types of metrics collected by a DCP frequently change, the standard multi-column model would require frequent modifications to the schema of the supertable. In this situation, creating one supertable per metric may offer improved performance.
The differences between basic tables and subtables are:
1. Tag Extensibility: Subtables add static tags on top of basic tables, allowing them to carry more metadata. Additionally, the tags of subtables are mutable, and users can add, delete, or modify tags as needed.
2. Table Ownership: Subtables always belong to a supertable and are part of it. Basic tables, however, exist independently and do not belong to any supertable.
3. Conversion Restrictions: In TDengine, basic tables cannot be directly converted into subtables, and likewise, subtables cannot be converted into basic tables. These two types of tables determine their structure and properties at creation and cannot be changed later.
In summary, basic tables provide functionality similar to traditional relational database tables, while subtables introduce a tagging mechanism, offering richer descriptions and more flexible management for time-series data. Users can choose to create basic tables or subtables based on actual needs.
The SQL for creating an basic table without any tags is as follows:
```sql
CREATE TABLE d1003(
ts timestamp,
current float,
voltage int,
phase float,
location varchar(64),
group_id int
);
```
The SQL above indicates the creation of the basic table `d1003`, with a structure including columns `ts`, `current`, `voltage`, `phase`, `location`, `group_id`, totaling 6 columns. This data model is completely consistent with relational databases.
Using basic tables as the data model means that static tag data (such as location and group_id) will be repeatedly stored in each row of the table. This approach not only increases storage space consumption but also significantly lowers query performance compared to using a supertable data model, as it cannot directly utilize tag data for filtering.
### Multi-Column Model vs. Single-Column Model
TDengine supports flexible data model designs, including multi-column and single-column models. The multi-column model allows multiple physical quantities collected simultaneously from the same data collection point with the same timestamp to be stored in different columns of the same supertable. However, in some extreme cases, a single-column model might be used, where each collected physical quantity is established in a separate table. For example, for the three physical quantities of current, voltage, and phase, three separate supertables might be established.
Although TDengine recommends using the multi-column model because it generally offers better writing and storage efficiency, the single-column model might be more suitable in certain specific scenarios. For example, if the types of quantities collected at a data collection point frequently change, using a multi-column model would require frequent modifications to the supertable's structural definition, increasing the complexity of the application. In such cases, using a single-column model can simplify the design and management of the application, as it allows independent management and expansion of each physical quantity's supertable.
Overall, TDengine offers flexible data model options, allowing users to choose the most suitable model based on actual needs and scenarios to optimize performance and manage complexity.

View File

@ -1,61 +1,63 @@
---
sidebar_label: Data Ingestion
title: Ingest, Update, and Delete Data
title: Data Ingestion
slug: /basic-features/data-ingestion
---
This document describes how to insert, update, and delete data using SQL. The databases and tables used as examples in this document are defined in [Sample Data](../data-model/#sample-data).
This chapter uses the data model of smart meters as an example to introduce how to write, update, and delete time-series data in TDengine using SQL.
TDengine can also ingest data from various data collection tools. For more information, see [Integrate with Data Collection Tools](../../third-party-tools/data-collection/).
## Writing
## Ingest Data
In TDengine, you can write time-series data using the SQL insert statement.
You use the `INSERT` statement to ingest data into TDengine. You can ingest one or more records into one or more tables.
### Writing One Record at a Time
### Insert a Record
Assume that the smart meter with device ID d1001 collected data on October 3, 2018, at 14:38:05: current 10.3A, voltage 219V, phase 0.31. We have already created a subtable d1001 belonging to the supertable meters in the TDengine's power database. Next, you can write time-series data into the subtable d1001 using the following insert statement.
The following SQL statement inserts one record into the `d1001` subtable:
1. You can write time-series data into the subtable d1001 using the following INSERT statement.
```sql
INSERT INTO d1001 (ts, current, voltage, phase) VALUES ("2018-10-03 14:38:05", 10.3, 219, 0.31);
insert into d1001 (ts, current, voltage, phase) values ( "2018-10-03 14:38:05", 10.3, 219, 0.31)
```
In this example, a smart meter with device ID `d1001` collected data on October 3, 2018 at 14:38:05. The data collected indicated a current of 10.3 A, voltage of 219 V, and phase of 0.31. The SQL statement provided inserts data into the `ts`, `current`, `voltage`, and `phase` columns of subtable `d1001` with the values `2018-10-03 14:38:05`, `10.3`, `219`, and `0.31`, respectively.
The above SQL writes `2018-10-03 14:38:05`, `10.3`, `219`, `0.31` into the columns `ts`, `current`, `voltage`, `phase` of the subtable `d1001`.
Note that when inserting data into every column of a subtable at once, you can omit the column list. The following SQL statement therefore achieves the same result:
2. When the `VALUES` part of the `INSERT` statement includes all columns of the table, the list of fields before `VALUES` can be omitted, as shown in the following SQL statement, which has the same effect as the previous INSERT statement specifying columns.
```sql
INSERT INTO d1001 VALUES ("2018-10-03 14:38:05", 10.3, 219, 0.31);
insert into d1001 values("2018-10-03 14:38:05", 10.3, 219, 0.31)
```
Also note that timestamps can be inserted in Unix time if desired:
3. For the table's timestamp column (the first column), you can also directly use the timestamp of the database precision.
```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);
```
### Insert Multiple Records
The effects of the above three SQL statements are exactly the same.
The following SQL statement inserts multiple records into the `d1001` subtable:
### Writing Multiple Records at Once
Assume that the smart meter with device ID d1001 collects data every 10s and reports data every 30s, i.e., it needs to write 3 records every 30s. Users can write multiple records in one insert statement. The following SQL writes a total of 3 records.
```sql
INSERT INTO d1001 VALUES
("2018-10-03 14:38:05", 10.2, 220, 0.23),
("2018-10-03 14:38:15", 12.6, 218, 0.33),
("2018-10-03 14:38:25", 12.3, 221, 0.31);
insert into d1001 values
( "2018-10-03 14:38:05", 10.2, 220, 0.23),
( "2018-10-03 14:38:15", 12.6, 218, 0.33),
( "2018-10-03 14:38:25", 12.3, 221, 0.31)
```
This method can be useful in scenarios where a data collection point (DCP) collects data faster than it reports data. In this example, the smart meter with device ID `d1001` collects data every 10 seconds but reports data every 30 seconds, meaning that three records need to be inserted every 30 seconds.
The above SQL writes a total of three records.
### Insert into Multiple Tables
### Writing to Multiple Tables at Once
The following SQL statement inserts three records each into the `d1001`, `d1002`, and `d1003` subtables:
Assume that the smart meters with device IDs d1001, d1002, and d1003, all need to write 3 records every 30 seconds. For such cases, TDengine supports writing multiple records to multiple tables at once.
```sql
INSERT INTO d1001 VALUES
("2018-10-03 14:38:05", 10.2, 220, 0.23),
("2018-10-03 14:38:15", 12.6, 218, 0.33),
("2018-10-03 14:38:25", 12.3, 221, 0.31)
("2018-10-03 14:38:25", 12.3, 221, 0.31)
d1002 VALUES
("2018-10-03 14:38:04", 10.2, 220, 0.23),
("2018-10-03 14:38:14", 10.3, 218, 0.25),
@ -63,65 +65,74 @@ d1002 VALUES
d1003 VALUES
("2018-10-03 14:38:06", 11.5, 221, 0.35),
("2018-10-03 14:38:16", 10.4, 220, 0.36),
("2018-10-03 14:38:26", 10.3, 220, 0.33);
("2018-10-03 14:38:26", 10.3, 220, 0.33)
;
```
### Insert into Specific Columns
The above SQL writes a total of nine records.
The following SQL statement inserts a record containing only the `ts`, `voltage`, and `phase` columns into the `d1004`subtable:
### Specifying Columns for Writing
You can write data to specific columns of a table by specifying columns. Columns not appearing in the SQL will be automatically filled with NULL values. Note that the timestamp column must be present, and its value cannot be NULL. The following SQL writes one record to the subtable d1004. This record only includes voltage and phase, with the current value being NULL.
```sql
INSERT INTO d1004 (ts, voltage, phase) VALUES ("2018-10-04 14:38:06", 223, 0.29);
insert into d1004 (ts, voltage, phase) values("2018-10-04 14:38:06", 223, 0.29)
```
A `NULL` value is written to any columns not included in the `INSERT` statement. Note that the timestamp column cannot be omitted and cannot be null.
### Automatic Table Creation on Insert
### Create Subtable on Insert
It is not necessary to create subtables in advance. You can use the `INSERT` statement with the `USING` keyword to create subtables automatically:
Users can perform inserts using the `using` keyword for automatic table creation. If the subtable does not exist, it triggers automatic table creation before data insertion; if the subtable already exists, it directly inserts the data. An insert statement with automatic table creation can also specify only some tag columns for insertion, leaving the unspecified tag columns as NULL values. The following SQL inserts a record. If the subtable d1005 does not exist, it first creates the table automatically with the tag `group_id` value as NULL, then inserts the data.
```sql
INSERT INTO d1002 USING meters TAGS ("California.SanFrancisco", 2) VALUES (now, 10.2, 219, 0.32);
insert into d1005
using meters (location)
tags ( "beijing.chaoyang")
values ( "2018-10-04 14:38:07", 10.15, 217, 0.33)
```
If the subtable `d1002` already exists, the specified metrics are inserted into the subtable. If the subtable does not exist, it is created using the `meters` subtable with the specified tag values, and the specified metrics are then inserted into it. This can be useful when creating subtables programatically for new DCPs.
### Insert via Supertable
The following statement inserts a record into the `d1001` subtable via the `meters` supertable.
The insert statement with automatic table creation also supports inserting data into multiple tables in one statement. The following SQL uses an automatic table creation insert statement to insert 9 records.
```sql
INSERT INTO meters (tbname, ts, current, voltage, phase, location, group_id) VALUES ("d1001", "2018-10-03 14:38:05", 10.2, 220, 0.23, "California.SanFrancisco", 2);
INSERT INTO d1001 USING meters TAGS ("California.SanFrancisco", 2) VALUES
("2018-10-03 14:38:05", 10.2, 220, 0.23),
("2018-10-03 14:38:15", 12.6, 218, 0.33),
("2018-10-03 14:38:25", 12.3, 221, 0.31)
d1002 USING meters TAGS ("California.SanFrancisco", 3) VALUES
("2018-10-03 14:38:04", 10.2, 220, 0.23),
("2018-10-03 14:38:14", 10.3, 218, 0.25),
("2018-10-03 14:38:24", 10.1, 220, 0.22)
d1003 USING meters TAGS ("California.LosAngeles", 2) VALUES
("2018-10-03 14:38:06", 11.5, 221, 0.35),
("2018-10-03 14:38:16", 10.4, 220, 0.36),
("2018-10-03 14:38:26", 10.3, 220, 0.33)
;
```
Note that the data is not stored in the supertable itself, but in the subtable specified as the value of the `tbname` column.
### Inserting Through Supertables
## Update Data
TDengine also supports direct data insertion into supertables. It is important to note that a supertable is a template and does not store data itself; the data is stored in the corresponding subtables. The following SQL inserts a record into the subtable d1001 by specifying the tbname column.
You can update existing metric data by writing a new record with the same timestamp as the record that you want to replace:
```sql
insert into meters (tbname, ts, current, voltage, phase, location, group_id)
values( "d1001, "2018-10-03 14:38:05", 10.2, 220, 0.23, "California.SanFrancisco", 2)
```
### Zero-Code Insertion
To facilitate easy data insertion for users, TDengine has seamlessly integrated with many well-known third-party tools, including Telegraf, Prometheus, EMQX, StatsD, collectd, and HiveMQ. Users only need to perform simple configurations on these tools to easily import data into TDengine. Additionally, TDengine Enterprise offers a variety of connectors, such as MQTT, OPC, AVEVA PI System, Wonderware, Kafka, MySQL, Oracle, etc. By configuring the corresponding connection information on the TDengine side, users can efficiently write data from different data sources into TDengine without writing any code.
## Update
Data in time-series can be updated by inserting a record with a duplicate timestamp; the newly inserted data will replace the old values. The following SQL, by specifying columns, inserts 1 row of data into the subtable `d1001`; when there is already data with the datetime `2018-10-03 14:38:05` in subtable `d1001`, the new `current` (current) value 22 will replace the old value.
```sql
INSERT INTO d1001 (ts, current) VALUES ("2018-10-03 14:38:05", 22);
```
This SQL statement updates the value of the `current` column at the specified time to `22`.
## Delete
## Delete Data
TDengine automatically deletes expired data based on the retention period configured for your database. However, if necessary, you can manually delete data from a table.
:::warning
Deleted data cannot be recovered. Exercise caution when deleting data.
Before deleting data, run a `SELECT` statement with the same `WHERE` condition to query the data that you want to delete. Confirm that you want to delete all data returned by the `SELECT` statement, and only then run the `DELETE` statement.
:::
The following SQL statement deletes all data from supertable `meters` whose timestamp is earlier than 2021-10-01 10:40:00.100.
To facilitate the cleanup of abnormal data caused by equipment failures and other reasons, TDengine supports deleting time-series data based on timestamps. The following SQL deletes all data in the supertable `meters` with timestamps earlier than `2021-10-01 10:40:00.100`. Data deletion is irreversible, so use it with caution. To ensure that the data being deleted is indeed what you want to delete, it is recommended to first use a select statement with the deletion condition in the where clause to view the data to be deleted, and confirm it is correct before executing delete.
```sql
DELETE FROM meters WHERE ts < '2021-10-01 10:40:00.100';
delete from meters where ts < '2021-10-01 10:40:00.100' ;
```
Note that when deleting data, you can filter only on the timestamp column. Other filtering conditions are not supported.

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,9 @@
---
title: Basic Features
description: 'TDengine Basic Features'
slug: /basic-features
---
This section describes the basic features of TDengine, including how to manage databases and tables, how to ingest data, and how to query data stored in TDengine.
This chapter mainly introduces the data model of TDengine as well as its write and query functions.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -3,31 +3,131 @@ title: Data Subscription
slug: /advanced-features/data-subscription
---
TDengine provides Kafka-like publish/subscribe data subscription as a built-in component. You create topics in TDengine using SQL statements, and your applications can subscribe to your topics as consumers.
To meet the needs of applications to obtain data written to TDengine in real-time, or to process data in the order of event arrival, TDengine provides data subscription and consumption interfaces similar to those of message queue products. In many scenarios, by adopting TDengine's time-series big data platform, there is no need to integrate additional message queue products, thus simplifying application design and reducing maintenance costs.
TDengine's message queue provides an ACK (Acknowledgment) mechanism to ensure at least once consumption in complex environments such as crashes and restarts.
Similar to Kafka, users need to define topics in TDengine. However, a topic in TDengine can be a database, a supertable, or based on existing supertables, subtables, or basic tables with specific query conditions, i.e., a query statement. Users can use SQL to filter by tags, table names, columns, expressions, etc., and perform scalar function and UDF computations (excluding data aggregation). Compared to other message queue tools, this is the biggest advantage of TDengine's data subscription feature. It offers greater flexibility; the granularity of the data is determined by the SQL defining the topic, and the filtering and preprocessing of data are automatically handled by TDengine, reducing the amount of data transmitted and simplifying application complexity.
To achieve the above functionality, TDengine automatically creates indexes for the Write-Ahead Logging (WAL) files to support fast random access and provides a flexible and configurable file switching and retention mechanism. Users can specify the retention time and size of the WAL files according to their needs. Through these methods, the WAL is transformed into a persistent storage engine that preserves the order of event arrival. For queries created in the form of topics, TDengine will read data from the WAL. During consumption, TDengine reads data directly from the WAL based on the current consumption progress and uses a unified query engine to perform filtering, transformation, and other operations before pushing the data to consumers.
After subscribing to a topic, consumers can receive the latest data in real-time. Multiple consumers can form a consumption group to share consumption progress, enabling multi-threaded, distributed data consumption to increase consumption speed. Consumers in different consumption groups do not share consumption progress even if they consume the same topic. A consumer can subscribe to multiple topics. If the topic corresponds to a supertable or database, the data may be distributed across multiple different nodes or data shards. When there are multiple consumers in a consumption group, consumption efficiency can be improved. TDengine's message queue provides an ACK (Acknowledgment) mechanism to ensure at least once consumption in complex environments such as crashes and restarts.
To implement the above functions, TDengine automatically creates indexes for Write-Ahead Logging (WAL) files to support fast random access and provides flexible and configurable file switching and retention mechanisms. Users can specify the retention time and size of WAL files according to their needs. Through these methods, WAL is transformed into a persistent storage engine that retains the order of event arrival. For queries created in the form of topics, TDengine reads data from WAL. During consumption, TDengine reads data directly from WAL based on the current consumption progress, performs filtering, transformation, and other operations using a unified query engine, and then pushes the data to consumers.
Starting from version 3.2.0.0, data subscription supports vnode migration and splitting. Due to the dependence of data subscription on wal files, wal does not synchronize during vnode migration and splitting. Therefore, after migration or splitting, wal data that has not been consumed before cannot be consumed. So please ensure that all data has been consumed before proceeding with vnode migration or splitting, otherwise data loss may occur during consumption.
## Topics
A topic can be a query, a supertable, or a database. You can filter by tag, table name, column, or expression and perform scalar operations. Note that data aggregation and time windows are not supported. The data granularity is determined by the SQL statement that defines the topic, and data filtering and preprocessing are automatically handled by TDengine.
TDengine uses SQL to create three types of topics, which are introduced below.
For more information about topics, see [Create a Topic](../../tdengine-reference/sql-manual/manage-topics-and-consumer-groups/#create-a-topic).
### Query Topic
## Consumers and Consumer Groups
Subscribe to the results of an SQL query, essentially a continuous query, returning only the latest values each time, with the following creation syntax:
Consumers that subscribe to a topic receive the latest data in real time. A single consumer can subscribe to multiple topics. If the topic corresponds to a supertable or database, the data may be distributed across multiple different nodes or data shards.
```sql
CREATE TOPIC [IF NOT EXISTS] topic_name as subquery
```
You can also create consumer groups to enable multithreaded, distributed data consumption. The consumers in a consumer group share consumption progress, while consumers in different consumer groups do not share consumption progress even if they consume the same topic.
This SQL subscribes through a SELECT statement (including SELECT *, or specific query subscriptions like SELECT ts, c1, with condition filtering, scalar function computations, but does not support aggregate functions or time window aggregation). Note that:
Consumers and consumer groups are created in your applications, not in TDengine. For more information, see [Manage Consumers](../../developer-guide/manage-consumers/). To delete consumer groups from TDengine or view existing consumer groups, see [Manage Consumer Groups](../../tdengine-reference/sql-manual/manage-topics-and-consumer-groups/#manage-consumer-groups).
1. Once this type of TOPIC is created, the structure of the subscribed data is fixed.
2. Columns or tags that are subscribed to or used for calculations cannot be deleted (ALTER table DROP) or modified (ALTER table MODIFY).
3. If table structure changes occur, newly added columns will not appear in the results.
4. For select *, it subscribes to all columns at the time of creation (data columns for subtables and basic tables, data columns plus tag columns for supertables).
## Replay
Suppose you need to subscribe to data where the voltage value in all smart meters is greater than 200, and only return the timestamp, current, and voltage (not phase), then you can create the topic power_topic with the following SQL.
You can replay data streams in the order of their actual write times. To replay a data stream, specify a time range in the query statement to control the start and end times for the replay.
```sql
CREATE TOPIC power_topic AS SELECT ts, current, voltage FROM power.meters WHERE voltage > 200;
```
For example, assume the following three records have been written to the database. During replay, the first record is returned immediately, the second record is returned 5 seconds later, and the third record is returned 3 seconds after the second record.
### Supertable Topic
Subscribe to all data in a supertable, with the following syntax:
```sql
CREATE TOPIC [IF NOT EXISTS] topic_name [with meta] AS STABLE stb_name [where_condition]
```
The difference from subscribing using `SELECT * from stbName` is:
1. It does not restrict user table structure changes, i.e., both structure changes and new data after changes can be subscribed to.
2. It returns unstructured data, and the structure of the returned data will change with the structure of the supertable.
3. The with meta parameter is optional; when selected, it returns statements for creating supertables, subtables, etc., mainly used for supertable migration in taosx.
4. The where_condition parameter is optional; when selected, it will be used to filter subtables that meet the conditions, subscribing to these subtables. The where condition cannot include ordinary columns, only tags or tbname, and functions can be used to filter tags, but not aggregate functions, as subtable tag values cannot be aggregated. It can also be a constant expression, such as 2 > 1 (subscribe to all subtables), or false (subscribe to 0 subtables).
5. Returned data does not include tags.
### Database Topics
Subscribe to all data in a database, with the syntax as follows:
```sql
CREATE TOPIC [IF NOT EXISTS] topic_name [with meta] AS DATABASE db_name;
```
This statement creates a subscription that includes all table data in the database:
1. The `with meta` parameter is optional. When selected, it will return the creation, deletion, and modification statements of all supertables, subtables, and basic tables' metadata in the database, mainly used for database migration in taosx.
2. Subscriptions to supertables and databases are advanced subscription modes and are prone to errors. If you really need to use them, please consult technical support personnel.
## Delete Topic
If you no longer need to subscribe to the data, you can delete the topic. Note that only topics that are not currently subscribed can be deleted.
```sql
DROP TOPIC [IF EXISTS] topic_name;
```
## View Topics
```sql
SHOW TOPICS;
```
The above SQL will display information about all topics under the current database.
## Consumers
### Creating Consumers
Consumers can only be created through the TDengine client driver or APIs provided by connectors. For details, refer to the development guide or reference manual.
### View Consumers
```sql
SHOW CONSUMERS;
```
Displays information about all consumers in the current database, including the consumer's status, creation time, etc.
### Delete Consumer Group
When creating a consumer, a consumer group is assigned to the consumer. Consumers cannot be explicitly deleted, but the consumer group can be deleted with the following statement when there are no consumers in the group:
```sql
DROP CONSUMER GROUP [IF EXISTS] cgroup_name ON topic_name;
```
## Data Subscription
### View Subscription Information
```sql
SHOW SUBSCRIPTIONS;
```
Displays consumption information of the topic on different vgroups, useful for viewing consumption progress.
### Subscribe to Data
TDengine provides comprehensive and rich data subscription APIs, aimed at meeting data subscription needs under different programming languages and frameworks. These interfaces include but are not limited to creating consumers, subscribing to topics, unsubscribing, obtaining real-time data, submitting consumption progress, and getting and setting consumption progress. Currently, TDengine supports a variety of mainstream programming languages, including C, Java, Go, Rust, Python, and C#, enabling developers to easily use TDengine's data subscription features in various application scenarios.
It is worth mentioning that TDengine's data subscription APIs are highly consistent with the popular Kafka subscription APIs in the industry, making it easy for developers to get started and leverage their existing knowledge and experience. To facilitate user understanding and reference, TDengine's official documentation provides detailed descriptions and example codes of various APIs, which can be accessed in the connectors section of the TDengine official website. Through these APIs, developers can efficiently implement real-time data subscription and processing to meet data handling needs in various complex scenarios.
### Replay Feature
TDengine's data subscription feature supports a replay function, allowing users to replay the data stream in the actual order of data writing. This feature is based on TDengine's efficient WAL mechanism, ensuring data consistency and reliability.
To use the data subscription's replay feature, users can specify the time range in the query statement to precisely control the start and end times of the replay. This allows users to easily replay data within a specific time period, whether for troubleshooting, data analysis, or other purposes.
If the following 3 data entries were written, then during replay, the first entry is returned first, followed by the second entry after 5 seconds, and the third entry 3 seconds after obtaining the second entry.
```text
2023/09/22 00:00:00.000
@ -35,8 +135,8 @@ For example, assume the following three records have been written to the databas
2023/09/22 00:00:08.000
```
The following conditions apply to replay:
When using the data subscription's replay feature, note the following:
- Replay supports query topics only. You cannot use replay with supertable or database topics.
- The replay function of data subscription only supports data playback for query subscriptions; supertable and database subscriptions do not support playback.
- Replay does not support progress saving.
- Replay precision may be delayed by several dozen milliseconds due to data processing.
- Because data playback itself requires processing time, there is a precision error of several tens of milliseconds in playback.

View File

@ -3,121 +3,98 @@ title: Caching
slug: /advanced-features/caching
---
TDengine includes caching as a built-in component. This includes write caching, read caching, metadata caching, and file system caching.
In the big data applications of the Internet of Things (IoT) and the Industrial Internet of Things (IIoT), the value of real-time data often far exceeds that of historical data. Enterprises not only need data processing systems to have efficient real-time writing capabilities but also need to quickly obtain the latest status of devices or perform real-time calculations and analyses on the latest data. Whether it's monitoring the status of industrial equipment, tracking vehicle locations in the Internet of Vehicles, or real-time readings of smart meters, current values are indispensable core data in business operations. These data are directly related to production safety, operational efficiency, and user experience.
## Write Cache
For example, in industrial production, the current operating status of production line equipment is crucial. Operators need to monitor key indicators such as temperature, pressure, and speed in real-time. If there is an anomaly in the equipment, these data must be presented immediately so that process parameters can be quickly adjusted to avoid downtime or greater losses. In the field of the Internet of Vehicles, taking DiDi as an example, the real-time location data of vehicles is key to optimizing dispatch strategies and improving operational efficiency on the DiDi platform, ensuring that each passenger gets on the vehicle quickly and enjoys a higher quality travel experience.
TDengine uses a time-driven cache management strategy that prioritizes caching the most recently ingested data. When the size of the data stored in cache reaches a preset threshold, the earliest data in cache is written to disk.
At the same time, dashboard systems and smart meters, as windows for on-site operations and user ends, also need real-time data support. Whether it's factory managers obtaining real-time production indicators through dashboards or household users checking the usage of smart water and electricity meters at any time, real-time data not only affects operational and decision-making efficiency but also directly relates to user satisfaction with the service.
You can optimize database performance for your use case by specifying the number of vgroups in the database and the size of the write cache allocated to each vnode.
## Limitations of Traditional Caching Solutions
For example, the following SQL statement creates a database with 10 vgroups, with each vnode having a 256 MB write cache.
To meet these high-frequency real-time query needs, many enterprises choose to integrate caching technologies like Redis into their big data platforms, enhancing query performance by adding a caching layer between the database and applications. However, this approach also brings several problems:
```sql
CREATE DATABASE power VGROUPS 10 BUFFER 256 CACHEMODEL 'none' PAGES 128 PAGESIZE 16;
```
- Increased system complexity: Additional deployment and maintenance of the cache cluster are required, raising higher demands on system architecture.
- Rising operational costs: Additional hardware resources are needed to support the cache, increasing maintenance and management expenses.
- Consistency issues: Data synchronization between the cache and the database requires additional mechanisms to ensure consistency, otherwise data inconsistencies may occur.
Generally, a larger cache size results in improved performance. However, there exists a certain point at which further increasing the cache size has no significant effect on performance.
## TDengine's Solution: Built-in Read Cache
## Read Cache
To address these issues, TDengine has designed and implemented a read cache mechanism specifically for high-frequency real-time query scenarios in IoT and IIoT. This mechanism automatically caches the last record of each table in memory, thus meeting users' real-time query needs for current values without introducing third-party caching technologies.
You can configure TDengine databases to cache the most recent data of each subtable, allowing for faster queries. To do so, you specify a cache model for your database by setting the the `CACHEMODEL` parameter to one of the following values:
TDengine uses a time-driven cache management strategy, prioritizing the storage of the latest data in the cache, allowing for quick results without needing to access the hard disk. When the cache capacity reaches the set limit, the system will batch-write the earliest data to the disk, enhancing query efficiency and effectively reducing the disk's write load, thereby extending the hardware's lifespan.
- `none`: The read cache is disabled.
- `last_row`: The most recent row of data from each subtable is cached. The `LAST_ROW()` function will then retrieve this data from cache.
- `last_value`: The most recent non-null value for each column of each subtable is cached. The `LAST()` function will then retrieve this data from cache.
- `both`: The most recent row of each subtable and the most recent non-null value of each column of each subtable are cached. This simultaneously activates the behavior of both the `last_row` and `last_value` cache models.
Users can customize the cache mode by setting the `cachemodel` parameter, including caching the latest row of data, the most recent non-NULL value of each column, or caching both rows and columns. This flexible design is particularly important in IoT scenarios, making real-time queries of device status more efficient and accurate.
You can also configure the memory size for each vnode by specifying a value for the `CACHESIZE` parameter. This parameter can be set from 1 MB to 65536 MB. The default value is 1 MB.
This built-in read cache mechanism significantly reduces query latency, avoids the complexity and operational costs of introducing external systems like Redis, and reduces the pressure of frequent queries on the storage system, greatly enhancing the overall throughput of the system. It ensures stable and efficient operation even in high-concurrency scenarios. Through read caching, TDengine provides a more lightweight real-time data processing solution, not only optimizing query performance but also reducing overall operational costs, providing strong technical support for IoT and IIoT users.
## Metadata Cache
## TDengine's Read Cache Configuration
Each vnode caches metadata that it has previously accessed. The size of this metadata cache is determined by the `PAGES` and `PAGESIZE` parameters of the database. For example, the following SQL statement creates a database whose vnodes have a metadata cache of 128 pages with each page being 16 KB:
When creating a database, users can choose whether to enable the caching mechanism to store the latest data of each subtable in that database. This caching mechanism is controlled by the database creation parameter `cachemodel`. The parameter `cachemodel` has the following 4 options:
```sql
CREATE DATABASE power PAGES 128 PAGESIZE 16;
```
- none: no caching
- last_row: caches the most recent row of data from the subtable, significantly improving the performance of the `last_row` function
- last_value: caches the most recent non-NULL value of each column from the subtable, significantly improving the performance of the `last` function when there are no special effects (such as WHERE, ORDER BY, GROUP BY, INTERVAL)
- both: caches both the most recent row and column, equivalent to the behaviors of `last_row` and `last_value` simultaneously effective
## File System Cache
When using database read caching, the `cachesize` parameter can be used to configure the memory size for each vnode.
For reliability purposes, TDengine records changes in a write-ahead log (WAL) file before any data is written to the data storage layer. The `fsync` function is then called to write the data from the WAL to disk. You can control when the `fsync` function is called for a database by specifying the `WAL_LEVEL` and `WAL_FSYNC_PERIOD` parameters.
- cachesize: represents the memory size used to cache the most recent data of subtables in each vnode. The default is 1, the range is [1, 65536], in MB. It should be configured reasonably according to the machine memory.
- `WAL_LEVEL`:
- Specify `1` to wait for the operating system to call `fsync`. In this configuration, TDengine does not call `fsync` itself.
- Specify `2` for TDengine to call `fsync` at a certain interval, specified by the WAL_FSYNC_PERIOD parameter.
For specific database creation, related parameters, and operation instructions, please refer to [Creating a Database](../../tdengine-reference/sql-manual/manage-databases/)
The default value is `1`.
- `WAL_FSYNC_PERIOD`:
- Specify `0` to call `fsync` every time data is written to the WAL.
- Specify a value between `1` and `180000` milliseconds to call `fsync` each time this interval has elapsed.
## Caching Practices for Real-Time Data Queries
Note that this parameter takes effect only when `WAL_LEVEL` is set to `2`.
The following SQL statement creates a database in which data in the WAL is written to disk every 3000 milliseconds:
```sql
CREATE DATABASE power WAL_LEVEL 2 WAL_FSYNC_PERIOD 3000;
```
The default configuration of `WAL_VALUE 1` delivers the highest performance. In use cases where data reliability is a higher priority than performance, you can set `WAL_LEVEL` to `2`.
## Example: Enhancing Query Performance with Read Caching
This example demonstrates the performance improvements delivered by read caching. The sample data from [Data Querying](../../basic-features/data-querying/) is used in this section. This data is generated by the following command:
This section takes smart electric meters as an example to look in detail at how LAST caching improves the performance of real-time data queries. First, use the taosBenchmark tool to generate the time-series data of smart electric meters needed for this chapter.
```shell
taosBenchmark -d power -Q --start-timestamp=1600000000000 --tables=10000 --records=10000 --time-step=10000 -y
# taosBenchmark -d power -Q --start-timestamp=1600000000000 --tables=10000 --records=10000 --time-step=10000 -y
```
Note that read caching is disabled by default on the sample database generated by taosBenchmark.
The above command, the taosBenchmark tool in TDengine created a test database for electric meters named `power`, generating a total of 1 billion time-series data entries. The timestamp of the time-series data starts from `1600000000000 (2020-09-13T20:26:40+08:00)`, with the supertable `meters` containing 10,000 devices (subtables), each device having 10,000 data entries, and the data collection frequency is 10 seconds per entry.
1. To establish a performance baseline, run the following SQL statements:
To query the latest current and timestamp data of any electric meter, execute the following SQL:
```text
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.353815s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.344070s)
```
```sql
taos> select last(ts,current) from meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.353815s)
These return the most recent non-null value and the most recent row from any subtable in the `meters` supertable. It can be seen that these queries return in 353 and 344 milliseconds, respectively.
taos> select last_row(ts,current) from meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.344070s)
```
2. Enable read caching on the database:
If you want to use caching to query the latest timestamp data of any electric meter, execute the following SQL and check if the database cache is effective.
```text
taos> ALTER DATABASE power CACHEMODEL 'both';
Query OK, 0 row(s) affected (0.046092s)
taos> SHOW CREATE DATABASE power\G;
*************************** 1.row ***************************
Database: power
Create Database: CREATE DATABASE `power` BUFFER 256 CACHESIZE 1 CACHEMODEL 'both' COMP 2 DURATION 14400m WAL_FSYNC_PERIOD 3000 MAXROWS 4096 MINROWS 100 STT_TRIGGER 2 KEEP 5256000m,5256000m,5256000m PAGES 256 PAGESIZE 4 PRECISION 'ms' REPLICA 1 WAL_LEVEL 1 VGROUPS 10 SINGLE_STABLE 0 TABLE_PREFIX 0 TABLE_SUFFIX 0 TSDB_PAGESIZE 4 WAL_RETENTION_PERIOD 3600 WAL_RETENTION_SIZE 0 KEEP_TIME_OFFSET 0
Query OK, 1 row(s) in set (0.000282s)
```
```sql
taos> alter database power cachemodel 'both' ;
Query OK, 0 row(s) affected (0.046092s)
3. Run the two queries from Step 1 again:
taos> show create database power\G;
*************************** 1.row ***************************
Database: power
Create Database: CREATE DATABASE `power` BUFFER 256 CACHESIZE 1 CACHEMODEL 'both' COMP 2 DURATION 14400m WAL_FSYNC_P...
Query OK, 1 row(s) in set (0.000282s)
```
```text
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.044021s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.046682s)
```
Query the latest real-time data of the electric meter again; the first query will perform cache computation, significantly reducing the latency of subsequent queries.
It can be seen that these queries now return in 44 and 47 milliseconds, respectively. This indicates that read caching on this system produces an approximately 8-fold improvement in query performance.
```sql
taos> select last(ts,current) from meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.044021s)
taos> select last_row(ts,current) from meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.046682s)
```
As can be seen, the query latency has been reduced from 353/344ms to 44ms, an improvement of approximately 8 times.

View File

@ -6,152 +6,279 @@ slug: /advanced-features/stream-processing
import Image from '@theme/IdealImage';
import watermarkImg from '../assets/stream-processing-01.png';
TDengine includes stream processing as a built-in component. You define real-time stream transformations by using SQL statements. Data written to the source table of the stream is then automatically processed in the specified manner and written to the target supertable based on the specified trigger mode. This provides a lightweight alternative to complex stream processing systems while delivering results in milliseconds even under high-throughput conditions.
In the processing of time-series data, it is often necessary to clean and preprocess the raw data before using a time-series database for long-term storage. Moreover, it is common to use the original time-series data to generate new time-series data through calculations. In traditional time-series data solutions, it is often necessary to deploy systems like Kafka, Flink, etc., for stream processing. However, the complexity of stream processing systems brings high development and operational costs.
Streams can include data filtering, scalar functions (including UDFs), and windowing. The source table of a stream can be a supertable, subtable, or basic table, but the target must be a supertable. You can use the `PARTITION BY` clause to partition data by table name or tag, and each partition is written to a different subtable in the target supertable.
TDengine's stream computing engine provides the capability to process data streams in real-time as they are written. It uses SQL to define real-time stream transformations. Once data is written into the stream's source table, it is automatically processed in the defined manner and pushed to the destination table according to the defined trigger mode. It offers a lightweight solution that replaces complex stream processing systems and can provide millisecond-level computational result latency under high-throughput data writing scenarios.
Streams can aggregate data from supertables distributed across multiple nodes and can handle out-of-order data ingestion. You can specify a tolerance for out-of-order data by using a watermark and decide whether to discard or recompute such data with the `IGNORE EXPIRED` option.
Stream computing can include data filtering, scalar function computations (including UDFs), and window aggregation (supporting sliding windows, session windows, and state windows). It can use supertables, subtables, and basic tables as source tables, writing into destination supertables. When creating a stream, the destination supertable is automatically created, and newly inserted data is processed and written into it as defined by the stream. Using the `partition by` clause, partitions can be divided by table name or tags, and different partitions will be written into different subtables of the destination supertable.
## Managing Streams
TDengine's stream computing can support aggregation of supertables distributed across multiple nodes and can handle out-of-order data writing. It provides a watermark mechanism to measure the degree of tolerance for data disorder and offers an `ignore expired` configuration option to decide the handling strategy for out-of-order data — either discard or recalculate.
For information about creating and managing streams, see [Manage Streams](../../tdengine-reference/sql-manual/manage-streams/).
Below is a detailed introduction to the specific methods used in stream computing.
## Partitioning in Streams
## Creating Stream Computing
You can use the `PARTITION BY` clause with the `tbname` pseudocolumn, tag columns, regular columns, or expressions to perform partitioned computations in a stream. Each partition has its own independent timeline and time window, and data is aggregated separately and written to different subtables in the target supertable. In a stream without a `PARTITION BY` clause, all data is written to the same subtable.
The syntax is as follows:
A group ID is automatically generated for each partition. By default, the subtables created by a stream are named with this group ID. You can use the `SUBTABLE` clause to generate custom names for the subtable for each partition. For example:
```sql
CREATE STREAM [IF NOT EXISTS] stream_name [stream_options] INTO stb_name
[(field1_name, ...)] [TAGS (column_definition [, column_definition] ...)]
SUBTABLE(expression) AS subquery
stream_options: {
TRIGGER [AT_ONCE | WINDOW_CLOSE | MAX_DELAY time]
WATERMARK time
IGNORE EXPIRED [0|1]
DELETE_MARK time
FILL_HISTORY [0|1]
IGNORE UPDATE [0|1]
}
column_definition:
col_name col_type [COMMENT 'string_value']
```
The subquery is a subset of the regular query syntax.
```sql
subquery: SELECT select_list
from_clause
[WHERE condition]
[PARTITION BY tag_list]
[window_clause]
window_clause: {
SESSION(ts_col, tol_val)
| STATE_WINDOW(col)
| INTERVAL(interval_val [, interval_offset]) [SLIDING (sliding_val)]
| EVENT_WINDOW START WITH start_trigger_condition END WITH end_trigger_condition
| COUNT_WINDOW(count_val[, sliding_val])
}
```
The subquery supports session windows, state windows, and sliding windows. When used with supertables, session windows and state windows must be used together with `partition by tbname`.
1. SESSION is a session window, where tol_val is the maximum range of the time interval. All data within the tol_val time interval belong to the same window. If the time interval between two consecutive data points exceeds tol_val, the next window automatically starts.
2. EVENT_WINDOW is an event window, defined by start and end conditions. The window starts when the start_trigger_condition is met and closes when the end_trigger_condition is met. start_trigger_condition and end_trigger_condition can be any condition expressions supported by TDengine and can include different columns.
3. COUNT_WINDOW is a counting window, divided by a fixed number of data rows. count_val is a constant, a positive integer, and must be at least 2 and less than 2147483648. count_val represents the maximum number of data rows in each COUNT_WINDOW. If the total number of data rows cannot be evenly divided by count_val, the last window will have fewer rows than count_val. sliding_val is a constant, representing the number of rows the window slides, similar to the SLIDING in INTERVAL.
The definition of a window is exactly the same as in the time-series data window query, for details refer to the TDengine window functions section.
The following SQL will create a stream computation. After execution, TDengine will automatically create a supertable named avg_vol. This stream computation uses a 1min time window and a 30s forward increment to calculate the average voltage of these smart meters, and writes the results from the meters data into avg_vol. Data from different partitions will be written into separate subtables.
```sql
CREATE STREAM avg_vol_s INTO avg_vol AS
SELECT _wstart, count(*), avg(voltage) FROM power.meters PARTITION BY tbname INTERVAL(1m) SLIDING(30s);
```
The explanations of the relevant parameters involved in this section are as follows.
- stb_name is the table name of the supertable where the computation results are saved. If this supertable does not exist, it will be automatically created; if it already exists, the column schema information will be checked. See section 6.3.8.
- The tags clause defines the rules for creating tags in the stream computation. Through the tags field, custom tag values can be generated for each partition's corresponding subtable.
## Rules and Strategies for Stream Computation
### Partitioning in Stream Computation
In TDengine, we can use the partition by clause combined with tbname, tag columns, ordinary columns, or expressions to perform multi-partition computations on a stream. Each partition has its own timeline and time window, and they will aggregate data separately and write the results into different subtables of the destination table. If the partition by clause is not used, all data will be written into the same subtable by default.
Specifically, partition by + tbname is a very practical operation, which means performing stream computation for each subtable. The advantage of this is that it allows for customized processing based on the characteristics of each subtable, thereby improving computational efficiency.
When creating a stream, if the substable clause is not used, the supertable created by the stream computation will contain a unique tag column groupId. Each partition will be assigned a unique groupId, and the corresponding subtable name will be calculated using the MD5 algorithm. TDengine will automatically create these subtables to store the computation results of each partition. This mechanism makes data management more flexible and efficient, and also facilitates subsequent data querying and analysis.
If the statement for creating the stream contains a substable clause, users can generate custom table names for each partition's corresponding subtable. Example as follows.
```sql
CREATE STREAM avg_vol_s INTO avg_vol SUBTABLE(CONCAT('new-', tname)) AS SELECT _wstart, count(*), avg(voltage) FROM meters PARTITION BY tbname tname INTERVAL(1m);
```
This statement creates subtables using the naming convention `new-<subtable-name>_<supertable-name>_<group-id>`.
In the PARTITION clause, an alias tname is defined for tbname, and the alias in the PARTITION clause can be used for expression calculation in the SUBTABLE clause. In the example above, the rule for newly created subtables is new- + subtable name + _supertable name +_groupId.
:::info[Version Info]
**Note**: If the length of the subtable name exceeds the limit of TDengine, it will be truncated. If the subtable name to be generated already exists in another supertable, since TDengine's subtable names are unique, the creation of the corresponding new subtable and the writing of data will fail.
Prior to TDengine 3.2.3.0, the supertable name and group ID were not appended to the name defined in the `SUBTABLE` clause. Therefore the naming convention in this example would be `new-<subtable-name>` in earlier versions.
### Stream Computation Processing Historical Data
:::
Under normal circumstances, stream computation tasks will not process data that was written to the source table before the stream was created. This is because the trigger for stream computation is based on newly written data, not existing data. However, if we need to process these existing historical data, we can set the fill_history option to 1 when creating the stream.
:::note
By enabling the fill_history option, the created stream computation task will be capable of processing data written before, during, and after the creation of the stream. This means that data written either before or after the creation of the stream will be included in the scope of stream computation, thus ensuring data integrity and consistency. This setting provides users with greater flexibility, allowing them to flexibly handle historical and new data according to actual needs.
- `tname` is an alias of `tbname` for use in expressions within the `SUBTABLE` clause.
- Subtable names that exceed the table name limit of 192 bytes are truncated.
- If the generated subtable name is not unique within the database, it will fail to be created and data will not be written to it.
:::
## Handling Historical Data
By default, a stream processes only data ingested after the stream is created. If you want a stream to process pre-existing data, you can specify the `FILL_HISTORY 1` parameter. This parameter enables streams to process data ingested at any time before, during, or after the creation of the stream.
For example, the following SQL statement creates a stream that counts the number of records generated by all smart meters every 10 seconds, including all historical data:
For example, create a stream to count the number of data entries generated by all smart meters every 10s, and also calculate historical data. SQL as follows:
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters INTERVAL(10s);
create stream if not exists count_history_s fill_history 1 into count_history as select count(*) from power.meters interval(10s)
```
You can also specify a time range. For example, the following SQL statement processes records after January 30, 2020:
Combined with the fill_history 1 option, it is possible to process data only within a specific historical time range, such as data after a historical moment (January 30, 2020).
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters WHERE ts > '2020-01-30' INTERVAL(10s);
create stream if not exists count_history_s fill_history 1 into count_history as select count(*) from power.meters where ts > '2020-01-30' interval(10s)
```
The following statement processes records between January 30, 2020 and January 1, 2023. Note that you can specify an end time in the future.
For instance, to process data within a specific time period, the end time can be a future date.
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters WHERE ts > '2020-01-30' AND ts < '2023-01-01' INTERVAL(10s);
create stream if not exists count_history_s fill_history 1 into count_history as select count(*) from power.meters where ts > '2020-01-30' and ts < '2023-01-01' interval(10s)
```
:::note
If the stream task has completely expired and you no longer want it to monitor or process data, you can manually delete it, and the computed data will still be retained.
A stream can process a maximum of 20 million records. Exceeding this limit will cause an error.
### Trigger Modes for Stream Computing
:::
When creating a stream, you can specify the trigger mode of stream computing through the TRIGGER command. For non-window computations, the trigger is real-time; for window computations, there are currently 4 trigger modes, with WINDOW_CLOSE as the default.
## Trigger Modes
1. AT_ONCE: Triggered immediately upon writing.
2. WINDOW_CLOSE: Triggered when the window closes (the closing of the window is determined by the event time, can be used in conjunction with watermark).
3. MAX_DELAY time: If the window closes, computation is triggered. If the window has not closed, and the duration since it has not closed exceeds the time specified by max delay, computation is triggered.
4. FORCE_WINDOW_CLOSE: Based on the current time of the operating system, only the results of the currently closed window are calculated and pushed out. The window is only calculated once at the moment of closure, and will not be recalculated subsequently. This mode currently only supports INTERVAL windows (does not support sliding); FILL_HISTORY must be 0, IGNORE EXPIRED must be 1, IGNORE UPDATE must be 1; FILL only supports PREV, NULL, NONE, VALUE.
You use the `TRIGGER` directive to specify when stream processing occurs for windowed computations:
The closing of the window is determined by the event time, such as when the event stream is interrupted or continuously delayed, at which point the event time cannot be updated, possibly leading to outdated computation results.
1. `AT_ONCE`: Triggered immediately upon ingestion.
2. `WINDOW_CLOSE`: Triggered when the window closes, with optional watermark.
3. `MAX_DELAY time`: Triggered when the specified time elapses or the window closes, whichever is earlier.
Therefore, stream computing provides the MAX_DELAY trigger mode that combines event time with processing time: MAX_DELAY mode triggers computation immediately when the window closes, and its unit can be specified, specific units: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks). Additionally, when data is written, if the time that triggers computation exceeds the time specified by MAX_DELAY, computation is triggered immediately.
The default value is `WINDOW_CLOSE`.
### Window Closure in Stream Computing
Note that non-windowed computations are processed in real time.
The core of stream computing lies in using the event time (i.e., the timestamp primary key in the written record) as the basis for calculating the window closure time, rather than relying on the TDengine server's time. Using event time as the basis effectively avoids issues caused by discrepancies between client and server times and can properly address challenges such as out-of-order data writing.
## Watermark
To further control the tolerance level for out-of-order data, stream computing introduces the watermark mechanism. When creating a stream, users can specify the value of watermark through the stream_option parameter, which defines the upper bound of tolerance for out-of-order data, defaulting to 0.
The time at which a window closes is determined by the event time, which is the primary key (timestamp) of the record ingested. This prevents problems caused by discrepancies between client and server times and addresses challenges such as out-of-order data ingestion.
You can specify a watermark to define the upper threshold of out-of-order data in your stream. The default value is 0, indicating that out-of-order data is not processed.
When data is ingested, the window closure time <math><mi>T</mi></math> is calculated as <math><mrow><mi>T</mi><mo>=</mo><mi>latest event time</mi><mo></mo><mi>watermark</mi></mrow></math>. All windows whose end time is earlier than <math><mi>T</mi></math> are then closed. This process is described in the following figure.
Assuming T = Latest event time - watermark, each time new data is written, the system updates the window closure time based on this formula. Specifically, the system closes all open windows whose end time is less than T. If the trigger mode is set to window_close or max_delay, the aggregated results of the window are pushed. The diagram below illustrates the window closure process in stream computing.
<figure>
<Image img={watermarkImg} alt="Window closure in stream processing"/>
<figcaption>Figure 1. Window closure diagram</figcaption>
</figure>
In the diagram, the vertical axis represents time, while the dots on the horizontal axis represent the received data points.
In the diagram above, the vertical axis represents moments, and the dots on the horizontal axis represent the data received. The related process is described as follows.
1. At time <math><msub><mi>T</mi><mn>1</mn></msub></math>, the 7th data point arrives. The calculated time falls within the second window, so the second window does not close.
2. At time <math><msub><mi>T</mi><mn>2</mn></msub></math>, the 6th and 8th data points are delayed. Since the latest event has not changed, <math><mi>T</mi></math> also remains unchanged, and the out-of-order data in the second window is processed.
3. At time <math><msub><mi>T</mi><mn>3</mn></msub></math>, the 10th data point arrives, and <math><mi>T</mi></math> moves past the closure time of the second window, which is then closed, allowing the out-of-order data to be correctly processed.
1. At moment T1, the 7th data point arrives, and based on T = Latest event - watermark, the calculated time falls within the second window, so the second window does not close.
2. At moment T2, the 6th and 8th data points arrive late to TDengine, and since the Latest event has not changed, T also remains unchanged, and the out-of-order data entering the second window has not yet been closed, thus it can be correctly processed.
3. At moment T3, the 10th data point arrives, T moves forward beyond the closure time of the second window, which is then closed, and the out-of-order data is correctly processed.
:::note
In window_close or max_delay modes, window closure directly affects the push results. In at_once mode, window closure only relates to memory usage.
For streams whose trigger mode is `WINDOW_CLOSE` or `MAX_DELAY`, window closure triggers computation. However, streams in `AT_ONCE` mode compute results immediately upon data ingestion regardless of window closure.
### Expired Data Handling Strategy
:::
For windows that have closed, data that falls into such windows again is marked as expired data. TDengine offers two ways to handle expired data, specified by the IGNORE EXPIRED option.
## Handling Expired Data
1. Recalculate, i.e., IGNORE EXPIRED 0: Re-find all data corresponding to the window from the TSDB and recalculate to get the latest result.
2. Directly discard, i.e., IGNORE EXPIRED 1: Default configuration, ignore expired data.
Data that is ingested into a closed window is considered to be expired. You can specify the `IGNORE EXPIRED` parameter to determine how to handle expired data:
Regardless of the mode, the watermark should be properly set to obtain correct results (direct discard mode) or avoid frequent re-triggering of recalculations that lead to performance overhead (recalculation mode).
1. `IGNORE EXPIRED 0`: Recalculate the latest results taking expired data into account.
2. `IGNORE EXPIRED 1`: Ignore expired data.
### Data Update Handling Strategy
The default value is `IGNORE EXPIRED 1`.
TDengine offers two ways to handle modified data, specified by the IGNORE UPDATE option.
:::note
1. Check whether the data has been modified, i.e., IGNORE UPDATE 0: Default configuration, if modified, recalculate the corresponding window.
2. Do not check whether the data has been modified, calculate all as incremental data, i.e., IGNORE UPDATE 1.
Ensure that an appropriate watermark has been set regardless of how you choose to handle expired data.
## Other Strategies for Stream Computing
:::
### Writing to an Existing Supertable
## Handling Updated Data
When the result of stream computing needs to be written into an existing supertable, ensure that the `stb_name` column corresponds correctly with the subquery output results. If the position and number of the `stb_name` column match exactly with the subquery output results, there is no need to explicitly specify the correspondence; if the data types do not match, the system will automatically convert the subquery output results to the corresponding `stb_name` column type.
You can specify the `IGNORE UPDATE` parameter to determine how to handle data that is updated after ingestion:
For already existing supertables, the system will check the schema information of the columns to ensure they match the subquery output results. Here are some key points:
1. `IGNORE UPDATE 0`: Check for updates and recompute results accordingly.
2. `IGNORE UPDATE 1`: Do not check for updates.
1. Check if the schema information of the columns matches; if not, automatically perform type conversion. Currently, an error is reported only if the data length exceeds 4096 bytes; otherwise, type conversion can be performed.
2. Check if the number of columns is the same; if different, explicitly specify the correspondence between the supertable and the subquery columns, otherwise, an error is reported. If the same, you can specify the correspondence or not; if not specified, they correspond by position order.
The default value is `IGNORE UPDATE 0`.
**Note** Although stream computing can write results to an existing supertable, it cannot allow two existing stream computations to write result data to the same (super) table. This is to avoid data conflicts and inconsistencies, ensuring data integrity and accuracy. In practice, set the column correspondence according to actual needs and data structure to achieve efficient and accurate data processing.
## Writing to an Existing Supertable
### Customizing Tags for Target Tables
Generally, the results of stream processing are stored in new supertables. If it is necessary to write results to an existing supertable, ensure that the columns in the supertable correspond exactly to the results of the subquery in your stream.
When writing to an existing supertable, note the following:
1. If the data types of the columns in the subquery results do not match those of the target supertable, the system will automatically convert them to the types specified in the supertable. If the length of the resultant data exceeds 4096 bytes, an error will occur.
2. If the number and position of the columns in the subquery results do not match those of the target supertable, you must explicitly specify the relationships between columns.
3. Multiple streams cannot write to the same target supertable.
## Customizing Tag Values for Target Tables
You can specify custom tag values for the subtable corresponding to each partition. The syntax is described as follows:
Users can generate custom tag values for each partition's subtable, as shown in the stream creation statement below:
```sql
CREATE STREAM output_tag TRIGGER AT_ONCE INTO output_tag_s TAGS(alias_tag varchar(100)) AS SELECT _wstart, COUNT(*) FROM power.meters PARTITION BY CONCAT("tag-", tbname) AS alias_tag INTERVAL(10s);
CREATE STREAM output_tag trigger at_once INTO output_tag_s TAGS(alias_tag varchar(100)) as select _wstart, count(*) from power.meters partition by concat("tag-", tbname) as alias_tag interval(10s));
```
In the `PARTITION BY` clause, an alias `alias_tag` is defined for `CONCAT("tag-", tbname)`, corresponding to the custom tag name of the supertable `output_tag_s`. In this example, the tag of the newly created subtables for the stream will have the prefix `tag-` concatenated with the original table name as the tag value.
In the PARTITION clause, an alias `alias_tag` is defined for `concat("tag-", tbname)`, corresponding to the custom tag name of the supertable `output_tag_s`. In the example above, the tag of the newly created subtable by the stream will use the prefix 'tag-' connected to the original table name as the tag value. The following checks will be performed on the tag information:
When defining custom tag values, note the following:
1. Check if the schema information of the tag matches; if not, automatically perform data type conversion. Currently, an error is reported only if the data length exceeds 4096 bytes; otherwise, type conversion can be performed.
2. Check if the number of tags is the same; if different, explicitly specify the correspondence between the supertable and the subquery tags, otherwise, an error is reported. If the same, you can specify the correspondence or not; if not specified, they correspond by position order.
1. If the data types of the defined tags do not match those of the target supertable, the system will automatically convert them to the types specified in the supertable. If the length of the resultant data exceeds 4096 bytes, an error will occur.
2. If the number and position of the defined tags do not match those of the target supertable, you must explicitly specify the relationships between the defined tags and the tag columns in the target supertable.
### Cleaning Up Intermediate States of Stream Computing
```sql
DELETE_MARK time
```
DELETE_MARK is used to delete cached window states, i.e., deleting the intermediate results of stream computing. Cached window states are mainly used for window result updates caused by expired data. If not set, the default value is 10 years.
## Specific Operations of Stream Computing
### Deleting Stream Computing
Only deletes the stream computing task; data written by stream computing will not be deleted, SQL as follows:
```sql
DROP STREAM [IF EXISTS] stream_name;
```
### Displaying Stream Computing
View the SQL of stream computing tasks as follows:
```sql
SHOW STREAMS;
```
To display more detailed information, you can use:
```sql
SELECT * from information_schema.`ins_streams`;
```
### Pausing Stream Computing Tasks
The SQL to pause stream computing tasks is as follows:
```sql
PAUSE STREAM [IF EXISTS] stream_name;
```
If IF EXISTS is not specified, an error is reported if the stream does not exist. If it exists, the stream computing is paused. If IF EXISTS is specified, it returns success if the stream does not exist. If it exists, the stream computing is paused.
### Resuming Stream Computing Tasks
The SQL to resume stream computing tasks is as follows. If IGNORE UNTREATED is specified, it ignores the data written during the pause period of the stream computing task when resuming.
```sql
RESUME STREAM [IF EXISTS] [IGNORE UNTREATED] stream_name;
```
If IF EXISTS is not specified, an error is reported if the stream does not exist. If it exists, the stream computing is resumed. If IF EXISTS is specified, it returns success if the stream does not exist. If it exists, the stream computing is resumed. If IGNORE UNTREATED is specified, it ignores the data written during the pause period of the stream computing task when resuming.
### Stream Computing Upgrade Fault Recovery
After upgrading TDengine, if the stream computing is not compatible, you need to delete the stream computing and then recreate it. The steps are as follows:
1. Modify taos.cfg, add `disableStream 1`
2. Restart taosd. If the startup fails, change the name of the stream directory to avoid taosd trying to load the stream computing data information during startup. Avoid using the delete operation to prevent risks caused by misoperations. The folders that need to be modified: `$dataDir/vnode/vnode*/tq/stream`, where `$dataDir` refers to the directory where TDengine stores data. In the `$dataDir/vnode/` directory, there will be multiple directories like vnode1, vnode2...vnode*, all need to change the name of the tq/stream directory to tq/stream.bk
3. Start taos
```sql
drop stream xxxx; ---- xxx refers to the stream name
flush database stream_source_db; ---- The database where the supertable for stream computing data reading is located
flush database stream_dest_db; ---- The database where the supertable for stream computing data writing is located
```
Example:
```sql
create stream streams1 into test1.streamst as select _wstart, count(a) c1 from test.st interval(1s) ;
drop stream streams1;
flush database test;
flush database test1;
```
4. Close taosd
5. Modify taos.cfg, remove `disableStream 1`, or change `disableStream` to 0
6. Start taosd

View File

@ -6,51 +6,52 @@ slug: /advanced-features/edge-cloud-orchestration
import Image from '@theme/IdealImage';
import edgeCloud from '../assets/edge-cloud-orchestration-01.png';
## Overview of Edge-Cloud Orchestration
## Why Edge-Cloud Collaboration is Needed
In the context of the Industrial Internet, edge devices are primarily used to process local data, and decision-makers cannot form a global understanding of the entire system based solely on the information collected from edge devices. In practice, edge devices need to report data to a cloud computing platform (either public or private), where data aggregation and information fusion occur, allowing decision-makers to gain a comprehensive insight into the data. This architecture of edge-cloud orchestration has gradually become an essential pillar supporting the development of the Industrial Internet.
In industrial Internet scenarios, edge devices are used only to handle local data, and decision-makers cannot form a global understanding of the entire system based solely on information collected by edge devices. In practical applications, edge devices need to report data to cloud computing platforms (public or private clouds), where data aggregation and information integration are carried out, providing decision-makers with a global insight into the entire dataset. This edge-cloud collaboration architecture has gradually become an important pillar supporting the development of the industrial Internet.
Edge devices mainly monitor and alert on specific data points from the production line, such as real-time production data from a workshop, and then synchronize this production data to a cloud-based big data platform. The requirement for real-time processing is high on the edge, but the volume of data may not be large; typically, a production workshop may have a few thousand to tens of thousands of monitoring points. In contrast, the central side often has sufficient computing resources to aggregate edge data for analysis.
Edge devices mainly monitor and alert on specific data on the production line, such as real-time production data in a particular workshop, and then synchronize this edge-side production data to the big data platform in the cloud.
On the edge side, there is a high requirement for real-time performance, but the data volume may not be large, typically ranging from a few thousand to tens of thousands of monitoring points in a workshop. On the central side, computing resources are generally abundant, capable of aggregating data from the edge side for analysis and computation.
To achieve this operation, the database or data storage layer must ensure that data can be reported hierarchically and selectively. In some scenarios, the overall data volume is very large, necessitating selective reporting. For example, raw records collected once every second on the edge may be downsampled to once every minute when reported to the central side. This downsampling significantly reduces the data volume while still retaining key information for long-term analysis and forecasting.
To achieve this operation, the requirements for the database or data storage layer are to ensure that data can be reported step by step and selectively. In some scenarios, where the overall data volume is very large, selective reporting is necessary. For example, raw records collected every second on the edge side, when reported to the central side, are downsampled to once a minute, which greatly reduces the data volume but still retains key information for long-term data analysis and prediction.
In the traditional industrial data collection process, data is collected from Programmable Logic Controllers (PLC) and then enters a historian (an industrial real-time database), which supports business applications. Such systems typically adopt a master-slave architecture that is difficult to scale horizontally and heavily relies on the Windows ecosystem, resulting in a relatively closed environment.
In the past industrial data collection process, data was collected from industrial logic controllers PLCs, then entered into Historian, the industrial real-time database, to support business applications. These systems are not easy to scale horizontally, and are heavily dependent on the Windows ecosystem, which is relatively closed.
## TDengine's Solution
## TDengine's Edge-Cloud Collaboration Solution
TDengine Enterprise is committed to providing powerful edge-cloud orchestration capabilities, featuring the following significant characteristics:
TDengine Enterprise is committed to providing powerful edge-cloud collaboration capabilities, with the following notable features:
- **Efficient Data Synchronization**: Supports synchronization efficiency of millions of data points per second, ensuring rapid and stable data transmission between the edge and the cloud.
- **Multi-Data Source Integration**: Compatible with various external data sources, such as AVEVA PI System, OPC-UA, OPC-DA, and MQTT, achieving broad data access and integration.
- **Flexible Configuration of Synchronization Rules**: Provides configurable synchronization rules, allowing users to customize data synchronization strategies and methods based on actual needs.
- **Resume Transmission and Re-Subscription**: Supports resume transmission and re-subscription functionalities, ensuring continuity and integrity of data synchronization during network instability or interruptions.
- **Historical Data Migration**: Supports the migration of historical data, enabling users to seamlessly transfer historical data to a new system during upgrades or system changes.
- Efficient data synchronization: Supports synchronization efficiency of millions of data per second, ensuring fast and stable data transmission between the edge side and the cloud.
- Multi-data source integration: Compatible with various external data sources, such as AVEVA PI System, OPC-UA, OPC-DA, MQTT, etc., to achieve broad data access and integration.
- Flexible configuration of synchronization rules: Provides configurable synchronization rules, allowing users to customize the strategy and method of data synchronization according to actual needs.
- Offline continuation and re-subscription: Supports offline continuation and re-subscription functions, ensuring the continuity and integrity of data synchronization in the event of unstable or interrupted networks.
- Historical data migration: Supports the migration of historical data, facilitating users to seamlessly migrate historical data to a new system when upgrading or replacing systems.
TDengine's data subscription feature offers significant flexibility for subscribers, allowing users to configure subscription objects as needed. Users can subscribe to a database, a supertable, or even a query statement with filter conditions. This allows users to achieve selective data synchronization, transferring only the relevant data (including offline and out-of-order data) from one cluster to another to meet various complex data demands.
TDengine's data subscription feature offers great flexibility to subscribers, allowing users to configure subscription objects as needed. Users can subscribe to a database, a supertable, or even a query statement with filtering conditions. This enables users to implement selective data synchronization, syncing truly relevant data (including offline and out-of-order data) from one cluster to another to meet the data needs of various complex scenarios.
The following diagram illustrates the implementation of the edge-cloud orchestration architecture in TDengine Enterprise using a specific example of a production workshop. In the workshop, real-time data generated by equipment is stored in TDengine deployed on the edge. The TDengine deployed at the branch factory subscribes to data from the workshop's TDengine. To better meet business needs, data analysts can set subscription rules, such as downsampling data or only synchronizing data that exceeds a specified threshold. Similarly, TDengine deployed at the group level subscribes to data from various branch factories, achieving data aggregation at the group level for further analysis and processing.
The following diagram illustrates the implementation of an edge-cloud collaboration architecture in TDengine Enterprise using a specific production workshop example. In the production workshop, real-time data generated by equipment is stored in TDengine deployed on the edge side. The TDengine deployed in the branch factory subscribes to the data from the TDengine in the production workshop. To better meet business needs, data analysts set some subscription rules, such as data downsampling or syncing only data exceeding a specified threshold. Similarly, the TDengine deployed on the corporate side then subscribes to data from various branch factories, achieving corporate-level data aggregation, ready for further analysis and processing.
<figure>
<Image img={edgeCloud} alt="Edge-cloud orchestration diagram"/>
<figcaption>Edge-cloud orchestration diagram</figcaption>
</figure>
This implementation approach has several advantages:
This implementation approach has the following advantages:
- Requires no coding; only simple configurations are needed on the edge and cloud sides.
- Significantly increases the automation level of cross-region data synchronization, reducing error rates.
- Data does not need to be cached, minimizing batch transmissions and avoiding bandwidth congestion during peak flow.
- Data is synchronized through a subscription method, which is configurable, simple, flexible, and real-time.
- Both edge and cloud use TDengine, ensuring a unified data model that reduces the difficulty of data governance.
- No coding required, just simple configuration on the edge side and cloud.
- Greatly improved automation of cross-regional data synchronization, reducing error rates.
- No need for data caching, reducing batch sending, avoiding traffic peak congestion bandwidth.
- Data synchronization through subscription, with configurable rules, simple, flexible, and highly real-time.
- Both edge and cloud use TDengine, completely unifying the data model, reducing data governance difficulty.
A common pain point faced by manufacturing enterprises is data synchronization. Many companies currently use offline methods to synchronize data, but TDengine Enterprise enables real-time data synchronization with configurable rules. This approach can prevent resource waste and bandwidth congestion risks caused by periodically transmitting large volumes of data.
Manufacturing enterprises often face a pain point in data synchronization. Many enterprises currently use offline methods to synchronize data, but TDengine Enterprise achieves real-time data synchronization with configurable rules. This method can avoid the resource waste and bandwidth congestion risks caused by regular large data transfers.
## Advantages of Edge-Cloud Orchestration
## Advantages of Edge-Cloud Collaboration
The IT and OT (Operational Technology) construction status of traditional industries varies greatly. Compared to the Internet sector, most enterprises are significantly lagging in their investments in digitization. Many enterprises are still using outdated systems to process data, which often operate independently, leading to so-called data silos.
The IT and OT (Operational Technology) construction conditions of traditional industries vary, and compared to the internet industry, most enterprises are significantly behind in digital investment. Many enterprises still use outdated systems to process data, which are often independent of each other, forming so-called data silos.
In this context, injecting new vitality into traditional industries with AI requires first integrating the dispersed systems and their collected data, breaking the limitations of data silos. However, this process is challenging, as it involves multiple systems and various Industrial Internet protocols, making data aggregation far more than a simple merging task. It requires cleaning, processing, and handling data from different sources to integrate it into a unified platform.
In this context, to inject new vitality into traditional industries with AI, the primary task is to integrate systems scattered in various corners and their collected data, breaking the limitations of data silos. However, this process is full of challenges, as it involves multiple systems and a plethora of industrial Internet protocols, and data aggregation is not a simple merging task. It requires cleaning, processing, and handling data from different sources to integrate it into a unified platform.
When all data is aggregated into a single system, the efficiency of accessing and processing data will be significantly improved. Enterprises will be able to respond more quickly to real-time data and resolve issues more effectively. Employees both inside and outside the enterprise can also collaborate efficiently, enhancing overall operational efficiency.
When all data is aggregated into one system, the efficiency of accessing and processing data is significantly improved. Enterprises can respond more quickly to real-time data, solve problems more effectively, and achieve efficient collaboration among internal and external staff, enhancing overall operational efficiency.
Moreover, once data is aggregated, advanced third-party AI analysis tools can be utilized for better anomaly monitoring, real-time alerts, and more accurate predictions regarding capacity, costs, and equipment maintenance. This will enable decision-makers to better grasp the overall macro situation, providing strong support for enterprise development and facilitating the digital transformation and intelligent upgrade of traditional industries.
Additionally, after data aggregation, advanced third-party AI analysis tools can be utilized for improved anomaly detection, real-time alerts, and provide more accurate predictions for production capacity, cost, and equipment maintenance. This will enable decision-makers to better grasp the overall macro situation, provide strong support for the development of the enterprise, and help traditional industries achieve digital transformation and intelligent upgrades.

View File

@ -9,46 +9,48 @@ import imgStep2 from '../../assets/tdengine-2-02.png';
import imgStep3 from '../../assets/tdengine-2-03.png';
import imgStep4 from '../../assets/tdengine-2-04.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from the TDengine 2.x to the current cluster.
This section describes how to create a data migration task through the Explorer interface to migrate data from the old version of TDengine2 to the current cluster.
## Feature Overview
`taosX` uses SQL queries to retrieve data from the source cluster and writes the query results to the target database. Specifically, `taosX` treats a subtable's data for a specific time period as the basic unit of the query, and the data to be migrated is written to the target database in batches.
taosX migrates data by querying the source cluster and writing the results to the target database. Specifically, taosX uses the data of a subtable over a period of time as the basic unit of query, and writes the data to be migrated to the target database in batches.
`taosX` supports three migration modes:
taosX supports three migration modes:
1. **history** mode: Migrates data within a specified time range. If no time range is specified, it migrates all data up to the task creation time. Once the migration is complete, the task stops.
2. **realtime** mode: Synchronizes data from the task creation time onward. The task will continue running unless manually stopped.
3. **both** mode: Executes history mode first, then switches to realtime mode.
1. **history** mode. This refers to migrating data within a specified time range. If no time range is specified, it migrates all data up to the time the task was created. The task stops once migration is complete.
2. **realtime** mode. It synchronizes data from the time the task is created onwards. The task will continue to run unless manually stopped.
3. **both** mode. It first executes in history mode, then in realtime mode.
In each migration mode, you can specify whether to migrate the table structure. If "always" is selected, the table structure will be synced to the target database before migrating the data. If there are many subtables, this process may take a while. If you are sure that the target database already has the same table schema as the source database, it is recommended to select "none" to save time.
Under each migration mode, you can specify whether to migrate the table structure. If "always" is selected, the structure of the table is synchronized to the target database before migrating data. This process may take longer if there are many subtables. If it is certain that the target database already has the same table interface as the source database, it is recommended to choose "none" to save time.
During task execution, progress is saved to disk, so if a task is paused and restarted, or automatically recovers from an error, it will not restart from the beginning.
The task saves progress information to the disk during operation, so if the task is paused and then restarted, or if it automatically recovers from an anomaly, the task will not start over from the beginning.
For more detailed information, we recommend reading the description of each form field on the task creation page.
For more options, it is recommended to read the description of each form field on the task creation page in detail.
## Steps
## Specific Steps
First, click the "Data Ingestion" menu on the left, then click the "Add Data Source" button on the right.
First, click on the "Data Writing" menu on the left, then click the "Add Data Source" button on the right.
<figure>
<Image img={imgStep1} alt="Add data source"/>
<figcaption>Figure 1. Add a data source</figcaption>
</figure>
Next, enter the task name, such as "migrate-test", and select the type "TDengine2". At this point, the form will switch to the dedicated TDengine2 migration form, which contains many options, each with a detailed description, as shown in the images below.
Then enter the task name, such as "migrate-test", and finally select the type "TDengine2". At this point, the form switches to a form dedicated to migrating data from TDengine2, containing a large number of options, each with detailed explanations, as shown in the images below.
<figure>
<Image img={imgStep2} alt="Add data source"/>
<figcaption>Figure 2. Add a data source</figcaption>
</figure>
<figure>
<Image img={imgStep3} alt="Add data source"/>
<figcaption>Figure 3. Add a data source</figcaption>
</figure>
<figure>
<Image img={imgStep4} alt="Add data source"/>
<figcaption>Figure 4. Add a data source</figcaption>
</figure>
After clicking the "Submit" button to submit the task, return to the "Data Source" task list page, where you can monitor the task's execution status.
After clicking the "Submit" button to submit the task, return to the "Data Source" task list page to monitor the status of the task.

View File

@ -14,90 +14,90 @@ import imgStep7 from '../../assets/tdengine-3-07.png';
import imgStep8 from '../../assets/tdengine-3-08.png';
import imgStep9 from '../../assets/tdengine-3-09.png';
This document explains how to use the Explorer interface to subscribe to data from another cluster into the current one.
This document describes how to use Explorer to subscribe to data from another cluster to this cluster.
## Preparation
Create the necessary subscription topic on the source cluster. You can subscribe to the entire database, a supertable, or a subtable. In this example, we will demonstrate subscribing to a database named `test`.
Create the required Topic in the source cluster, which can subscribe to the entire database, supertable, or subtable. In this example, we demonstrate subscribing to a database named test.
### Step 1: Access the Data Subscription page
### Step One: Enter the "Data Subscription" page
Open the Explorer interface for the source cluster, click on the "Data Subscription" menu on the left, and then click on "Add New Topic."
Open the Explorer interface of the source cluster, click the "Data Subscription" menu on the left, then click "Add New Topic".
<figure>
<Image img={imgStep1} alt=""/>
</figure>
### Step 2: Add a New Topic
### Step Two: Add a New Topic
Enter the topic name and select the database you want to subscribe to.
Enter the topic name, select the database to subscribe to.
<figure>
<Image img={imgStep2} alt=""/>
</figure>
### Step 3: Copy the Topic's DSN
### Step Three: Copy the Topic's DSN
Click the "Create" button, go back to the topic list, and copy the topic's **DSN** for later use.
Click the "Create" button, return to the topic list and copy the **DSN** of the topic for later use.
<figure>
<Image img={imgStep3} alt=""/>
</figure>
## Create a Subscription Task
## Create Subscription Task
### Step 1: Go to the "Add Data Source" page
### Step One: Enter the "Add Data Source" page
1. Click the "Data Ingestion" menu on the left.
2. Click "Add Data Source."
1. Click the "Data Writing" menu on the left
2. Click "Add Data Source"
<figure>
<Image img={imgStep4} alt=""/>
</figure>
### Step 2: Enter Data Source Information
### Step Two: Enter Data Source Information
1. Enter the task name.
2. Select the task type "TDengine3."
3. Choose the target database.
4. Paste the DSN copied from the preparation step into the **Topic DSN** field. For example: `tmq+ws://root:taosdata@localhost:6041/topic`
5. After completing the above steps, click the "Connectivity Check" button to test connectivity with the source.
1. Enter the task name
2. Select the task type "TDengine3"
3. Select the target database
4. Paste the DSN copied in the preparation step into the **Topic DSN** field. For example: tmq+ws://root:taosdata@localhost:6041/topic
5. After completing the above steps, click the "Connectivity Check" button to test connectivity with the source
<figure>
<Image img={imgStep5} alt=""/>
</figure>
### Step 3: Configure Subscription Settings and Submit the Task
### Step Three: Fill in Subscription Settings and Submit Task
1. Choose the subscription starting point. You can configure it to start from the earliest or latest data, with the default being the earliest.
2. Set the timeout. Supported units include ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), M (months), y (years).
3. Set the subscription group ID. The subscription group ID is an arbitrary string used to identify a subscription group, with a maximum length of 192 characters. Subscribers within the same group share consumption progress. If not specified, a randomly generated group ID will be used.
4. Set the client ID. The client ID is an arbitrary string used to identify the client, with a maximum length of 192 characters.
5. Synchronize data that has already been written to disk. If enabled, it will synchronize data that has already been flushed to the TSDB storage file (i.e., not in the WAL). If disabled, it will only synchronize data that has not yet been flushed (i.e., still in the WAL).
6. Synchronize table deletion operations. If enabled, it will synchronize table deletion operations to the target database.
7. Synchronize data deletion operations. If enabled, it will synchronize data deletion operations to the target database.
8. Compression. Enable WebSocket compression to reduce network bandwidth usage.
9. Click the "Submit" button to submit the task.
1. Choose the subscription start position. Configurable to start from the earliest or latest data, default is earliest
2. Set the timeout period. Supports units ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), M (months), y (years)
3. Set the subscription group ID. The subscription group ID is an arbitrary string used to identify a subscription group, with a maximum length of 192. If not specified, a randomly generated group ID will be used.
4. Set the client ID. The client ID is an arbitrary string used to identify the client, with a maximum length of 192.
5. Synchronize data that has been written to disk. If enabled, it can synchronize data that has been written to the TSDB time-series data storage file (i.e., not in WAL). If disabled, only data that has not yet been written to disk (i.e., saved in WAL) will be synchronized.
6. Synchronize table deletion operations. If enabled, table deletion operations will be synchronized to the target database.
7. Synchronize data deletion operations. If enabled, data deletion operations will be synchronized to the target database.
8. Compression. Enable WebSocket compression support to reduce network bandwidth usage.
9. Click the "Submit" button to submit the task
<figure>
<Image img={imgStep6} alt=""/>
</figure>
## Monitoring Task Progress
## Monitor Task Execution
After submitting the task, return to the data source page to view the task status. The task will first be added to the execution queue and will start running shortly after.
After submitting the task, return to the data source page to view the task status. The task will first be added to the execution queue and will start running shortly.
<figure>
<Image img={imgStep7} alt=""/>
</figure>
Click the "View" button to monitor dynamic statistical information about the task.
Click the "View" button to monitor the dynamic statistical information of the task.
<figure>
<Image img={imgStep8} alt=""/>
</figure>
You can also click the collapse button on the left to expand the task's activity information. If the task encounters any issues, detailed explanations will be provided here.
You can also click the left collapse button to expand the task's activity information. If the task runs abnormally, detailed explanations can be seen here.
<figure>
<Image img={imgStep9} alt=""/>
@ -105,6 +105,6 @@ You can also click the collapse button on the left to expand the task's activity
## Advanced Usage
1. The FROM DSN supports multiple Topics, separated by commas. For example: `tmq+ws://root:taosdata@localhost:6041/topic1,topic2,topic3`
2. In the FROM DSN, you can also use database names, supertable names, or subtable names in place of the Topic names. For example: `tmq+ws://root:taosdata@localhost:6041/db1,db2,db3`. In this case, it is not necessary to create Topics in advance; `taosX` will automatically recognize the use of database names and create the database subscription Topics in the source cluster.
3. The FROM DSN supports the `group.id` parameter to explicitly specify the group ID for the subscription. If not specified, a randomly generated group ID will be used.
1. FROM DSN supports multiple Topics, with multiple Topic names separated by commas. For example: `tmq+ws://root:taosdata@localhost:6041/topic1,topic2,topic3`
2. In the FROM DSN, you can also use the database name, supertable name, or subtable name instead of the Topic name. For example: `tmq+ws://root:taosdata@localhost:6041/db1,db2,db3`, in this case, there is no need to create a Topic in advance, taosX will automatically recognize that a database name is used and automatically create a subscription Topic in the source cluster.
3. FROM DSN supports the group.id parameter, to explicitly specify the group ID used for subscription. If not specified, a randomly generated group ID will be used.

View File

@ -10,31 +10,31 @@ import imgStep2 from '../../assets/pi-system-02.png';
import imgStep3 from '../../assets/pi-system-03.png';
import imgStep4 from '../../assets/pi-system-04.png';
This section explains how to create a task through the Explorer interface to migrate data from PI System to TDengine.
This section describes how to create data migration tasks through the Explorer interface, migrating data from the PI system to the current TDengine cluster.
## Overview
## Feature Overview
PI System is a suite of software products for data collection, retrieval, analysis, transmission, and visualization. It can serve as the infrastructure for enterprise-level systems that manage real-time data and events. The `taosX` PI System connector plugin can extract both real-time and historical data from PI System.
The PI system is a software product suite used for data collection, retrieval, analysis, transmission, and visualization, serving as the infrastructure for enterprise-level systems managing real-time data and events. taosX can extract real-time or historical data from the PI system using the PI connector plugin.
From a data timeliness perspective, PI System data source tasks are divided into two types: **real-time tasks** and **backfill tasks**. In the task type dropdown list, these two types are labeled: **PI** and **PI backfill**.
From the perspective of data timeliness, PI data source tasks are divided into two categories: **real-time tasks** and **backfill tasks**. In the task type dropdown list, these two categories correspond to the names: **PI** and **PI backfill**.
From a data model perspective, PI System data source tasks are divided into **single-column model** tasks and **multi-column model** tasks:
From the data model perspective, PI data source tasks are divided into **single-column model** tasks and **multi-column model** tasks:
1. **Single-column model** tasks map a PI Point to a TDengine table.
2. **Multi-column model** tasks map a PI AF element to a TDengine table.
1. **Single-column model** tasks map one PI Point to one table in TDengine
2. **Multi-column model** tasks map one PI AF element to one table
For the type of connected data source, PI System data source tasks are divided into **Archive Server** data sources and **AF Server** data sources. For **Archive Server** data sources, only the **single-column model** can be used. For **AF Server** data sources, both **single-column model** and **multi-column model** can be selected.
Regarding the type of connected data source, PI data source tasks are further divided into **Archive Server** data sources and **AF Server** data sources. For **Archive Server** data sources, only the **single-column model** can be used. For **AF Server** data sources, both **single-column model** and **multi-column model** can be chosen.
Users configure data mapping rules from PI System to TDengine via a CSV file, referred to as the **model configuration file**:
Users configure the data mapping rules from PI to TDengine through a CSV file, referred to as the **model configuration file**:
1. For tasks using the AF Server's single-column model, `taosX` automatically identifies which attributes of an element reference PI Point data, mapping a PI Point attribute to a table.
2. For tasks using the AF Server's multi-column model, one element corresponds to one table. By default, `taosX` maps PI Point attributes to TDengine metric columns and other attributes to TDengine tag columns.
1. For tasks using the AF Server single-column model, taosX automatically identifies which attributes of the element are referencing PI Point data, mapping one PI Point attribute to one table.
2. For tasks using the AF Server multi-column model, one element corresponds to one table. taosX by default maps PI Point attributes to TDengine Metric columns and other attributes to TDengine tag columns.
## Creating a Task
## Creating Tasks
### Add Data Source
In the "Data Ingestion" page, click the **+Add Data Source** button to go to the add data source page.
In the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep1} alt=""/>
@ -42,13 +42,13 @@ In the "Data Ingestion" page, click the **+Add Data Source** button to go to the
### Basic Configuration
Enter a task name in the **Name** field, such as: "test."
Enter the task name in **Name**, such as "test";
Select **PI** or **PI backfill** from the **Type** dropdown list.
If the `taosX` service runs on the same server as the PI system or can connect directly to it (requires PI AF SDK), an agent is not necessary. Otherwise, configure an agent: select a specified agent from the dropdown, or click the **+Create New Agent** button on the right to create a new agent, following the prompts to configure it. `taosX` or its agent must be deployed on a host that can directly connect to the PI System.
If the taosX service is running on or can directly connect to the server where the PI system is located (dependent on PI AF SDK), **Proxy** is not necessary; otherwise, configure **Proxy**: select the specified proxy from the dropdown, or click the **+Create New Proxy** button on the right to create a new proxy and follow the prompts to configure the proxy. That is, taosX or its proxy needs to be deployed on a host that can directly connect to the PI system.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right to create a new one.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep2} alt=""/>
@ -56,31 +56,31 @@ Select a target database from the **Target Database** dropdown list, or click th
### Connection Configuration
The PI System connector supports two connection methods:
The PI connector supports two connection methods:
1. **PI Data Archive Only**: Does not use the AF model. In this mode, fill in the **PI Service Name** (server address, typically the hostname).
1. **PI Data Archive Only**: Does not use AF mode. In this mode, directly fill in the **PI Service Name** (server address, usually using the hostname).
<figure>
<Image img={imgStep3} alt=""/>
</figure>
2. **PI Data Archive and Asset Framework (AF) Server**: Uses the AF SDK. In this mode, in addition to configuring the service name, you also need to configure the PI System (AF Server) name (hostname) and the AF Database name.
2. **PI Data Archive and Asset Framework (AF) Server**: Uses AF SDK. In addition to configuring the service name, this mode also requires configuring the PI system (AF Server) name (hostname) and AF database name.
<figure>
<Image img={imgStep4} alt=""/>
</figure>
Click the **Connectivity Check** button to check if the data source is available.
Click the **Connectivity Check** button to verify if the data source is available.
### Data Model Configuration
This section has two tabs corresponding to the single-column model configuration and the multi-column model configuration. If this is your first time configuring it, whether you choose the single-column model or the multi-column model, be sure to click the "Download Default Configuration" button. This will generate the default **model configuration file**, which will also be downloaded to your local machine, where you can view or edit it. After editing, you can re-upload it to override the default configuration.
This part has two tabs, corresponding to the configuration of the single-column model and the multi-column model. If this is your first configuration, whether you choose a single-column model or a multi-column model, be sure to click the "Download Default Configuration" button. This action will trigger the generation of the default **model configuration file** and also download the **model configuration file** to your local machine, which you can view or edit. After editing, you can also upload it again to overwrite the default configuration.
If you want to synchronize all points or all elements of a template, the default configuration is sufficient. If you want to filter specific naming patterns of points or element templates, you need to fill in the filter conditions before clicking "Download Default Configuration."
If you want to synchronize all points or all template elements, then the default configuration is sufficient. If you want to filter specific naming patterns of points or element templates, you need to fill in the filter conditions before clicking "Download Default Configuration".
#### Multi-Column Model Configuration File
#### Multi-column Model Configuration File
Below is an example of a multi-column model configuration file. This configuration file includes configurations for two supertables: one for the `metertemplate` table, whose data comes from the `MeterTemplate` element, and another for the `farm` table, whose data comes from the `Farm` element.
Below is an example of a multi-column model configuration file. This configuration file includes configurations for two supertables: one is the metertemplate table, which receives data from elements of the MeterTemplate template; the other is the farm table, which receives data from elements of the Farm template.
```csv
SuperTable,metertemplate
@ -92,10 +92,10 @@ voltage,COLUMN,DOUBLE,$voltage
voltage_status,COLUMN,INT,$voltage_status
current,COLUMN,DOUBLE,$current
current_status,COLUMN,INT,$current_status
element_id,TAG,VARCHAR(100),$element_id
element_name,TAG,VARCHAR(100),$element_name
path,TAG,VARCHAR(100),$path
categories,TAG,VARCHAR(100),$categories
element_id,tag,VARCHAR(100),$element_id
element_name,tag,VARCHAR(100),$element_name
path,tag,VARCHAR(100),$path
categories,tag,VARCHAR(100),$categories
SuperTable,farm
SubTable,${element_name}_${element_id}
@ -112,21 +112,21 @@ farm_lifetime_production__weekly_,COLUMN,FLOAT,$farm_lifetime_production__weekly
farm_lifetime_production__weekly__status,COLUMN,INT,$farm_lifetime_production__weekly__status
farm_lifetime_production__hourly_,COLUMN,FLOAT,$farm_lifetime_production__hourly_
farm_lifetime_production__hourly__status,COLUMN,INT,$farm_lifetime_production__hourly__status
element_id,TAG,VARCHAR(100),$element_id
element_name,TAG,VARCHAR(100),$element_name
path,TAG,VARCHAR(100),$path
categories,TAG,VARCHAR(100),$categories
element_id,tag,VARCHAR(100),$element_id
element_name,tag,VARCHAR(100),$element_name
path,tag,VARCHAR(100),$path
categories,tag,VARCHAR(100),$categories
```
A multi-column model configuration file consists of one or more supertable definitions. Each supertable configuration includes:
The multi-column model configuration file consists of one or more supertable definitions. Each supertable configuration includes:
1. The mapping between supertables and templates.
2. The mapping between attributes and TDengine metric columns.
3. The mapping between attributes and TDengine tag columns.
4. Source data filtering conditions.
5. For each column, whether it is a metric column or a tag column, you can configure a mapping rule. For details, see [Zero-Code Third-Party Data Integration](../), "Data Extraction, Filtering, and Transformation."
1. Correspondence between supertables and templates
2. Correspondence between attributes and TDengine Metric columns
3. Correspondence between attributes and TDengine tag columns
4. Source data filtering conditions
5. For each column, whether it is a Metrics column or a tag column, a mapping rule can be configured, see [Zero-code third-party data access](../) "Data extraction, filtering, and transformation" section
#### Single-Column Model Configuration File
#### Single-column model configuration file
Below is an example of a single-column model configuration file.
@ -137,18 +137,18 @@ Filter,
ts,KEY,TIMESTAMP,$ts
value,COLUMN,FLOAT,$value
status,COLUMN,INT,$status
path,TAG,VARCHAR(200),$path
point_name,TAG,VARCHAR(100),$point_name
ptclassname,TAG,VARCHAR(100),$ptclassname
sourcetag,TAG,VARCHAR(100),$sourcetag
tag,TAG,VARCHAR(100),$tag
descriptor,TAG,VARCHAR(100),$descriptor
exdesc,TAG,VARCHAR(100),$exdesc
engunits,TAG,VARCHAR(100),$engunits
pointsource,TAG,VARCHAR(100),$pointsource
step,TAG,VARCHAR(100),$step
future,TAG,VARCHAR(100),$future
element_paths,TAG,VARCHAR(512),`$element_paths.replace("\\", ".")`
path,tag,VARCHAR(200),$path
point_name,tag,VARCHAR(100),$point_name
ptclassname,tag,VARCHAR(100),$ptclassname
sourcetag,tag,VARCHAR(100),$sourcetag
tag,tag,VARCHAR(100),$tag
descriptor,tag,VARCHAR(100),$descriptor
exdesc,tag,VARCHAR(100),$exdesc
engunits,tag,VARCHAR(100),$engunits
pointsource,tag,VARCHAR(100),$pointsource
step,tag,VARCHAR(100),$step
future,tag,VARCHAR(100),$future
element_paths,tag,VARCHAR(512),`$element_paths.replace("\\", ".")`
SuperTable,milliampere_float32
SubTable,${point_name}
@ -156,18 +156,18 @@ Filter,
ts,KEY,TIMESTAMP,$ts
value,COLUMN,FLOAT,$value
status,COLUMN,INT,$status
path,TAG,VARCHAR(200),$path
point_name,TAG,VARCHAR(100),$point_name
ptclassname,TAG,VARCHAR(100),$ptclassname
sourcetag,TAG,VARCHAR(100),$sourcetag
tag,TAG,VARCHAR(100),$tag
descriptor,TAG,VARCHAR(100),$descriptor
exdesc,TAG,VARCHAR(100),$exdesc
engunits,TAG,VARCHAR(100),$engunits
pointsource,TAG,VARCHAR(100),$pointsource
step,TAG,VARCHAR(100),$step
future,TAG,VARCHAR(100),$future
element_paths,TAG,VARCHAR(512),`$element_paths.replace("\\", ".")`
path,tag,VARCHAR(200),$path
point_name,tag,VARCHAR(100),$point_name
ptclassname,tag,VARCHAR(100),$ptclassname
sourcetag,tag,VARCHAR(100),$sourcetag
tag,tag,VARCHAR(100),$tag
descriptor,tag,VARCHAR(100),$descriptor
exdesc,tag,VARCHAR(100),$exdesc
engunits,tag,VARCHAR(100),$engunits
pointsource,tag,VARCHAR(100),$pointsource
step,tag,VARCHAR(100),$step
future,tag,VARCHAR(100),$future
element_paths,tag,VARCHAR(512),`$element_paths.replace("\\", ".")`
Meter_1000004_Voltage,POINT,volt_float32
Meter_1000004_Current,POINT,milliampere_float32
@ -177,25 +177,25 @@ Meter_1000474_Voltage,POINT,volt_float32
Meter_1000474_Current,POINT,milliampere_float32
```
A single-column model configuration file is divided into two parts. The first part is similar to the multi-column model configuration file and consists of several supertable definitions. The second part is the point list, which configures the mapping between points and supertables. The default configuration maps points with the same UOM and data type to the same supertable.
The single-column model configuration file is divided into two parts. The first part, like the multi-column model configuration file, consists of several supertable definitions. The second part is the point list, which configures the mapping between points and supertables. The default configuration maps points with the same UOM and data type to the same supertable.
### Backfill Configuration
1. For PI tasks, a "restart compensation time" can be configured. If the task is interrupted unexpectedly, this parameter is useful upon restart, as it allows `taosX` to automatically backfill a period of data.
2. For PI backfill tasks, you must configure the start and end times for the backfill.
1. For PI tasks, you can configure the "restart compensation time." If the task is unexpectedly interrupted, configuring this parameter when restarting is very useful as it allows taosX to automatically backfill data for a period.
2. For PI backfill tasks, you must configure the start and end times of the backfill.
### Advanced Options
The advanced options differ for different task types. The common advanced options are:
The advanced options vary for different types of tasks. Common advanced options include:
1. Connector log level.
2. Batch size for querying and sending data.
3. Maximum delay for a single read.
1. Connector log level
2. Batch size for connector queries and data sending
3. Maximum delay for a single read
For **multi-column real-time tasks**, there are the following toggle options:
For **real-time tasks of the multi-column model**, there are also the following switch options:
1. Sync new elements. If enabled, the PI connector will monitor new elements in the template. Without restarting the task, it can automatically synchronize the data for new elements.
2. Sync static attribute changes. If enabled, the PI connector will sync changes to all static attributes (non-PI Point attributes). This means that if a static attribute of an element is modified in the PI AF Server, the corresponding tag value in the TDengine table will also be modified.
3. Sync delete element operations. If enabled, the PI connector will listen for element deletion events in the configured template and sync the deletion of the corresponding subtable in TDengine.
4. Sync delete historical data operations. If enabled, for the time-series data of an element, if data from a certain time is deleted in PI, the corresponding column data in TDengine for that time will be set to null.
5. Sync historical data modifications. If enabled, for the time-series data of an element, if historical data is modified in PI, the corresponding data in TDengine will also be updated.
1. Whether to synchronize newly added elements. If enabled, the PI connector will listen for newly added elements under the template and automatically synchronize the data of the newly added elements without needing to restart the task.
2. Whether to synchronize changes in static attributes. If enabled, the PI connector will synchronize all changes in static attributes (non-PI Point attributes). That is, if a static attribute value of an element in the PI AF Server is modified, the corresponding tag value in the TDengine table will also be modified.
3. Whether to synchronize the deletion of elements. If enabled, the PI connector will listen for events of element deletions under the configured template and synchronize the deletion of the corresponding subtable in TDengine.
4. Whether to synchronize the deletion of historical data. If enabled, for the time-series data of an element, if data at a certain time is deleted in PI, the corresponding column data at that time in TDengine will be set to null.
5. Whether to synchronize the modification of historical data. If enabled, for the time-series data of an element, if historical data is modified in PI, the corresponding data at that time in TDengine will also be updated.

View File

@ -14,21 +14,21 @@ import imgStep7 from '../../assets/opc-ua-07.png';
import imgStep8 from '../../assets/opc-ua-08.png';
import imgStep9 from '../../assets/opc-ua-09.png';
This section explains how to create a data migration task through the Explorer interface, syncing data from an OPC-UA server to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface to synchronize data from an OPC-UA server to the current TDengine cluster.
## Overview
OPC is one of the interoperability standards for securely and reliably exchanging data in industrial automation and other industries.
OPC is one of the interoperability standards for securely and reliably exchanging data in the field of industrial automation and other industries.
OPC-UA is the next-generation standard of the classic OPC specification. It is a platform-independent, service-oriented architecture specification that integrates all the features of the existing OPC Classic specification and provides a path to migrate to a more secure and scalable solution.
OPC-UA is the next-generation standard of the classic OPC specifications, a platform-independent, service-oriented architecture specification that integrates all the functionalities of the existing OPC Classic specifications, providing a path to a more secure and scalable solution.
TDengine can efficiently read data from the OPC-UA server and write it to TDengine to achieve real-time data ingestion.
TDengine can efficiently read data from OPC-UA servers and write it to TDengine, enabling real-time data ingestion.
## Creating a Task
### 1. Add Data Source
### 1. Add a Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
On the data writing page, click the **+ Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep1} alt=""/>
@ -36,13 +36,13 @@ On the Data Ingestion page, click the **+Add Data Source** button to go to the A
### 2. Configure Basic Information
Enter a task name in the **Name** field, such as for a task monitoring environmental temperature and humidity, name it **environment-monitoring**.
Enter the task name in **Name**, for example, for environmental temperature and humidity monitoring, name it **environment-monitoring**.
Select **OPC-UA** from the **Type** dropdown list.
The agent is optional. If needed, you can select a designated agent from the dropdown or click the **+Create New Agent** button on the right.
**Proxy** is optional, you can select a specific proxy from the dropdown list, or click the **+ Create New Proxy** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+ Create Database** button on the right.
<figure>
<Image img={imgStep2} alt=""/>
@ -50,20 +50,20 @@ Select a target database from the **Target Database** dropdown list, or click th
### 3. Configure Connection Information
In the **Connection Configuration** section, fill in the **OPC-UA Server Address**, such as: `127.0.0.1:5000`, and configure the data transmission security mode. There are three security modes to choose from:
In the **Connection Configuration** area, fill in the **OPC-UA Service Address**, for example: `127.0.0.1:5000`, and configure the data transmission security mode, with three security modes available:
1. None: Data is transmitted in plaintext.
2. Sign: Digital signatures are used to verify the communication data, ensuring data integrity.
3. SignAndEncrypt: Digital signatures are used to verify the communication data, and encryption algorithms are applied to encrypt the data, ensuring data integrity, authenticity, and confidentiality.
1. None: Communication data is transmitted in plaintext.
2. Sign: Communication data is verified using a digital signature to protect data integrity.
3. SignAndEncrypt: Communication data is verified using a digital signature and encrypted using encryption algorithms to ensure data integrity, authenticity, and confidentiality.
If you select Sign or SignAndEncrypt as the security mode, you must select a valid security policy. The security policy defines how encryption and verification mechanisms in the security mode are implemented, including the encryption algorithm used, key length, digital certificates, etc. The available security policies are:
If you choose Sign or SignAndEncrypt as the security mode, you must select a valid security policy. Security policies define how to implement the encryption and verification mechanisms in the security mode, including the encryption algorithms used, key lengths, digital certificates, etc. Available security policies include:
1. None: Only selectable when the security mode is None.
2. Basic128Rsa15: Uses RSA algorithm and a 128-bit key length to sign or encrypt communication data.
3. Basic256: Uses AES algorithm and a 256-bit key length to sign or encrypt communication data.
4. Basic256Sha256: Uses AES algorithm and a 256-bit key length, and encrypts the digital signature using the SHA-256 algorithm.
5. Aes128Sha256RsaOaep: Uses AES-128 algorithm for encrypting and decrypting communication data, encrypts the digital signature using SHA-256 algorithm, and uses RSA algorithm and OAEP mode for encrypting and decrypting symmetric communication keys.
6. Aes256Sha256RsaPss: Uses AES-256 algorithm for encrypting and decrypting communication data, encrypts the digital signature using SHA-256 algorithm, and uses RSA algorithm and PSS mode for encrypting and decrypting symmetric communication keys.
2. Basic128Rsa15: Uses RSA algorithm and 128-bit key length to sign or encrypt communication data.
3. Basic256: Uses AES algorithm and 256-bit key length to sign or encrypt communication data.
4. Basic256Sha256: Uses AES algorithm and 256-bit key length, and encrypts digital signatures using the SHA-256 algorithm.
5. Aes128Sha256RsaOaep: Uses AES-128 algorithm for encrypting and decrypting communication data, encrypts digital signatures using the SHA-256 algorithm, and uses RSA algorithm and OAEP mode for encrypting and decrypting symmetric communication keys.
6. Aes256Sha256RsaPss: Uses AES-256 algorithm for encrypting and decrypting communication data, encrypts digital signatures using the SHA-256 algorithm, and uses RSA algorithm and PSS mode for encrypting and decrypting symmetric communication keys.
<figure>
<Image img={imgStep3} alt=""/>
@ -71,31 +71,31 @@ If you select Sign or SignAndEncrypt as the security mode, you must select a val
### 4. Choose Authentication Method
As shown in the image below, switch the tab to choose different authentication methods. The available authentication methods are:
As shown below, switch tabs to choose different authentication methods, with the following options available:
1. Anonymous
2. Username
3. Certificate access: This can be the same as the security communication certificate or a different one.
3. Certificate Access: Can be the same as the security communication certificate, or a different certificate.
<figure>
<Image img={imgStep4} alt=""/>
</figure>
After configuring the connection properties and authentication method, click the **Connectivity Check** button to check if the data source is available. If you are using a security communication certificate or authentication certificate, the certificate must be trusted by the OPC UA server; otherwise, it will fail the check.
After configuring the connection properties and authentication method, click the **Connectivity Check** button to check if the data source is available. If using a security communication certificate or authentication certificate, the certificate must be trusted by the OPC UA server, otherwise, it will still fail.
### 5. Configure Data Points Set
### 5. Configure Points Set
You can choose to use a CSV file template or **Select All Data Points** for the **Data Points Set**.
**Points Set** can choose to use a CSV file template or **Select All Points**.
#### 5.1. Upload CSV Configuration File
You can download an empty CSV template and configure the data point information based on the template, then upload the CSV configuration file to configure data points. Alternatively, download the data points based on configured filtering conditions in the format specified by the CSV template.
You can download the CSV blank template and configure the point information according to the template, then upload the CSV configuration file to configure points; or download data points according to the configured filter conditions, and download in the format specified by the CSV template.
The CSV file must follow these rules:
CSV files have the following rules:
1. File Encoding
The uploaded CSV file must be encoded in one of the following formats:
The encoding format of the CSV file uploaded by the user must be one of the following:
(1) UTF-8 with BOM
@ -103,75 +103,75 @@ The uploaded CSV file must be encoded in one of the following formats:
2. Header Configuration Rules
The header is the first row of the CSV file. The rules are as follows:
The header is the first line of the CSV file, with the following rules:
(1) The following columns can be configured in the CSV header:
(1) The header of the CSV can configure the following columns:
| No. | Column Name | Description | Required | Default Behavior |
| ---- | ----------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
| 1 | point_id | The id of the data point on the OPC UA server | Yes | None |
| 2 | stable | The supertable in TDengine corresponding to the data point | Yes | None |
| 3 | tbname | The subtable in TDengine corresponding to the data point | Yes | None |
| 4 | enable | Whether to collect data for this point | No | Uses a default value of `1` as the enable value |
| 5 | value_col | The column name in TDengine where the collected value of the data point is stored | No | Uses a default value of `val` as the value_col value |
| 6 | value_transform | The transformation function executed on the collected value in taosX | No | No transformation will be applied |
| 7 | type | The data type of the collected value | No | The original type of the collected value will be used as the data type in TDengine |
| 8 | quality_col | The column name in TDengine where the quality of the collected value is stored | No | No quality column will be added in TDengine |
| 9 | ts_col | The timestamp column in TDengine where the original timestamp of the data point is stored | No | If both ts_col and received_ts_col are non-empty, the former is used as the timestamp column. If one of the two is empty, the non-empty column is used. If both are empty, the original timestamp of the data point is used as the timestamp. |
| 10 | received_ts_col | The timestamp column in TDengine where the received timestamp of the data point is stored | No | Same as above |
| 11 | ts_transform | The transformation function applied to the data point's timestamp in taosX | No | No transformation will be applied to the original timestamp of the data point |
| 12 | received_ts_transform | The transformation function applied to the received timestamp of the data point in taosX | No | Same as above |
| 13 | tag::VARCHAR(200)::name | The Tag column in TDengine corresponding to the data point. `tag` is a reserved keyword that indicates the column is a tag column. `VARCHAR(200)` indicates the tag's type. `name` is the actual name of the tag. | No | If 1 or more tag columns are configured, the specified tag columns are used. If no tag columns are configured and the supertable exists in TDengine, the supertable's tags are used. If not, default tags are added: `point_id` and `point_name`. |
| Number | Column Name | Description | Required | Default Behavior |
| ------ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | point_id | The id of the data point on the OPC UA server | Yes | None |
| 2 | stable | The corresponding supertable for the data point in TDengine | Yes | None |
| 3 | tbname | The corresponding subtable for the data point in TDengine | Yes | None |
| 4 | enable | Whether to collect data from this point | No | Use the unified default value `1` for enable |
| 5 | value_col | The column name in TDengine corresponding to the collected value of the data point | No | Use the unified default value `val` as the value_col |
| 6 | value_transform | The transformation function executed in taosX for the collected value of the data point | No | Do not transform the collected value uniformly |
| 7 | type | The data type of the collected value of the data point | No | Use the original type of the collected value as the data type in TDengine |
| 8 | quality_col | The column name in TDengine corresponding to the quality of the collected value | No | Do not add a quality column in TDengine uniformly |
| 9 | ts_col | The original timestamp column of the data point in TDengine | No | If both ts_col and received_ts_col are non-empty, use the former as the timestamp column; if one of ts_col or received_ts_col is non-empty, use the non-empty column as the timestamp column; if both are empty, use the original timestamp of the data point as the timestamp column with the default name `ts`. |
| 10 | received_ts_col | The timestamp column in TDengine when the data point value is received | No | Same as above |
| 11 | ts_transform | The transformation function executed in taosX for the original timestamp of the data point | No | Do not transform the original timestamp of the data point uniformly |
| 12 | received_ts_transform | The transformation function executed in taosX for the received timestamp of the data point | No | Do not transform the received timestamp of the data point uniformly |
| 13 | tag::VARCHAR(200)::name | The Tag column corresponding to the data point in TDengine. Here `tag` is a reserved keyword indicating that this column is a tag; `VARCHAR(200)` indicates the type of tag; `name` is the actual name of the tag. | No | If 1 or more tag columns are configured, use the configured tag columns; if no tag columns are configured and stable exists in TDengine, use the tags of the stable in TDengine; if no tag columns are configured and stable does not exist in TDengine, automatically add the following 2 tag columns: tag::VARCHAR(256)::point_id and tag::VARCHAR(256)::point_name |
(2) The CSV header must not have duplicate columns.
(2) In the CSV Header, there cannot be duplicate columns;
(3) Columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple tags in TDengine. However, tag names must not be duplicated.
(3) In the CSV Header, columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple Tags in TDengine, but the names of the Tags cannot be repeated.
(4) The order of columns in the CSV header does not affect the CSV file validation rules.
(4) In the CSV Header, the order of the columns does not affect the CSV file validation rules;
(5) Columns not listed in the table can also be configured, such as serial numbers. These columns will be automatically ignored.
(5) In the CSV Header, columns that are not listed in the table above can be configured, such as: sequence number, these columns will be automatically ignored.
3. Row Configuration Rules
Each row in the CSV file configures an OPC data point. The row rules are as follows:
Each Row in the CSV file configures an OPC data point. The rules for Rows are as follows:
(1) The columns in the row must correspond to the columns in the header:
(1) Correspondence with columns in the Header
| No. | Header Column | Value Type | Value Range | Required | Default Value |
| ---- | ----------------------- | ---------- | ------------------------------------------------------------ | -------- | ---------------------------------------- |
| 1 | point_id | String | A string like `ns=3;i=1005`, which must conform to the OPC UA ID specification, i.e., must contain ns and id parts | Yes | |
| 2 | enable | int | 0: Do not collect data for this point. The subtable corresponding to the data point will be deleted from TDengine before the OPC DataIn task starts. 1: Collect data for this point. The subtable will not be deleted. | No | 1 |
| 3 | stable | String | Any string that conforms to TDengine supertable naming conventions. Special characters such as `.` will be replaced with underscores. If `{type}` is present: - If `type` is non-empty, it will be replaced by the value of `type`. - If `type` is empty, the original type of the collected value will be used. | Yes | |
| 4 | tbname | String | Any string that conforms to TDengine subtable naming conventions. Special characters such as `.` will be replaced with underscores. For OPC UA: - If `{ns}` is present, it will be replaced with the ns part of the point_id. - If `{id}` is present, it will be replaced with the id part of the point_id. | Yes | |
| 5 | value_col | String | A column name that conforms to TDengine naming conventions | No | val |
| 6 | value_transform | String | A calculation expression that conforms to the Rhai engine, such as `(val + 10) / 1000 * 2.0` or `log(val) + 10` | No | None |
| 7 | type | String | Supported types include: `b/bool/i8/tinyint/i16/small/inti32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/float/f64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json` | No | The original type of the collected value |
| 8 | quality_col | String | A column name that conforms to TDengine naming conventions | No | None |
| 9 | ts_col | String | A column name that conforms to TDengine naming conventions | No | ts |
| 10 | received_ts_col | String | A column name that conforms to TDengine naming conventions | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, e.g., `ts / 1000 * 1000` to set the last 3 digits of a timestamp in ms to 0; `ts + 8 * 3600 * 1000` to add 8 hours to a timestamp in ms precision; `ts - 8 * 3600 * 1000` to subtract 8 hours from a timestamp in ms precision. | No | None |
| 12 | received_ts_transform | String | None | No | None |
| 13 | tag::VARCHAR(200)::name | String | The value in the tag. When the tag type is VARCHAR, it can be in Chinese | No | NULL |
| Number | Column in Header | Type of Value | Value Range | Mandatory | Default Value |
| ------ | ----------------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------ |
| 1 | point_id | String | Strings like `ns=3;i=1005`, must meet the OPC UA ID specification, i.e., include ns and id parts | Yes | |
| 2 | enable | int | 0: Do not collect this point, and delete the corresponding subtable in TDengine before the OPC DataIn task starts; 1: Collect this point, do not delete the subtable before the OPC DataIn task starts. | No | 1 |
| 3 | stable | String | Any string that meets the TDengine supertable naming convention; if special character `.` exists, replace with underscore if `{type}` exists: if type in CSV file is not empty, replace with the value of type if type is empty, replace with the original type of the collected value | Yes | |
| 4 | tbname | String | Any string that meets the TDengine subtable naming convention; for OPC UA: if `{ns}` exists, replace with ns from point_id if `{id}` exists, replace with id from point_id for OPC DA: if `{tag_name}` exists, replace with tag_name | Yes | |
| 5 | value_col | String | Column name that meets TDengine naming convention | No | val |
| 6 | value_transform | String | Expressions that meet the Rhai engine, for example: `(val + 10) / 1000 * 2.0`, `log(val) + 10`, etc.; | No | None |
| 7 | type | String | Supported types include: b/bool/i8/tinyint/i16/small/inti32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/float/f64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json | No | Original type of the data point value |
| 8 | quality_col | String | Column name that meets TDengine naming convention | No | None |
| 9 | ts_col | String | Column name that meets TDengine naming convention | No | ts |
| 10 | received_ts_col | String | Column name that meets TDengine naming convention | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, for example: ts / 1000* 1000, sets the last 3 digits of a timestamp in ms to 0; ts + 8 *3600* 1000, adds 8 hours to a timestamp in ms; ts - 8 *3600* 1000, subtracts 8 hours from a timestamp in ms; | No | None |
| 12 | received_ts_transform | String | No | None | |
| 13 | tag::VARCHAR(200)::name | String | The value inside a tag, when the tag type is VARCHAR, can be in Chinese | No | NULL |
(2) point_id must be unique across the entire DataIn task. In an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you must create multiple OPC DataIn tasks.
(2) `point_id` is unique throughout the DataIn task, meaning: in an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you need to create multiple OPC DataIn tasks;
(3) If point_id is different but tbname is the same, value_col must be different. This configuration allows data from multiple data points of different data types to be written to different columns in the same subtable. This approach corresponds to the use case of "wide tables for OPC data ingestion into TDengine."
(3) When `point_id` is different but `tbname` is the same, `value_col` must be different. This configuration allows data from multiple data points of different types to be written to different columns in the same subtable. This method corresponds to the "OPC data into TDengine wide table" usage scenario.
4. Other Rules
(1) If the number of columns in the header and the row is inconsistent, validation fails, and the user is prompted with the row number that does not meet the requirements.
(1) If the number of columns in Header and Row are inconsistent, the validation fails, and the user is prompted with the line number that does not meet the requirements;
(2) The header must be in the first row and cannot be empty.
(2) Header is on the first line and cannot be empty;
(3) At least one data point is required.
(3) There must be at least one data point;
#### 5.2. Select Data Points
#### 5.2. Selecting Data Points
You can filter data points by configuring **Root Node ID**, **Namespace**, **Regular Expression Match**, and other conditions.
Data points can be filtered by configuring **Root Node ID**, **Namespace**, **Regular Matching**, etc.
Specify the supertable and subtable where the data will be written by configuring **Supertable Name** and **Table Name**.
Configure **Supertable Name**, **Table Name** to specify the supertable and subtable where the data will be written.
Configure the **Primary Key Column**: select `origin_ts` to use the original timestamp of the OPC data point as the primary key in TDengine, or select `received_ts` to use the received timestamp as the primary key in TDengine. You can also configure the **Primary Key Alias** to specify the name of the timestamp column in TDengine.
Configure **Primary Key Column**, choose `origin_ts` to use the original timestamp of the OPC data point as the primary key in TDengine; choose `received_ts` to use the data's reception timestamp as the primary key in TDengine. Configure **Primary Key Alias** to specify the name of the TDengine timestamp column.
<figure>
<Image img={imgStep5} alt=""/>
@ -179,7 +179,7 @@ Configure the **Primary Key Column**: select `origin_ts` to use the original tim
### 6. Collection Configuration
In the collection configuration, configure the collection mode, collection interval, collection timeout, and other options for the current task.
In the collection configuration, configure the current task's collection mode, collection interval, collection timeout, etc.
<figure>
<Image img={imgStep6} alt=""/>
@ -187,19 +187,19 @@ In the collection configuration, configure the collection mode, collection inter
As shown in the image above:
- **Collection Mode**: You can use the `subscribe` or `observe` mode.
- `subscribe`: Subscription mode. Data is reported when there is a change and written to TDengine.
- `observe`: The latest value of the data point is polled at the `Collection Interval` and written to TDengine.
- **Collection Interval**: The default is 10 seconds. The collection interval is the duration between the end of the previous data collection and the start of the next data collection, during which the latest value of the data point is polled and written to TDengine. This is only configurable when the **Collection Mode** is set to `observe`.
- **Collection Timeout**: If no data is returned from the OPC server within the specified time when reading data points, the read operation will fail. The default is 10 seconds. This is only configurable when the **Collection Mode** is set to `observe`.
- **Collection Mode**: Can use `subscribe` or `observe` mode.
- `subscribe`: Subscription mode, reports data changes and writes to TDengine.
- `observe`: According to the `collection interval`, polls the latest value of the data point and writes to TDengine.
- **Collection Interval**: Default is 10 seconds, the interval for collecting data points, starting from the end of the last data collection, polls the latest value of the data point and writes to TDengine. Only configurable in `observe` **Collection Mode**.
- **Collection Timeout**: If the data from the OPC server is not returned within the set time when reading data points, the read fails, default is 10 seconds. Only configurable in `observe` **Collection Mode**.
When **Data Points Set** is configured using the **Select Data Points** method, you can configure **Data Point Update Mode** and **Data Point Update Interval** in the collection configuration to enable dynamic data point updates. **Dynamic Data Point Update** means that during the task's execution, if the OPC Server adds or deletes data points, data points that meet the criteria will be automatically added to the current task without restarting the OPC task.
When using **Selecting Data Points** in the **Data Point Set**, the collection configuration can configure **Data Point Update Mode** and **Data Point Update Interval** to enable dynamic data point updates. **Dynamic Data Point Update** refers to, during the task operation, after OPC Server adds or deletes data points, the data points that meet the conditions will automatically be added to the current task without needing to restart the OPC task.
- Data Point Update Mode: You can choose `None`, `Append`, or `Update`.
- None: Dynamic data point updates are not enabled.
- Append: Dynamic data point updates are enabled, but only new data points are added.
- Update: Dynamic data point updates are enabled, and data points can be added or removed.
- Data Point Update Interval: This is effective when the **Data Point Update Mode** is set to `Append` or `Update`. The unit is seconds, the default is 600, the minimum value is 60, and the maximum value is 2147483647.
- Data Point Update Mode: Can choose `None`, `Append`, `Update`.
- None: Do not enable dynamic data point updates;
- Append: Enable dynamic data point updates, but only append;
- Update: Enable dynamic data point updates, append or delete;
- Data Point Update Interval: Effective when "Data Point Update Mode" is `Append` and `Update`. Unit: seconds, default value is 600, minimum value: 60, maximum value: 2147483647.
### 7. Advanced Options
@ -207,35 +207,43 @@ When **Data Points Set** is configured using the **Select Data Points** method,
<Image img={imgStep7} alt=""/>
</figure>
As shown in the image above, advanced options can be configured to further optimize performance, logging, and more.
As shown in the image above, configure advanced options for more detailed optimization of performance, logs, etc.
The **Log Level** is set to `info` by default. The available options are `error`, `warn`, `info`, `debug`, and `trace`.
**Log Level** defaults to `info`, with options `error`, `warn`, `info`, `debug`, `trace`.
The **Max Write Concurrency** option sets the maximum concurrency limit for writing to taosX. The default value is 0, which means auto, and concurrency is automatically configured.
In **Maximum Write Concurrency**, set the maximum concurrency limit for writing to taosX. Default value: 0, meaning auto, automatically configures concurrency.
The **Batch Size** option sets the batch size for each write operation, i.e., the maximum number of messages sent at once.
In **Batch Size**, set the batch size for each write, i.e., the maximum number of messages sent at one time.
The **Batch Delay** option sets the maximum delay for a single send operation (in seconds). When the timeout expires, if there is data, even if the **Batch Size** is not met, the data will be sent immediately.
In **Batch Delay**, set the maximum delay for a single send (in seconds), when the timeout ends, as long as there is data, it is sent immediately even if it does not meet the **Batch Size**.
In the **Save Raw Data** option, choose whether to save the raw data. The default is No.
In **Save Raw Data**, choose whether to save raw data. Default value: No.
When saving raw data, the following two parameters become effective.
When saving raw data, the following 2 parameters are effective.
The **Max Retention Days** option sets the maximum retention days for raw data.
In **Maximum Retention Days**, set the maximum retention days for raw data.
The **Raw Data Storage Directory** option sets the path for storing raw data. If an agent is used, the storage path refers to the path on the agent's server; otherwise, it is the path on the taosX server. The path can use the `$DATA_DIR` placeholder and `:id` as part of the path.
In **Raw Data Storage Directory**, set the path for saving raw data. If using Agent, the storage path refers to the path on the server where the Agent is located, otherwise it is on the taosX server. The path can use placeholders `$DATA_DIR` and `:id` as part of the path.
- On Linux platforms, `$DATA_DIR` is `/var/lib/taos/taosx`. By default, the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platforms, `$DATA_DIR` is `C:\TDengine\data\taosx`. By default, the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
- On Linux platform, `$DATA_DIR` is /var/lib/taos/taosx, by default the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platform, `$DATA_DIR` is C:\TDengine\data\taosx, by default the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
### 8. Task Completion
### 8. Completion
Click the **Submit** button to complete the OPC UA to TDengine data synchronization task. Return to the **Data Sources List** page to view the task's execution status.
Click the **Submit** button to complete the creation of the OPC UA to TDengine data synchronization task. Return to the **Data Source List** page to view the status of the task execution.
## Adding Data Points
## Add Data Points
While the task is running, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
During the task execution, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
In the pop-up form, fill in the data point information.
<figure>
<Image img={imgStep8} alt=""/>
</figure>
Click the **Confirm** button to complete the addition of data points.
In the pop-up form, fill in the information for the data points.
<figure>
<Image img={imgStep9} alt=""/>
</figure>
Click the **Confirm** button to complete the addition of the data points.

View File

@ -13,21 +13,21 @@ import imgStep6 from '../../assets/opc-da-06.png';
import imgStep7 from '../../assets/opc-da-07.png';
import imgStep8 from '../../assets/opc-da-08.png';
This section explains how to create a data migration task through the Explorer interface, syncing data from an OPC-DA server to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, synchronizing data from an OPC-DA server to the current TDengine cluster.
## Overview
OPC is one of the interoperability standards for securely and reliably exchanging data in industrial automation and other industries.
OPC is one of the interoperability standards for secure and reliable data exchange in the field of industrial automation and other industries.
OPC DA (Data Access) is a classic COM-based specification that is only applicable to Windows. Although OPC DA is not the most modern or efficient data communication standard, it is widely used because some older devices only support OPC DA.
OPC DA (Data Access) is a classic COM-based specification, only applicable to Windows. Although OPC DA is not the latest and most efficient data communication specification, it is widely used. This is mainly because some old equipment only supports OPC DA.
TDengine can efficiently read data from the OPC-DA server and write it to TDengine to achieve real-time data ingestion.
TDengine can efficiently read data from OPC-DA servers and write it to TDengine, achieving real-time data storage.
## Creating a Task
### 1. Add Data Source
### 1. Add a Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
On the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep1} alt=""/>
@ -35,13 +35,13 @@ On the Data Ingestion page, click the **+Add Data Source** button to go to the A
### 2. Configure Basic Information
Enter a task name in the **Name** field, such as for a task monitoring environmental temperature and humidity, name it **environment-monitoring**.
Enter the task name in **Name**, for example, for environmental temperature and humidity monitoring, name it **environment-monitoring**.
Select **OPC-DA** from the **Type** dropdown list.
If the taosX service is running on the same server as the OPC-DA server, the agent is not required. Otherwise, you need to configure an agent: select a designated agent from the dropdown or click the **+Create New Agent** button to create a new agent, and follow the prompts to configure the agent.
If the taosX service is running on the same server as OPC-DA, **Proxy** is not necessary; otherwise, configure **Proxy**: select a specified proxy from the dropdown, or click the **+Create New Proxy** button on the right to create a new proxy and follow the prompts to configure the proxy.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep2} alt=""/>
@ -49,7 +49,7 @@ Select a target database from the **Target Database** dropdown list, or click th
### 3. Configure Connection Information
In the **Connection Configuration** section, fill in the **OPC-DA Server Address**, such as `127.0.0.1/Matrikon.OPC.Simulation.1`, and configure the authentication method.
Fill in the **OPC-DA Service Address** in the **Connection Configuration** area, for example: `127.0.0.1/Matrikon.OPC.Simulation.1`, and configure the authentication method.
Click the **Connectivity Check** button to check if the data source is available.
@ -57,19 +57,19 @@ Click the **Connectivity Check** button to check if the data source is available
<Image img={imgStep3} alt=""/>
</figure>
### 4. Configure Data Points Set
### 4. Configure Points Set
You can choose to use a CSV file template or **Select All Data Points** for the **Data Points Set**.
**Points Set** can choose to use a CSV file template or **Select All Points**.
#### 4.1. Upload CSV Configuration File
You can download an empty CSV template and configure the data point information based on the template, then upload the CSV configuration file to configure data points. Alternatively, download the data points based on the configured filtering conditions in the format specified by the CSV template.
You can download the CSV blank template and configure the point information according to the template, then upload the CSV configuration file to configure the points; or download the data points according to the configured filter conditions, and download them in the format specified by the CSV template.
The CSV file must follow these rules:
CSV files have the following rules:
1. File Encoding
The uploaded CSV file must be encoded in one of the following formats:
The encoding format of the CSV file uploaded by the user must be one of the following:
(1) UTF-8 with BOM
@ -77,75 +77,75 @@ The uploaded CSV file must be encoded in one of the following formats:
2. Header Configuration Rules
The header is the first row of the CSV file. The rules are as follows:
The header is the first line of the CSV file, with the following rules:
(1) The following columns can be configured in the CSV header:
(1) The header of the CSV can configure the following columns:
| No. | Column Name | Description | Required | Default Behavior |
| ---- | ----------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
| 1 | tag_name | The id of the data point on the OPC DA server | Yes | None |
| 2 | stable | The supertable in TDengine corresponding to the data point | Yes | None |
| 3 | tbname | The subtable in TDengine corresponding to the data point | Yes | None |
| 4 | enable | Whether to collect data for this point | No | Uses a default value of `1` as the enable value |
| 5 | value_col | The column name in TDengine where the collected value of the data point is stored | No | Uses a default value of `val` as the value_col value |
| 6 | value_transform | The transformation function executed on the collected value in taosX | No | No transformation will be applied |
| 7 | type | The data type of the collected value | No | The original type of the collected value will be used as the data type in TDengine |
| 8 | quality_col | The column name in TDengine where the quality of the collected value is stored | No | No quality column will be added in TDengine |
| 9 | ts_col | The timestamp column in TDengine where the original timestamp of the data point is stored | No | If both ts_col and received_ts_col are non-empty, the former is used as the timestamp column. If one of the two is empty, the non-empty column is used. If both are empty, the original timestamp of the data point is used as the timestamp. |
| 10 | received_ts_col | The timestamp column in TDengine where the received timestamp of the data point is stored | No | Same as above |
| 11 | ts_transform | The transformation function applied to the data point's timestamp in taosX | No | No transformation will be applied to the original timestamp of the data point |
| 12 | received_ts_transform | The transformation function applied to the received timestamp of the data point in taosX | No | Same as above |
| 13 | tag::VARCHAR(200)::name | The Tag column in TDengine corresponding to the data point. `tag` is a reserved keyword that indicates the column is a tag column. `VARCHAR(200)` indicates the tag's type. `name` is the actual name of the tag. | No | If 1 or more tag columns are configured, the specified tag columns are used. If no tag columns are configured and the supertable exists in TDengine, the supertable's tags are used. If not, default tags are added: `point_id` and `point_name`. |
| No. | Column Name | Description | Required | Default Behavior |
| ---- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | tag_name | The id of the data point on the OPC DA server | Yes | None |
| 2 | stable | The supertable in TDengine corresponding to the data point | Yes | None |
| 3 | tbname | The subtable in TDengine corresponding to the data point | Yes | None |
| 4 | enable | Whether to collect data from this point | No | Use a unified default value `1` for enable |
| 5 | value_col | The column name in TDengine corresponding to the collected value of the data point | No | Use a unified default value `val` as the value_col |
| 6 | value_transform | The transform function executed in taosX for the collected value of the data point | No | Do not perform a transform on the collected value |
| 7 | type | The data type of the collected value of the data point | No | Use the original type of the collected value as the data type in TDengine |
| 8 | quality_col | The column name in TDengine corresponding to the quality of the collected value | No | Do not add a quality column in TDengine |
| 9 | ts_col | The timestamp column in TDengine corresponding to the original timestamp of the data point | No | If both ts_col and received_ts_col are non-empty, use the former as the timestamp column; if one of ts_col or received_ts_col is non-empty, use the non-empty column as the timestamp column; if both are empty, use the original timestamp of the data point as the timestamp column in TDengine, with the default column name ts. |
| 10 | received_ts_col | The timestamp column in TDengine corresponding to the timestamp when the data point value was received | No | |
| 11 | ts_transform | The transform function executed in taosX for the original timestamp of the data point | No | Do not perform a transform on the original timestamp of the data point |
| 12 | received_ts_transform | The transform function executed in taosX for the received timestamp of the data point | No | Do not perform a transform on the received timestamp of the data point |
| 13 | tag::VARCHAR(200)::name | The Tag column in TDengine corresponding to the data point. Where `tag` is a reserved keyword, indicating that this column is a tag column; `VARCHAR(200)` indicates the type of this tag, which can also be other legal types; `name` is the actual name of this tag. | No | If configuring more than one tag column, use the configured tag columns; if no tag columns are configured, and stable exists in TDengine, use the tags of the stable in TDengine; if no tag columns are configured, and stable does not exist in TDengine, automatically add the following two tag columns by default: tag::VARCHAR(256)::point_idtag::VARCHAR(256)::point_name |
(2) The CSV header must not have duplicate columns.
(2) In the CSV Header, there cannot be duplicate columns;
(3) Columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple tags in TDengine. However, tag names must not be duplicated.
(3) In the CSV Header, columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple Tags in TDengine, but the names of the Tags cannot be duplicated.
(4) The order of columns in the CSV header does not affect the CSV file validation rules.
(4) In the CSV Header, the order of columns does not affect the CSV file validation rules;
(5) Columns not listed in the table can also be configured, such as serial numbers. These columns will be automatically ignored.
(5) In the CSV Header, columns that are not listed in the table above, such as: serial number, will be automatically ignored.
3. Row Configuration Rules
Each row in the CSV file configures an OPC data point. The row rules are as follows:
Each Row in the CSV file configures an OPC data point. The rules for Rows are as follows:
(1) The columns in the row must correspond to the columns in the header:
(1) Correspondence with columns in the Header
| No. | Header Column | Value Type | Value Range | Required | Default Value |
| ---- | ----------------------- | ---------- | ------------------------------------------------------------ | -------- | ---------------------------------------- |
| 1 | tag_name | String | A string like `root.parent.temperature`, which must conform to the OPC DA ID specification | Yes | |
| 2 | enable | int | 0: Do not collect data for this point. The subtable corresponding to the data point will be deleted from TDengine before the OPC DataIn task starts. 1: Collect data for this point. The subtable will not be deleted. | No | 1 |
| 3 | stable | String | Any string that conforms to TDengine supertable naming conventions. Special characters such as `.` will be replaced with underscores. If `{type}` is present: - If `type` is non-empty, it will be replaced by the value of `type`. - If `type` is empty, the original type of the collected value will be used. | Yes | |
| 4 | tbname | String | Any string that conforms to TDengine subtable naming conventions. Special characters such as `.` will be replaced with underscores. If `{tag_name}` is present, it will be replaced with the tag_name. | Yes | |
| 5 | value_col | String | A column name that conforms to TDengine naming conventions | No | val |
| 6 | value_transform | String | A calculation expression that conforms to the Rhai engine, such as `(val + 10) / 1000 * 2.0` or `log(val) + 10` | No | None |
| 7 | type | String | Supported types include: `b/bool/i8/tinyint/i16/small/inti32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/float/f64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json` | No | The original type of the collected value |
| 8 | quality_col | String | A column name that conforms to TDengine naming conventions | No | None |
| 9 | ts_col | String | A column name that conforms to TDengine naming conventions | No | ts |
| 10 | received_ts_col | String | A column name that conforms to TDengine naming conventions | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, e.g., `ts / 1000 * 1000` to set the last 3 digits of a timestamp in ms to 0; `ts + 8 * 3600 * 1000` to add 8 hours to a timestamp in ms precision; `ts - 8 * 3600 * 1000` to subtract 8 hours from a timestamp in ms precision. | No | None |
| 12 | received_ts_transform | String | None | No | None |
| 13 | tag::VARCHAR(200)::name | String | The value in the tag. When the tag type is VARCHAR, it can be in Chinese | No | NULL |
| Number | Column in Header | Type of Value | Range of Values | Mandatory | Default Value |
| ------ | ----------------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------ |
| 1 | tag_name | String | Strings like `root.parent.temperature`, must meet the OPC DA ID specification | Yes | |
| 2 | enable | int | 0: Do not collect this point, and delete the corresponding subtable in TDengine before the OPC DataIn task starts; 1: Collect this point, do not delete the subtable before the OPC DataIn task starts. | No | 1 |
| 3 | stable | String | Any string that meets the TDengine supertable naming convention; if there are special characters `.`, replace with underscore. If `{type}` exists: if type in CSV file is not empty, replace with the value of type; if empty, replace with the original type of the collected value | Yes | |
| 4 | tbname | String | Any string that meets the TDengine subtable naming convention; for OPC UA: if `{ns}` exists, replace with ns from point_id; if `{id}` exists, replace with id from point_id; for OPC DA: if `{tag_name}` exists, replace with tag_name | Yes | |
| 5 | value_col | String | Column name that meets TDengine naming convention | No | val |
| 6 | value_transform | String | Computation expressions supported by Rhai engine, such as: `(val + 10) / 1000 * 2.0`, `log(val) + 10`, etc.; | No | None |
| 7 | type | String | Supported types include: b/bool/i8/tinyint/i16/smallint/i32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/floatf64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json | No | Original type of data point value |
| 8 | quality_col | String | Column name that meets TDengine naming convention | No | None |
| 9 | ts_col | String | Column name that meets TDengine naming convention | No | ts |
| 10 | received_ts_col | String | Column name that meets TDengine naming convention | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, for example: ts / 1000* 1000, sets the last 3 digits of a ms unit timestamp to 0; ts + 8 *3600* 1000, adds 8 hours to a ms precision timestamp; ts - 8 *3600* 1000, subtracts 8 hours from a ms precision timestamp; | No | None |
| 12 | received_ts_transform | String | No | None | |
| 13 | tag::VARCHAR(200)::name | String | The value in tag, when the tag type is VARCHAR, it can be in Chinese | No | NULL |
(2) tag_name must be unique across the entire DataIn task. In an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you must create multiple OPC DataIn tasks.
(2) `tag_name` is unique throughout the DataIn task, that is: in an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you need to create multiple OPC DataIn tasks;
(3) If tag_name is different but tbname is the same, value_col must be different. This configuration allows data from multiple data points of different data types to be written to different columns in the same subtable. This approach corresponds to the use case of "wide tables for OPC data ingestion into TDengine."
(3) When `tag_name` is different but `tbname` is the same, `value_col` must be different. This configuration allows data from multiple data points of different types to be written to different columns in the same subtable. This corresponds to the "OPC data into TDengine wide table" scenario.
4. Other Rules
(1) If the number of columns in the header and the row is inconsistent, validation fails, and the user is prompted with the row number that does not meet the requirements.
(1) If the number of columns in Header and Row are not consistent, validation fails, and the user is prompted with the line number that does not meet the requirements;
(2) The header must be in the first row and cannot be empty.
(2) Header is on the first line and cannot be empty;
(3) At least one data point is required.
(3) Row must have more than 1 line;
#### 4.2. Select Data Points
#### 4.2. Selecting Data Points
You can filter data points by configuring **Root Node ID** and **Regular Expression Match** as filtering conditions.
Data points can be filtered by configuring the **Root Node ID** and **Regular Expression**.
Specify the supertable and subtable where the data will be written by configuring **Supertable Name** and **Table Name**.
Configure **Supertable Name** and **Table Name** to specify the supertable and subtable where the data will be written.
Configure the **Primary Key Column**: select origin_ts to use the original timestamp of the OPC data point as the primary key in TDengine, or select received_ts to use the received timestamp as the primary key in TDengine. You can also configure the **Primary Key Alias** to specify the name of the timestamp column in TDengine.
Configure **Primary Key Column**, choosing `origin_ts` to use the original timestamp of the OPC data point as the primary key in TDengine; choosing `received_ts` to use the timestamp when the data is received as the primary key. Configure **Primary Key Alias** to specify the name of the TDengine timestamp column.
<figure>
<Image img={imgStep4} alt=""/>
@ -153,25 +153,25 @@ Configure the **Primary Key Column**: select origin_ts to use the original times
### 5. Collection Configuration
In the collection configuration, configure the collection interval, connection timeout, and collection timeout options for the current task.
In the collection configuration, set the current task's collection interval, connection timeout, and collection timeout.
<figure>
<Image img={imgStep5} alt=""/>
</figure>
As shown in the image above:
As shown in the image:
- **Connection Timeout**: Configure the timeout for connecting to the OPC server. The default is 10 seconds.
- **Collection Timeout**: If no data is returned from the OPC server within the specified time when reading data points, the read operation will fail. The default is 10 seconds.
- **Collection Interval**: The default is 10 seconds. The collection interval is the duration between the end of the previous data collection and the start of the next data collection, during which the latest value of the data point is polled and written to TDengine.
- **Connection Timeout**: Configures the timeout for connecting to the OPC server, default is 10 seconds.
- **Collection Timeout**: If data is not returned from the OPC server within the set time during data point reading, the read fails, default is 10 seconds.
- **Collection Interval**: Default is 10 seconds, the interval for data point collection, starting from the end of the last data collection, polling to read the latest value of the data point and write it into TDengine.
When **Data Points Set** is configured using the **Select Data Points** method, you can configure **Data Point Update Mode** and **Data Point Update Interval** in the collection configuration to enable dynamic data point updates. **Dynamic Data Point Update** means that during the task's execution, if the OPC Server adds or deletes data points, data points that meet the criteria will be automatically added to the current task without restarting the OPC task.
When using **Select Data Points** in the **Data Point Set**, the collection configuration can configure **Data Point Update Mode** and **Data Point Update Interval** to enable dynamic data point updates. **Dynamic Data Point Update** means that during the task operation, if OPC Server adds or deletes data points, the matching data points will automatically be added to the current task without needing to restart the OPC task.
- Data Point Update Mode: You can choose `None`, `Append`, or `Update`.
- None: Dynamic data point updates are not enabled.
- Append: Dynamic data point updates are enabled, but only new data points are added.
- Update: Dynamic data point updates are enabled, and data points can be added or removed.
- Data Point Update Interval: This is effective when the **Data Point Update Mode** is set to `Append` or `Update`. The unit is seconds, the default is 600, the minimum value is 60, and the maximum value is 2147483647.
- Data Point Update Mode: Can choose `None`, `Append`, `Update`.
- None: Do not enable dynamic data point updates;
- Append: Enable dynamic data point updates, but only append;
- Update: Enable dynamic data point updates, append or delete;
- Data Point Update Interval: Effective when "Data Point Update Mode" is `Append` and `Update`. Unit: seconds, default value is 600, minimum value: 60, maximum value: 2147483647.
### 6. Advanced Options
@ -179,35 +179,43 @@ When **Data Points Set** is configured using the **Select Data Points** method,
<Image img={imgStep6} alt=""/>
</figure>
As shown in the image above, advanced options can be configured to further optimize performance, logging, and more.
As shown above, configure advanced options for more detailed optimization of performance, logs, etc.
The **Log Level** is set to `info` by default. The available options are `error`, `warn`, `info`, `debug`, and `trace`.
**Log Level** defaults to `info`, with options `error`, `warn`, `info`, `debug`, `trace`.
The **Max Write Concurrency** option sets the maximum concurrency limit for writing to taosX. The default value is 0, which means auto, and concurrency is automatically configured.
In **Maximum Write Concurrency**, set the limit for the maximum number of concurrent writes to taosX. Default value: 0, meaning auto, automatically configures concurrency.
The **Batch Size** option sets the batch size for each write operation, i.e., the maximum number of messages sent at once.
In **Batch Size**, set the batch size for each write, that is, the maximum number of messages sent at once.
The **Batch Delay** option sets the maximum delay for a single send operation (in seconds). When the timeout expires, if there is data, even if the **Batch Size** is not met, the data will be sent immediately.
In **Batch Delay**, set the maximum delay for a single send (in seconds). When the timeout ends, as long as there is data, it is sent immediately even if it does not meet the **Batch Size**.
In the **Save Raw Data** option, choose whether to save the raw data. The default is No.
In **Save Raw Data**, choose whether to save raw data. Default value: no.
When saving raw data, the following two parameters become effective.
When saving raw data, the following 2 parameters are effective.
The **Max Retention Days** option sets the maximum retention days for raw data.
In **Maximum Retention Days**, set the maximum retention days for raw data.
The **Raw Data Storage Directory** option sets the path for storing raw data. If an agent is used, the storage path refers to the path on the agent's server; otherwise, it is the path on the taosX server. The path can use the `$DATA_DIR` placeholder and `:id` as part of the path.
In **Raw Data Storage Directory**, set the path for saving raw data. If using Agent, the storage path refers to the path on the server where Agent is located, otherwise it is on the taosX server. The path can include placeholders `$DATA_DIR` and `:id` as part of the path.
- On Linux platforms, `$DATA_DIR` is `/var/lib/taos/taosx`. By default, the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platforms, `$DATA_DIR` is `C:\TDengine\data\taosx`. By default, the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
- On Linux platform, `$DATA_DIR` is /var/lib/taos/taosx, by default the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platform, `$DATA_DIR` is C:\TDengine\data\taosx, by default the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
### 7. Task Completion
### 7. Completion
Click the **Submit** button to complete the OPC DA to TDengine data synchronization task. Return to the **Data Sources List** page to view the task's execution status.
Click the **Submit** button to complete the creation of the OPC DA to TDengine data synchronization task, return to the **Data Source List** page to view the task execution status.
## Adding Data Points
## Add Data Points
While the task is running, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
During the task execution, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
In the pop-up form, fill in the data point information.
<figure>
<Image img={imgStep7} alt=""/>
</figure>
In the pop-up form, fill in the information for the data points.
<figure>
<Image img={imgStep8} alt=""/>
</figure>
Click the **Confirm** button to complete the addition of data points.

View File

@ -19,19 +19,19 @@ import imgStep12 from '../../assets/mqtt-12.png';
import imgStep13 from '../../assets/mqtt-13.png';
import imgStep14 from '../../assets/mqtt-14.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from MQTT to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from MQTT to the current TDengine cluster.
## Overview
MQTT stands for Message Queuing Telemetry Transport. It is a lightweight messaging protocol, easy to implement and use.
MQTT stands for Message Queuing Telemetry Transport. It is a lightweight messaging protocol that is easy to implement and use.
TDengine can use the MQTT connector to subscribe to data from an MQTT broker and write it to TDengine to enable real-time data ingestion.
TDengine can subscribe to data from an MQTT broker via an MQTT connector and write it into TDengine, enabling real-time data streaming.
## Creating a Task
### 1. Add Data Source
### 1. Add a Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
On the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep01} alt=""/>
@ -39,11 +39,11 @@ On the Data Ingestion page, click the **+Add Data Source** button to go to the A
### 2. Configure Basic Information
Enter the task name in the **Name** field, such as "test_mqtt".
Enter the task name in **Name**, such as: "test_mqtt";
Select **MQTT** from the **Type** dropdown list.
The agent is optional. If needed, you can select an agent from the dropdown list, or click the **+Create New Agent** button.
**Broker** is optional, you can select a specific broker from the dropdown list or click the **+Create New Broker** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
@ -53,21 +53,21 @@ Select a target database from the **Target Database** dropdown list, or click th
### 3. Configure Connection and Authentication Information
In the **MQTT Address** field, enter the address of the MQTT broker, for example: `192.168.1.42`.
Enter the MQTT broker's address in **MQTT Address**, for example: `192.168.1.42`
In the **MQTT Port** field, enter the port of the MQTT broker, for example: `1883`.
Enter the MQTT broker's port in **MQTT Port**, for example: `1883`
In the **User** field, enter the username for the MQTT broker.
Enter the MQTT broker's username in **User**.
In the **Password** field, enter the password for the MQTT broker.
Enter the MQTT broker's password in **Password**.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure SSL Certificates
### 4. Configure SSL Certificate
If the MQTT broker uses SSL certificates, upload the certificate file in the **SSL Certificate** field.
If the MQTT broker uses an SSL certificate, upload the certificate file in **SSL Certificate**.
<figure>
<Image img={imgStep04} alt=""/>
@ -75,19 +75,20 @@ If the MQTT broker uses SSL certificates, upload the certificate file in the **S
### 5. Configure Collection Information
In the **Collection Configuration** section, enter the relevant parameters for the collection task.
Fill in the collection task related configuration parameters in the **Collection Configuration** area.
Select the MQTT protocol version from the **MQTT Protocol** dropdown list. There are three options: `3.1`, `3.1.1`, and `5.0`. The default is 3.1.
Select the MQTT protocol version from the **MQTT Protocol** dropdown list. There are three options: `3.1`, `3.1.1`, `5.0`. The default value is 3.1.
In the **Client ID** field, enter the client identifier. This will generate a client ID with the `taosx` prefix (for example, if you enter "foo", the generated client ID will be `taosxfoo`). If the switch at the end is enabled, the task ID will be appended after `taosx` before the entered identifier (the generated client ID will be like `taosx100foo`). All client IDs connected to the same MQTT address must be unique.
Enter the client identifier in **Client ID**, after which a client id with the prefix `taosx` will be generated (for example, if the identifier entered is `foo`, the generated client id will be `taosxfoo`). If the switch at the end is turned on, the current task's task id will be concatenated after `taosx` and before the entered identifier (the generated client id will look like `taosx100foo`). All client ids connecting to the same MQTT address must be unique.
In the **Keep Alive** field, enter the keep-alive interval. If the broker does not receive any messages from the client within this interval, it will assume the client has disconnected and close the connection. The keep-alive interval is the negotiated time between the client and the broker to detect if the client is active. If no messages are sent within this interval, the broker will disconnect the client.
Enter the keep alive interval in **Keep Alive**. If the broker does not receive any message from the client within the keep alive interval, it will assume the client has disconnected and will close the connection.
The keep alive interval is the time interval negotiated between the client and the broker to check if the client is active. If the client does not send a message to the broker within the keep alive interval, the broker will disconnect.
In the **Clean Session** field, choose whether to clean the session. The default value is true.
In **Clean Session**, choose whether to clear the session. The default value is true.
In the **Subscription Topics and QoS Configuration** field, enter the Topic names to consume, using the following format: `topic1::0,topic2::1`.
Fill in the Topic names to be consumed in **Subscription Topics and QoS Configuration**. Use the following format: `topic1::0,topic2::1`.
Click the **Connectivity Check** button to test if the data source is available.
Click the **Check Connectivity** button to check if the data source is available.
<figure>
<Image img={imgStep05} alt=""/>
@ -95,40 +96,40 @@ Click the **Connectivity Check** button to test if the data source is available.
### 6. Configure MQTT Payload Parsing
In the **MQTT Payload Parsing** section, enter the configuration parameters related to parsing the Payload.
Fill in the Payload parsing related configuration parameters in the **MQTT Payload Parsing** area.
taosX can use a JSON extractor to parse the data and allows users to specify the data model in the database, including specifying table names, supertable names, setting regular columns, and tag columns.
taosX can use a JSON extractor to parse data and allows users to specify the data model in the database, including specifying table names and supertable names, setting ordinary columns and tag columns, etc.
#### 6.1 Parsing
There are three ways to obtain sample data:
There are three methods to obtain sample data:
Click the **Retrieve from Server** button to get sample data from MQTT.
Click the **File Upload** button to upload a CSV file and get sample data.
Click the **File Upload** button to upload a CSV file and obtain sample data.
Enter sample data from the MQTT message body in the **Message Body** field.
Fill in the example data from the MQTT message body in **Message Body**.
json data supports `JSONObject` or `JSONArray`. The following data can be parsed using the JSON parser:
JSON data supports JSONObject or JSONArray, and the json parser can parse the following data:
```json
{"id": 1, "message": "hello-world"}
{"id": 2, "message": "hello-world"}
``` json
{"id": 1, "message": "hello-word"}
{"id": 2, "message": "hello-word"}
```
or
```json
[{"id": 1, "message": "hello-world"},{"id": 2, "message": "hello-world"}]
``` json
[{"id": 1, "message": "hello-word"},{"id": 2, "message": "hello-word"}]
```
The parsing result is shown below:
The analysis results are as follows:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
Click the **Magnifier Icon** to preview the parsing result.
Click the **magnifying glass icon** to view the preview of the analysis results.
<figure>
<Image img={imgStep07} alt=""/>
@ -136,7 +137,7 @@ Click the **Magnifier Icon** to preview the parsing result.
#### 6.2 Field Splitting
In the **Extract or Split from Column** section, enter the fields to extract or split from the message body. For example, to split the `message` field into `message_0` and `message_1`, select the `split` extractor, enter `-` as the separator, and `2` as the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `message` field into `message_0` and `message_1`, select the split extractor, fill in the separator as -, and number as 2.
<figure>
<Image img={imgStep08} alt=""/>
@ -144,9 +145,9 @@ In the **Extract or Split from Column** section, enter the fields to extract or
Click **Delete** to remove the current extraction rule.
Click **Add more** to add more extraction rules.
Click **Add** to add more extraction rules.
Click the **Magnifier Icon** to preview the extraction/split results.
Click the **magnifying glass icon** to view the preview of the extraction/split results.
<figure>
<Image img={imgStep09} alt=""/>
@ -154,15 +155,15 @@ Click the **Magnifier Icon** to preview the extraction/split results.
#### 6.3 Data Filtering
In the **Filter** section, enter filtering conditions. For example, entering `id != 1` will filter out data where the `id` is equal to `1`, and only data with `id` not equal to 1 will be written to TDengine.
In **Filter**, fill in the filtering conditions, for example: write `id != 1`, then only data with id not equal to 1 will be written to TDengine.
<figure>
<Image img={imgStep10} alt=""/>
</figure>
Click **Delete** to remove the current filter rule.
Click **Delete** to remove the current filtering rule.
Click the **Magnifier Icon** to preview the filtering results.
Click the **magnifying glass icon** to view the preview of the filtering results.
<figure>
<Image img={imgStep11} alt=""/>
@ -170,9 +171,9 @@ Click the **Magnifier Icon** to preview the filtering results.
#### 6.4 Table Mapping
In the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button to create a new one.
In the **Target Supertable** dropdown, select a target supertable, or click the **Create Supertable** button on the right.
In the **Mapping** section, enter the mapping rule for the target tables name. For example, enter `t_{id}`. Fill in the mapping rules according to your needs, and mapping supports setting default values.
In **Mapping**, fill in the subtable name in the target supertable, for example: `t_{id}`. Fill in the mapping rules according to the requirements, where mapping supports setting default values.
<figure>
<Image img={imgStep12} alt=""/>
@ -186,13 +187,13 @@ Click **Preview** to view the mapping results.
### 7. Advanced Options
In the **Log Level** dropdown list, select the log level. There are five options: `TRACE`, `DEBUG`, `INFO`, `WARN`, and `ERROR`. The default is `INFO`.
In the **Log Level** dropdown, select a log level. There are five options: `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR`. The default is INFO.
When saving raw data, the following two parameters are enabled:
When **saving raw data**, the following two parameters are effective.
**Max Retention Days:** Set the maximum number of days to retain raw data.
Set the maximum retention days for raw data in **Maximum Retention Days**.
**Raw Data Storage Directory:** Set the path for storing raw data. If an agent is used, this path refers to the server where the agent is located; otherwise, it refers to the server where taosX is running. You can use placeholders like `DATA_DIR` and `:id` as part of the path.
Set the storage path for raw data in **Raw Data Storage Directory**.
<figure>
<Image img={imgStep14} alt=""/>
@ -200,4 +201,4 @@ When saving raw data, the following two parameters are enabled:
### 8. Completion
Click the **Submit** button to complete the creation of the MQTT to TDengine data synchronization task. Go back to the **Data Source List** page to monitor the task's execution status.
Click the **Submit** button to complete the creation of the MQTT to TDengine data synchronization task, return to the **Data Source List** page to view the status of the task execution.

View File

@ -24,19 +24,19 @@ import imgStep16 from '../../assets/kafka-16.png';
import imgStep17 from '../../assets/kafka-17.png';
import imgStep18 from '../../assets/kafka-18.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from Kafka to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from Kafka to the current TDengine cluster.
## Overview
## Feature Overview
Apache Kafka is an open-source distributed streaming platform for stream processing, real-time data pipelines, and large-scale data integration.
Apache Kafka is an open-source distributed streaming system used for stream processing, real-time data pipelines, and large-scale data integration.
TDengine can efficiently read data from Kafka and write it into TDengine, enabling historical data migration or real-time data ingestion.
TDengine can efficiently read data from Kafka and write it into TDengine, enabling historical data migration or real-time data streaming.
## Creating a Task
### 1. Add Data Source
### 1. Add a Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
On the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep01} alt=""/>
@ -44,11 +44,11 @@ On the Data Ingestion page, click the **+Add Data Source** button to go to the A
### 2. Configure Basic Information
Enter the task name in the **Name** field, such as "test_kafka".
Enter the task name in **Name**, such as: "test_kafka";
Select **Kafka** from the **Type** dropdown list.
The agent is optional. If needed, you can select an agent from the dropdown list, or click the **+Create New Agent** button.
**Proxy** is optional; if needed, you can select a specific proxy from the dropdown, or click **+Create New Proxy** on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
@ -58,11 +58,11 @@ Select a target database from the **Target Database** dropdown list, or click th
### 3. Configure Connection Information
Enter **bootstrap-server**, for example: `192.168.1.92`.
**bootstrap-server**, for example: `192.168.1.92`.
Enter **Port**, for example: `9092`.
**Service Port**, for example: `9092`.
For multiple broker addresses, add more pairs of bootstrap-server and ports by clicking the **Add Broker** button at the bottom right of the connection configuration section.
When there are multiple broker addresses, add a **+Add Broker** button at the bottom right of the connection settings to add bootstrap-server and service port pairs.
<figure>
<Image img={imgStep03} alt=""/>
@ -70,7 +70,7 @@ For multiple broker addresses, add more pairs of bootstrap-server and ports by c
### 4. Configure SASL Authentication Mechanism
If the server has SASL authentication enabled, configure SASL and select the appropriate authentication mechanism. Currently, PLAIN/SCRAM-SHA-256/GSSAPI are supported.
If the server has enabled SASL authentication, you need to enable SASL here and configure the relevant content. Currently, three authentication mechanisms are supported: PLAIN/SCRAM-SHA-256/GSSAPI. Please choose according to the actual situation.
#### 4.1. PLAIN Authentication
@ -90,27 +90,24 @@ Select the `SCRAM-SHA-256` authentication mechanism and enter the username and p
#### 4.3. GSSAPI Authentication
Select `GSSAPI`, which uses the [RDkafka client](https://github.com/confluentinc/librdkafka) to call GSSAPI for Kerberos authentication:
Select `GSSAPI`, which will use the [RDkafka client](https://github.com/confluentinc/librdkafka) to invoke the GSSAPI applying Kerberos authentication mechanism:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
You will need to provide:
The required information includes:
- Kerberos service name, typically `kafka`.
- Kerberos principal (authentication username), such as `kafkaclient`.
- Kerberos initialization command (optional).
- Kerberos keytab, a file that you must upload.
- Kerberos service name, usually `kafka`;
- Kerberos authentication principal, i.e., the authentication username, such as `kafkaclient`;
- Kerberos initialization command (optional, generally not required);
- Kerberos keytab, you need to provide and upload the file;
These details must be provided by the Kafka administrator.
All the above information must be provided by the Kafka service manager.
You must also set up [Kerberos](https://web.mit.edu/kerberos/) authentication on your server. Install it using the following commands:
In addition, the [Kerberos](https://web.mit.edu/kerberos/) authentication service needs to be configured on the server. Use `apt install krb5-user` on Ubuntu; on CentOS, use `yum install krb5-workstation`.
- On Ubuntu: `apt install krb5-user`
- On CentOS: `yum install krb5-workstation`
After configuring, you can use the [kcat](https://github.com/edenhill/kcat) tool to validate the Kafka topic consumption:
After configuration, you can use the [kcat](https://github.com/edenhill/kcat) tool to verify Kafka topic consumption:
```bash
kcat <topic> \
@ -123,11 +120,11 @@ kcat <topic> \
-X sasl.kerberos.service.name=kafka
```
If you get the error: "Server xxxx not found in kerberos database," ensure that the domain name for the Kafka node is configured properly and set `rdns = true` in the Kerberos client configuration file (`/etc/krb5.conf`).
If an error occurs: "Server xxxx not found in kerberos database", you need to configure the domain name corresponding to the Kafka node and configure reverse DNS resolution `rdns = true` in the Kerberos client configuration file `/etc/krb5.conf`.
### 5. Configure SSL Certificate
If SSL encryption authentication is enabled on the server, enable SSL here and configure the relevant details.
If the server has enabled SSL encryption authentication, SSL needs to be enabled here and related content configured.
<figure>
<Image img={imgStep07} alt=""/>
@ -135,24 +132,24 @@ If SSL encryption authentication is enabled on the server, enable SSL here and c
### 6. Configure Collection Information
In the **Collection Configuration** section, fill in the relevant parameters for the collection task.
Fill in the configuration parameters related to the collection task in the **Collection Configuration** area.
Enter the **Timeout**. If Kafka does not provide any data within the timeout period, the data collection task will exit. The default is 0 ms, meaning it will wait indefinitely until data is available or an error occurs.
Enter the timeout duration in **Timeout**. If no data is consumed from Kafka, and the timeout is exceeded, the data collection task will exit. The default value is 0 ms. When the timeout is set to 0, it will wait indefinitely until data becomes available or an error occurs.
Enter the **Topic** name to consume. Multiple topics can be configured, separated by commas (e.g., `tp1,tp2`).
Enter the Topic name to be consumed in **Topic**. Multiple Topics can be configured, separated by commas. For example: `tp1,tp2`.
In the **Client ID** field, enter the client identifier. This will generate a client ID with the `taosx` prefix (e.g., if you enter "foo", the client ID will be `taosxfoo`). If you enable the switch at the end, the task ID will be appended to `taosx` before the entered identifier (e.g., `taosx100foo`). You should note that when using multiple taosX instances to subscribe to the same Topic for load balancing, you must provide a consistent Client ID to achieve the balancing effect.
Enter the client identifier in **Client ID**. After entering, a client ID with the prefix `taosx` will be generated (for example, if the identifier entered is `foo`, the generated client ID will be `taosxfoo`). If the switch at the end is turned on, the current task's task ID will be concatenated after `taosx` and before the entered identifier (the generated client ID will look like `taosx100foo`). Note that when using multiple taosX subscriptions for the same Topic to achieve load balancing, a consistent client ID must be entered to achieve the balancing effect.
In the **Consumer Group ID** field, enter the consumer group identifier. This will generate a consumer group ID with the `taosx` prefix (e.g., if you enter "foo", the consumer group ID will be `taosxfoo`). If you enable the switch at the end, the task ID will be appended to `taosx` before the entered identifier (e.g., `taosx100foo`).
Enter the consumer group identifier in **Consumer Group ID**. After entering, a consumer group ID with the prefix `taosx` will be generated (for example, if the identifier entered is `foo`, the generated consumer group ID will be `taosxfoo`). If the switch at the end is turned on, the current task's task ID will be concatenated after `taosx` and before the entered identifier (the generated consumer group ID will look like `taosx100foo`).
From the **Offset** dropdown list, choose the offset to start consuming data. There are three options: `Earliest`, `Latest`, `ByTime(ms)`. The default is `Earliest`.
In the **Offset** dropdown, select from which Offset to start consuming data. There are three options: `Earliest`, `Latest`, `ByTime(ms)`. The default is Earliest.
- Earliest: Requests the earliest offset.
- Latest: Requests the latest offset.
In the **Max Duration for Fetching Data**, set the maximum time to wait for data when the data is insufficient (in milliseconds). The default is 100ms.
Set the maximum duration to wait for insufficient data when fetching messages in **Maximum Duration to Fetch Data** (in milliseconds), the default value is 100ms.
Click the **Check Connectivity** button to check if the data source is available.
Click the **Connectivity Check** button to check if the data source is available.
<figure>
<Image img={imgStep08} alt=""/>
@ -160,38 +157,38 @@ Click the **Check Connectivity** button to check if the data source is available
### 7. Configure Payload Parsing
In the **Payload Parsing** section, fill in the relevant parameters for payload parsing.
Fill in the configuration parameters related to Payload parsing in the **Payload Parsing** area.
#### 7.1 Parsing
There are three ways to obtain sample data:
There are three methods to obtain sample data:
Click the **Fetch from Server** button to retrieve sample data from Kafka.
Click the **Retrieve from Server** button to get sample data from Kafka.
Click the **Upload File** button to upload a CSV file to obtain sample data.
Click the **File Upload** button to upload a CSV file and obtain sample data.
In the **Message Body** field, enter a sample of the Kafka message body.
Enter sample data from the Kafka message body in **Message Body**.
JSON data supports both `JSONObject` and `JSONArray`. Use the JSON parser to parse the following data:
JSON data supports JSONObject or JSONArray, and the following data can be parsed using a JSON parser:
```json
{"id": 1, "message": "hello-world"}
{"id": 2, "message": "hello-world"}
``` json
{"id": 1, "message": "hello-word"}
{"id": 2, "message": "hello-word"}
```
or
```json
[{"id": 1, "message": "hello-world"},{"id": 2, "message": "hello-world"}]
``` json
[{"id": 1, "message": "hello-word"},{"id": 2, "message": "hello-word"}]
```
The parsed result is as follows:
The parsing results are shown as follows:
<figure>
<Image img={imgStep09} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the parsed result.
Click the **magnifying glass icon** to view the preview parsing results.
<figure>
<Image img={imgStep10} alt=""/>
@ -199,17 +196,17 @@ Click the **Magnifying Glass Icon** to preview the parsed result.
#### 7.2 Field Splitting
In the **Extract or Split from Column** field, enter the fields to be extracted or split from the message body. For example, to split the `message` field into `message_0` and `message_1`, select the `split` extractor, set `separator` to `-`, and set `number` to `2`.
In **Extract or Split from Columns**, fill in the fields to extract or split from the message body, for example: split the message field into `message_0` and `message_1`, select the split extractor, fill in the separator as -, and number as 2.
Click **Add** to add more extraction rules.
Click **Delete** to remove the current extraction rule.
Click **Delete** to delete the current extraction rule.
<figure>
<Image img={imgStep11} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the extracted/split result.
Click the **magnifying glass icon** to view the preview extraction/splitting results.
<figure>
<Image img={imgStep12} alt=""/>
@ -217,17 +214,17 @@ Click the **Magnifying Glass Icon** to preview the extracted/split result.
#### 7.3 Data Filtering
In the **Filter** section, enter filtering conditions. For example, if you enter `id != 1`, only data where `id` is not equal to `1` will be written into TDengine.
In **Filter**, fill in the filtering conditions, for example: enter `id != 1`, then only data with id not equal to 1 will be written to TDengine.
Click **Add** to add more filtering rules.
Click **Delete** to remove the current filtering rule.
Click **Delete** to delete the current filtering rule.
<figure>
<Image img={imgStep13} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the filtered result.
Click the **magnifying glass icon** to view the preview filtering results.
<figure>
<Image img={imgStep14} alt=""/>
@ -235,15 +232,15 @@ Click the **Magnifying Glass Icon** to preview the filtered result.
#### 7.4 Table Mapping
From the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button on the right.
In the **Target Supertable** dropdown, select a target supertable, or click the **Create Supertable** button on the right.
In the **Mapping** field, enter the name of the subtable in the target supertable, such as `t_{id}`. Based on the requirements, fill in the mapping rules. The mapping supports setting default values.
In the **Mapping** section, fill in the name of the subtable in the target supertable, for example: `t_{id}`. Fill in the mapping rules as required, where mapping supports setting default values.
<figure>
<Image img={imgStep15} alt=""/>
</figure>
Click **Preview** to view the mapping result.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep16} alt=""/>
@ -251,7 +248,7 @@ Click **Preview** to view the mapping result.
### 8. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the right `>` to expand it, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
<figure>
<Image img={imgStep17} alt=""/>
@ -261,6 +258,6 @@ The **Advanced Options** section is collapsed by default. Click the right `>` to
<Image img={imgStep18} alt=""/>
</figure>
### 9. Complete the Creation
### 9. Completion of Creation
Click the **Submit** button to complete the Kafka to TDengine data synchronization task. Go back to the **Data Sources** page to view the task's status.
Click the **Submit** button to complete the creation of the Kafka to TDengine data synchronization task. Return to the **Data Source List** page to view the status of the task execution.

View File

@ -15,19 +15,19 @@ import imgStep08 from '../../assets/influxdb-08.png';
import imgStep09 from '../../assets/influxdb-09.png';
import imgStep10 from '../../assets/influxdb-10.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from InfluxDB to the current TDengine cluster.
This section describes how to create a data migration task through the Explorer interface to migrate data from InfluxDB to the current TDengine cluster.
## Function Overview
## Feature Overview
InfluxDB is a popular open-source time series database optimized for handling large amounts of time series data. TDengine can efficiently read data from InfluxDB via the InfluxDB connector and write it into TDengine to achieve historical data migration or real-time data synchronization.
InfluxDB is a popular open-source time-series database optimized for handling large volumes of time-series data. TDengine can efficiently read data from InfluxDB through the InfluxDB connector and write it into TDengine, enabling historical data migration or real-time data synchronization.
During task execution, progress information is saved to disk, so if the task is paused and restarted, or it recovers automatically from an error, the task will not start from the beginning. More options can be found by reading the descriptions of each form field on the task creation page.
The task saves progress information to the disk during operation, so if the task is paused and restarted, or if it automatically recovers from an anomaly, it will not start over. For more options, it is recommended to read the explanations of each form field on the task creation page in detail.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the top left of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the upper left corner of the data writing page to enter the add data source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -35,13 +35,13 @@ Click the **+Add Data Source** button in the top left of the data writing page t
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_influxdb_01`*.
Enter the task name in the **Name** field, for example *`test_influxdb_01`*.
Select *`InfluxDB`* from the **Type** dropdown box, as shown below (the fields on the page will change after selection).
Select *`InfluxDB`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional. If needed, you can select a specified agent from the dropdown box, or click the **+Create New Agent** button on the right to create a new agent.
**Proxy** is optional. If needed, you can select a specific proxy from the dropdown menu, or click the **+ Create New Proxy** button on the right.
The **Target Database** is required. Since InfluxDB stores data in various time precisions such as seconds, milliseconds, microseconds, and nanoseconds, you need to select a *`nanosecond-precision database`*. You can also click the **+Create Database** button on the right to create a new database.
**Target Database** is required. Since InfluxDB can store data with time precision of seconds, milliseconds, microseconds, and nanoseconds, you need to select a *`nanosecond precision database`* here, or click the **+ Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
@ -49,7 +49,7 @@ The **Target Database** is required. Since InfluxDB stores data in various time
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`connection information of the source InfluxDB database`*, as shown below:
Fill in the *`connection information for the source InfluxDB database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -57,54 +57,60 @@ In the **Connection Configuration** area, fill in the *`connection information o
### 4. Configure Authentication Information
In the **Authentication** section, there are two tabs, *`1.x version`* and *`2.x version`*, as different versions of InfluxDB require different authentication parameters, and the APIs differ significantly. Please select based on your actual situation:
In the **Authentication** area, there are two tabs, *`1.x version`* and *`2.x version`*, due to different authentication parameters and significant API differences between different versions of InfluxDB databases. Please choose according to the actual situation:
*`1.x version`*
**Version**: Select the version of the source InfluxDB database from the dropdown.
**User**: Enter the user for the source InfluxDB database, and the user must have read access in the organization.
**Password**: Enter the password for the above user in the source InfluxDB database.
**Version** Select the version of the source InfluxDB database from the dropdown menu.
**User** Enter the user of the source InfluxDB database, who must have read permissions in that organization.
**Password** Enter the login password for the above user in the source InfluxDB database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
*`2.x version`*
**Version**: Select the version of the source InfluxDB database from the dropdown.
**Organization ID**: Enter the organization ID of the source InfluxDB database, which is a string composed of hexadecimal characters, not the organization name. This can be obtained from the Organization->About page of the InfluxDB console.
**Token**: Enter the access token for the source InfluxDB database, which must have read access in the organization.
**Add Database Retention Policy**: This is a *`Yes/No`* toggle item. InfluxQL requires a combination of the database and retention policy (DBRP) to query data. Some 2.x versions and the InfluxDB Cloud version require manually adding this mapping. Turning on this switch allows the connector to automatically add this during task execution.
**Version** Select the version of the source InfluxDB database from the dropdown menu.
**Organization ID** Enter the organization ID of the source InfluxDB database, which is a string of hexadecimal characters, not the organization name, and can be obtained from the InfluxDB console's Organization->About page.
**Token** Enter the access token for the source InfluxDB database, which must have read permissions in that organization.
**Add Database Retention Policy** This is a *`Yes/No`* toggle. InfluxQL requires a combination of database and retention policy (DBRP) to query data. The cloud version of InfluxDB and some 2.x versions require manually adding this mapping. Turn on this switch, and the connector can automatically add it when executing tasks.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Below the **Authentication** area, there is a **Connectivity Check** button. Users can click this button to check whether the information entered above can correctly retrieve data from the source InfluxDB database. The check results are shown below:
**Failure**
Below the **Authentication** area, there is a **Connectivity Check** button. Users can click this button to check if the information filled in above can normally access the data of the source InfluxDB database. The check results are shown below:
**Failed**
<figure>
<Image img={imgStep06} alt=""/>
</figure>
**Success**
**Successful**
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 5. Configure Task Information
**Bucket**: In InfluxDB, a bucket is a namespace for storing data. Each task must specify a bucket. Users need to click the **Get Schema** button on the right to fetch the data structure information of the current source InfluxDB database and then select from the dropdown box, as shown below:
**Bucket** is a named space in the InfluxDB database for storing data. Each task needs to specify a bucket. Users need to first click the **Get Schema** button on the right to obtain the data structure information of the current source InfluxDB database, and then select from the dropdown menu as shown below:
<figure>
<Image img={imgStep08} alt=""/>
</figure>
**Measurements**: This is optional. Users can select one or more measurements to synchronize. If not specified, all will be synchronized.
**Measurements** are optional. Users can select one or more Measurements to synchronize from the dropdown menu. If none are specified, all will be synchronized.
**Start Time**: This refers to the start time of the data in the source InfluxDB database. The time zone of the start time uses the time zone selected in the explorer. This field is required.
**Start Time** refers to the start time of the data in the source InfluxDB database. The timezone for the start time uses the timezone selected in explorer, and this field is required.
**End Time**: This refers to the end time of the data in the source InfluxDB database. If the end time is not specified, the synchronization of the latest data will continue; if the end time is specified, synchronization will only occur up to that point. The time zone of the end time uses the time zone selected in the explorer. This field is optional.
**End Time** refers to the end time of the data in the source InfluxDB database. If no end time is specified, synchronization of the latest data will continue; if an end time is specified, synchronization will only continue up to this end time. The timezone for the end time uses the timezone selected in explorer, and this field is optional.
**Time Range per Read (minutes)**: This defines the maximum time range for a single read from the source InfluxDB database. This is an important parameter that users need to decide based on server performance and data storage density. If the range is too small, the synchronization task will execute slowly. If the range is too large, it may cause system failures in the InfluxDB database due to high memory usage.
**Time Range per Read (minutes)** is the maximum time range for the connector to read data from the source InfluxDB database in a single read. This is a very important parameter, and users need to decide based on server performance and data storage density. If the range is too small, the execution speed of the synchronization task will be very slow; if the range is too large, it may cause the InfluxDB database system to fail due to high memory usage.
**Delay (seconds)**: This is an integer between 1 and 30. To eliminate the impact of out-of-order data, TDengine always waits for the time specified here before reading the data.
**Delay (seconds)** is an integer between 1 and 30. To eliminate the impact of out-of-order data, TDengine always waits for the duration specified here before reading data.
### 6. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the `>` on the right to expand it, as shown below:
The **Advanced Options** area is collapsed by default. Click the `>` on the right to expand it, as shown below:
<figure>
<Image img={imgStep09} alt=""/>
@ -114,6 +120,6 @@ The **Advanced Options** section is collapsed by default. Click the `>` on the r
<Image img={imgStep10} alt=""/>
</figure>
### 7. Completion
### 7. Completion of Creation
Click the **Submit** button to complete the creation of the InfluxDB to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.
Click the **Submit** button to complete the creation of the data synchronization task from InfluxDB to TDengine. Return to the **Data Source List** page to view the status of the task execution.

View File

@ -13,19 +13,19 @@ import imgStep06 from '../../assets/opentsdb-06.png';
import imgStep07 from '../../assets/opentsdb-07.png';
import imgStep08 from '../../assets/opentsdb-08.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from OpenTSDB to the current TDengine cluster.
This section describes how to create a data migration task through the Explorer interface to migrate data from OpenTSDB to the current TDengine cluster.
## Function Overview
## Overview
OpenTSDB is a real-time monitoring information collection and display platform built on top of the HBase system. TDengine can efficiently read data from OpenTSDB via the OpenTSDB connector and write it into TDengine to achieve historical data migration or real-time data synchronization.
OpenTSDB is a real-time monitoring information collection and display platform built on the HBase system. TDengine can efficiently read data from OpenTSDB through the OpenTSDB connector and write it into TDengine, achieving historical data migration or real-time data synchronization.
During task execution, progress information is saved to disk, so if the task is paused and restarted, or it recovers automatically from an error, the task will not start from the beginning. More options can be found by reading the descriptions of each form field on the task creation page.
During the operation, the task will save progress information to the disk, so if the task is paused and restarted, or automatically recovers from an anomaly, it will not start over. For more options, it is recommended to read the explanations of each form field on the task creation page in detail.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the top left of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the upper left corner of the data writing page to enter the add data source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -33,13 +33,13 @@ Click the **+Add Data Source** button in the top left of the data writing page t
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_opentsdb_01`*.
Enter the task name in the **Name** field, for example *`test_opentsdb_01`*.
Select *`OpenTSDB`* from the **Type** dropdown box, as shown below (the fields on the page will change after selection).
Select *`OpenTSDB`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional. If needed, you can select a specified agent from the dropdown box, or click the **+Create New Agent** button on the right to create a new agent.
**Proxy** is optional. If needed, you can select a specific proxy from the dropdown menu, or click the **+ Create New Proxy** button on the right.
The **Target Database** is required. Since OpenTSDB stores data with a time precision of milliseconds, you need to select a *`millisecond-precision database`*. You can also click the **+Create Database** button on the right to create a new database.
**Target Database** is required. Since OpenTSDB stores data with millisecond precision, you need to select a *`millisecond precision database`*, or click the **+ Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
@ -47,41 +47,44 @@ The **Target Database** is required. Since OpenTSDB stores data with a time prec
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`connection information of the source OpenTSDB database`*, as shown below:
Fill in the *`connection information for the source OpenTSDB database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
Below the **Connection Configuration** area, there is a **Connectivity Check** button. Users can click this button to check whether the information entered above can correctly retrieve data from the source OpenTSDB database. The check results are shown below:
**Failure**
Below the **Connection Configuration** area, there is a **Connectivity Check** button. Users can click this button to check whether the information filled in above can normally access the data from the source OpenTSDB database. The check results are shown below:
**Failed**
<figure>
<Image img={imgStep04} alt=""/>
</figure>
**Success**
**Successful**
<figure>
<Image img={imgStep05} alt=""/>
</figure>
### 4. Configure Task Information
**Metrics**: These are the physical quantities stored in the OpenTSDB database. Users can specify multiple metrics to synchronize; if not specified, all data in the database will be synchronized. If users specify metrics, they need to click the **Get Metrics** button on the right to fetch all metric information from the current source OpenTSDB database and then select from the dropdown box, as shown below:
**Metrics** are the physical quantities in which data is stored in the OpenTSDB database. Users can specify multiple metrics to synchronize, or synchronize all data in the database if none are specified. If users specify metrics, they need to first click the **Get Metrics** button on the right to obtain all the metric information from the current source OpenTSDB database, and then select from the dropdown menu, as shown below:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
**Start Time**: This refers to the start time of the data in the source OpenTSDB database. The time zone of the start time uses the time zone selected in the explorer. This field is required.
**Start Time** refers to the start time of the data in the source OpenTSDB database, using the timezone selected in explorer, and this field is required.
**End Time**: This refers to the end time of the data in the source OpenTSDB database. If the end time is not specified, synchronization of the latest data will continue; if the end time is specified, synchronization will only occur up to that point. The time zone of the end time uses the time zone selected in the explorer. This field is optional.
**End Time** refers to the end time of the data in the source OpenTSDB database. If no end time is specified, the synchronization of the latest data will continue; if an end time is specified, synchronization will only continue up to this end time, using the timezone selected in explorer, and this field is optional.
**Time Range per Read (minutes)**: This defines the maximum time range for a single read from the source OpenTSDB database. This is an important parameter that users need to decide based on server performance and data storage density. If the range is too small, the synchronization task will execute slowly. If the range is too large, it may cause system failures in the OpenTSDB database due to high memory usage.
**Time Range per Read (minutes)** is the maximum time range for the connector to read data from the source OpenTSDB database in a single operation. This is a very important parameter, and users need to decide based on server performance and data storage density. If the range is too small, the execution speed of the synchronization task will be very slow; if the range is too large, it may cause the OpenTSDB database system to fail due to excessive memory usage.
**Delay (seconds)**: This is an integer between 1 and 30. To eliminate the impact of out-of-order data, TDengine always waits for the time specified here before reading the data.
**Delay (seconds)** is an integer ranging from 1 to 30. To eliminate the impact of out-of-order data, TDengine always waits for the duration specified here before reading the data.
### 5. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the `>` on the right to expand it, as shown below:
The **Advanced Options** area is collapsed by default. Click the `>` on the right to expand it, as shown in the following images:
<figure>
<Image img={imgStep07} alt=""/>
@ -91,6 +94,6 @@ The **Advanced Options** section is collapsed by default. Click the `>` on the r
<Image img={imgStep08} alt=""/>
</figure>
### 6. Completion
### 6. Completion of Creation
Click the **Submit** button to complete the creation of the OpenTSDB to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.
Click the **Submit** button to complete the creation of the OpenTSDB to TDengine data synchronization task. Return to the **Data Source List** page to view the status of the task.

View File

@ -14,17 +14,17 @@ import imgStep07 from '../../assets/csv-file-07.png';
import imgStep10 from '../../assets/csv-file-10.png';
import imgStep11 from '../../assets/csv-file-11.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from CSV to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from CSV to the current TDengine cluster.
## Function Overview
## Feature Overview
Import one or more CSV files into TDengine.
Import data from one or more CSV files into TDengine.
## Creating a Task
## Create Task
### 1. Add a Data Source
### 1. Add Data Source
Click the **+Add Data Source** button on the data writing page to enter the Add Data Source page.
On the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep01} alt=""/>
@ -32,11 +32,11 @@ Click the **+Add Data Source** button on the data writing page to enter the Add
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as: "test_csv";
Enter the task name in **Name**, such as: "test_csv";
Select **CSV** from the **Type** dropdown list.
In the **Target Database** dropdown list, select a target database, or click the **+Create Database** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
@ -44,27 +44,27 @@ In the **Target Database** dropdown list, select a target database, or click the
### 3. Configure CSV Options
In the **Contains Header** section, toggle to enable or disable; if it contains a header, the first row will be treated as column information.
Click to enable or disable in the **Include Header** area, if enabled, the first line will be treated as column information.
In the **Ignore First N Rows** section, enter N to ignore the first N rows of the CSV file.
In the **Ignore First N Rows** area, fill in N, indicating to ignore the first N rows of the CSV file.
In the **Field Separator** section, select the separator between CSV fields; the default is ",".
Select in the **Field Separator** area, the separator between CSV fields, default is ",".
In the **Field Quotation Character** section, select the character used to surround field content when the CSV field contains separators or newline characters to ensure the entire field is correctly identified; the default is `" "`.
Select in the **Field Enclosure** area, used to surround field content when CSV fields contain separators or newline characters, ensuring the entire field is correctly recognized, default is "\"".
In the **Comment Prefix Character** section, select the character; if any line in the CSV file begins with this character, that line will be ignored; the default is "#".
Select in the **Comment Prefix** area, if a line in the CSV file starts with the character specified here, that line will be ignored, default is "#".
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure CSV File Parsing
### 4. Configure Parsing CSV File
Upload the CSV file locally, for example: test-json.csv; this sample CSV file will then be used to configure extraction and filtering conditions.
Upload a CSV file locally, for example: test-json.csv, this example csv file will be used later to configure extraction and filtering conditions.
#### 4.1 Parsing
After clicking **Select File**, choose test-json.csv, then click **Parse** to preview the identified columns.
Click **Select File**, choose test-json.csv, then click **Parse** to preview the recognized columns.
<figure>
<Image img={imgStep04} alt=""/>
@ -78,10 +78,8 @@ After clicking **Select File**, choose test-json.csv, then click **Parse** to pr
#### 4.2 Field Splitting
In the **Extract or Split from Columns** section, enter the fields to extract or split from the message body. For example, split the message field into `text_0` and `text_1` using the split extractor; enter `-` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the message field into `text_0` and `text_1`, select the split extractor, fill in the separator as -, and number as 2.
Click **Delete** to remove the current extraction rule.
Click **Add** to add more extraction rules.
<figure>
@ -94,23 +92,31 @@ Click the **Magnifying Glass Icon** to preview the extraction or splitting resul
<Image img={imgStep07} alt=""/>
</figure>
<!-- In the **Filtering** section, enter filtering conditions, such as: `id != 1`, so that only data where id is not 1 will be written to TDengine.
<!-- In **Filter**, fill in the filtering conditions, for example: fill in `id != 1`, then only data with id not equal to 1 will be written into TDengine.
Click **Delete** to remove the current filtering rule.
![csv-08.png](../../assets/csv-file-08.png)
![csv-08.png](./csv-08.png)
Click the **Magnifying Glass Icon** to preview the filtering results.
Click the **Magnifying Glass Icon** to view the preview filtering results.
![csv-09.png](../../assets/csv-file-09.png) -->
![csv-09.png](./csv-09.png) -->
#### 4.3 Table Mapping
In the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button on the right.
Select a target supertable from the **Target Supertable** dropdown list, or click the **Create Supertable** button on the right.
In the **Mapping** section, fill in the subtable name in the target supertable, for example: `t_${groupid}`.
In **Mapping**, fill in the subtable name of the target supertable, for example: `t_${groupid}`.
Click **Preview** to see the mapping results.
<figure>
<Image img={imgStep10} alt=""/>
</figure>
Click **Preview** to preview the mapping results.
<figure>
<Image img={imgStep11} alt=""/>
</figure>
### 5. Completion
Click the **Submit** button to complete the creation of the CSV to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.
Click the **Submit** button to complete the creation of the CSV to TDengine data synchronization task, return to the **Data Source List** page to view the status of the task execution.

View File

@ -13,19 +13,19 @@ import imgStep06 from '../../assets/aveva-historian-06.png';
import imgStep07 from '../../assets/aveva-historian-07.png';
import imgStep08 from '../../assets/aveva-historian-08.png';
This section explains how to create data migration/data synchronization tasks through the Explorer interface to migrate/synchronize data from AVEVA Historian to the current TDengine cluster.
This section describes how to create data migration/data synchronization tasks through the Explorer interface, migrating/synchronizing data from AVEVA Historian to the current TDengine cluster.
## Function Overview
## Feature Overview
AVEVA Historian is an industrial big data analytics software, formerly known as Wonderware. It captures and stores high-fidelity industrial big data, unlocking constrained potential to improve operations.
AVEVA Historian is an industrial big data analytics software, formerly known as Wonderware. It captures and stores high-fidelity industrial big data, unleashing constrained potential to improve operations.
TDengine can efficiently read data from AVEVA Historian and write it to TDengine for historical data migration or real-time data synchronization.
TDengine can efficiently read data from AVEVA Historian and write it into TDengine, enabling historical data migration or real-time data synchronization.
## Creating a Task
## Creating Tasks
### 1. Add a Data Source
Click the **+Add Data Source** button on the data writing page to enter the Add Data Source page.
On the data writing page, click the **+Add Data Source** button to enter the add data source page.
<figure>
<Image img={imgStep01} alt=""/>
@ -33,13 +33,13 @@ Click the **+Add Data Source** button on the data writing page to enter the Add
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as: "test_avevaHistorian";
Enter the task name in **Name**, such as: "test_avevaHistorian";
Select **AVEVA Historian** from the **Type** dropdown list.
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown, or click the **+Create New Agent** button on the right.
**Proxy** is optional, if needed, you can select a specific proxy from the dropdown, or click the **+Create New Proxy** button on the right.
In the **Target Database** dropdown list, select a target database, or click the **+Create Database** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
@ -57,25 +57,25 @@ Click the **Connectivity Check** button to check if the data source is available
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Data Collection Information
### 4. Configure Collection Information
In the **Collection Configuration** area, fill in the parameters related to the collection task.
Fill in the collection task related configuration parameters in the **Collection Configuration** area.
#### 4.1. Migrate Data
To perform data migration, configure the following parameters:
If you want to perform data migration, configure the following parameters:
Select **migrate** from the **Collection Mode** dropdown list.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In **Tags**, fill in the list of tags to migrate, separated by commas (,).
In the **Tag Group Size** field, specify the size of the tag group.
In **Tag Group Size**, fill in the size of the tag group.
In the **Task Start Time** field, enter the start time for the data migration task.
In **Task Start Time**, fill in the start time of the data migration task.
In the **Task End Time** field, enter the end time for the data migration task.
In **Task End Time**, fill in the end time of the data migration task.
In the **Query Time Window** field, specify a time interval; the data migration task will segment the time window according to this interval.
In **Query Time Window**, fill in a time interval, the data migration task will divide time windows according to this interval.
<figure>
<Image img={imgStep04} alt=""/>
@ -83,23 +83,23 @@ In the **Query Time Window** field, specify a time interval; the data migration
#### 4.2. Synchronize Data from the History Table
To synchronize data from the **Runtime.dbo.History** table to TDengine, configure the following parameters:
If you want to synchronize data from the **Runtime.dbo.History** table to TDengine, configure the following parameters:
Select **synchronize** from the **Collection Mode** dropdown list.
In the **Table** field, select **Runtime.dbo.History**.
In **Table**, select **Runtime.dbo.History**.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In **Tags**, fill in the list of tags to migrate, separated by commas (,).
In the **Tag Group Size** field, specify the size of the tag group.
In **Tag Group Size**, fill in the size of the tag group.
In the **Task Start Time** field, enter the start time for the data migration task.
In **Task Start Time**, fill in the start time of the data migration task.
In the **Query Time Window** field, specify a time interval; the historical data part will segment according to this time interval.
In **Query Time Window**, fill in a time interval, the historical data part will divide time windows according to this interval.
In the **Real-Time Synchronization Interval** field, specify a time interval for polling real-time data.
In **Real-time Synchronization Interval**, fill in a time interval, the real-time data part will poll data according to this interval.
In the **Out-of-Order Time Limit** field, specify a time interval; data that arrives later than this interval may be lost during real-time synchronization.
In **Disorder Time Upper Limit**, fill in a time interval, data that enters the database after this time during real-time data synchronization may be lost.
<figure>
<Image img={imgStep05} alt=""/>
@ -107,15 +107,15 @@ In the **Out-of-Order Time Limit** field, specify a time interval; data that arr
#### 4.3. Synchronize Data from the Live Table
To synchronize data from the **Runtime.dbo.Live** table to TDengine, configure the following parameters:
If you want to synchronize data from the **Runtime.dbo.Live** table to TDengine, configure the following parameters:
Select **synchronize** from the **Collection Mode** dropdown list.
In the **Table** field, select **Runtime.dbo.Live**.
In **Table**, select **Runtime.dbo.Live**.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In **Tags**, fill in the list of tags to migrate, separated by commas (,).
In the **Real-Time Synchronization Interval** field, specify a time interval for polling real-time data.
In **Real-time Synchronization Interval**, fill in a time interval, the real-time data part will poll data according to this interval.
<figure>
<Image img={imgStep06} alt=""/>
@ -123,17 +123,17 @@ In the **Real-Time Synchronization Interval** field, specify a time interval for
### 5. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Fill in the data mapping related configuration parameters in the **Data Mapping** area.
Click the **Retrieve from Server** button to get sample data from the AVEVA Historian server.
Click the **Retrieve from Server** button to fetch sample data from the AVEVA Historian server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, select the split extractor, fill in the separator as `,`, and number as 2.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In **Filter**, fill in the filtering conditions, for example: enter `Value > 0`, then only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine, and specify the columns to map to the supertable.
In **Mapping**, select the supertable in TDengine to which you want to map, as well as the columns to map to the supertable.
Click **Preview** to view the mapping results.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep07} alt=""/>
@ -141,24 +141,24 @@ Click **Preview** to view the mapping results.
### 6. Configure Advanced Options
In the **Advanced Options** area, fill in the parameters related to advanced options.
Fill in the related configuration parameters in the **Advanced Options** area.
In the **Maximum Read Concurrency** field, set the maximum read concurrency. The default value is 0, which means auto, automatically configuring the concurrency.
Set the maximum read concurrency in **Maximum Read Concurrency**. Default value: 0, which means auto, automatically configures the concurrency.
In the **Batch Size** field, set the batch size for each write, that is, the maximum number of messages sent at one time.
Set the batch size for each write in **Batch Size**, that is: the maximum number of messages sent at once.
In the **Save Raw Data** section, choose whether to save the raw data. The default is no.
In **Save Raw Data**, choose whether to save the raw data. Default value: No.
When saving raw data, the following two parameters take effect.
When saving raw data, the following two parameters are effective.
In the **Maximum Retention Days** field, set the maximum retention days for the raw data.
Set the maximum retention days for raw data in **Maximum Retention Days**.
In the **Raw Data Storage Directory** field, set the path to save the raw data.
Set the storage path for raw data in **Raw Data Storage Directory**.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 7. Completion
### 7. Completion of Creation
Click the **Submit** button to complete the task creation. After submitting the task, return to the **Data Writing** page to check the task status.
Click the **Submit** button to complete the creation of the task. After submitting the task, return to the **Data Writing** page to view the status of the task.

View File

@ -13,17 +13,17 @@ import imgStep06 from '../../assets/mysql-06.png';
import imgStep07 from '../../assets/mysql-07.png';
import imgStep08 from '../../assets/mysql-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from MySQL to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from MySQL to the current TDengine cluster.
## Function Overview
## Overview
MySQL is one of the most popular relational databases. Many systems have used or are currently using MySQL databases to store data reported by IoT and industrial internet devices. However, as the number of devices connected to these systems continues to grow and user demands for real-time data feedback increase, MySQL can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.0.0, TDengine can efficiently read data from MySQL and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical pain points faced by businesses.
MySQL is one of the most popular relational databases. Many systems have used or are using MySQL databases to store data reported by IoT and industrial internet devices. However, as the number of devices in the access systems grows and the demand for real-time data feedback from users increases, MySQL can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.0.0, TDengine can efficiently read data from MySQL and write it into TDengine, achieving historical data migration or real-time data synchronization, and solving the technical pain points faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the top left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -31,13 +31,13 @@ Click the **+Add Data Source** button in the upper left corner of the data writi
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_mysql_01`*.
Enter the task name in the **Name** field, for example *`test_mysql_01`*.
Select *`MySQL`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
Select *`MySQL`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
**Proxy** is optional. If needed, you can select a specific proxy from the dropdown menu, or click the **+ Create New Proxy** button on the right to create a new proxy.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
**Target Database** is required. You can click the **+ Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
@ -45,7 +45,7 @@ The **Target Database** field is required; you can first click the **+Create Dat
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source MySQL database connection information`*, as shown below:
Fill in the *`connection information for the source MySQL database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -53,9 +53,9 @@ In the **Connection Configuration** area, fill in the *`source MySQL database co
### 4. Configure Authentication Information
In the **User** field, enter the user for the source MySQL database; this user must have read permissions in the organization.
**User** Enter the user of the source MySQL database, who must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source MySQL database.
**Password** Enter the login password for the user mentioned above in the source MySQL database.
<figure>
<Image img={imgStep04} alt=""/>
@ -63,43 +63,42 @@ In the **Password** field, enter the login password for the user in the source M
### 5. Configure Connection Options
In the **Character Set** field, set the character set for the connection. The default character set is utf8mb4. MySQL 5.5.3 supports this feature. If you need to connect to an older version, it is recommended to change it to utf8. Optional values include utf8, utf8mb4, utf16, utf32, gbk, big5, latin1, ascii.
**Character Set** Set the character set for the connection. The default character set is utf8mb4. MySQL 5.5.3 supports this feature. If connecting to an older version, it is recommended to change to utf8.
Options include utf8, utf8mb4, utf16, utf32, gbk, big5, latin1, ascii.
In the **SSL Mode** field, set whether to negotiate a secure SSL TCP/IP connection with the server or prioritize how to negotiate it. The default value is PREFERRED. Optional values include DISABLED, PREFERRED, REQUIRED.
**SSL Mode** Set whether to negotiate a secure SSL TCP/IP connection with the server or the priority of negotiation. The default value is PREFERRED. Options include DISABLED, PREFERRED, REQUIRED.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source MySQL database.
Then click the **Check Connectivity** button, where users can click this button to check if the information filled in above can normally fetch data from the source MySQL database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
**Subtable Field** is used to split subtables, it is a select distinct SQL statement that queries non-repeated items of specified field combinations, usually corresponding to the tag in transform:
> This configuration is mainly to solve the problem of data migration disorder, and it needs to be used together with **SQL Template**, otherwise it cannot achieve the expected effect, usage examples are as follows:
>
> 1. Fill in the subtable field statement `select distinct col_name1, col_name2 from table`, which means using the fields col_name1 and col_name2 in the source table to split the subtables of the target supertable
> 2. Add subtable field placeholders in the **SQL Template**, for example, the `${col_name1} and ${col_name2}` part in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`
> 3. Configure `col_name1` and `col_name2` two tag mappings in **transform**
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
**SQL Template** is the SQL statement template used for querying, the SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time in the source database and the placeholders defined below.
> SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
>
> 1. `${start}`, `${end}`: Represents RFC3339 format timestamps, e.g.: 2024-03-14T08:00:00+0800
> 2. `${start_no_tz}`, `${end_no_tz}`: Represents RFC3339 strings without timezone: 2024-03-14T08:00:00
> 3. `${start_date}`, `${end_date}`: Represents date only, e.g.: 2024-03-14
>
> To solve the problem of data migration disorder, it is advisable to add sorting conditions in the query statement, such as `order by ts asc`.
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
**Start Time** The start time for migrating data, this field is required.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
**End Time** The end time for migrating data, which can be left blank. If set, the migration task will stop automatically after reaching the end time; if left blank, it will continuously synchronize real-time data and the task will not stop automatically.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
**Query Interval** The time interval for querying data in segments, default is 1 day. To avoid querying a large amount of data at once, a data synchronization sub-task will use the query interval to segment the data retrieval.
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date, e.g., 2024-03-14
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
**Delay Duration** In real-time data synchronization scenarios, to avoid losing data due to delayed writes, each synchronization task will read data from before the delay duration.
<figure>
<Image img={imgStep06} alt=""/>
@ -107,17 +106,17 @@ To solve the problem of migration data disorder, sorting conditions should be ad
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
In the **Data Mapping** area, fill in the configuration parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the MySQL server.
Click the **Retrieve from Server** button to fetch sample data from the MySQL server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, select the split extractor, fill in the separator `,`, and number `2`.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In **Filter**, fill in the filtering conditions, for example: write `Value > 0`, then only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
In **Mapping**, select the supertable in TDengine to map to, and the columns to map to the supertable.
Click **Preview** to view the mapping results.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep07} alt=""/>
@ -125,11 +124,11 @@ Click **Preview** to view the mapping results.
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Maximum Read Concurrency** The limit on the number of data source connections or reading threads, modify this parameter when the default parameters do not meet the needs or when adjusting resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
**Batch Size** The maximum number of messages or rows sent at once. The default is 10000.
<figure>
<Image img={imgStep08} alt=""/>
@ -137,4 +136,4 @@ The **Advanced Options** area is folded by default; click the `>` button on the
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from MySQL to TDengine. Return to the **Data Source List** page to view the task execution status.
Click the **Submit** button to complete the creation of the data synchronization task from MySQL to TDengine, and return to the **Data Source List** page to view the task execution status.

View File

@ -13,19 +13,19 @@ import imgStep06 from '../../assets/postgresql-06.png';
import imgStep07 from '../../assets/postgresql-07.png';
import imgStep08 from '../../assets/postgresql-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from PostgreSQL to the current TDengine cluster.
This section describes how to create a data migration task through the Explorer interface to migrate data from PostgreSql to the current TDengine cluster.
## Function Overview
## Feature Overview
PostgreSQL is a powerful open-source client/server relational database management system that has many features found in large commercial RDBMSs, including transactions, subqueries, triggers, views, foreign key referential integrity, and complex locking capabilities.
PostgreSQL is a very powerful, open-source client/server relational database management system with many features found in large commercial RDBMSs, including transactions, subselects, triggers, views, foreign key referential integrity, and sophisticated locking capabilities.
TDengine can efficiently read data from PostgreSQL and write it to TDengine for historical data migration or real-time data synchronization.
TDengine can efficiently read data from PostgreSQL and write it to TDengine, enabling historical data migration or real-time data synchronization.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the upper left corner of the data writing page to enter the add data source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -33,13 +33,13 @@ Click the **+Add Data Source** button in the upper left corner of the data writi
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_postgres_01`*.
Enter the task name in the **Name** field, for example *`test_postgres_01`*.
Select *`PostgreSQL`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
Select *`PostgreSQL`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
**Proxy** is optional. If needed, you can select a specific proxy from the dropdown menu or click the **+ Create New Proxy** button on the right to create a new proxy.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
**Target Database** is required. You can click the **+ Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
@ -47,7 +47,7 @@ The **Target Database** field is required; you can first click the **+Create Dat
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source PostgreSQL database connection information`*, as shown below:
Fill in the *`connection information for the source PostgreSQL database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -55,9 +55,9 @@ In the **Connection Configuration** area, fill in the *`source PostgreSQL databa
### 4. Configure Authentication Information
In the **User** field, enter the user for the source PostgreSQL database; this user must have read permissions in the organization.
**User** Enter the user of the source PostgreSQL database, who must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source PostgreSQL database.
**Password** Enter the login password for the user mentioned above in the source PostgreSQL database.
<figure>
<Image img={imgStep04} alt=""/>
@ -65,43 +65,41 @@ In the **Password** field, enter the login password for the user in the source P
### 5. Configure Connection Options
In the **Application Name** field, set the application name to identify the connecting application.
**Application Name** Set the application name to identify the connected application.
In the **SSL Mode** field, set whether to negotiate a secure SSL TCP/IP connection with the server or prioritize how to negotiate it. The default value is PREFER. Optional values include DISABLE, ALLOW, PREFER, and REQUIRE.
**SSL Mode** Set whether to negotiate a secure SSL TCP/IP connection with the server or the priority of such negotiation. The default value is PREFER. Options include DISABLE, ALLOW, PREFER, REQUIRE.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source PostgreSQL database.
Then click the **Check Connectivity** button, where users can click this button to check if the information filled in above can normally fetch data from the source PostgreSQL database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
**Subtable Field** Used to split subtables, it is a select distinct SQL statement querying non-repeated items of specified field combinations, usually corresponding to the tag in transform:
> This configuration is mainly to solve the data migration disorder problem, and it needs to be used in conjunction with **SQL Template**, otherwise, it cannot achieve the expected effect, usage examples are as follows:
>
> 1. Fill in the subtable field statement `select distinct col_name1, col_name2 from table`, which means using the fields col_name1 and col_name2 in the source table to split the subtables of the target supertable
> 2. Add subtable field placeholders in the **SQL Template**, for example, the `${col_name1} and ${col_name2}` part in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`
> 3. Configure `col_name1` and `col_name2` two tag mappings in **transform**
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
**SQL Template** Used for the SQL statement template for querying, the SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time in the source database and the placeholders defined below.
> Different placeholders represent different time format requirements in SQL, specifically including the following placeholder formats:
>
> 1. `${start}`, `${end}`: Represents RFC3339 format timestamps, e.g.: 2024-03-14T08:00:00+0800
> 2. `${start_no_tz}`, `${end_no_tz}`: Represents RFC3339 strings without timezone: 2024-03-14T08:00:00
> 3. `${start_date}`, `${end_date}`: Represents date only, e.g.: 2024-03-14
>
> To solve the problem of data migration disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
**Start Time** The start time for migrating data, this field is required.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
**End Time** The end time for migrating data, which can be left blank. If set, the migration task will stop automatically after reaching the end time; if left blank, it will continuously synchronize real-time data and the task will not stop automatically.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
**Query Interval** The time interval for querying data in segments, default is 1 day. To avoid querying a large amount of data at once, a data synchronization subtask will use the query interval to segment the data retrieval.
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date, e.g., 2024-03-14
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
**Delay Duration** In real-time data synchronization scenarios, to avoid losing data due to delayed writes, each synchronization task will read data from before the delay duration.
<figure>
<Image img={imgStep06} alt=""/>
@ -109,17 +107,17 @@ To solve the problem of migration data disorder, sorting conditions should be ad
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
In the **Data Mapping** area, fill in the configuration parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the PostgreSQL server.
Click the **Retrieve from Server** button to fetch sample data from the PostgreSQL server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, select the split extractor, fill in the separator `,`, and number `2`.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In **Filter**, fill in the filtering conditions, for example: write `Value > 0`, then only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
In **Mapping**, select the supertable in TDengine to map to, and the columns to map to the supertable.
Click **Preview** to view the mapping results.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep07} alt=""/>
@ -127,11 +125,11 @@ Click **Preview** to view the mapping results.
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Maximum Read Concurrency** Limit on the number of data source connections or read threads. Modify this parameter when the default parameters do not meet the needs or when adjusting resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
**Batch Size** The maximum number of messages or rows sent at once. The default is 10000.
<figure>
<Image img={imgStep08} alt=""/>
@ -139,4 +137,4 @@ The **Advanced Options** area is folded by default; click the `>` button on the
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from PostgreSQL to TDengine. Return to the **Data Source List** page to view the task execution status.
Click the **Submit** button to complete the creation of the data synchronization task from PostgreSQL to TDengine. Return to the **Data Source List** page to view the status of the task execution.

View File

@ -12,19 +12,19 @@ import imgStep05 from '../../assets/oracle-database-05.png';
import imgStep06 from '../../assets/oracle-database-06.png';
import imgStep07 from '../../assets/oracle-database-07.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from Oracle to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from Oracle to the current TDengine cluster.
## Function Overview
## Feature Overview
The Oracle database system is one of the most popular relational database management systems in the world, known for its good portability, ease of use, and powerful features, suitable for various large, medium, and small computing environments. It is an efficient, reliable database solution capable of handling high throughput.
The Oracle database system is a popular relational database management system worldwide, known for its good portability, ease of use, and strong functionality, suitable for various large, medium, and small computer environments. It is an efficient, reliable, and high-throughput database solution.
TDengine can efficiently read data from Oracle and write it to TDengine for historical data migration or real-time data synchronization.
TDengine can efficiently read data from Oracle and write it to TDengine, enabling historical data migration or real-time data synchronization.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -32,13 +32,13 @@ Click the **+Add Data Source** button in the upper left corner of the data writi
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_oracle_01`*.
Enter the task name in the **Name** field, for example, *`test_oracle_01`*.
Select *`Oracle`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
Select *`Oracle`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
**Agent** is optional. If needed, you can select a specific agent from the dropdown menu or click the **+ Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
**Target Database** is required. You can click the **+ Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
@ -46,7 +46,7 @@ The **Target Database** field is required; you can first click the **+Create Dat
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source Oracle database connection information`*, as shown below:
Fill in the *`connection information for the source Oracle database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -54,43 +54,41 @@ In the **Connection Configuration** area, fill in the *`source Oracle database c
### 4. Configure Authentication Information
In the **User** field, enter the user for the source Oracle database; this user must have read permissions in the organization.
**User** Enter the user of the source Oracle database, who must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source Oracle database.
**Password** Enter the login password for the user mentioned above in the source Oracle database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source Oracle database.
Then click the **Check Connectivity** button, where users can click this button to check if the information filled in above can normally access data from the source Oracle database.
### 5. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
**Subtable Field** is used to split the subtable field, which is a select distinct SQL statement querying non-repeated items of specified field combinations, usually corresponding to the tag in transform:
> This configuration is mainly to solve the problem of data migration disorder. It needs to be used in conjunction with **SQL Template**, otherwise, it cannot achieve the expected effect. Usage examples are as follows:
>
> 1. Fill in the subtable field statement `select distinct col_name1, col_name2 from table`, which means using the fields col_name1 and col_name2 in the source table to split the subtable of the target supertable
> 2. Add subtable field placeholders in the **SQL Template**, for example, the `${col_name1} and ${col_name2}` part in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`
> 3. Configure the mappings of `col_name1` and `col_name2` as two tags in **transform**
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
**SQL Template** is the SQL statement template used for querying, which must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time in the source database and the placeholders defined below.
> Different placeholders represent different time format requirements in SQL, specifically including the following placeholder formats:
>
> 1. `${start}`, `${end}`: Represents RFC3339 format timestamps, e.g., 2024-03-14T08:00:00+0800
> 2. `${start_no_tz}`, `${end_no_tz}`: Represents RFC3339 strings without timezone: 2024-03-14T08:00:00
> 3. `${start_date}`, `${end_date}`: Represents date only, but since there is no pure date type in Oracle, it will include zero hour, zero minute, and zero second, e.g., 2024-03-14 00:00:00, so be careful when using `date <= ${end_date}` as it does not include data of the day 2024-03-14
>
> To solve the problem of data migration disorder, a sorting condition should be added to the query statement, such as `order by ts asc`.
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
**Start Time** The start time for migrating data, this field is required.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
**End Time** The end time for migrating data, which can be left blank. If set, the migration task will stop automatically after reaching the end time; if left blank, it will continuously synchronize real-time data and the task will not stop automatically.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
**Query Interval** The time interval for querying data in segments, default is 1 day. To avoid querying too much data at once, a data synchronization sub-task will use the query interval to segment the data retrieval.
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date; however, Oracle does not have a pure date type, so it will include zero hours, minutes, and seconds, e.g., 2024-03-14 00:00:00. Therefore, care must be taken when using `date <= ${end_date}`; it should not include data from the day of 2024-03-14.
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
**Delay Duration** In real-time data synchronization scenarios, to avoid losing data due to delayed writes, each synchronization task will read data from before the delay duration.
<figure>
<Image img={imgStep05} alt=""/>
@ -98,17 +96,17 @@ To solve the problem of migration data disorder, sorting conditions should be ad
### 6. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
In the **Data Mapping** area, fill in the configuration parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the Oracle server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, choose the split extractor, fill in the separator `,`, and number `2`.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In **Filter**, fill in the filtering conditions, for example: write `Value > 0`, then only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
In **Mapping**, select the supertable in TDengine to map to, and the columns to map to the supertable.
Click **Preview** to view the mapping results.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep06} alt=""/>
@ -116,11 +114,11 @@ Click **Preview** to view the mapping results.
### 7. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Maximum Read Concurrency** Limit on the number of data source connections or reading threads, modify this parameter when the default parameters do not meet the needs or when adjusting resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
**Batch Size** The maximum number of messages or rows sent at once. The default is 10000.
<figure>
<Image img={imgStep07} alt=""/>
@ -128,4 +126,4 @@ The **Advanced Options** area is folded by default; click the `>` button on the
### 8. Completion
Click the **Submit** button to complete the creation of the data synchronization task from Oracle to TDengine. Return to the **Data Source List** page to view the task execution status.
Click the **Submit** button to complete the creation of the data synchronization task from Oracle to TDengine, return to the **Data Source List** page to view the status of the task execution.

View File

@ -14,17 +14,17 @@ import imgStep06 from '../../assets/sql-server-06.png';
import imgStep07 from '../../assets/sql-server-07.png';
import imgStep08 from '../../assets/sql-server-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from Microsoft SQL Server to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from Microsoft SQL Server to the current TDengine cluster.
## Function Overview
## Feature Overview
Microsoft SQL Server is one of the most popular relational databases. Many systems have used or are currently using Microsoft SQL Server databases to store data reported by IoT and industrial IoT devices. However, as the number of devices connected to the system increases and users' demands for real-time data feedback grow, Microsoft SQL Server can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.2.0, TDengine can efficiently read data from Microsoft SQL Server and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical challenges faced by businesses.
Microsoft SQL Server is one of the most popular relational databases. Many systems have used or are using Microsoft SQL Server to store data reported by IoT and industrial internet devices. However, as the number of devices in the access systems grows and the demand for real-time data feedback from users increases, Microsoft SQL Server can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.2.0, TDengine can efficiently read data from Microsoft SQL Server and write it into TDengine, achieving historical data migration or real-time data synchronization, and solving technical pain points faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -32,13 +32,13 @@ Click the **+Add Data Source** button in the upper left corner of the data writi
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_mssql_01`*.
Enter the task name in the **Name** field, for example *`test_mssql_01`*.
Select *`Microsoft SQL Server`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
Select *`Microsoft SQL Server`* from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
**Agent** is optional. If needed, you can select a specific agent from the dropdown menu, or click the **+ Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
**Target Database** is required. You can click the **+ Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
@ -46,7 +46,7 @@ The **Target Database** field is required; you can first click the **+Create Dat
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source Microsoft SQL Server database connection information`*, as shown below:
Fill in the *`connection information for the source Microsoft SQL Server database`* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -54,9 +54,9 @@ In the **Connection Configuration** area, fill in the *`source Microsoft SQL Ser
### 4. Configure Authentication Information
In the **User** field, enter the user for the source Microsoft SQL Server database; this user must have read permissions in the organization.
**User** Enter the user of the source Microsoft SQL Server database, who must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source Microsoft SQL Server database.
**Password** Enter the login password for the user mentioned above in the source Microsoft SQL Server database.
<figure>
<Image img={imgStep04} alt=""/>
@ -64,49 +64,49 @@ In the **Password** field, enter the login password for the user in the source M
### 5. Configure Connection Options
In the **Instance Name** field, set the Microsoft SQL Server instance name (the instance name defined in SQL Browser, only available on Windows platforms; if specified, the port will be replaced by the value returned from SQL Browser).
**Instance Name** Set the Microsoft SQL Server instance name (defined in SQL Browser, only available on Windows platform, if specified, the port will be replaced with the value returned from SQL Browser).
In the **Application Name** field, set the application name used to identify the connecting application.
**Application Name** Set the application name to identify the connecting application.
In the **Encryption** field, set whether to use an encrypted connection. The default value is Off. The options are Off, On, NotSupported, and Required.
**Encryption** Set whether to use an encrypted connection. The default value is Off. Options include Off, On, NotSupported, Required.
In the **Trust Server Certificate** field, set whether to trust the server certificate; if enabled, the server certificate will not be validated and will be accepted as is (if trust is enabled, the **Trust Certificate CA** field below will be hidden).
**Trust Certificate** Set whether to trust the server certificate. If enabled, the server certificate will not be verified and will be accepted as is (if trust is enabled, the `Trust Certificate CA` field below will be hidden).
In the **Trust Certificate CA** field, set whether to trust the server's certificate CA. If a CA file is uploaded, the server certificate will be verified against the provided CA certificate in addition to the system trust store.
**Trust Certificate CA** Set whether to trust the server's certificate CA. If a CA file is uploaded, the server certificate will be verified based on the provided CA certificate in addition to the system trust store.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source Microsoft SQL Server database.
Then click the **Check Connectivity** button. Users can click this button to check if the information filled in above can normally retrieve data from the source Microsoft SQL Server database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
**Subtable Field** is used to split subtables, it is a select distinct SQL statement that queries non-repeated items of specified field combinations, usually corresponding to the tag in transform:
> This configuration is mainly to solve the data migration disorder problem, and needs to be used in conjunction with **SQL Template**, otherwise it cannot achieve the expected effect, usage examples are as follows:
>
> 1. Fill in the subtable field statement `select distinct col_name1, col_name2 from table`, which means using the fields col_name1 and col_name2 in the source table to split the subtables of the target supertable
> 2. Add subtable field placeholders in the **SQL Template**, for example, the `${col_name1} and ${col_name2}` part in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`
> 3. Configure the mappings of `col_name1` and `col_name2` as two tags in **transform**
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
**SQL Template** is used for the SQL statement template for querying, which must include a time range condition, and the start time and end time must appear in pairs. The time range defined in the SQL statement template consists of a column representing time in the source database and the placeholders defined below.
> SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
>
> 1. `${start}`, `${end}`: Represents RFC3339 format timestamp, e.g.: 2024-03-14T08:00:00+0800
> 2. `${start_no_tz}`, `${end_no_tz}`: Represents an RFC3339 string without timezone: 2024-03-14T08:00:00
> 3. `${start_date}`, `${end_date}`: Represents date only, e.g.: 2024-03-14
>
> Note: Only `datetime2` and `datetimeoffset` support querying using start/end, `datetime` and `smalldatetime` can only use start_no_tz/end_no_tz for querying, and `timestamp` cannot be used as a query condition.
>
> To solve the problem of data migration disorder, it is advisable to add a sorting condition in the query statement, such as `order by ts asc`.
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
**Start Time** The start time of the data migration, this field is mandatory.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
**End Time** The end time of the data migration, which can be left blank. If set, the migration task will stop automatically after reaching the end time; if left blank, it will continuously synchronize real-time data, and the task will not stop automatically.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
**Query Interval** The time interval for segmenting data queries, default is 1 day. To avoid querying too much data at once, a data synchronization subtask will use the query interval to segment the data.
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date; however, only `datetime2` and `datetimeoffset` support using start/end queries, while `datetime` and `smalldatetime` can only use start_no_tz/end_no_tz queries, and `timestamp` cannot be used as a query condition.
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
**Delay Duration** In real-time data synchronization scenarios, to avoid losing data due to delayed writing, each synchronization task will read data from before the delay duration.
<figure>
<Image img={imgStep06} alt=""/>
@ -114,17 +114,17 @@ To solve the problem of migration data disorder, sorting conditions should be ad
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Fill in the related configuration parameters in the **Data Mapping** area.
Click the **Retrieve from Server** button to get sample data from the Microsoft SQL Server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, select the split extractor, fill in the separator as `,`, and number as 2.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In **Filter**, fill in the filter conditions, for example: write `Value > 0`, then only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
In **Mapping**, select the supertable in TDengine to which you want to map, and the columns to map to the supertable.
Click **Preview** to view the mapping results.
Click **Preview** to view the results of the mapping.
<figure>
<Image img={imgStep07} alt=""/>
@ -132,11 +132,11 @@ Click **Preview** to view the mapping results.
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Maximum Read Concurrency** Limit on the number of data source connections or reading threads, modify this parameter when default parameters do not meet the needs or when adjusting resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
**Batch Size** The maximum number of messages or rows sent at once. The default is 10000.
<figure>
<Image img={imgStep08} alt=""/>
@ -144,4 +144,4 @@ The **Advanced Options** area is folded by default; click the `>` button on the
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from Microsoft SQL Server to TDengine. Return to the **Data Source List** page to view the task execution status.
Click the **Submit** button to complete the creation of the data synchronization task from Microsoft SQL Server to TDengine, and return to the **Data Source List** page to view the status of the task execution.

View File

@ -13,17 +13,17 @@ import imgStep06 from '../../assets/mongodb-06.png';
import imgStep07 from '../../assets/mongodb-07.png';
import imgStep08 from '../../assets/mongodb-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from MongoDB to the current TDengine cluster.
This section describes how to create data migration tasks through the Explorer interface, migrating data from MongoDB to the current TDengine cluster.
## Function Overview
## Feature Overview
MongoDB is a product that sits between relational and non-relational databases and is widely used in various fields such as content management systems, mobile applications, and the Internet of Things. Starting from TDengine Enterprise Edition 3.3.3.0, TDengine can efficiently read data from MongoDB and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical challenges faced by businesses.
MongoDB is a product that lies between relational and non-relational databases, widely used in content management systems, mobile applications, and the Internet of Things, among other fields. Starting from TDengine Enterprise Edition 3.3.3.0, TDengine can efficiently read data from MongoDB and write it into TDengine, achieving historical data migration or real-time data synchronization, and addressing technical pain points faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper right corner of the data writing page to enter the Add Data Source page, as shown below:
Click the **+ Add Data Source** button in the top right corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
@ -31,13 +31,13 @@ Click the **+Add Data Source** button in the upper right corner of the data writ
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as `test_mongodb_01`.
Enter the task name in the **Name** field, for example `test_mongodb_01`.
Select `MongoDB` from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
Select `MongoDB` from the **Type** dropdown menu, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
**Proxy** is optional. If needed, you can select a specific proxy from the dropdown menu, or click the **+ Create New Proxy** button on the right to create a new proxy.
The **Target Database** field is required; you can select a specified database from the dropdown or click the **+Create Database** button on the right to create a new database.
**Target Database** is mandatory. You can select a specific database from the dropdown menu, or click the **+ Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
@ -45,7 +45,7 @@ The **Target Database** field is required; you can select a specified database f
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source MongoDB database connection information`*, as shown below:
Fill in the *connection information for the source MongoDB database* in the **Connection Configuration** area, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
@ -53,11 +53,11 @@ In the **Connection Configuration** area, fill in the *`source MongoDB database
### 4. Configure Authentication Information
In the **User** field, enter the user for the source MongoDB database; this user must have read permissions in the MongoDB system.
**User** Enter the user of the source MongoDB database, who must have read permissions in the MongoDB system.
In the **Password** field, enter the login password for the user in the source MongoDB database.
**Password** Enter the login password for the user mentioned above in the source MongoDB database.
In the **Authentication Database** field, enter the database in MongoDB that stores user information, which defaults to admin.
**Authentication Database** The database in MongoDB where user information is stored, default is admin.
<figure>
<Image img={imgStep04} alt=""/>
@ -65,64 +65,65 @@ In the **Authentication Database** field, enter the database in MongoDB that sto
### 5. Configure Connection Options
In the **Application Name** field, set the application name used to identify the connecting application.
**Application Name** Set the application name to identify the connected application.
In the **SSL Certificate** field, set whether to use an encrypted connection, which is off by default. If enabled, you need to upload the following two files:
**SSL Certificate** Set whether to use an encrypted connection, which is off by default. If enabled, you need to upload the following two files:
1. **CA File**: Upload the SSL encrypted certificate authorization file.
2. **Certificate File**: Upload the SSL encrypted certificate file.
&emsp; 1. **CA File** Upload the SSL encryption certificate authority file.
&emsp; 2. **Certificate File** Upload the SSL encryption certificate file.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source MongoDB database.
Then click the **Check Connectivity** button, where users can click this button to check if the information filled in above can normally retrieve data from the source MongoDB database.
### 6. Configure Data Query
In the **Database** field, specify the source database in MongoDB, and you can use placeholders for dynamic configuration, such as `database_${Y}`. See the table below for the available placeholders.
**Database** The source database in MongoDB, which can be dynamically configured using placeholders, such as `database_${Y}`. See the table below for a list of available placeholders.
In the **Collection** field, specify the collection in MongoDB, and you can also use placeholders for dynamic configuration, such as `collection_${md}`. See the table below for the available placeholders.
**Collection** The collection in MongoDB, which can be dynamically configured using placeholders, such as `collection_${md}`. See the table below for a list of available placeholders.
| Placeholder | Description | Example Data |
| :---------: | :----------------------------------------------------------: | :----------: |
| Y | Complete year in Gregorian calendar, zero-padded 4-digit integer | 2024 |
| y | Year in Gregorian calendar divided by 100, zero-padded 2-digit integer | 24 |
| M | Integer month (1 - 12) | 1 |
| m | Integer month (01 - 12) | 01 |
| B | Full name of the month in English | January |
| b | Abbreviation of the month in English (3 letters) | Jan |
| D | Numeric representation of the date (1 - 31) | 1 |
| d | Numeric representation of the date (01 - 31) | 01 |
| J | Day of the year (1 - 366) | 1 |
| j | Day of the year (001 - 366) | 001 |
| F | Equivalent to `${Y}-${m}-${d}` | 2024-01-01 |
|Placeholder|Description|Example Data|
| :-----: | :------------: |:--------:|
|Y|Complete Gregorian year, zero-padded 4-digit integer|2024|
|y|Gregorian year divided by 100, zero-padded 2-digit integer|24|
|M|Integer month (1 - 12)|1|
|m|Integer month (01 - 12)|01|
|B|Full English spelling of the month|January|
|b|Abbreviation of the month in English (3 letters)|Jan|
|D|Numeric representation of the date (1 - 31)|1|
|d|Numeric representation of the date (01 - 31)|01|
|J|Day of the year (1 - 366)|1|
|j|Day of the year (001 - 366)|001|
|F|Equivalent to `${Y}-${m}-${d}`|2024-01-01|
The **Subtable Fields** are used to split the subtable fields, which typically correspond to tags in the transform section. Multiple fields are separated by commas, e.g., `col_name1,col_name2`.
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **Query Template**; otherwise, the expected effect cannot be achieved. Usage examples:
**Subtable Fields** Fields used to split subtables, usually corresponding to tags in transform, separated by commas, such as col_name1,col_name2.
This configuration is mainly to solve the problem of data migration disorder, and needs to be used in conjunction with **Query Template**, otherwise it cannot achieve the expected effect. Usage examples are as follows:
1. Configure two subtable fields `col_name1,col_name2`.
2. In the **Query Template**, add placeholders for the subtable fields, for example, `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}, ${col_name1}, ${col_name2}}` where `${col_name1}` and `${col_name2}` are the placeholders.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
1. Configure two subtable fields `col_name1,col_name2`
2. Add subtable field placeholders in the **Query Template**, such as the `${col_name1}, ${col_name2}` part in `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}, ${col_name1}, ${col_name2}}`
3. Configure `col_name1` and `col_name2` two tag mappings in **transform**
The **Query Template** is used for querying data. It must be in JSON format and must include time range conditions, with start and end times appearing in pairs. The defined time range in the template is composed of a column representing time from the source database and the placeholders defined below.
Using different placeholders represents different time format requirements, specifically the following placeholder formats:
**Query Template** is used for querying data with a JSON format query statement, which must include a time range condition, and the start and end times must appear in pairs. The time range defined in the template consists of a time-representing column from the source database and the placeholders defined below.
Different placeholders represent different time format requirements, specifically the following placeholder formats:
1. `${start_datetime}` and `${end_datetime}`: Correspond to filtering based on backend datetime type fields, e.g., `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}}` will be converted to `{"ddate":{"$gte":{"$date":"2024-06-01T00:00:00+00:00"},"$lt":{"$date":"2024-07-01T00:00:00"}}}`
2. `${start_timestamp}` and `${end_timestamp}`: Correspond to filtering based on backend timestamp type fields, e.g., `{"ttime":{"$gte":${start_timestamp},"$lt":${end_timestamp}}}` will be converted to `{"ttime":{"$gte":{"$timestamp":{"t":123,"i":456}},"$lt":{"$timestamp":{"t":123,"i":456}}}}`
1. `${start_datetime}`, `${end_datetime}`: Corresponds to filtering by the backend datetime type field, e.g., `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}}` will be converted to `{"ddate":{"$gte":{"$date":"2024-06-01T00:00:00+00:00"},"$lt":{"$date":"2024-07-01T00:00:00+00:00"}}}`
2. `${start_timestamp}`, `${end_timestamp}`: Corresponds to filtering by the backend timestamp type field, e.g., `{"ttime":{"$gte":${start_timestamp},"$lt":${end_timestamp}}}` will be converted to `{"ttime":{"$gte":{"$timestamp":{"t":123,"i":456}},"$lt":{"$timestamp":{"t":123,"i":456}}}}`
In the **Query Sorting** field, specify sorting conditions for executing the query in JSON format. It must comply with MongoDB's sorting condition format. Example usages:
**Query Sorting** Sorting conditions during query execution, in JSON format, must comply with MongoDB's sorting condition format specifications, with usage examples as follows:
1. `{"createtime":1}`: Returns MongoDB query results sorted by `createtime` in ascending order.
2. `{"createdate":1, "createtime":1}`: Returns MongoDB query results sorted by `createdate` in ascending order, followed by `createtime` in ascending order.
1. `{"createtime":1}`: MongoDB query results are returned in ascending order by createtime.
2. `{"createdate":1, "createtime":1}`: MongoDB query results are returned in ascending order by createdate and createtime.
**Start Time** is the starting time for migrating data; this is a required field.
**Start Time** The start time for migrating data, this field is mandatory.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**End Time** The end time for migrating data, can be left blank. If set, the migration task will stop automatically after reaching the end time; if left blank, it will continuously synchronize real-time data, and the task will not stop automatically.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Query Interval** The time interval for segmenting data queries, default is 1 day. To avoid querying too much data at once, a data synchronization subtask will use the query interval to segment the data.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
**Delay Duration** In real-time data synchronization scenarios, to avoid losing data due to delayed writes, each synchronization task will read data from before the delay duration.
<figure>
<Image img={imgStep06} alt=""/>
@ -130,17 +131,17 @@ In the **Query Sorting** field, specify sorting conditions for executing the que
### 7. Configure Data Mapping
In the **Payload Transformation** area, fill in the parameters related to data mapping.
Fill in the data mapping related configuration parameters in the **Payload Transformation** area.
Click the **Retrieve from Server** button to get sample data from the MongoDB server.
Click the **Retrieve from Server** button to fetch sample data from the MongoDB server.
In the **Parsing** section, choose from JSON/Regex/UDT parsing rules for the raw message body; after configuration, click the **Preview** button on the right to view the parsing results.
In **Parsing**, choose from JSON/Regex/UDT to parse the original message body, and click the **Preview** button on the right to view the parsing results after configuration.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number. After configuration, click the **Preview** button on the right to view the transformation results.
In **Extract or Split from Column**, fill in the fields to extract or split from the message body, for example: split the `vValue` field into `vValue_0` and `vValue_1`, select the split extractor, fill in the separator as `,`, number as 2, and click the **Preview** button on the right to view the transformation results after configuration.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine. After configuration, click the **Preview** button on the right to view the filtering results.
In **Filter**, fill in the filtering conditions, for example: write `Value > 0`, then only data where Value is greater than 0 will be written to TDengine, and click the **Preview** button on the right to view the filtering results after configuration.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable. After configuration, click the **Preview** button on the right to view the mapping results.
In **Mapping**, select the supertable in TDengine to which the data will be mapped, as well as the columns to map to the supertable, and click the **Preview** button on the right to view the mapping results after configuration.
<figure>
<Image img={imgStep07} alt=""/>
@ -148,11 +149,11 @@ In the **Mapping** section, select the supertable to map to TDengine and specify
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
The **Advanced Options** area is collapsed by default, click the `>` on the right to expand it, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Maximum Read Concurrency** Limit on the number of data source connections or reading threads, modify this parameter when default parameters do not meet the needs or when adjusting resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
**Batch Size** The maximum number of messages or rows sent at once. Default is 10000.
<figure>
<Image img={imgStep08} alt=""/>
@ -160,4 +161,4 @@ The **Advanced Options** area is folded by default; click the `>` button on the
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from MongoDB to TDengine. Return to the **Data Source List** page to view the task execution status.
Click the **Submit** button to complete the creation of the data synchronization task from MongoDB to TDengine, and return to the **Data Source List** page to view the task execution status.

View File

@ -13,9 +13,9 @@ import imgSplit from '../../assets/data-connectors-06.png';
## Overview
TDengine Enterprise is equipped with a powerful visual data management tool—taosExplorer. With taosExplorer, users can easily configure tasks in their browsers to seamlessly import data from various sources into TDengine with zero code. During the import process, TDengine automatically extracts, filters, and transforms data to ensure its quality. This zero-code data source access approach has successfully transformed TDengine into an outstanding time-series big data aggregation platform. Users do not need to deploy additional ETL tools, significantly simplifying the overall architecture design and improving data processing efficiency.
TDengine Enterprise is equipped with a powerful visual data management tool—taosExplorer. With taosExplorer, users can easily submit tasks to TDengine through simple configurations in the browser, achieving seamless data import from various data sources into TDengine with zero coding. During the import process, TDengine automatically extracts, filters, and transforms the data to ensure the quality of the imported data. Through this zero-code data source integration method, TDengine has successfully transformed into an outstanding platform for aggregating time-series big data. Users do not need to deploy additional ETL tools, thereby greatly simplifying the overall architecture design and improving data processing efficiency.
The following figure illustrates the system architecture of the zero-code access platform.
The diagram below shows the system architecture of the zero-code integration platform.
<figure>
<Image img={imgZeroCode} alt="Zero-code access platform"/>
@ -24,33 +24,41 @@ The following figure illustrates the system architecture of the zero-code access
## Supported Data Sources
Currently, TDengine supports the following data sources:
The data sources currently supported by TDengine are as follows:
1. Aveva PI System: An industrial data management and analysis platform, formerly known as OSIsoft PI System, which can collect, integrate, analyze, and visualize industrial data in real-time, helping enterprises achieve intelligent decision-making and refined management.
2. Aveva Historian: An industrial big data analysis software, formerly known as Wonderware Historian, designed for industrial environments to store, manage, and analyze real-time and historical data from various industrial devices and sensors.
3. OPC DA/UA: OPC stands for Open Platform Communications, an open and standardized communication protocol used for data exchange between automation devices from different vendors. It was initially developed by Microsoft to address interoperability issues among different devices in the industrial control field. The OPC protocol was first released in 1996 as OPC DA (Data Access), primarily for real-time data collection and control. In 2006, the OPC Foundation released the OPC UA (Unified Architecture) standard, a service-oriented, object-oriented protocol with greater flexibility and scalability, which has become the mainstream version of the OPC protocol.
4. MQTT: Short for Message Queuing Telemetry Transport, a lightweight communication protocol based on a publish/subscribe model, designed for low-overhead, low-bandwidth instant messaging, widely used in IoT, small devices, mobile applications, and other fields.
5. Kafka: An open-source stream processing platform developed by the Apache Software Foundation, primarily used for processing real-time data and providing a unified, high-throughput, low-latency messaging system. It features high speed, scalability, persistence, and a distributed design, allowing it to handle hundreds of thousands of read and write operations per second, supporting thousands of clients while maintaining data reliability and availability.
6. OpenTSDB: A distributed and scalable time-series database based on HBase. It is mainly used to store, index, and provide metrics data collected from large-scale clusters (including network devices, operating systems, applications, etc.), making it easier to access and visualize this data.
7. CSV: Short for Comma Separated Values, a plain text file format that uses commas to separate values, typically used in spreadsheet or database software.
8. TDengine 2: Refers to instances of TDengine running version 2.x.
9. TDengine 3: Refers to instances of TDengine running version 3.x.
10. Relational databases such as MySQL, PostgreSQL, and Oracle.
| Data Source | Supported Version | Description |
| --- | --- | --- |
| Aveva PI System | PI AF Server Version 2.10.9.593 or above | An industrial data management and analytics platform, formerly known as OSIsoft PI System, capable of real-time collection, integration, analysis, and visualization of industrial data, helping enterprises achieve intelligent decision-making and refined management |
| Aveva Historian | AVEVA Historian 2020 RS SP1 | Industrial big data analytics software, formerly known as Wonderware Historian, designed for industrial environments to store, manage, and analyze real-time and historical data from various industrial devices and sensors |
| OPC DA | Matrikon OPC version: 1.7.2.7433 | Abbreviation for Open Platform Communications, an open, standardized communication protocol for data exchange between automation devices from different manufacturers. Initially developed by Microsoft, it was aimed at addressing interoperability issues in the industrial control field; the OPC protocol was first released in 1996, then known as OPC DA (Data Access), mainly for real-time data collection and control. |
| OPC UA | KeepWare KEPServerEx 6.5 | In 2006, the OPC Foundation released the OPC UA (Unified Architecture) standard, a service-oriented, object-oriented protocol with higher flexibility and scalability, which has become the mainstream version of the OPC protocol |
| MQTT | emqx: 3.0.0 to 5.7.1<br/> hivemq: 4.0.0 to 4.31.0<br/> mosquitto: 1.4.4 to 2.0.18 | Abbreviation for Message Queuing Telemetry Transport, a lightweight communication protocol based on the publish/subscribe pattern, designed for low overhead, low bandwidth usage instant messaging, widely applicable in IoT, small devices, mobile applications, and other fields. |
| Kafka | 2.11 ~ 3.8.0 | An open-source stream processing platform developed by the Apache Software Foundation, primarily used for processing real-time data and providing a unified, high-throughput, low-latency messaging system. It features high speed, scalability, persistence, and a distributed design, enabling it to handle hundreds of thousands of read/write operations per second, support thousands of clients, while maintaining data reliability and availability. |
| InfluxDB | 1.7, 1.8, 2.0-2.7 | A popular open-source time-series database optimized for handling large volumes of time-series data.|
| OpenTSDB | 2.4.1 | A distributed, scalable time-series database based on HBase. It is primarily used for storing, indexing, and providing access to metric data collected from large-scale clusters (including network devices, operating systems, applications, etc.), making this data more accessible and graphically presentable. |
| MySQL | 5.6,5.7,8.0+ | One of the most popular relational database management systems, known for its small size, fast speed, low overall ownership cost, and particularly its open-source nature, making it the choice for website database development for both medium-sized and large websites. |
| Oracle | 11G/12c/19c | Oracle Database System is one of the world's popular relational database management systems, known for its good portability, ease of use, powerful features, suitable for various large, medium, and small computer environments. It is an efficient, reliable, and high-throughput database solution. |
| PostgreSQL | v15.0+ | PostgreSQL is a very powerful open-source client/server relational database management system, with many features found in large commercial RDBMS, including transactions, sub-selects, triggers, views, foreign key referential integrity, and complex locking capabilities.|
| SQL Server | 2012/2022 | Microsoft SQL Server is a relational database management system developed by Microsoft, known for its ease of use, good scalability, and high integration with related software. |
| MongoDB | 3.6+ | MongoDB is a product between relational and non-relational databases, widely used in content management systems, mobile applications, and the Internet of Things, among many other fields. |
| CSV | - | Abbreviation for Comma Separated Values, a plain text file format separated by commas, commonly used in spreadsheet or database software. |
| TDengine 2.x | 2.4 or 2.6+ | Older version of TDengine, no longer maintained, upgrade to the latest version 3.0 is recommended. |
| TDengine 3.x | Source version+ | Use TMQ for subscribing to specified databases or supertables from TDengine. |
## Data Extraction, Filtering, and Transformation
Since there can be multiple data sources, the physical units, naming conventions, and time zones may vary. To address this issue, TDengine has built-in ETL capabilities to parse and extract the necessary data from data packets and perform filtering and transformation to ensure the quality of the written data and provide a unified naming space. The specific functionalities are as follows:
Since there can be multiple data sources, each data source may have different physical units, naming conventions, and time zones. To address this issue, TDengine has built-in ETL capabilities that can parse and extract the required data from the data packets of data sources, and perform filtering and transformation to ensure the quality of the data written and provide a unified namespace. The specific functions are as follows:
1. Parsing: Use JSON Path or regular expressions to parse fields from raw messages.
2. Extracting or Splitting from Columns: Use split or regular expressions to extract multiple fields from a raw field.
3. Filtering: Only messages with a true expression value will be written to TDengine.
4. Transformation: Establish a conversion and mapping relationship between the parsed fields and TDengine supertable fields.
1. Parsing: Use JSON Path or regular expressions to parse fields from the original message.
2. Extracting or splitting from columns: Use split or regular expressions to extract multiple fields from an original field.
3. Filtering: Messages are only written to TDengine if the expression's value is true.
4. Transformation: Establish conversion and mapping relationships between parsed fields and TDengine supertable fields.
Below are detailed explanations of the data transformation rules.
Below is a detailed explanation of the data transformation rules.
### Parsing
This step is only required for unstructured data sources. Currently, MQTT and Kafka data sources use the provided rules to parse unstructured data and initially obtain structured data that can be represented as row and column data described by fields. In the explorer, you need to provide sample data and parsing rules to preview the structured data presented in a table format.
Only unstructured data sources need this step. Currently, MQTT and Kafka data sources use the rules provided in this step to parse unstructured data to preliminarily obtain structured data, i.e., row and column data that can be described by fields. In the explorer, you need to provide sample data and parsing rules to preview the parsed structured data presented in a table.
#### Sample Data
@ -59,23 +67,23 @@ This step is only required for unstructured data sources. Currently, MQTT and Ka
<figcaption>Figure 2. Sample data</figcaption>
</figure>
As shown in the figure, the textarea input box contains the sample data, which can be obtained in three ways:
As shown in the image, the textarea input box contains the sample data, which can be obtained in three ways:
1. Directly inputting sample data into the textarea.
2. Clicking the button on the right "Retrieve from Server" retrieves sample data from the configured server and appends it to the sample data textarea.
3. Uploading a file to append its content to the sample data textarea.
1. Directly enter the sample data in the textarea;
2. Click the button on the right "Retrieve from Server" to get the sample data from the configured server and append it to the sample data textarea;
3. Upload a file, appending the file content to the sample data textarea.
Each sample data entry ends with a newline character.
Each piece of sample data ends with a carriage return.
#### Parsing
#### Parsing<a name="parse"></a>
Parsing involves converting unstructured strings into structured data through parsing rules. The current parsing rules for message bodies support JSON, Regex, and UDT.
Parsing is the process of parsing unstructured strings into structured data. The message body's parsing rules currently support JSON, Regex, and UDT.
##### JSON Parsing
JSON parsing supports JSONObject or JSONArray. The following JSON sample data can automatically parse the fields: `groupid`, `voltage`, `current`, `ts`, `inuse`, `location`.
JSON parsing supports JSONObject or JSONArray. The following JSON sample data can automatically parse fields: `groupid`, `voltage`, `current`, `ts`, `inuse`, `location`.
```json
``` json
{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}
@ -83,17 +91,17 @@ JSON parsing supports JSONObject or JSONArray. The following JSON sample data ca
Or
```json
``` json
[{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"},
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"},
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}]
```
Subsequent examples will illustrate with JSONObject as an example.
Subsequent examples will only explain using JSONObject.
The following nested JSON structure can automatically parse the fields `groupid`, `data_voltage`, `data_current`, `ts`, `inuse`, `location_0_province`, `location_0_city`, `location_0_datun`, and you can also choose which fields to parse and set aliases.
The following nested JSON data can automatically parse fields `groupid`, `data_voltage`, `data_current`, `ts`, `inuse`, `location_0_province`, `location_0_city`, `location_0_datun`, and you can also choose which fields to parse and set aliases for the parsed fields.
```json
``` json
{"groupid": 170001, "data": { "voltage": "221V", "current": 12.3 }, "ts": "2023-12-18T22:12:00", "inuse": true, "location": [{"province": "beijing", "city":"chaoyang", "street": "datun"}]}
```
@ -102,11 +110,11 @@ The following nested JSON structure can automatically parse the fields `groupid`
<figcaption>Figure 3. JSON parsing</figcaption>
</figure>
##### Regex Regular Expression
##### Regex Regular Expressions<a name="regex"></a>
You can use **named capture groups** in regular expressions to extract multiple fields from any string (text) field. As shown in the figure, this extracts the access IP, timestamp, and accessed URL from the nginx log.
You can use **named capture groups** in regular expressions to extract multiple fields from any string (text) field. As shown in the figure, extract fields such as access IP, timestamp, and accessed URL from nginx logs.
```re
``` re
(?<ip>\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b)\s-\s-\s\[(?<ts>\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}\s\+\d{4})\]\s"(?<method>[A-Z]+)\s(?<url>[^\s"]+).*(?<status>\d{3})\s(?<length>\d+)
```
@ -115,17 +123,17 @@ You can use **named capture groups** in regular expressions to extract multiple
<figcaption>Figure 4. Regex parsing</figcaption>
</figure>
##### UDT Custom Parsing Script
##### UDT Custom Parsing Scripts
Custom Rhai syntax scripts can be used to parse input data (refer to `https://rhai.rs/book/`). The script currently only supports raw JSON data.
Custom rhai syntax scripts for parsing input data (refer to `https://rhai.rs/book/`), the script currently only supports json format raw data.
**Input**: The script can use the parameter data, which is the Object Map after parsing the raw JSON data.
**Input**: In the script, you can use the parameter data, which is the Object Map after the raw data is parsed from json;
**Output**: The output data must be an array.
For example, for data reporting three-phase voltage values, which are to be entered into three subtables, parsing is required.
For example, for data reporting three-phase voltage values, which are entered into three subtables respectively, such data needs to be parsed
```json
``` json
{
"ts": "2024-06-27 18:00:00",
"voltage": "220.1,220.3,221.1",
@ -133,7 +141,7 @@ For example, for data reporting three-phase voltage values, which are to be ente
}
```
You can use the following script to extract the three voltage data.
Then you can use the following script to extract the three voltage data.
```rhai
let v3 = data["voltage"].split(",");
@ -145,72 +153,72 @@ let v3 = data["voltage"].split(",");
]
```
The final parsed result is as follows:
The final parsing result is shown below:
<figure>
<Image img={imgResults} alt="Parsed results"/>
<figcaption>Figure 5. Parsed results</figcaption>
</figure>
### Extracting or Splitting
### Extraction or Splitting
The parsed data may not meet the requirements of the target table. For instance, the raw data collected from the smart meter is as follows (in JSON format):
The parsed data may still not meet the data requirements of the target table. For example, the original data collected by a smart meter is as follows (in json format):
```json
``` json
{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}
```
The voltage parsed using the JSON rules is expressed as a string with units. Ultimately, it is hoped to store the voltage and current values as integers for statistical analysis, which requires further splitting of the voltage; in addition, the date is expected to be split into date and time for storage.
Using json rules, the voltage is parsed as a string with units, and it is desired to use int type to record voltage and current values for statistical analysis, so further splitting of the voltage is needed; additionally, the date is expected to be split into date and time for storage.
You can use the split rule on the source field `ts` to split it into date and time, and use regex to extract the voltage value and unit from the `voltage` field. The split rule requires setting the **delimiter** and **number of splits**, and the naming convention for the split fields is `{original_field_name}_{order_number}`, while the Regex rule is the same as in the parsing process, using **named capture groups** to name the extracted fields.
As shown in the figure below, you can use the split rule on the source field `ts` to split it into date and time, and use regex to extract the voltage value and unit from the field `voltage`. The split rule needs to set **delimiter** and **number of splits**, and the naming rule for the split fields is `{original field name}_{sequence number}`. The Regex rule is the same as in the parsing process, using **named capture groups** to name the extracted fields.
### Filtering
### Filtering<a name="filter"></a>
The filtering function allows you to set filtering conditions so that only rows of data meeting the conditions will be written to the target table. The result of the filtering condition expression must be of boolean type. Before writing filtering conditions, you must determine the type of the parsed fields, and based on the type, you can use judgment functions and comparison operators (`>`, `>=`, `<=`, `<`, `==`, `!=`) for judgment.
The filtering feature can set filtering conditions, and only data rows that meet the conditions will be written to the target table. The result of the filter condition expression must be of boolean type. Before writing filter conditions, it is necessary to determine the type of parsed fields, and based on the type of parsed fields, judgment functions and comparison operators (`>`, `>=`, `<=`, `<`, `==`, `!=`) can be used to judge.
#### Field Types and Conversion
Only by clearly defining the type of each parsed field can you use the correct syntax for data filtering.
Only by clearly parsing the type of each field can you use the correct syntax for data filtering.
Fields parsed using JSON rules automatically set types based on their attribute values:
Fields parsed using the json rule are automatically set to types based on their attribute values:
1. bool type: `"inuse": true`
2. int type: `"voltage": 220`
3. float type: `"current" : 12.2`
4. String type: `"location": "MX001"`
Data parsed using regex rules are all of string type.
Data extracted or split using split and regex rules are of string type.
Data parsed using regex rules are all string types.
Data extracted or split using split and regex are string types.
If the extracted data type does not match the expected type, you can perform type conversion. Common type conversions involve converting strings to numeric types. The supported conversion functions are as follows:
If the extracted data type is not the expected type, data type conversion can be performed. A common data type conversion is converting a string to a numeric type. Supported conversion functions are as follows:
|Function|From type|To type|e.g.|
|:----|:----|:----|:----|
| parse_int | string | int | parse_int("56") // Resulting integer 56 |
| parse_float | string | float | parse_float("12.3") // Resulting float 12.3 |
| parse_int | string | int | parse_int("56") // Results in integer 56 |
| parse_float | string | float | parse_float("12.3") // Results in float 12.3 |
#### Judgment Expressions
#### Conditional Expressions
Different data types have their respective ways of writing judgment expressions.
Different data types have their own ways of writing conditional expressions.
##### BOOL Type
##### BOOL type
You can use variables or the operator `!`. For example, for the field `"inuse": true`, you can write the following expressions:
You can use variables or the `!` operator, for example for the field "inuse": true, you can write the following expressions:
> 1. inuse
> 2. !inuse
##### Numeric Types (int/float)
##### Numeric types (int/float)
Numeric types support the comparison operators `==`, `!=`, `>`, `>=`, `<`, `<=`.
Numeric types support comparison operators `==`, `!=`, `>`, `>=`, `<`, `<=`.
##### String Types
##### String type
Use comparison operators to compare strings.
String Functions
String functions
|Function|Description|e.g.|
|:----|:----|:----|
@ -218,93 +226,95 @@ String Functions
| contains | checks if a certain character or sub-string occurs in the string | s.contains("substring") |
| starts_with | returns true if the string starts with a certain string | s.starts_with("prefix") |
| ends_with | returns true if the string ends with a certain string | s.ends_with("suffix") |
| len | returns the number of characters (not number of bytes) in the stringmust be used with comparison operator | s.len == 5 Determines whether the string length is 5; len as an attribute returns int, different from the first four functions, which directly return bool. |
| len | returns the number of characters (not number of bytes) in the string, must be used with comparison operator | s.len == 5 to check if the string length is 5; len as a property returns int, different from the first four functions which directly return bool. |
##### Composite Expressions
##### Compound Expressions
Multiple judgment expressions can be combined using logical operators (&&, ||, !).
For example, the following expression retrieves the data from smart meters installed in Beijing with a voltage greater than 200.
Multiple conditional expressions can be combined using logical operators (&&, ||, !).
For example, the following expression represents fetching data from smart meters installed in Beijing with a voltage value greater than 200.
> location.starts_with("beijing") && voltage > 200
### Mapping
Mapping refers to matching the **source fields** parsed, extracted, and split to the **target table fields**. It can be done directly or calculated through some rules before mapping to the target table.
Mapping is mapping the **source field** parsed, extracted, or split to the **target table field**. It can be directly mapped, or it can be mapped to the target table after some rule calculations.
#### Selecting Target Supertable
#### Selecting the target supertable
After selecting the target supertable, all tags and columns of the supertable will be loaded.
Source fields automatically use mapping rules to map to the target supertable's tags and columns based on their names.
For example, there is preview data after parsing, extracting, and splitting as follows:
The source field is automatically mapped to the tag and column of the target supertable using the mapping rule based on the name.
For example, the following parsed, extracted, or split preview data:
#### Mapping Rules
#### Mapping Rules <a name="expression"></a>
The supported mapping rules are shown in the table below:
The supported mapping rules are shown in the following table:
|rule|description|
|:----|:----|
| mapping | Direct mapping, requires selecting the mapping source field.|
| value | Constant; you can input string constants or numeric constants, and the constant value will be directly stored.|
| generator | Generator, currently only supports timestamp generator now, which will store the current time.|
| join | String concatenator, can specify connection characters to concatenate multiple source fields.|
| format | **String formatting tool**, fill in the formatting string. For example, if there are three source fields year, month, day representing year, month, and day respectively, and you want to store the date in yyyy-MM-dd format, you can provide the formatting string as `${year}-${month}-${day}`. Here `${}` serves as a placeholder, which can be a source field or a function processing a string-type field.|
| sum | Select multiple numeric fields for addition.|
| mapping | Direct mapping, need to select the mapping source field.|
| value | Constant, can enter string constants or numeric constants, the entered constant value is directly stored.|
| generator | Generator, currently only supports the timestamp generator now, which stores the current time when storing.|
| join | String connector, can specify connecting characters to concatenate selected multiple source fields.|
| format | **String formatting tool**, fill in the formatting string, for example, if there are three source fields year, month, day representing year, month, and day, and you wish to store them in the yyyy-MM-dd date format, you can provide a formatting string as `${year}-${month}-${day}`. Where `${}` acts as a placeholder, the placeholder can be a source field or a string type field function handling|
| sum | Select multiple numeric fields for addition calculation.|
| expr | **Numeric operation expression**, can perform more complex function processing and mathematical operations on numeric fields.|
##### Supported String Processing Functions in Format
##### Supported string processing functions in `format`
|Function|description|e.g.|
|:----|:----|:----|
| pad(len, pad_chars) | pads the string with a character or a string to at least a specified length | "1.2".pad(5, '0') // Resulting "1.200" |
|trim|trims the string of whitespace at the beginning and end|" abc ee ".trim() // Resulting "abc ee"|
|sub_string(start_pos, len)|extracts a sub-stringtwo parameters:<br />1. start position, counting from end if < 0<br />2. (optional) number of characters to extract, none if ≤ 0, to end if omitted|"012345678".sub_string(5) // "5678"<br />"012345678".sub_string(5, 2) // "56"<br />"012345678".sub_string(-2) // "78"|
| pad(len, pad_chars) | pads the string with a character or a string to at least a specified length | "1.2".pad(5, '0') // Result is "1.200" |
|trim|trims the string of whitespace at the beginning and end|" abc ee ".trim() // Result is "abc ee"|
|sub_string(start_pos, len)|extracts a sub-string, two parameters:<br />1. start position, counting from end if < 0<br />2. (optional) number of characters to extract, none if ≤ 0, to end if omitted|"012345678".sub_string(5) // "5678"<br />"012345678".sub_string(5, 2) // "56"<br />"012345678".sub_string(-2) // "78"|
|replace(substring, replacement)|replaces a sub-string with another|"012345678".replace("012", "abc") // "abc345678"|
##### expr Numeric Calculation Expressions
##### Mathematical expressions in `expr`
Basic mathematical operations support addition `+`, subtraction `-`, multiplication `*`, and division `/`.
For example, if the data source collects values in degrees, and the target database wants to store the temperature value in Fahrenheit, then the temperature data needs to be converted.
For example, if the data source collects temperature values in Celsius and the target database stores values in Fahrenheit, then the collected temperature data needs to be converted.
The source field parsed is `temperature`, and the expression `temperature * 1.8 + 32` should be used.
If the source field is `temperature`, then use the expression `temperature * 1.8 + 32`.
Numeric expressions also support mathematical functions, and the available mathematical functions are shown in the table below:
Mathematical expressions also support the use of mathematical functions, as shown in the table below:
|Function|description|e.g.|
|:----|:----|:----|
|sin、cos、tan、sinh、cosh|Trigonometry|a.sin() |
|asin、acos、atan、 asinh、acosh|arc-trigonometry|a.asin()|
|sin, cos, tan, sinh, cosh|Trigonometry|a.sin() |
|asin, acos, atan, asinh, acosh|arc-trigonometry|a.asin()|
|sqrt|Square root|a.sqrt() // 4.sqrt() == 2|
|exp|Exponential|a.exp()|
|lnlog|Logarithmic|a.ln() // e.ln() == 1<br />a.log() // 10.log() == 1|
|floor、ceiling、round、int、fraction|rounding|a.floor() // (4.2).floor() == 4<br />a.ceiling() // (4.2).ceiling() == 5<br />a.round() // (4.2).round() == 4<br />a.int() // (4.2).int() == 4<br />a.fraction() // (4.2).fraction() == 0.2|
|ln, log|Logarithmic|a.ln() // e.ln() == 1<br />a.log() // 10.log() == 1|
|floor, ceiling, round, int, fraction|rounding|a.floor() // (4.2).floor() == 4<br />a.ceiling() // (4.2).ceiling() == 5<br />a.round() // (4.2).round() == 4<br />a.int() // (4.2).int() == 4<br />a.fraction() // (4.2).fraction() == 0.2|
#### Subtable Name Mapping
#### Subtable name mapping
The subtable name is a string type, and you can define the subtable name using the string formatting format expression in the mapping rules.
Subtable names are strings and can be defined using the string formatting `format` expression in the mapping rules.
## Task Creation
## Creating a Task
Taking the MQTT data source as an example, this section describes how to create an MQTT-type task to consume data from the MQTT Broker and write it into TDengine.
Below, using MQTT data source as an example, we explain how to create a task of MQTT type, consume data from MQTT Broker, and write into TDengine.
1. Log in to taosExplorer, then click on "Data Writing" in the left navigation bar to enter the task list page.
1. After logging into taosExplorer, click on "Data Writing" on the left navigation bar to enter the task list page.
2. On the task list page, click "+ Add Data Source" to enter the task creation page.
3. After entering the task name, select the type as MQTT, and then you can create a new agent or select an existing agent.
4. Enter the IP address and port number of the MQTT broker, for example: 192.168.1.100:1883.
3. After entering the task name, select the type as MQTT, then you can create a new proxy or select an already created proxy.
4. Enter the IP address and port number of the MQTT broker, for example: 192.168.1.100:1883
5. Configure authentication and SSL encryption:
- If the MQTT broker has user authentication enabled, enter the username and password of the MQTT broker in the authentication section.
- If the MQTT broker has SSL encryption enabled, you can turn on the SSL certificate switch on the page and upload the CA certificate, as well as the client certificate and private key files.
6. In the "Acquisition Configuration" section, you can choose the version of the MQTT protocol, currently supporting versions 3.1, 3.1.1, and 5.0. When configuring the Client ID, note that if you create multiple tasks for the same MQTT broker, the Client IDs must be different; otherwise, it will cause Client ID conflicts, preventing the tasks from running properly. When configuring topics and QoS, you need to use the format `<topic name>::<QoS>`, where two colons separate the subscribed topic from QoS, with QoS values being 0, 1, or 2, representing at most once, at least once, and exactly once, respectively. After completing the above configuration, you can click the "Check Connectivity" button to check the configuration. If the connectivity check fails, please modify it according to the specific error prompts returned on the page.
7. During the synchronization of data from the MQTT broker, taosX also supports extracting, filtering, and mapping fields in the message body. In the text box below "Payload Transformation", you can directly input sample messages or upload files. In the future, it will also support directly retrieving sample messages from the configured server.
8. Currently, there are two ways to extract fields from the message body: JSON and regular expressions. For simple key/value format JSON data, you can directly click the extract button to display the parsed field names. For complex JSON data, you can use JSON Path to extract the fields of interest. When using regular expressions to extract fields, ensure the correctness of the regular expressions.
9. After the fields in the message body are parsed, you can set filtering rules based on the parsed field names. Only data meeting the filtering rules will be written to TDengine; otherwise, the message will be ignored. For example, you can configure the filtering rule as voltage > 200, meaning only data with voltage greater than 200V will be synchronized to TDengine.
10. Finally, after configuring the mapping rules between the fields in the message body and those in the supertable, you can submit the task. Besides basic mapping, you can also convert the values of fields in the message, for example, you can use expressions (expr) to calculate power from the voltage and current in the original message body before writing them into TDengine.
11. Once the task is submitted, you will be automatically returned to the task list page. If the submission is successful, the task status will switch to "Running." If the submission fails, you can check the task's activity log to find the error cause.
12. For running tasks, clicking the metrics view button allows you to see detailed running metrics for the task. The pop-up window is divided into two tabs, displaying the accumulated metrics from multiple runs of the task and the metrics for the current run. These metrics will automatically refresh every two seconds.
- If the MQTT broker has enabled user authentication, enter the username and password of the MQTT broker in the authentication section;
- If the MQTT broker has enabled SSL encryption, you can turn on the SSL certificate switch on the page and upload the CA's certificate, as well as the client's certificate and private key files;
6. In the "Collection Configuration" section, you can select the version of the MQTT protocol, currently supporting 3.1, 3.1.1, 5.0; when configuring the Client ID, be aware that if multiple tasks are created for the same MQTT broker, the Client IDs should be different to avoid conflicts, which could cause the tasks to not run properly; when configuring the topic and QoS, use the format `<topic name>::<QoS>`, where the QoS values range from 0, 1, 2, representing at most once, at least once, exactly once; after configuring the above information, you can click the "Check Connectivity" button to check the configurations, if the connectivity check fails, please modify according to the specific error tips returned on the page;
7. During the process of syncing data from the MQTT broker, taosX also supports extracting, filtering, and mapping operations on the fields in the message body. In the text box under "Payload Transformation", you can directly input a sample of the message body, or import it by uploading a file, and in the future, it will also support retrieving sample messages directly from the configured server;
8. For extracting fields from the message body, currently, two methods are supported: JSON and regular expressions. For simple key/value formatted JSON data, you can directly click the extract button to display the parsed field names; for complex JSON data, you can use JSON Path to extract the fields of interest; when using regular expressions to extract fields, ensure the correctness of the regular expressions;
9. After the fields in the message body are parsed, you can set filtering rules based on the parsed field names, and only data that meets the filtering rules will be written into TDengine, otherwise, the message will be ignored; for example, you can configure a filtering rule as voltage > 200, meaning only data with a voltage greater than 200V will be synced to TDengine;
10. Finally, after configuring the mapping rules between the fields in the message body and the fields in the supertable, you can submit the task; in addition to basic mapping, here you can also convert the values of the fields in the message, for example, you can use the expression (expr) to calculate the power from the original message body's voltage and current before writing it into TDengine;
11. After submitting the task, it will automatically return to the task list page, if the submission is successful, the status of the task will switch to "Running", if the submission fails, you can check the activity log of the task to find the error reason;
12. For tasks that are running, clicking the view button of the metrics allows you to view the detailed running metrics of the task, the popup window is divided into 2 tabs, displaying the cumulative metrics of the task's multiple runs and the metrics of this run, these metrics are automatically refreshed every 2 seconds.
## Task Management
On the task list page, you can also start, stop, view, delete, copy, and perform other operations on tasks, as well as check the running status of each task, including the number of records written, traffic, etc.
On the task list page, you can also start, stop, view, delete, copy, and other operations on tasks. You can also view the running status of each task, including the number of records written, traffic, etc.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -1,10 +1,9 @@
---
title: Advanced Features
description: 'TDengine Advanced Features'
slug: /advanced-features
---
This section describes the advanced features of TDengine, including data subscription, caching, and stream processing; edgecloud orchestration; and connectors for various data sources.
This chapter mainly introduces the advanced features of TDengine, such as data subscription, caching, stream computing, edge-cloud collaboration, and data access.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -21,74 +21,89 @@ import VerifyLinux from "../14-reference/05-connector/_verify_linux.mdx";
import VerifyMacOS from "../14-reference/05-connector/_verify_macos.mdx";
import VerifyWindows from "../14-reference/05-connector/_verify_windows.mdx";
TDengine provides a rich set of application development interfaces. To facilitate users in quickly developing their applications, TDengine supports various programming language connectors, including official connectors for C/C++, Java, Python, Go, Node.js, C#, Rust, Lua (community-contributed), and PHP (community-contributed). These connectors support connecting to TDengine clusters using native interfaces (taosc) and REST interfaces (not supported by some languages). Community developers have also contributed several unofficial connectors, such as the ADO.NET connector, Lua connector, and PHP connector. Additionally, TDengine can directly call the REST API provided by taosadapter for data writing and querying operations.
TDengine provides a rich set of application development interfaces. To facilitate users in quickly developing their applications, TDengine supports connectors for multiple programming languages. The official connectors include support for C/C++, Java, Python, Go, Node.js, C#, Rust, Lua (community contribution), and PHP (community contribution). These connectors support connecting to the TDengine cluster using the native interface (taosc) and REST interface (not supported in some languages yet). Community developers have also contributed several unofficial connectors, such as ADO.NET connector, Lua connector, and PHP connector. Additionally, TDengine can directly call the REST API provided by taosadapter for data writing and querying operations.
## Connection Methods
TDengine provides three methods for establishing connections:
1. Directly connect to the server program taosd using the client driver taosc; this method is referred to as "native connection."
2. Establish a connection to taosd via the REST API provided by the taosAdapter component; this method is referred to as "REST connection."
3. Establish a connection to taosd via the WebSocket API provided by the taosAdapter component; this method is referred to as "WebSocket connection."
1. Direct connection between the client driver taosc and the server program taosd, referred to as "native connection" in the text below.
2. Connection to taosd through the REST API provided by the taosAdapter component, referred to as "REST connection" in the text below.
3. Connection to taosd through the WebSocket API provided by the taosAdapter component, referred to as "WebSocket connection" in the text below.
<figure>
<Image img={imgConnect} alt="Connecting to TDengine"/>
<figcaption>Figure 1. Connecting to TDengine</figcaption>
</figure>
Regardless of the method used to establish a connection, the connectors provide similar API operations for databases and can execute SQL statements. The only difference lies in how the connection is initialized, and users should not notice any difference in usage. For various connection methods and language connector support, refer to: [Feature Support](../../tdengine-reference/client-libraries/#feature-support).
Regardless of the method used to establish the connection, the connectors provide the same or similar API to operate the database and can execute SQL statements. The initialization of the connection slightly differs, but users will not feel any difference in usage.
For various connection methods and language connector support, please refer to: [Connector Features](../../tdengine-reference/client-libraries/)
Key differences include:
1. For the native connection, it is necessary to ensure that the client driver taosc and the TDengine server version are compatible.
2. With the REST connection, users do not need to install the client driver taosc, which offers cross-platform ease of use; however, users cannot experience features like data subscription and binary data types. Moreover, REST connections have the lowest performance compared to native and WebSocket connections. The REST API is stateless, and when using REST connections, users must specify the database name for tables and supertables in SQL.
3. For the WebSocket connection, users also do not need to install the client driver taosc.
4. To connect to cloud service instances, users must use REST or WebSocket connections.
1. Using native connection requires ensuring that the client driver taosc and the server's TDengine version are compatible.
2. Using REST connection does not require installing the client driver taosc, offering the advantage of cross-platform ease of use, but it lacks features like data subscription and binary data types. Additionally, compared to native and WebSocket connections, the performance of REST connections is the lowest. REST interfaces are stateless. When using REST connections, it is necessary to specify the database names of tables and supertables in SQL.
3. Using WebSocket connection also does not require installing the client driver taosc.
4. Connecting to cloud service instances must use REST connection or WebSocket connection.
**It is recommended to use WebSocket connections.**
**WebSocket connection is recommended**
## Install the Client Driver taosc
## Installing the Client Driver taosc
If you choose the native connection and the application is not running on the same server as TDengine, you need to install the client driver first; otherwise, this step can be skipped. To avoid incompatibility between the client driver and server, please use the same version.
If you choose a native connection and your application is not running on the same server as TDengine, you need to install the client driver first; otherwise, you can skip this step. To avoid incompatibility between the client driver and the server, please use consistent versions.
### Installation Steps
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<InstallOnLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<InstallOnWindows />
</TabItem>
<TabItem value="macos" label="macOS">
<InstallOnMacOS />
</TabItem>
<TabItem value="linux" label="Linux">
<InstallOnLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<InstallOnWindows />
</TabItem>
<TabItem value="macos" label="macOS">
<InstallOnMacOS />
</TabItem>
</Tabs>
### Installation Verification
After the installation and configuration are complete, and ensuring that the TDengine service is running normally, you can execute the TDengine command-line program taos included in the installation package to log in.
After completing the above installation and configuration, and confirming that the TDengine service has started running normally, you can log in using the TDengine command-line program `taos` included in the installation package.
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<VerifyLinux />
</TabItem>
<TabItem value="linux" label="Linux">
<TabItem value="windows" label="Windows">
<VerifyWindows />
</TabItem>
<VerifyLinux />
<TabItem value="macos" label="macOS">
<VerifyMacOS />
</TabItem>
</TabItem>
<TabItem value="windows" label="Windows">
<VerifyWindows />
</TabItem>
<TabItem value="macos" label="macOS">
<VerifyMacOS />
</TabItem>
</Tabs>
## Install Connectors
## Installing Connectors
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
If you are using Maven to manage the project, simply add the following dependency to the pom.xml.
If you are using Maven to manage your project, simply add the following dependency to your pom.xml.
```xml
<dependency>
@ -102,60 +117,62 @@ If you are using Maven to manage the project, simply add the following dependenc
<TabItem label="Python" value="python">
- **Installation Prerequisites**
- Install Python. The latest version of the taospy package requires Python 3.6.2+. Earlier versions require Python 3.7+. The taos-ws-py package requires Python 3.7+. If Python is not already installed on your system, refer to the [Python BeginnersGuide](https://wiki.python.org/moin/BeginnersGuide/Download) for installation.
- Install [pip](https://pypi.org/project/pip/). Most Python installation packages come with the pip tool. If not, refer to the [pip documentation](https://pip.pypa.io/en/stable/installation/) for installation.
- If you are using the native connection, you also need to [install the client driver](#install-the-client-driver-taosc). The client software includes the TDengine client dynamic link library (libtaos.so or taos.dll) and the TDengine CLI.
- **Install using pip**
- **Pre-installation Preparation**
- Install Python. Recent versions of the taospy package require Python 3.6.2+. Earlier versions of the taospy package require Python 3.7+. The taos-ws-py package requires Python 3.7+. If Python is not already installed on your system, refer to [Python BeginnersGuide](https://wiki.python.org/moin/BeginnersGuide/Download) for installation.
- Install [pip](https://pypi.org/project/pip/). In most cases, the Python installation package comes with the pip tool; if not, refer to the [pip documentation](https://pip.pypa.io/en/stable/installation/) for installation.
- If using a native connection, you also need to [install the client driver](../connecting-to-tdengine/). The client software package includes the TDengine client dynamic link library (libtaos.so or taos.dll) and TDengine CLI.
- **Using pip to Install**
- Uninstall old versions
If you previously installed an older version of the Python connector, please uninstall it first.
If you have previously installed old versions of the Python connector, please uninstall them first.
```shell
pip3 uninstall taos taospy
pip3 uninstall taos taos-ws-py
```
- Install `taospy`
- Latest version
```shell
pip3 uninstall taos taospy
pip3 uninstall taos taos-ws-py
pip3 install taospy
```
- Install `taospy`
- Latest version
- Install a specific version
```shell
pip3 install taospy
```
```shell
pip3 install taospy==2.3.0
```
- Install a specific version
- Install from GitHub
```shell
pip3 install taospy==2.3.0
```
```shell
pip3 install git+https://github.com/taosdata/taos-connector-python.git
```
- Install from GitHub
Note: This package is for native connection
- Install `taos-ws-py`
```shell
pip3 install git+https://github.com/taosdata/taos-connector-python.git
```
```bash
pip3 install taos-ws-py
```
:::note This installation package is for the native connector.
- Install `taos-ws-py`
Note: This package is for WebSocket connection
- Install both `taospy` and `taos-ws-py`
```shell
pip3 install taos-ws-py
```
```bash
pip3 install taospy[ws]
```
:::
:::note This installation package is for the WebSocket connector.
- Install both `taospy` and `taos-ws-py`
</TabItem>
</Tabs>
```shell
pip3 install taospy[ws]
```
:::
- **Installation Verification**
<Tabs defaultValue="rest">
<TabItem value="native" label="Native Connection">
For the native connection, verify that both the client driver and the Python connector are correctly installed. If you can successfully import the `taos` module, it indicates that the client driver and Python connector have been correctly installed. You can type the following in the Python interactive shell:
For native connections, it is necessary to verify that both the client driver and the Python connector itself are correctly installed. If the `taos` module can be successfully imported, then the client driver and Python connector are correctly installed. You can enter in the Python interactive Shell:
```python
import taos
@ -164,8 +181,7 @@ import taos
</TabItem>
<TabItem value="rest" label="REST Connection">
For the REST connection, you only need to verify whether you can successfully import the `taosrest` module. You can type the following in the Python interactive shell:
For REST connections, you only need to verify if the `taosrest` module can be successfully imported. You can enter in the Python interactive Shell:
```python
import taosrest
@ -174,7 +190,8 @@ import taosrest
</TabItem>
<TabItem value="ws" label="WebSocket Connection">
For the WebSocket connection, you only need to verify whether you can successfully import the `taosws` module. You can type the following in the Python interactive shell:
For WebSocket connections, you only need to verify if the `taosws` module can be successfully imported. You can enter in the Python interactive Shell:
```python
import taosws
@ -182,8 +199,8 @@ import taosws
</TabItem>
</Tabs>
</TabItem>
<Tabs>
<TabItem label="Go" value="go">
Edit `go.mod` to add the `driver-go` dependency.
@ -197,7 +214,8 @@ require github.com/taosdata/driver-go/v3 latest
```
:::note
driver-go uses cgo to wrap the taosc API. cgo requires GCC to compile C source code. Therefore, ensure that your system has GCC installed.
driver-go uses cgo to wrap the taosc API. cgo requires GCC to compile C source code. Therefore, make sure GCC is installed on your system.
:::
@ -213,7 +231,8 @@ taos = { version = "*"}
```
:::info
The Rust connector differentiates between connection methods through different features. By default, it supports both native and WebSocket connections. If you only need to establish a WebSocket connection, you can set the `ws` feature:
The Rust connector distinguishes different connection methods through different features. It supports both native and WebSocket connections by default. If only a WebSocket connection is needed, set the `ws` feature:
```toml
taos = { version = "*", default-features = false, features = ["ws"] }
@ -225,34 +244,35 @@ taos = { version = "*", default-features = false, features = ["ws"] }
<TabItem label="Node.js" value="node">
- **Installation Prerequisites**
- Install the Node.js development environment, using version 14 or higher. [Download link](https://nodejs.org/en/download/)
- **Pre-installation Preparation**
- Install the Node.js development environment, using version 14 or above. Download link: [https://nodejs.org/en/download/](https://nodejs.org/en/download/)
- **Installation**
- Install the Node.js connector using npm
- Use npm to install the Node.js connector
```shell
npm install @tdengine/websocket
```
```shell
npm install @tdengine/websocket
```
Note: Node.js currently only supports WebSocket connections
:::note Node.js currently only supports WebSocket connections.
- **Installation Verification**
- Create an installation verification directory, for example: `~/tdengine-test`, and download the [nodejsChecker.js source code](https://github.com/taosdata/TDengine/tree/main/docs/examples/node/websocketexample/nodejsChecker.js) from GitHub to your local machine.
- Execute the following command in the terminal.
- Create a verification directory, for example: `~/tdengine-test`, download the [nodejsChecker.js source code](https://github.com/taosdata/TDengine/tree/main/docs/examples/node/websocketexample/nodejsChecker.js) from GitHub to local.
- Execute the following commands in the command line.
```bash
npm init -y
npm install @tdengine/websocket
node nodejsChecker.js
```
```bash
npm init -y
npm install @tdengine/websocket
node nodejsChecker.js
```
- After executing the above steps, the command line will output the results of connecting to the TDengine instance and performing a simple insert and query.
- After performing the above steps, the command line will output the results of nodeChecker.js connecting to the TDengine instance and performing simple insertion and query operations.
</TabItem>
<TabItem label="C#" value="csharp">
Add the reference for [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) in the project configuration file:
Edit the project configuration file to add a reference to [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/):
```xml title=csharp.csproj
<Project Sdk="Microsoft.NET.Sdk">
@ -272,14 +292,15 @@ Add the reference for [TDengine.Connector](https://www.nuget.org/packages/TDengi
</Project>
```
You can also add it using the dotnet command:
You can also add it via the dotnet command:
```shell
dotnet add package TDengine.Connector
```
:::note
The following example code is based on dotnet 6.0; if you are using other versions, you may need to make appropriate adjustments.
The following example code is based on dotnet6.0. If you are using another version, you may need to make appropriate adjustments.
:::
@ -287,89 +308,86 @@ The following example code is based on dotnet 6.0; if you are using other versio
<TabItem label="C" value="c">
If you have already installed the TDengine server software or the TDengine client driver taosc, then the C connector is already installed, and no additional action is required.
If you have already installed the TDengine server software or the TDengine client driver taosc, then the C connector is already installed and no additional action is required.
</TabItem>
<TabItem label="REST API" value="rest">
Using the REST API to access TDengine does not require the installation of any drivers or connectors.
To access TDengine using the REST API method, no drivers or connectors need to be installed.
</TabItem>
</Tabs>
## Establishing Connections
## Establishing Connection
Before executing this step, ensure that there is a running and accessible TDengine instance, and that the server's FQDN is configured correctly. The following example code assumes that TDengine is installed on the local machine, with the FQDN (default localhost) and serverPort (default 6030) using default configurations.
Before proceeding with this step, please ensure that there is a running TDengine that can be accessed, and that the server's FQDN is configured correctly. The following example code assumes that TDengine is installed on the local machine, and that the FQDN (default localhost) and serverPort (default 6030) are using the default configuration.
### Connection Parameters
There are many configuration items for the connection. Before establishing the connection, we can introduce the parameters used by each language connector to establish a connection.
There are many configuration options for connecting, so before establishing a connection, let's first introduce the parameters used by the connectors of each language to establish a connection.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
The parameters for establishing a connection with the Java connector include URL and Properties.
The parameters for establishing a connection with the Java connector are URL and Properties.
The JDBC URL format for TDengine is: `jdbc:[TAOS|TAOS-WS|TAOS-RS]://[host_name]:[port]/[database_name]?[user={user}|&password={password}|&charset={charset}|&cfgdir={config_dir}|&locale={locale}|&timezone={timezone}|&batchfetch={batchfetch}]`
The standard format for TDengine's JDBC URL is: `jdbc:[TAOS|TAOS-RS]://[host_name]:[port]/[database_name]?[user={user}|&password={password}|&charset={charset}|&cfgdir={config_dir}|&locale={locale}|&timezone={timezone}|&batchfetch={batchfetch}]`
For detailed parameter descriptions of URL and Properties, and how to use them, refer to [URL Specification](../../tdengine-reference/client-libraries/java/#url-specification).
**Note**: Adding the `batchfetch` parameter and setting it to true in REST connections will enable the WebSocket connection.
For detailed explanations of URL and Properties parameters and how to use them, see [URL specifications](../../tdengine-reference/client-libraries/java/)
</TabItem>
<TabItem label="Python" value="python">
The Python connector uses the `connect()` method to establish a connection. The specific descriptions of the connection parameters are as follows:
The Python connector uses the `connect()` method to establish a connection, here are the specific parameters for the connection:
- url: The URL of the `taosAdapter` REST service. The default is port `6041` on `localhost`.
- user: The TDengine username. The default is `root`.
- password: The TDengine user password. The default is `taosdata`.
- timeout: The HTTP request timeout, in seconds. The default is `socket._GLOBAL_DEFAULT_TIMEOUT`, which usually does not need to be configured.
- url: URL of the `taosAdapter` REST service. The default is port `6041` on `localhost`.
- user: TDengine username. The default is `root`.
- password: TDengine user password. The default is `taosdata`.
- timeout: HTTP request timeout in seconds. The default is `socket._GLOBAL_DEFAULT_TIMEOUT`. Generally, no configuration is needed.
</TabItem>
<TabItem label="Go" value="go">
The data source name has a general format, such as [PEAR DB](http://pear.php.net/manual/en/package.database.db.intro-dsn.php), but without a type prefix (the brackets indicate that it is optional):
``` text
[username[:password]@][protocol[(address)]]/[dbname][?param1=value1&...&paramN=valueN]
```
The complete form of the DSN:
The data source name has a generic format, similar to [PEAR DB](http://pear.php.net/manual/en/package.database.db.intro-dsn.php), but without the type prefix (brackets indicate optional):
```text
username:password@protocol(address)/dbname?param=value
[username[:password]@][protocol[(address)]]/[dbname][?param1=value1&...&paramN=valueN]
```
Supported DSN parameters include:
Complete DSN format:
For native connections:
```text
username:password@protocol(address)/dbname?param=value
```
- `cfg`: Specifies the taos.cfg directory.
- `cgoThread`: Specifies the number of cgo threads to execute concurrently, defaulting to the number of system cores.
- `cgoAsyncHandlerPoolSize`: Specifies the size of the async function handler, defaulting to 10,000.
Supported DSN parameters are as follows:
For REST connections:
Native connection:
- `disableCompression`: Whether to accept compressed data; default is true (does not accept compressed data). If using gzip compression for transmission, set to false.
- `readBufferSize`: The size of the read data buffer, defaulting to 4K (4096). This value can be increased for larger query results.
- `token`: The token used when connecting to cloud services.
- `skipVerify`: Whether to skip certificate verification; default is false (does not skip verification). Set to true if connecting to an insecure service.
- `cfg` specifies the taos.cfg directory
- `cgoThread` specifies the number of cgo operations that can be executed concurrently, default is the number of system cores
- `cgoAsyncHandlerPoolSize` specifies the size of the async function handler, default is 10000
For WebSocket connections:
REST connection:
- `enableCompression`: Whether to send compressed data; default is false (does not send compressed data). Set to true if using compression.
- `readTimeout`: The read timeout for data, defaulting to 5m.
- `writeTimeout`: The write timeout for data, defaulting to 10s.
- `disableCompression` whether to accept compressed data, default is true which means not accepting compressed data, set to false if data transmission uses gzip compression.
- `readBufferSize` the size of the buffer for reading data, default is 4K (4096), this value can be increased appropriately when the query result data volume is large.
- `token` the token used when connecting to cloud services.
- `skipVerify` whether to skip certificate verification, default is false which means not skipping certificate verification, set to true if connecting to an insecure service.
WebSocket connection:
- `enableCompression` whether to send compressed data, default is false which means not sending compressed data, set to true if data transmission uses compression.
- `readTimeout` the timeout for reading data, default is 5m.
- `writeTimeout` the timeout for writing data, default is 10s.
</TabItem>
<TabItem label="Rust" value="rust">
The Rust connector uses DSN to create connections. The basic structure of the DSN description string is as follows:
Rust connector uses DSN to create connections, the basic structure of the DSN description string is as follows:
```text
<driver>[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
@ -377,37 +395,36 @@ The Rust connector uses DSN to create connections. The basic structure of the DS
|driver| protocol | | username | password | host | port | database | params |
```
For detailed DSN explanations and usage, refer to [Connection Functionality](../../tdengine-reference/client-libraries/rust/#connection-functionality).
For detailed explanation of DSN and how to use it, see [Connection Features](../../tdengine-reference/client-libraries/rust/)
</TabItem>
<TabItem label="Node.js" value="node">
The Node.js connector uses DSN to create connections. The basic structure of the DSN description string is as follows:
Node.js connector uses DSN to create connections, the basic structure of the DSN description string is as follows:
```text
[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
|------------|---|-----------|-----------|------|------|------------|-----------------------|
| protocol | | username | password | host | port | database | params |
[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
|------------|---|-----------|-----------|------|------|------------|-----------------------|
| protocol | | username | password | host | port | database | params |
```
- **protocol**: Establish a connection using the WebSocket protocol. For example, `ws://localhost:6041`.
- **username/password**: The database username and password.
- **host/port**: The host address and port number. For example, `localhost:6041`.
- **database**: The database name.
- **params**: Other parameters, such as token.
- **protocol**: Establish a connection using the websocket protocol. For example, `ws://localhost:6041`
- **username/password**: Username and password for the database.
- **host/port**: Host address and port number. For example, `localhost:6041`
- **database**: Database name.
- **params**: Other parameters. For example, token.
- Complete DSN example:
```js
ws://root:taosdata@localhost:6041
```
```js
ws://root:taosdata@localhost:6041
```
</TabItem>
<TabItem label="C#" value="csharp">
The ConnectionStringBuilder sets connection parameters using a key-value approach, where the key is the parameter name and the value is the parameter value, separated by semicolons `;`.
ConnectionStringBuilder uses a key-value pair method to set connection parameters, where key is the parameter name and value is the parameter value, separated by a semicolon `;`.
For example:
@ -415,34 +432,36 @@ For example:
"protocol=WebSocket;host=127.0.0.1;port=6041;useSSL=false"
```
Supported parameters include:
Supported parameters are as follows:
- `host`: The address of the TDengine instance.
- `port`: The port of the TDengine instance.
- `username`: The username for the connection.
- `password`: The password for the connection.
- `protocol`: The connection protocol, with optional values of Native or WebSocket, defaulting to Native.
- `db`: The connected database.
- `timezone`: The timezone, defaulting to the local timezone.
- `connTimeout`: The connection timeout, defaulting to 1 minute.
- `username`: Username for the connection.
- `password`: Password for the connection.
- `protocol`: Connection protocol, options are Native or WebSocket, default is Native.
- `db`: Database to connect to.
- `timezone`: Time zone, default is the local time zone.
- `connTimeout`: Connection timeout, default is 1 minute.
WebSocket connections also support the following parameters:
Additional parameters supported for WebSocket connections:
- `readTimeout`: The read timeout, defaulting to 5 minutes.
- `writeTimeout`: The send timeout, defaulting to 10 seconds.
- `token`: The token for connecting to TDengine cloud.
- `useSSL`: Whether to use SSL for the connection, defaulting to false.
- `enableCompression`: Whether to enable WebSocket compression, defaulting to false.
- `autoReconnect`: Whether to automatically reconnect, defaulting to false.
- `reconnectRetryCount`: The number of reconnection attempts, defaulting to 3.
- `reconnectIntervalMs`: The reconnection interval in milliseconds, defaulting to 2000.
- `readTimeout`: Read timeout, default is 5 minutes.
- `writeTimeout`: Send timeout, default is 10 seconds.
- `token`: Token for connecting to TDengine cloud.
- `useSSL`: Whether to use SSL connection, default is false.
- `enableCompression`: Whether to enable WebSocket compression, default is false.
- `autoReconnect`: Whether to automatically reconnect, default is false.
- `reconnectRetryCount`: Number of retries for reconnection, default is 3.
- `reconnectIntervalMs`: Reconnection interval in milliseconds, default is 2000.
-
</TabItem>
<TabItem label="C" value="c">
**WebSocket Connection**
The C/C++ language connector uses the `ws_connect()` function to establish a connection to the TDengine database. Its parameter is a DSN description string with the following basic structure:
**WebSocket Connection**
For C/C++ language connectors, the WebSocket connection uses the `ws_connect()` function to establish a connection with the TDengine database. Its parameter is a DSN description string, structured as follows:
```text
<driver>[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
@ -450,35 +469,37 @@ The C/C++ language connector uses the `ws_connect()` function to establish a con
|driver| protocol | | username | password | host | port | database | params |
```
For detailed DSN explanations and usage, refer to [DSN](../../tdengine-reference/client-libraries/cpp/#dsn).
For detailed explanation of DSN and how to use it, see [Connection Features](../../tdengine-reference/client-libraries/cpp/#dsn)
**Native Connection**
The C/C++ language connector uses the `taos_connect()` function to establish a connection to the TDengine database. The detailed parameter descriptions are as follows:
**Native Connection**
- `host`: The hostname or IP address of the database server to connect to. If it is a local database, you can use `"localhost"`.
- `user`: The username used to log in to the database.
- `passwd`: The password corresponding to the username.
- `db`: The default database name to select when connecting. If not specified, you can pass `NULL` or an empty string.
- `port`: The port number that the database server listens on. The default port number is `6030`.
For C/C++ language connectors, the native connection method uses the `taos_connect()` function to establish a connection with the TDengine database. Detailed parameters are as follows:
There is also the `taos_connect_auth()` function for establishing a connection to the TDengine database using an MD5-encrypted password. This function works the same as `taos_connect`, but the difference lies in how the password is handled; `taos_connect_auth` requires the MD5 hash of the password.
- `host`: Hostname or IP address of the database server to connect to. If it is a local database, `"localhost"` can be used.
- `user`: Username for logging into the database.
- `passwd`: Password corresponding to the username.
- `db`: Default database name when connecting. If no database is specified, pass `NULL` or an empty string.
- `port`: Port number the database server listens on. The default port number is `6030`.
The `taos_connect_auth()` function is also provided for establishing a connection with the TDengine database using an MD5 encrypted password. This function is similar to `taos_connect`, but differs in the handling of the password, as `taos_connect_auth` requires the MD5 encrypted string of the password.
</TabItem>
<TabItem label="REST API" value="rest">
When accessing TDengine via the REST API, the application directly establishes an HTTP connection with taosAdapter. It is recommended to use a connection pool to manage connections.
When accessing TDengine via REST API, the application directly establishes an HTTP connection with taosAdapter, and it is recommended to use a connection pool to manage connections.
For specific parameters used in the REST API, refer to: [HTTP Request Format](../../tdengine-reference/client-libraries/rest-api/#http-request-format).
For specific parameters using the REST API, refer to: [HTTP request format](../../tdengine-reference/client-libraries/rest-api/)
</TabItem>
</Tabs>
### WebSocket Connection
Below are code samples for establishing a WebSocket connection using each language connector. They demonstrate how to connect to the TDengine database using the WebSocket connection method and set some parameters for the connection. The entire process mainly involves establishing the database connection and handling exceptions.
Below are code examples for establishing WebSocket connections in various language connectors. It demonstrates how to connect to the TDengine database using WebSocket and set some parameters for the connection. The whole process mainly involves establishing the database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
@ -544,7 +565,7 @@ Not supported
### Native Connection
Below are code samples for establishing a native connection using each language connector. They demonstrate how to connect to the TDengine database using the native connection method and set some parameters for the connection. The entire process mainly involves establishing the database connection and handling exceptions.
Below are examples of code for establishing native connections in various languages. It demonstrates how to connect to the TDengine database using a native connection method and set some parameters for the connection. The entire process mainly involves establishing a database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
@ -556,7 +577,9 @@ Below are code samples for establishing a native connection using each language
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
@ -590,7 +613,9 @@ Not supported
</TabItem>
<TabItem label="C" value="c">
<ConnC />
<ConnC />
</TabItem>
<TabItem label="REST API" value="rest">
@ -602,7 +627,7 @@ Not supported
### REST Connection
Below are code samples for establishing a REST connection using each language connector. They demonstrate how to connect to the TDengine database using the REST connection method. The entire process mainly involves establishing the database connection and handling exceptions.
Below are examples of code for establishing REST connections in various languages. It demonstrates how to connect to the TDengine database using a REST connection method. The entire process mainly involves establishing a database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
@ -655,56 +680,56 @@ Not supported
<TabItem label="REST API" value="rest">
Using the REST API to access TDengine allows the application to independently establish HTTP connections.
Access TDengine using the REST API method, where the application independently establishes an HTTP connection.
</TabItem>
</Tabs>
:::tip
If the connection fails, in most cases, it is due to incorrect FQDN or firewall configuration. For detailed troubleshooting methods, refer to [Frequently Asked Questions](../../frequently-asked-questions/) under "If I encounter the error Unable to establish connection, what should I do?"
If the connection fails, in most cases it is due to incorrect FQDN or firewall settings. For detailed troubleshooting methods, please see ["Encountering the error 'Unable to establish connection, what should I do?'"](../../frequently-asked-questions/) in the "Common Questions and Feedback".
:::
## Connection Pool
Some connectors provide connection pools or can work with existing connection pool components. Using a connection pool allows applications to quickly obtain available connections from the pool, avoiding the overhead of creating and destroying connections for each operation. This not only reduces resource consumption but also improves response speed. Additionally, connection pools support connection management, such as limiting the maximum number of connections and checking connection validity, ensuring efficient and reliable use of connections. We **recommend using connection pools to manage connections**.
Below are code samples for connection pool support in various language connectors.
Some connectors offer a connection pool, or can be used in conjunction with existing connection pool components. By using a connection pool, applications can quickly obtain available connections from the pool, avoiding the overhead of creating and destroying connections with each operation. This not only reduces resource consumption but also improves response speed. Additionally, connection pools support the management of connections, such as limiting the maximum number of connections and checking the validity of connections, ensuring efficient and reliable use of connections. We **recommend managing connections using a connection pool**.
Below are code examples of connection pool support for various language connectors.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
**HikariCP**
Usage example:
Example usage is as follows:
```java
{{#include docs/examples/java/src/main/java/com/taos/example/HikariDemo.java:connection_pool}}
```
> After obtaining a connection via HikariDataSource.getConnection(), you need to call the close() method after use; it does not actually close the connection, but returns it to the pool.
> For more issues related to HikariCP usage, refer to the [official documentation](https://github.com/brettwooldridge/HikariCP).
> After obtaining a connection through HikariDataSource.getConnection(), you need to call the close() method after use, which actually does not close the connection but returns it to the pool.
> For more issues about using HikariCP, please see the [official documentation](https://github.com/brettwooldridge/HikariCP).
**Druid**
Usage example:
Example usage is as follows:
```java
{{#include docs/examples/java/src/main/java/com/taos/example/DruidDemo.java:connection_pool}}
```
> For more issues related to Druid usage, refer to the [official documentation](https://github.com/alibaba/druid).
> For more issues about using Druid, please see the [official documentation](https://github.com/alibaba/druid).
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
Using `sql.Open`, the created connection already implements a connection pool. You can set connection pool parameters via the API, as shown below:
Using `sql.Open` creates a connection that has already implemented a connection pool, and you can set connection pool parameters through the API, as shown in the example below
```go
{{#include docs/examples/go/connect/connpool/main.go:pool}}
@ -714,15 +739,18 @@ Using `sql.Open`, the created connection already implements a connection pool. Y
<TabItem label="Rust" value="rust">
In complex applications, it is recommended to enable the connection pool. The [taos] connection pool, by default (asynchronous mode), uses [deadpool] for implementation.
In complex applications, it is recommended to enable connection pooling. The connection pool for [taos] by default (in asynchronous mode) is implemented using [deadpool].
Here is how to generate a connection pool with default parameters.
Below, you can create a connection pool with default parameters.
```rust
let pool: Pool<TaosBuilder> = TaosBuilder::from_dsn("taos:///").unwrap().pool().unwrap();
let pool: Pool<TaosBuilder> = TaosBuilder::from_dsn("taos:///")
.unwrap()
.pool()
.unwrap();
```
You can also use the pool constructor to set connection pool parameters:
You can also use the connection pool builder to set the connection pool parameters:
```rust
let pool: Pool<TaosBuilder> = Pool::builder(Manager::from_dsn(self.dsn.clone()).unwrap().0)
@ -731,7 +759,7 @@ let pool: Pool<TaosBuilder> = Pool::builder(Manager::from_dsn(self.dsn.clone()).
.unwrap();
```
In the application code, use `pool.get()?` to obtain a connection object from [Taos].
In your application code, use `pool.get()?` to obtain a connection object [Taos].
```rust
let taos = pool.get()?;

View File

@ -7,20 +7,21 @@ slug: /developer-guide/running-sql-statements
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
TDengine provides comprehensive support for SQL, allowing users to perform data queries, inserts, and deletions using familiar SQL syntax. TDengine's SQL also supports database and table management operations, such as creating, modifying, and deleting databases and tables. TDengine extends standard SQL by introducing features specific to time-series data processing, such as aggregation queries, downsampling, and interpolation queries, to accommodate the characteristics of time-series data. These extensions enable users to handle time-series data more efficiently and conduct complex data analysis and processing. For specific supported SQL syntax, please refer to [TDengine SQL](../../tdengine-reference/sql-manual/).
TDengine provides comprehensive support for the SQL language, allowing users to query, insert, and delete data using familiar SQL syntax. TDengine's SQL also supports database and table management operations, such as creating, modifying, and deleting databases and tables. TDengine extends standard SQL by introducing features unique to time-series data processing, such as aggregation queries, downsampling, and interpolation queries, to adapt to the characteristics of time-series data. These extensions enable users to process time-series data more efficiently and perform complex data analysis and processing. For specific supported SQL syntax, please refer to [TDengine SQL](../../tdengine-reference/sql-manual/)
Below is an introduction to how to use various language connectors to execute SQL commands for creating databases, creating tables, inserting data, and querying data.
Below, we introduce how to use language connectors to execute SQL for creating databases, tables, writing data, and querying data.
:::note
REST connection: Each programming language's connector encapsulates the connection using `HTTP` requests, supporting data writing and querying operations. Developers still access `TDengine` through the interfaces provided by the connector.
REST API: Directly calls the REST API interface provided by `taosadapter` to perform data writing and querying operations. Code examples demonstrate using the `curl` command.
REST connection: Connectors for various programming languages encapsulate the use of `HTTP` requests for connections, supporting data writing and querying operations, with developers still using the interfaces provided by the connectors to access `TDengine`.
REST API: Directly call the REST API interface provided by `taosadapter` for data writing and querying operations. Code examples use the `curl` command for demonstration.
:::
## Create Database and Table
## Creating Databases and Tables
Using a smart meter as an example, below demonstrates how to execute SQL commands using various language connectors to create a database named `power` and then set `power` as the default database. Next, it creates a supertable named `meters`, with columns including timestamp, current, voltage, phase, and tags for group ID and location.
Below, using smart meters as an example, we show how to use language connectors to execute SQL commands to create a database named `power`, then use the `power` database as the default database.
Next, create a supertable (STABLE) named `meters`, whose table structure includes columns for timestamp, current, voltage, phase, etc., and labels for group ID and location.
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
@ -30,10 +31,9 @@ Using a smart meter as an example, below demonstrates how to execute SQL command
```
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
```python title="WebSocket Connection"
{{#include docs/examples/python/create_db_ws.py}}
```
@ -41,20 +41,16 @@ Using a smart meter as an example, below demonstrates how to execute SQL command
{{#include docs/examples/python/create_db_native.py}}
```
```python title="REST Connection"
```python title="Rest Connection"
{{#include docs/examples/python/create_db_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:create_db_and_table}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -62,35 +58,27 @@ Using a smart meter as an example, below demonstrates how to execute SQL command
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:create_db_and_table}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:create_db_and_table}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
```c title="WebSocket Connection"
{{#include docs/examples/c-ws/create_db_demo.c:create_db_and_table}}
```
```c title="Native Connection"
```c title="Native Connection"
{{#include docs/examples/c/create_db_demo.c:create_db_and_table}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Create Database
@ -100,7 +88,7 @@ curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
--data 'CREATE DATABASE IF NOT EXISTS power'
```
Create Table, specifying the database as `power` in the URL
Create Table, specify the database as `power` in the URL
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql/power' \
@ -109,35 +97,25 @@ curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql/power' \
</TabItem>
</Tabs>
:::note
It is recommended to construct SQL statements using the `<dbName>.<tableName>` format; using the `USE DBName` approach in the application is not recommended.
:::
> **Note**: It is recommended to construct SQL statements in the format of `<dbName>.<tableName>`. It is not recommended to use `USE DBName` in applications.
## Insert Data
Using a smart meter as an example, below demonstrates how to execute SQL to insert data into the `meters` supertable in the `power` database. The example uses TDengine's automatic table creation SQL syntax to write 3 data entries into the `d1001` subtable and 1 data entry into the `d1002` subtable, and then prints the actual number of inserted data entries.
Below, using smart meters as an example, demonstrates how to use connectors to execute SQL to insert data into the `power` database's `meters` supertable. The example uses TDengine's auto table creation SQL syntax, writes 3 records into the d1001 subtable, writes 1 record into the d1002 subtable, and then prints the actual number of records inserted.
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcInsertDataDemo.java:insert_data}}
```
:::note
NOW is an internal function that defaults to the current time of the client's computer. NOW + 1s means the client's current time plus 1 second; the number after represents the time unit: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), n (months), y (years).
:::
**Note**
NOW is an internal system function, defaulting to the current time of the client's computer. NOW + 1s represents the client's current time plus 1 second, with the number following representing the time unit: a (millisecond), s (second), m (minute), h (hour), d (day), w (week), n (month), y (year).
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
```python title="WebSocket Connection"
{{#include docs/examples/python/insert_ws.py}}
```
@ -145,20 +123,16 @@ NOW is an internal function that defaults to the current time of the client's co
{{#include docs/examples/python/insert_native.py}}
```
```python title="REST Connection"
```python title="Rest Connection"
{{#include docs/examples/python/insert_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:insert_data}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -166,26 +140,19 @@ NOW is an internal function that defaults to the current time of the client's co
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:insertData}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:insert_data}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
```c title="WebSocket Connection"
{{#include docs/examples/c-ws/insert_data_demo.c:insert_data}}
```
@ -193,17 +160,12 @@ NOW is an internal function that defaults to the current time of the client's co
{{#include docs/examples/c/insert_data_demo.c:insert_data}}
```
:::note
NOW is an internal function that defaults to the current time of the client's computer. NOW + 1s means the client's current time plus 1 second; the number after represents the time unit: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), n (months), y (years).
:::
**Note**
NOW is an internal system function, defaulting to the current time of the client's computer. NOW + 1s represents the client's current time plus 1 second, where the number is followed by a time unit: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), n (months), y (years).
</TabItem>
<TabItem label="REST API" value="rest">
Write Data
Write data
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
@ -213,9 +175,9 @@ curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
</TabItem>
</Tabs>
## Query Data
## Query data
Using a smart meter as an example, below demonstrates how to execute SQL using various language connectors to query data, retrieving up to 100 rows from the `meters` supertable in the `power` database and printing the results line by line.
Below, using smart meters as an example, demonstrates how to use connectors in various languages to execute SQL to query data from the `power` database `meters` supertable, querying up to 100 rows of data and printing the results line by line.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
@ -224,17 +186,12 @@ Using a smart meter as an example, below demonstrates how to execute SQL using v
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcQueryDemo.java:query_data}}
```
:::note
Query operations are consistent with relational databases. When accessing return field content using indexes, start from 1; it is recommended to use field names for retrieval.
:::
**Note** Querying and operating relational databases are consistent, use indices starting from 1 to get returned field content, and it is recommended to use field names to retrieve.
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
```python title="WebSocket Connection"
{{#include docs/examples/python/query_ws.py}}
```
@ -242,62 +199,50 @@ Query operations are consistent with relational databases. When accessing return
{{#include docs/examples/python/query_native.py}}
```
```python title="REST Connection"
```python title="Rest Connection"
{{#include docs/examples/python/query_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:select_data}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/query.rs:query_data}}
```
The Rust connector also supports using **serde** for deserialization to obtain results as structured data:
Rust connector also supports using **serde** for deserializing to get structured results:
```rust
{{#include docs/examples/rust/nativeexample/examples/query.rs:query_data_2}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:queryData}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:select_data}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
```c title="WebSocket Connection"
{{#include docs/examples/c-ws/query_data_demo.c:query_data}}
```
```c title="Native Connection"
```c title="Native Connection"
{{#include docs/examples/c/query_data_demo.c:query_data}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Query Data
@ -312,17 +257,17 @@ curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
## Execute SQL with reqId
reqId can be used for request tracing. It acts similarly to traceId in distributed systems. A request may need to go through multiple services or modules to complete. reqId is used to identify and associate all related operations for this request, making it easier to trace and analyze the complete path of the request.
reqId can be used for request link tracing, similar to the role of traceId in distributed systems. A request might need to pass through multiple services or modules to be completed. reqId is used to identify and associate all related operations of this request, allowing us to track and analyze the complete path of the request.
Benefits of using reqId include:
Using reqId has the following benefits:
- Request tracing: By associating the same reqId with all related operations of a request, you can trace the complete path of the request within the system.
- Performance analysis: Analyzing a request's reqId allows you to understand the processing time across various services and modules, helping to identify performance bottlenecks.
- Fault diagnosis: When a request fails, you can find out where the issue occurred by examining the reqId associated with that request.
- Request tracing: By associating the same reqId with all related operations of a request, you can trace the complete path of the request in the system.
- Performance analysis: By analyzing a request's reqId, you can understand the processing time of the request across various services and modules, thereby identifying performance bottlenecks.
- Fault diagnosis: When a request fails, you can identify where the problem occurred by examining the reqId associated with the request.
If users do not set a reqId, the connector will randomly generate one internally, but it is recommended to set it explicitly for better association with user requests.
If the user does not set a reqId, the connector will internally generate one randomly, but it is recommended that users explicitly set it to better associate it with their requests.
Below are code samples for setting reqId while executing SQL with various language connectors.
Below are code examples of setting reqId to execute SQL in various language connectors.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
@ -332,10 +277,9 @@ Below are code samples for setting reqId while executing SQL with various langua
```
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
```python title="WebSocket Connection"
{{#include docs/examples/python/reqid_ws.py}}
```
@ -343,20 +287,16 @@ Below are code samples for setting reqId while executing SQL with various langua
{{#include docs/examples/python/reqid_native.py}}
```
```python title="REST Connection"
```python title="Rest Connection"
{{#include docs/examples/python/reqid_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/queryreqid/main.go:query_id}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -364,26 +304,19 @@ Below are code samples for setting reqId while executing SQL with various langua
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:sqlWithReqid}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:query_id}}
```
</TabItem>
<TabItem label="C" value="c">
```c "Websocket Connection"
```c "WebSocket Connection"
{{#include docs/examples/c-ws/with_reqid_demo.c:with_reqid}}
```
@ -392,10 +325,9 @@ Below are code samples for setting reqId while executing SQL with various langua
```
</TabItem>
<TabItem label="REST API" value="rest">
Query Data, specifying reqId as 3
Query data, specify reqId as 3
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql?req_id=3' \

View File

@ -7,52 +7,52 @@ slug: /developer-guide/schemaless-ingestion
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
In Internet of Things (IoT) applications, it is often necessary to collect a large number of data points to achieve various functionalities such as automated management, business analysis, and device monitoring. However, due to reasons like version upgrades of application logic and adjustments in the hardware of devices, the data collection items may change frequently. To address this challenge, TDengine provides a schemaless writing mode aimed at simplifying the data recording process.
In IoT applications, to achieve functions such as automated management, business analysis, and device monitoring, it is often necessary to collect a large number of data items. However, due to reasons such as application logic upgrades and hardware adjustments of the devices themselves, the data collection items may change frequently. To address this challenge, TDengine provides a schemaless writing method, aimed at simplifying the data recording process.
By using schemaless writing, users do not need to pre-create supertables or subtables, as TDengine automatically creates the corresponding storage structure based on the actual data written. Additionally, when necessary, schemaless writing can also automatically add required data columns or tag columns to ensure that the data written by users can be stored correctly.
With the schemaless writing method, users do not need to create supertables or subtables in advance, as TDengine will automatically create the corresponding storage structures based on the actual data written. Additionally, when necessary, the schemaless writing method can also automatically add necessary data columns or tag columns to ensure that the data written by users is correctly stored.
It is worth noting that supertables and their corresponding subtables created through schemaless writing are functionally indistinguishable from those created directly via SQL. Users can still use SQL to write data into these tables directly. However, the table names generated through schemaless writing are based on the tag values and are created according to a fixed mapping rule, which may result in names that lack readability and are not easy to understand.
It is worth noting that the supertables and their corresponding subtables created through the schemaless writing method have no functional differences from those created directly through SQL. Users can still use SQL to write data directly into them. However, since the table names generated by the schemaless writing method are based on tag values according to a fixed mapping rule, these table names may lack readability and are not easy to understand.
**When using schemaless writing, tables are automatically created, and there is no need to manually create tables. Manual table creation may result in unknown errors.**
**When using the schemaless writing method, tables are created automatically, and manual creation of tables may lead to unknown errors.**
## Schemaless Writing Line Protocol
TDengine's schemaless writing line protocol is compatible with InfluxDB's line protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. For the standard writing protocols of InfluxDB and OpenTSDB, please refer to their respective official documentation.
The following introduces the protocol based on InfluxDB's line protocol, along with the extensions made by TDengine. This protocol allows users to control (supertable) schemas in a more granular way. A string can express a data row, and multiple rows can be passed to the writing API at once as multiple strings. The format is specified as follows.
Below, we first introduce the protocol content extended by TDengine based on InfluxDB's line protocol. This protocol allows users to control the (supertable) schema in a more detailed manner. Using a string to express a data row, multiple rows of strings can be passed into the writing API at once to achieve batch writing of multiple data rows, with the format specified as follows.
```text
measurement,tag_set field_set timestamp
```
The parameters are described as follows:
The parameters are explained as follows.
- measurement is the table name, separated from tag_set by a comma.
- tag_set has the format `<tag_key>=<tag_value>, <tag_key>=<tag_value>`, indicating the tag column data, separated by commas and space from field_set.
- field_set has the format `<field_key>=<field_value>, <field_key>=<field_value>`, indicating the ordinary columns, also separated by commas and space from the timestamp.
- timestamp is the primary key timestamp corresponding to this data row.
- Schemaless writing does not support writing data to tables with a second primary key column.
- measurement is the table name, separated by a comma from tag_set.
- tag_set is formatted as `<tag_key>=<tag_value>, <tag_key>=<tag_value>`, representing tag column data, separated by commas, and separated by a space from field_set.
- field_set is formatted as `<field_key>=<field_value>, <field_key>=<field_value>`, representing ordinary columns, also separated by commas, and separated by a space from timestamp.
- timestamp is the primary key timestamp for this row of data.
- Schemaless writing does not support writing data for tables with a second primary key column.
All data in tag_set is automatically converted to the nchar data type and does not require double quotes.
In the schemaless writing data row protocol, each data item in field_set needs to describe its own data type with specific requirements as follows:
All data in tag_set are automatically converted to nchar data type and do not need to use double quotes.
In the schemaless writing line protocol, each data item in field_set needs to describe its own data type, with specific requirements as follows.
- If surrounded by double quotes, it indicates varchar type, e.g., "abc".
- If surrounded by double quotes and prefixed with L or l, it indicates nchar type, e.g., L" error message ".
- If surrounded by double quotes and prefixed with G or g, it indicates geometry type, e.g., G"Point(4.343 89.342)".
- If surrounded by double quotes and prefixed with B or b, it indicates varbinary type, where the quoted string can start with \x for hexadecimal or be a regular string, e.g., B"\x98f46e" and B"hello".
- For characters like spaces, equals sign (=), commas (,), double quotes ("), and backslashes (\), the preceding backslash (\) must be used for escaping (all are in English half-width symbols). The domain escaping rules for the schemaless writing protocol are shown in the following table.
- If enclosed in double quotes, it represents varchar type, e.g., "abc".
- If enclosed in double quotes and prefixed with L or l, it represents nchar type, e.g., L" error message ".
- If enclosed in double quotes and prefixed with G or g, it represents geometry type, e.g., G"Point(4.343 89.342)".
- If enclosed in double quotes and prefixed with B or b, it represents varbinary type, the double quotes can contain hexadecimal starting with \x or strings, e.g., B"\x98f46e" and B"hello".
- For spaces, equal signs (=), commas (,), double quotes ("), and backslashes (\), a backslash (\) is needed for escaping (all in half-width English characters). The domain escape rules for the schemaless writing protocol are shown in the following table.
| **No.** | **Domain** | **Need to Escape Characters** |
| **Number** | **Field** | **Characters to Escape** |
| -------- | -------- | ---------------- |
| 1 | Supertable Name | Comma, space |
| 2 | Tag Name | Comma, equals sign, space |
| 3 | Tag Value | Comma, equals sign, space |
| 4 | Column Name | Comma, equals sign, space |
| 5 | Column Value | Double quotes, backslash |
| 1 | Supertable name | comma, space |
| 2 | Tag name | comma, equal sign, space |
| 3 | Tag value | comma, equal sign, space |
| 4 | Column name | comma, equal sign, space |
| 5 | Column value | double quotes, backslash |
If two consecutive backslashes are used, the first backslash acts as an escape character; if only one backslash is used, it does not need to be escaped. The backslash escape rules for the schemaless writing protocol are shown in the following table.
If two consecutive backslashes are used, the first backslash acts as an escape character; if there is only one backslash, no escape is needed. The backslash escape rules for the schemaless writing protocol are shown in the following table.
| **No.** | **Backslash** | **Escaped as** |
| **Number** | **Backslash** | **Escapes to** |
| -------- | ------------ | ---------- |
| 1 | \ | \ |
| 2 | \\\\ | \ |
@ -61,93 +61,93 @@ If two consecutive backslashes are used, the first backslash acts as an escape c
| 5 | \\\\\\\\\\ | \\\\\\ |
| 6 | \\\\\\\\\\\\ | \\\\\\ |
Numerical types will be distinguished by suffixes. The numerical type escape rules for the schemaless writing protocol are shown in the following table.
Numeric types are distinguished by suffixes. The escape rules for numeric types in the schema-less write protocol are shown in the following table.
| **No.** | **Suffix** | **Mapped Type** | **Size (bytes)** |
| -------- | ----------- | ----------------------------- | -------------- |
| 1 | None or f64 | double | 8 |
| 2 | f32 | float | 4 |
| 3 | i8/u8 | TinyInt/UTinyInt | 1 |
| 4 | i16/u16 | SmallInt/USmallInt | 2 |
| 5 | i32/u32 | Int/UInt | 4 |
| 6 | i64/i/u64/u | BigInt/BigInt/UBigInt/UBigInt | 8 |
| **Number** | **Suffix** | **Mapped Type** | **Size (Bytes)** |
| ---------- | ---------- | ---------------------------- | ---------------- |
| 1 | None or f64| double | 8 |
| 2 | f32 | float | 4 |
| 3 | i8/u8 | TinyInt/UTinyInt | 1 |
| 4 | i16/u16 | SmallInt/USmallInt | 2 |
| 5 | i32/u32 | Int/UInt | 4 |
| 6 | i64/i/u64/u| BigInt/BigInt/UBigInt/UBigInt| 8 |
- t, T, true, True, TRUE, f, F, false, False will be treated directly as BOOL type.
- t, T, true, True, TRUE, f, F, false, False will be directly treated as BOOL type.
For example, the following data row represents: writing to a subtable under the supertable named `st`, with tags t1 as "3" (NCHAR), t2 as "4" (NCHAR), and t3 as "t3" (NCHAR), writing column c1 as 3 (BIGINT), column c2 as false (BOOL), column c3 as "passit" (BINARY), and column c4 as 4 (DOUBLE), with a primary key timestamp of 1626006833639000000.
For example, the following data line indicates: under the supertable named st, a subtable with tags t1 as "3" (NCHAR), t2 as "4" (NCHAR), t3 as "t3" (NCHAR), writing a row of data with column c1 as 3 (BIGINT), c2 as false (BOOL), c3 as "passit" (BINARY), c4 as 4 (DOUBLE), and the primary timestamp as 1626006833639000000.
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```
It is important to note that errors in case sensitivity when specifying data type suffixes, or incorrect specifications for data types, can lead to error messages and cause data writing to fail.
Note that if there is a case error in describing the data type suffix or the data type specified for the data is incorrect, it may trigger an error message and cause data writing to fail.
TDengine provides idempotency guarantees for data writing, meaning you can repeatedly call the API to write erroneous data. However, it does not provide atomicity guarantees for writing multiple rows of data. This means that during a batch write process, some data may be successfully written while other data may fail.
TDengine provides idempotence for data writing, meaning you can repeatedly call the API to write data that failed previously. However, it does not provide atomicity for writing multiple rows of data. That is, during the batch writing process of multiple rows of data, some data may be written successfully while others may fail.
## Schemaless Writing Processing Rules
## Schema-less Write Handling Rules
Schemaless writing processes row data according to the following principles:
Schema-less writes handle row data according to the following principles:
1. The following rules will be used to generate subtable names: first, combine the measurement name and the tag key and value into the following string.
1. The subtable name is generated using the following rules: first, combine the measurement name with the tag's key and value into the following string:
```json
"measurement,tag_key1=tag_value1,tag_key2=tag_value2"
```
- It should be noted that tag_key1, tag_key2 here are not in the original order of user-input tags, but are arranged in alphabetical order based on tag names. Therefore, tag_key1 is not the first tag input in the row protocol.
After arrangement, the MD5 hash value "md5_val" of the string is calculated. Then, the result is combined with the string to generate the table name: "t_md5_val". The prefix "t_" is fixed, and every table generated by this mapping relationship has this prefix.
- Note that tag_key1, tag_key2 are not in the original order entered by the user, but are sorted in ascending order by tag name. Therefore, tag_key1 is not the first tag entered in the line protocol.
After sorting, calculate the MD5 hash value "md5_val" of this string. Then combine the calculated result with the string to generate the table name: "t_md5_val". The "t_" is a fixed prefix, and each table automatically generated through this mapping relationship has this prefix.
- If you do not want to use the automatically generated table name, there are two ways to specify subtable names (the first one has a higher priority).
1. Specify it by configuring the `smlAutoChildTableNameDelimiter` parameter in `taos.cfg` (excluding `@ # space carriage return newline tab`).
1. For example: if configured `smlAutoChildTableNameDelimiter=-` and the data inserted is `st,t0=cpu1,t1=4 c1=3 1626006833639000000`, the created table name will be `cpu1-4`.
2. Specify it by configuring the `smlChildTableName` parameter in `taos.cfg`.
1. For example: if configured `smlChildTableName=tname` and the data inserted is `st,tname=cpu1,t1=4 c1=3 1626006833639000000`, the created table name will be `cpu1`. Note that if the `tname` is the same for multiple rows of data but with different tag sets, only the first row with the automatically created table will use the specified tag set; the others will be ignored.
- If you do not want to use the automatically generated table name, there are two ways to specify the subtable name (the first method has higher priority).
1. By configuring the smlAutoChildTableNameDelimiter parameter in taos.cfg (excluding `@ # space CR LF tab`).
1. For example: configure smlAutoChildTableNameDelimiter=- and insert data as st,t0=cpu1,t1=4 c1=3 1626006833639000000, the created table name would be cpu1-4.
2. By configuring the smlChildTableName parameter in taos.cfg.
1. For example: configure smlChildTableName=tname and insert data as st,tname=cpu1,t1=4 c1=3 1626006833639000000, the created table name would be cpu1. Note that if multiple rows of data have the same tname but different tag_sets, the tag_set specified during the first automatic table creation is used, and other rows will ignore it.
2. If the supertable obtained from parsing the row protocol does not exist, it will be created (it is not recommended to manually create supertables; otherwise, data insertion may be abnormal).
3. If the subtable obtained from parsing the row protocol does not exist, Schemaless will create the subtable according to the names determined in steps 1 or 2.
4. If the tag columns or ordinary columns specified in the data row do not exist, the corresponding tag columns or ordinary columns will be added to the supertable (only additions are allowed).
5. If some tag columns or ordinary columns exist in the supertable but are not specified with values in a data row, those columns will be set to NULL in that row.
6. For BINARY or NCHAR columns, if the provided value's length exceeds the column type limit, the allowable character length of that column will be automatically increased (only increments are allowed) to ensure complete data storage.
7. Any errors encountered during the entire process will interrupt the writing process and return an error code.
8. To improve writing efficiency, it is assumed by default that the order of field_set in the same supertable is the same (the first data contains all fields, and the subsequent data follows this order). If the order differs, you need to configure the parameter `smlDataFormat` as false; otherwise, data will be written in the same order, and the data in the library will be abnormal. Starting from version 3.0.3.0, automatic detection of order consistency is performed, and this configuration is deprecated.
9. Since SQL table names do not support dots (.), schemaless writing also handles dots (.). If the table name generated by schemaless writing contains a dot (.), it will be automatically replaced with an underscore (_). If a subtable name is specified manually, dots (.) in the name will also be converted to underscores (_).
10. The `taos.cfg` configuration file has added the `smlTsDefaultName` configuration (value as a string), which only takes effect on the client side. After configuration, the timestamp column name of the automatically created supertable can be set through this configuration. If not configured, it defaults to `_ts`.
11. The names of supertables or subtables created through schemaless writing are case-sensitive.
12. Schemaless writing still adheres to TDengine's underlying limitations on data structures, such as the total length of each row of data not exceeding 48KB (64KB from version 3.0.5.0), and the total length of tag values not exceeding 16KB.
2. If the supertable obtained from parsing the line protocol does not exist, it will be created (it is not recommended to manually create supertables, otherwise data insertion may be abnormal).
3. If the subtable obtained from parsing the line protocol does not exist, Schemaless will create the subtable according to the subtable name determined in step 1 or 2.
4. If the tag columns or regular columns specified in the data row do not exist, they will be added to the supertable (only additions, no deletions).
5. If some tag columns or regular columns exist in the supertable but are not specified in a data row, their values will be set to NULL in that row.
6. For BINARY or NCHAR columns, if the length of the values provided in the data row exceeds the limit of the column type, the maximum character storage limit of the column will be automatically increased (only additions, no deletions) to ensure the complete storage of data.
7. Errors encountered during the entire processing process will interrupt the writing process and return an error code.
8. To improve writing efficiency, it is assumed by default that the order of the field_set in the same supertable is the same (the first data contains all fields, and subsequent data follow this order). If the order is different, configure the smlDataFormat parameter to false. Otherwise, data will be written in the same order, and the data in the database will be abnormal. Starting from version 3.0.3.0, it automatically checks whether the order is consistent, and this configuration is deprecated.
9. Since SQL table creation does not support dots (.), Schemaless also processes dots (.) in automatically created table names, replacing them with underscores (_). If the subtable name is manually specified and contains a dot (.), it will also be converted to an underscore (_).
10. taos.cfg adds the smlTsDefaultName configuration (value as a string), which only works on the client side. After configuration, the time column name for Schemaless automatic table creation can be set through this configuration. If not configured, the default is _ts.
11. The supertable or subtable names in schema-less writing are case-sensitive.
12. Schema-less writing still follows TDengine's underlying restrictions on data structures, such as the total length of each row of data cannot exceed 48KB (from version 3.0.5.0 it is 64KB), and the total length of tag values cannot exceed 16KB.
## Time Resolution Identification
## Time Resolution Recognition
Schemaless writing supports three specified modes, as shown in the table below:
Schema-less writing supports three specified modes, as shown in the table below:
| **No.** | **Value** | **Description** |
| -------- | ------------------- | ----------------------------- |
| 1 | SML_LINE_PROTOCOL | InfluxDB Line Protocol |
| 2 | SML_TELNET_PROTOCOL | OpenTSDB Telnet Protocol |
| 3 | SML_JSON_PROTOCOL | JSON Format Protocol |
| **Number** | **Value** | **Description** |
| ---------- | -------------------- | -------------------------------- |
| 1 | SML_LINE_PROTOCOL | InfluxDB Line Protocol |
| 2 | SML_TELNET_PROTOCOL | OpenTSDB Text Line Protocol |
| 3 | SML_JSON_PROTOCOL | JSON Protocol Format |
In SML_LINE_PROTOCOL parsing mode, users need to specify the time resolution of the input timestamps. Available time resolutions are as follows:
In the SML_LINE_PROTOCOL parsing mode, users need to specify the time resolution of the input timestamp. The available time resolutions are as follows:
| **No.** | **Time Resolution Definition** | **Meaning** |
| -------- | --------------------------------- | -------------- |
| 1 | TSDB_SML_TIMESTAMP_NOT_CONFIGURED | Undefined (Invalid) |
| 2 | TSDB_SML_TIMESTAMP_HOURS | Hours |
| 3 | TSDB_SML_TIMESTAMP_MINUTES | Minutes |
| 4 | TSDB_SML_TIMESTAMP_SECONDS | Seconds |
| 5 | TSDB_SML_TIMESTAMP_MILLI_SECONDS | Milliseconds |
| 6 | TSDB_SML_TIMESTAMP_MICRO_SECONDS | Microseconds |
| 7 | TSDB_SML_TIMESTAMP_NANO_SECONDS | Nanoseconds |
| **Number** | **Time Resolution Definition** | **Meaning** |
| ---------- | ----------------------------------- | -------------- |
| 1 | TSDB_SML_TIMESTAMP_NOT_CONFIGURED | Undefined (invalid) |
| 2 | TSDB_SML_TIMESTAMP_HOURS | Hours |
| 3 | TSDB_SML_TIMESTAMP_MINUTES | Minutes |
| 4 | TSDB_SML_TIMESTAMP_SECONDS | Seconds |
| 5 | TSDB_SML_TIMESTAMP_MILLI_SECONDS | Milliseconds |
| 6 | TSDB_SML_TIMESTAMP_MICRO_SECONDS | Microseconds |
| 7 | TSDB_SML_TIMESTAMP_NANO_SECONDS | Nanoseconds |
In SML_TELNET_PROTOCOL and SML_JSON_PROTOCOL modes, the time precision is determined by the length of the timestamps (this is the same as the standard operation method for OpenTSDB), and the user-specified time resolution will be ignored.
In the SML_TELNET_PROTOCOL and SML_JSON_PROTOCOL modes, the time precision is determined by the length of the timestamp (consistent with the standard operation of OpenTSDB), and the user-specified time resolution will be ignored.
## Data Mode Mapping Rules
The data from InfluxDB line protocol will be mapped to a structured format, where the measurement maps to the supertable name, the tag names in tag_set map to the tag names in the data structure, and the names in field_set map to column names. For example, the following data:
Data from the InfluxDB line protocol will be mapped to schema-based data, where the measurement maps to the supertable name, tag names in the tag_set map to tag names in the data schema, and names in the field_set map to column names. For example, the following data.
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```
This data row generates a supertable named `st`, which includes three nchar-type tags: `t1`, `t2`, `t3`, and five data columns: `ts` (timestamp), `c1` (bigint), `c3` (binary), `c2` (bool), `c4` (bigint). It maps to the following SQL statement:
This line of data maps to create a supertable: st, which includes 3 tags of type nchar: t1, t2, t3. Five data columns, namely ts (timestamp), c1 (bigint), c3 (binary), c2 (bool), c4 (bigint). Mapped into the following SQL statement:
```json
create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) tags(t1 nchar(1), t2 nchar(1), t3 nchar(2))
@ -155,45 +155,45 @@ create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) ta
## Data Mode Change Handling
This section explains the impact on data mode under different row data writing conditions.
This section will explain the impact on the data schema under different line data writing scenarios.
When writing with a clearly identified field type using row protocol, changing the field type definition later will result in a clear data mode error, triggering the writing API to report an error. As shown below,
When using line protocol to write a field type with a clear identifier, subsequent changes to the field type definition will result in a clear data schema error, triggering the write API to report an error. As shown below,
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4i 1626006833640000000
```
The first row maps the data type of column `c4` to Double, but the second row specifies this column as BigInt through the numerical suffix, triggering a parsing error in schemaless writing.
The data type mapping of the first line defines the c4 column as Double, but the second line declares the column as BigInt through a numeric suffix, thus triggering a parsing error in schema-less writing.
If a row protocol indicates that a column is binary, but subsequent rows require a longer binary length, this will trigger a change in the supertable mode.
If the line protocol in the previous rows declares a data column as binary, and subsequent requirements for a longer binary length, this will trigger a change in the supertable schema.
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c5="pass" 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c5="passit" 1626006833640000000
```
In the first row, the row protocol parsing declares `c5` as a binary(4) field, and the second row will extract `c5` as a binary column, but its width will be increased to accommodate the new string width.
The line protocol parsing in the first line declares that column c5 is a binary(4) field. The second line of data writing extracts that column c5 is still a binary column, but its width is 6. At this point, the width of the binary needs to be increased to accommodate the new string.
```json
st,t1=3,t2=4,t3=t3 c1=3i64 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c6="passit" 1626006833640000000
```
In the second row, compared to the first, an additional column `c6` of type binary(6) has been added. Therefore, a new column `c6` of type binary(6) will be automatically added.
The second line of data adds a column c6 relative to the first line, with a type of binary(6). Thus, a column c6, type binary(6), will be automatically added.
## Example of Schemaless Writing
## Schemaless Writing Example
Using the smart meter as an example, here are code samples demonstrating how various language connectors use the schemaless writing interface to write data, covering three protocols: InfluxDB's line protocol, OpenTSDB's TELNET line protocol, and OpenTSDB's JSON format protocol.
Below, using smart meters as an example, we introduce code samples for writing data using the schemaless writing interface with various language connectors. This includes three protocols: InfluxDB's line protocol, OpenTSDB's TELNET line protocol, and OpenTSDB's JSON format protocol.
:::note
- Since the automatic table creation rules for schemaless writing differ from those in previous SQL example sections, please ensure that the `meters`, `metric_telnet`, and `metric_json` tables do not exist before running the code samples.
- The TELNET line protocol and JSON format protocol of OpenTSDB only support a single data column, so other examples have been used.
- Since the rules for automatic table creation with schemaless writing differ from those in the previous SQL examples, please ensure that the `meters`, `metric_telnet`, and `metric_json` tables do not exist before running the code samples.
- OpenTSDB's TELNET line protocol and OpenTSDB's JSON format protocol only support one data column, so we have used other examples.
:::
### Websocket Connection
### WebSocket Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
@ -202,14 +202,13 @@ Using the smart meter as an example, here are code samples demonstrating how var
{{#include docs/examples/java/src/main/java/com/taos/example/SchemalessWsTest.java:schemaless}}
```
Execute schemaless writing with reqId, the last parameter reqId can be used for request tracing.
Execute schemaless writing with reqId, where the last parameter reqId can be used for request link tracing.
```java
writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);
```
</TabItem>
<TabItem label="Python" value="python">
```python
@ -217,15 +216,11 @@ writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/schemaless/ws/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -233,23 +228,16 @@ writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/line_example.js}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wssml/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
@ -257,31 +245,26 @@ writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
### Native Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/SchemalessJniTest.java:schemaless}}
```
Execute schemaless writing with reqId, the last parameter reqId can be used for request tracing.
Execute schemaless writing with reqId, where the last parameter reqId can be used for request link tracing.
```java
writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);
```
</TabItem>
<TabItem label="Python" value="python">
```python
@ -289,15 +272,11 @@ writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/schemaless/native/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -305,39 +284,27 @@ writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO
```
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/nativesml/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c/sml_insert_demo.c:schemaless}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
## Querying Written Data
## Querying the Written Data
Running the code samples from the previous section will automatically create tables in the power database. We can query the data through the taos shell or application. Below are examples of querying supertables and the meters table data using the taos shell.
By running the example code from the previous section, tables will be automatically created in the power database. We can query the data using taos shell or an application. Below is an example of querying the data from the supertable and meters table using taos shell.
```shell
taos> show power.stables;

View File

@ -7,36 +7,34 @@ slug: /developer-guide/parameter-binding
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
Using parameter binding for writing data can avoid the resource consumption of SQL syntax parsing, thus significantly improving writing performance. The reasons parameter binding can enhance writing efficiency include:
When inserting data using parameter binding, it can avoid the resource consumption of SQL syntax parsing, thereby significantly improving the write performance. The reasons why parameter binding can improve writing efficiency include:
- **Reduced Parsing Time**: With parameter binding, the structure of the SQL statement is determined upon the first execution. Subsequent executions only need to replace the parameter values, thereby avoiding syntax parsing for each execution, which reduces parsing time.
- **Precompilation**: When using parameter binding, SQL statements can be precompiled and cached. When executing with different parameter values later, the precompiled version can be used directly, improving execution efficiency.
- **Reduced Network Overhead**: Parameter binding can also reduce the amount of data sent to the database since only parameter values need to be sent rather than the full SQL statement. This difference is particularly noticeable when executing a large number of similar insert or update operations.
- Reduced parsing time: With parameter binding, the structure of the SQL statement is determined at the first execution, and subsequent executions only need to replace parameter values, thus avoiding syntax parsing each time and reducing parsing time.
- Precompilation: When using parameter binding, the SQL statement can be precompiled and cached. When executed later with different parameter values, the precompiled version can be used directly, improving execution efficiency.
- Reduced network overhead: Parameter binding also reduces the amount of data sent to the database because only parameter values need to be sent, not the complete SQL statement, especially when performing a large number of similar insert or update operations, this difference is particularly noticeable.
**Tips: Data writing is recommended to use parameter binding.**
**Tips: It is recommended to use parameter binding for data insertion**
Next, we will continue using smart meters as an example to demonstrate how various language connectors efficiently write data using parameter binding:
Next, we continue to use smart meters as an example to demonstrate the efficient writing functionality of parameter binding with various language connectors:
1. Prepare a parameterized SQL insert statement for inserting data into the supertable `meters`. This statement allows dynamically specifying the subtable name, tags, and column values.
1. Prepare a parameterized SQL insert statement for inserting data into the supertable `meters`. This statement allows dynamically specifying subtable names, tags, and column values.
2. Loop to generate multiple subtables and their corresponding data rows. For each subtable:
- Set the subtable name and tag values (group ID and location).
- Set the subtable's name and tag values (group ID and location).
- Generate multiple rows of data, each including a timestamp, randomly generated current, voltage, and phase values.
- Execute batch insert operations to insert these data rows into the corresponding subtable.
- Perform batch insertion operations to insert these data rows into the corresponding subtable.
3. Finally, print the actual number of rows inserted into the table.
## Websocket Connection
## WebSocket Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/WSParameterBindingBasicDemo.java:para_bind}}
```
Here is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/WSParameterBindingFullDemo.java).
This is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/WSParameterBindingFullDemo.java)
</TabItem>
<TabItem label="Python" value="python">
```python
@ -44,15 +42,11 @@ Here is a [more detailed parameter binding example](https://github.com/taosdata/
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/stmt/ws/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -60,50 +54,40 @@ Here is a [more detailed parameter binding example](https://github.com/taosdata/
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/stmt_example.js:createConnect}}
{{#include docs/examples/node/websocketexample/stmt_example.js:createConnect}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsStmt/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c-ws/stmt_insert_demo.c}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
## Native Connection
<Tabs defaultValue="java" groupId="lang">
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/ParameterBindingBasicDemo.java:para_bind}}
```
Here is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/ParameterBindingFullDemo.java).
This is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/ParameterBindingFullDemo.java)
</TabItem>
<TabItem label="Python" value="python">
```python
@ -111,15 +95,11 @@ Here is a [more detailed parameter binding example](https://github.com/taosdata/
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/stmt/native/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
@ -127,32 +107,20 @@ Here is a [more detailed parameter binding example](https://github.com/taosdata/
```
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/stmtInsert/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c/stmt_insert_demo.c}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>

File diff suppressed because it is too large Load Diff

View File

@ -6,46 +6,46 @@ slug: /developer-guide/user-defined-functions
## Introduction to UDF
In certain application scenarios, the query functions required by the application logic cannot be directly implemented using built-in functions. TDengine allows the writing of User-Defined Functions (UDF) to address the specific needs in such scenarios. Once the UDF is successfully registered in the cluster, it can be called in SQL just like system-built-in functions, with no difference in usage. UDFs are divided into scalar functions and aggregate functions. Scalar functions output a value for each row of data, such as calculating the absolute value (abs), sine function (sin), string concatenation function (concat), etc. Aggregate functions output a value for multiple rows of data, such as calculating the average (avg) or maximum value (max).
In some application scenarios, the query functionality required by the application logic cannot be directly implemented using built-in functions. TDengine allows the writing of user-defined functions (UDFs) to address the needs of special application scenarios. Once successfully registered in the cluster, UDFs can be called in SQL just like system built-in functions, with no difference in usage. UDFs are divided into scalar functions and aggregate functions. Scalar functions output a value for each row of data, such as absolute value (abs), sine function (sin), string concatenation function (concat), etc. Aggregate functions output a value for multiple rows of data, such as average (avg), maximum value (max), etc.
TDengine supports writing UDFs in both C and Python programming languages. UDFs written in C have performance almost identical to built-in functions, while those written in Python can leverage the rich Python computation libraries. To prevent exceptions during UDF execution from affecting database services, TDengine uses process separation technology to execute UDFs in another process. Even if a user-defined UDF crashes, it will not affect the normal operation of TDengine.
TDengine supports writing UDFs in two programming languages: C and Python. UDFs written in C have performance nearly identical to built-in functions, while those written in Python can utilize the rich Python computation libraries. To prevent exceptions during UDF execution from affecting the database service, TDengine uses process isolation technology, executing UDFs in a separate process. Even if a user-written UDF crashes, it will not affect the normal operation of TDengine.
## Developing UDFs in C
## Developing UDFs in C Language
When implementing UDFs in C, it is necessary to implement the specified interface functions:
When implementing UDFs in C language, you need to implement the specified interface functions:
- Scalar functions need to implement the scalar interface function `scalarfn`.
- Scalar functions need to implement the scalar interface function scalarfn.
- Aggregate functions need to implement the aggregate interface functions `aggfn_start`, `aggfn`, `aggfn_finish`.
- If initialization is required, implement `udf_init`.
- If cleanup is required, implement `udf_destroy`.
- If initialization is needed, implement `udf_init`.
- If cleanup is needed, implement `udf_destroy`.
### Interface Definition
The name of the interface function is the UDF name or a combination of the UDF name and specific suffixes (\_start, \_finish, \_init, \_destroy). The function names described later, such as `scalarfn` and `aggfn`, need to be replaced with the UDF name.
The interface function names are the UDF name, or the UDF name connected with specific suffixes (`_start`,`_finish`, `_init`,`_destroy`). Function names described later in the content, such as `scalarfn`, `aggfn`, should be replaced with the UDF name.
#### Scalar Function Interface
A scalar function is a function that converts input data to output data, typically used for calculations and transformations on a single data value. The prototype for the scalar function interface is as follows.
A scalar function is a function that converts input data into output data, typically used for calculating and transforming a single data value. The prototype of the scalar function interface is as follows.
```c
int32_t scalarfn(SUdfDataBlock* inputDataBlock, SUdfColumn *resultColumn);
```
The main parameter descriptions are as follows:
Key parameter descriptions are as follows:
- `inputDataBlock`: The input data block.
- `resultColumn`: The output column.
- inputDataBlock: The input data block.
- resultColumn: The output column.
#### Aggregate Function Interface
An aggregate function is a special function used to group and calculate data to generate summary information. The workings of an aggregate function are as follows:
An aggregate function is a special type of function used for grouping and calculating data to generate summary information. The working principle of aggregate functions is as follows:
- Initialize the result buffer: First, call the `aggfn_start` function to generate a result buffer (result buffer) for storing intermediate results.
- Group data: Relevant data will be divided into multiple row data blocks, each containing a set of data with the same grouping key.
- Update intermediate results: For each data block, call the `aggfn` function to update the intermediate results. The `aggfn` function will compute the data according to the type of aggregate function (such as sum, avg, count, etc.) and store the calculation results in the result buffer.
- Generate final results: After updating the intermediate results of all data blocks, call the `aggfn_finish` function to extract the final result from the result buffer. The final result will contain either 0 or 1 piece of data, depending on the type of aggregate function and the input data.
- Initialize the result buffer: First, the `aggfn_start` function is called to generate a result buffer for storing intermediate results.
- Group data: Related data is divided into multiple row data blocks, each containing a group of data with the same grouping key.
- Update intermediate results: For each data block, the `aggfn` function is called to update the intermediate results. The `aggfn` function performs calculations according to the type of aggregate function (such as sum, avg, count, etc.) and stores the results in the result buffer.
- Generate the final result: After updating the intermediate results of all data blocks, the `aggfn_finish` function is called to extract the final result from the result buffer. The final result contains either 0 or 1 data row, depending on the type of aggregate function and the input data.
The prototype for the aggregate function interface is as follows.
The prototype of the aggregate function interface is as follows.
```c
int32_t aggfn_start(SUdfInterBuf *interBuf);
@ -53,29 +53,27 @@ int32_t aggfn(SUdfDataBlock* inputBlock, SUdfInterBuf *interBuf, SUdfInterBuf *n
int32_t aggfn_finish(SUdfInterBuf* interBuf, SUdfInterBuf *result);
```
Where `aggfn` is a placeholder for the function name. First, call `aggfn_start` to generate the result buffer, then the relevant data will be divided into multiple row data blocks, and the `aggfn` function will be called for each data block to update the intermediate results. Finally, call `aggfn_finish` to produce the final result from the intermediate results, which can only contain 0 or 1 result data.
Key parameter descriptions are as follows:
The main parameter descriptions are as follows:
- `interBuf`: The intermediate result buffer.
- `interBuf`: Intermediate result buffer.
- `inputBlock`: The input data block.
- `newInterBuf`: The new intermediate result buffer.
- `newInterBuf`: New intermediate result buffer.
- `result`: The final result.
#### Initialization and Destruction Interfaces
#### Initialization and Destruction Interface
The initialization and destruction interfaces are shared by both scalar and aggregate functions, with the relevant APIs as follows.
The initialization and destruction interfaces are common interfaces used by both scalar and aggregate functions, with the following APIs.
```c
int32_t udf_init()
int32_t udf_destroy()
```
The `udf_init` function performs initialization, while the `udf_destroy` function handles cleanup. If there is no initialization work, there is no need to define the `udf_init` function; if there is no cleanup work, there is no need to define the `udf_destroy` function.
Among them, the `udf_init` function completes the initialization work, and the `udf_destroy` function completes the cleanup work. If there is no initialization work, there is no need to define the `udf_init` function; if there is no cleanup work, there is no need to define the `udf_destroy` function.
### Scalar Function Template
The template for developing scalar functions in C is as follows.
The template for developing scalar functions in C language is as follows.
```c
#include "taos.h"
@ -112,7 +110,7 @@ int32_t scalarfn_destroy() {
### Aggregate Function Template
The template for developing aggregate functions in C is as follows.
The template for developing aggregate functions in C language is as follows.
```c
#include "taos.h"
@ -172,13 +170,14 @@ int32_t aggfn_destroy() {
### Compilation
In TDengine, to implement UDFs, you need to write C source code and compile it into a dynamic link library file according to TDengine's specifications. Following the previously described rules, prepare the source code for the UDF `bit_and.c`. For the Linux operating system, execute the following command to compile and obtain the dynamic link library file.
In TDengine, to implement UDF, you need to write C language source code and compile it into a dynamic link library file according to TDengine's specifications.
Prepare the UDF source code `bit_and.c` as described earlier. For example, on a Linux operating system, execute the following command to compile into a dynamic link library file.
```shell
gcc -g -O0 -fPIC -shared bit_and.c -o libbitand.so
```
To ensure reliable operation, it is recommended to use GCC version 7.5 or above.
It is recommended to use GCC version 7.5 or above to ensure reliable operation.
### C UDF Data Structures
@ -230,21 +229,21 @@ typedef struct SUdfInterBuf {
} SUdfInterBuf;
```
The structure descriptions are as follows:
The data structures are described as follows:
- `SUdfDataBlock` contains the number of rows `numOfRows` and the number of columns `numCols`. `udfCols[i]` (0 \<= i \<= `numCols`-1) represents each column of data, of type `SUdfColumn*`.
- `SUdfColumn` contains the data type definition `colMeta` and the data `colData`.
- The members of `SUdfColumnMeta` are defined similarly to the data type definitions in `taos.h`.
- `SUdfColumnData` can be of variable length, with `varLenCol` defining variable length data and `fixLenCol` defining fixed length data.
- `SUdfInterBuf` defines the intermediate structure buffer and the number of results in the buffer `numOfResult`.
- `SUdfDataBlock` contains the number of rows `numOfRows` and the number of columns `numOfCols`. `udfCols[i]` (0 \<= i \<= numCols-1) represents each column's data, type `SUdfColumn*`.
- `SUdfColumn` includes the column's data type definition `colMeta` and the column's data `colData`.
- `SUdfColumnMeta` members are defined similarly to data type definitions in `taos.h`.
- `SUdfColumnData` can be variable-length, `varLenCol` defines variable-length data, and `fixLenCol` defines fixed-length data.
- `SUdfInterBuf` defines an intermediate structure buffer and the number of results in the buffer `numOfResult`
To better operate on the above data structures, some utility functions are provided, defined in `taosudf.h`.
To better operate the above data structures, some convenience functions are provided, defined in `taosudf.h`.
### C UDF Example Code
#### Scalar Function Example [bit_and](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/bit_and.c)
`bit_add` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `bit_add` ignores null values.
`bit_and` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `bit_and` ignores null values.
<details>
<summary>bit_and.c</summary>
@ -255,9 +254,9 @@ To better operate on the above data structures, some utility functions are provi
</details>
#### Aggregate Function Example 1 Return Value as Numeric Type [l2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/l2norm.c)
#### Aggregate Function Example 1 Returning Numeric Type [l2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/l2norm.c)
`l2norm` implements the second-order norm of all data in the input column, which means squaring each data point, summing them, and then taking the square root.
`l2norm` implements the second-order norm of all data in the input columns, i.e., squaring each data point, then summing them up, and finally taking the square root.
<details>
<summary>l2norm.c</summary>
@ -268,9 +267,9 @@ To better operate on the above data structures, some utility functions are provi
</details>
#### Aggregate Function Example 2 Return Value as String Type [max_vol](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/max_vol.c)
#### Aggregate Function Example 2 Returning String Type [max_vol](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/max_vol.c)
`max_vol` finds the maximum voltage from multiple input voltage columns and returns a combined string value composed of device ID + the location (row, column) of the maximum voltage + the maximum voltage value.
`max_vol` implements finding the maximum voltage from multiple input voltage columns, returning a composite string value consisting of the device ID + the position (row, column) of the maximum voltage + the maximum voltage value.
Create table:
@ -299,37 +298,37 @@ select max_vol(vol1, vol2, vol3, deviceid) from battery;
</details>
## Developing UDFs in Python
## Developing UDFs in Python Language
### Preparing the Environment
### Environment Setup
The specific steps to prepare the environment are as follows:
- Step 1: Prepare the Python runtime environment.
- Step 2: Install the Python package `taospyudf`. The command is as follows.
- Step 1, prepare the Python runtime environment.
- Step 2, install the Python package taospyudf. The command is as follows.
```shell
pip3 install taospyudf
```
- Step 3: Execute the command `ldconfig`.
- Step 4: Start the `taosd` service.
- Step 3, execute the command ldconfig.
- Step 4, start the taosd service.
During installation, C++ source code will be compiled, so the system must have `cmake` and `gcc`. The compiled file `libtaospyudf.so` will be automatically copied to the `/usr/local/lib/` directory, so if you are a non-root user, you need to add `sudo` during installation. After installation, you can check if this file exists in the directory:
The installation process will compile C++ source code, so cmake and gcc must be present on the system. The compiled libtaospyudf.so file will automatically be copied to the /usr/local/lib/ directory, so if you are not a root user, you need to add sudo during installation. After installation, you can check if this file is in the directory:
```shell
root@slave11 ~/udf $ ls -l /usr/local/lib/libtaos*
root@server11 ~/udf $ ls -l /usr/local/lib/libtaos*
-rw-r--r-- 1 root root 671344 May 24 22:54 /usr/local/lib/libtaospyudf.so
```
### Interface Definition
When developing UDFs using Python, you need to implement the specified interface functions. The specific requirements are as follows.
When developing UDFs in Python, you need to implement the specified interface functions. The specific requirements are as follows.
- Scalar functions need to implement the scalar interface function `process`.
- Aggregate functions need to implement the aggregate interface functions `start`, `reduce`, and `finish`.
- If initialization is required, implement the `init` function.
- If cleanup is required, implement the `destroy` function.
- Scalar functions need to implement the scalar interface function process.
- Aggregate functions need to implement the aggregate interface functions start, reduce, finish.
- If initialization is needed, the init function should be implemented.
- If cleanup work is needed, implement the destroy function.
#### Scalar Function Interface
@ -339,10 +338,10 @@ The interface for scalar functions is as follows.
def process(input: datablock) -> tuple[output_type]:
```
The main parameter description is as follows:
The main parameters are as follows:
- `input`: `datablock` similar to a two-dimensional matrix, which reads the Python object located at row `row` and column `col` through the member method `data(row, col)`.
- The return value is a tuple of Python objects, with each element of the output type.
- input: datablock is similar to a two-dimensional matrix, read the python object located at row and col through the member method data(row, col)
- The return value is a tuple of Python objects, each element type as the output type.
#### Aggregate Function Interface
@ -354,33 +353,29 @@ def reduce(inputs: datablock, buf: bytes) -> bytes
def finish(buf: bytes) -> output_type:
```
The above code defines three functions used to implement a custom aggregate function. The specific process is as follows.
The above code defines 3 functions, each used to implement a custom aggregate function. The specific process is as follows.
First, call the `start` function to generate the initial result buffer. This result buffer is used to store the internal state of the aggregate function and will be continuously updated as the input data is processed.
First, the start function is called to generate the initial result buffer. This result buffer is used to store the internal state of the aggregate function, which is continuously updated as input data is processed.
Then, the input data will be divided into multiple row data blocks. For each row data block, the `reduce` function will be called, passing the current row data block (`inputs`) and the current intermediate result (`buf`) as parameters. The `reduce` function will update the internal state of the aggregate function based on the input data and current state, returning the new intermediate result.
Then, the input data is divided into multiple row data blocks. For each row data block, the reduce function is called, and the current row data block (inputs) and the current intermediate result (buf) are passed as parameters. The reduce function updates the internal state of the aggregate function based on the input data and current state, and returns a new intermediate result.
Finally, when all row data blocks are processed, the `finish` function will be called. This function receives the final intermediate result (`buf`) as a parameter and generates the final output from it. Due to the nature of aggregate functions, the final output can only contain 0 or 1 piece of data. This output result will be returned to the caller as the computation result of the aggregate function.
Finally, when all row data blocks have been processed, the finish function is called. This function takes the final intermediate result (buf) as a parameter and generates the final output from it. Due to the nature of aggregate functions, the final output can only contain 0 or 1 data entries. This output result is returned to the caller as the result of the aggregate function calculation.
#### Initialization and Destruction Interfaces
#### Initialization and Destruction Interface
The initialization and destruction interfaces are as follows.
The interfaces for initialization and destruction are as follows.
```Python
def init()
def destroy()
```
Parameter descriptions:
Parameter description:
- `init`: Completes initialization work.
- `destroy`: Completes cleanup work.
- `init` completes the initialization work
- `destroy` completes the cleanup work
:::note
When developing UDFs in Python, it is necessary to define the `init` and `destroy` functions.
:::
**Note** When developing UDFs in Python, you must define both `init` and `destroy` functions
### Scalar Function Template
@ -417,7 +412,7 @@ def finish(buf: bytes) -> output_type:
### Data Type Mapping
The following table describes the mapping between TDengine SQL data types and Python data types. Any type of NULL value is mapped to Python's `None` value.
The table below describes the mapping between TDengine SQL data types and Python data types. Any type of NULL value is mapped to Python's None value.
| **TDengine SQL Data Type** | **Python Data Type** |
| :-----------------------: | ------------ |
@ -431,18 +426,14 @@ The following table describes the mapping between TDengine SQL data types and Py
### Development Examples
This article contains five example programs, progressing from simple to complex, and also includes a wealth of practical debugging tips.
This article includes 5 example programs, ranging from basic to advanced, and also contains numerous practical debugging tips.
:::note
Logging cannot be output through the `print` function within UDFs; you need to write to files yourself or use Python's built-in logging library to write to files.
:::
Note: **Within UDF, logging cannot be done using the print function; you must write to a file or use Python's built-in logging library.**
#### Example One
Write a UDF function that only accepts a single integer: input `n`, output `ln(n^2 + 1)`.
First, write a Python file located in a certain system directory, such as `/root/udf/myfun.py`, with the following content.
Write a UDF function that only accepts a single integer: Input n, output ln(n^2 + 1).
First, write a Python file, located in a system directory, such as `/root/udf/myfun.py` with the following content.
```python
from math import log
@ -458,27 +449,25 @@ def process(block):
return [log(block.data(i, 0) ** 2 + 1) for i in range(rows)]
```
This file contains three functions: `init` and `destroy` are both empty functions; they are the lifecycle functions of the UDF and need to be defined even if they do nothing. The key function is `process`, which accepts a data block; this data block object has two methods.
This file contains 3 functions, `init` and `destroy` are empty functions, they are the lifecycle functions of UDF, even if they do nothing, they must be defined. The most crucial is the `process` function, which accepts a data block. This data block object has two methods.
1. `shape()` returns the number of rows and columns in the data block.
2. `data(i, j)` returns the data located at row `i` and column `j`.
1. `shape()` returns the number of rows and columns of the data block
2. `data(i, j)` returns the data at row i, column j
The `process` method of the scalar function needs to return as many rows of data as the number of rows in the input data block. The above code ignores the number of columns because it only needs to calculate the first column of each row.
The scalar function's `process` method must return as many rows of data as there are in the data block. The above code ignores the number of columns, as it only needs to compute each row's first column.
Next, create the corresponding UDF function by executing the following statement in the TDengine CLI.
Next, create the corresponding UDF function, execute the following statement in the TDengine CLI.
```sql
create function myfun as '/root/udf/myfun.py' outputtype double language 'Python'
```
The output is as follows.
```shell
taos> create function myfun as '/root/udf/myfun.py' outputtype double language 'Python';
Create OK, 0 row(s) affected (0.005202s)
```
It seems to go smoothly. Next, check all custom functions in the system to confirm that the creation was successful.
It looks smooth, next let's check all the custom functions in the system to confirm it was created successfully.
```text
taos> show functions;
@ -488,7 +477,7 @@ taos> show functions;
Query OK, 1 row(s) in set (0.005767s)
```
Generate test data by executing the following commands in the TDengine CLI.
Generate test data, you can execute the following commands in the TDengine CLI.
```sql
create database test;
@ -498,7 +487,7 @@ insert into t values('2023-05-03 08:09:10', 2, 3, 4);
insert into t values('2023-05-10 07:06:05', 3, 4, 5);
```
Test the `myfun` function.
Test the myfun function.
```sql
taos> select myfun(v1, v2) from t;
@ -506,22 +495,22 @@ taos> select myfun(v1, v2) from t;
DB error: udf function execution failure (0.011088s)
```
Unfortunately, the execution failed. What is the reason? Check the logs of the `udfd` process.
Unfortunately, the execution failed. What could be the reason? Check the udfd process logs.
```shell
tail -10 /var/log/taos/udfd.log
```
The following error message is found.
Found the following error messages.
```text
05/24 22:46:28.733545 01665799 UDF ERROR can not load library libtaospyudf.so. error: operation not permitted
05/24 22:46:28.733561 01665799 UDF ERROR can not load python plugin. lib path libtaospyudf.so
```
The error is clear: the Python plugin `libtaospyudf.so` could not be loaded. If you encounter this error, please refer to the preparation environment section above.
The error is clear: the Python plugin `libtaospyudf.so` was not loaded. If you encounter this error, please refer to the previous section on setting up the environment.
After fixing the environment error, execute the command again as follows.
After fixing the environment error, execute again as follows.
```sql
taos> select myfun(v1) from t;
@ -532,13 +521,13 @@ taos> select myfun(v1) from t;
2.302585093 |
```
Thus, we have completed the first UDF and learned some simple debugging methods.
With this, we have completed our first UDF 😊, and learned some basic debugging methods.
#### Example Two
#### Example 2
Although the above `myfun` passed the test, it has two shortcomings.
Although the myfun function passed the test, it has two drawbacks.
1. This scalar function only accepts one column of data as input; if the user passes in multiple columns, it will not raise an exception.
1. This scalar function only accepts 1 column of data as input, and it will not throw an exception if multiple columns are passed.
```sql
taos> select myfun(v1, v2) from t;
@ -549,7 +538,7 @@ taos> select myfun(v1, v2) from t;
2.302585093 |
```
2. It does not handle null values. We expect that if there are null values in the input, it will raise an exception and terminate execution. Therefore, the `process` function is improved as follows.
2. It does not handle null values. We expect that if the input contains null, it will throw an exception and terminate execution. Therefore, the process function is improved as follows.
```python
def process(block):
@ -565,7 +554,7 @@ Execute the following statement to update the existing UDF.
create or replace function myfun as '/root/udf/myfun.py' outputtype double language 'Python';
```
Passing two parameters to `myfun` will now cause it to fail.
Passing two arguments to myfun will result in a failure.
```sql
taos> select myfun(v1, v2) from t;
@ -573,23 +562,24 @@ taos> select myfun(v1, v2) from t;
DB error: udf function execution failure (0.014643s)
```
The custom exception message is printed in the plugin's log file `/var/log/taos/taospyudf.log`.
Custom exception messages are logged in the plugin log file `/var/log/taos/taospyudf.log`.
```text
2023-05-24 23:21:06.790 ERROR [1666188] [doPyUdfScalarProc@507] call pyUdfScalar proc function. context 0x7faade26d180. error: Exception: require 1 parameter but given 2
At:
/var/lib/taos//.udf/myfun_3_1884e1281d9.py(12): process
```
Thus, we have learned how to update the UDF and check the error logs produced by the UDF.
(Note: If the UDF does not take effect after being updated, in versions of TDengine prior to 3.0.5.0, it is necessary to restart `taosd`; in versions 3.0.5.0 and later, there is no need to restart `taosd` for it to take effect.)
Thus, we have learned how to update UDFs and view the error logs output by UDFs.
(Note: If the UDF does not take effect after an update, in versions prior to TDengine 3.0.5.0 (not inclusive), it is necessary to restart taosd, while in version 3.0.5.0 and later, restarting taosd is not required for the update to take effect.)
#### Example Three
Input (x1, x2, ..., xn), output the sum of each value multiplied by its index: `1 * x1 + 2 * x2 + ... + n * xn`. If x1 to xn contains null, the result is null.
Input (x1, x2, ..., xn), output the sum of each value and its index multiplied: `1 *x1 + 2* x2 + ... + n * xn`. If x1 to xn contain null, the result is null.
The difference from Example One is that this can accept any number of columns as input and needs to process the values of each column. Write the UDF file `/root/udf/nsum.py`.
This example differs from Example One in that it can accept any number of columns as input and needs to process each column's value. Write the UDF file /root/udf/nsum.py.
```python
def init():
@ -637,13 +627,14 @@ Query OK, 4 row(s) in set (0.010653s)
#### Example Four
Write a UDF that takes a timestamp as input and outputs the next Sunday closest to that time. For example, if today is 2023-05-25, the next Sunday would be 2023-05-28. This function will use the third-party library `moment`. First, install this library.
Write a UDF that takes a timestamp as input and outputs the next closest Sunday. For example, if today is 2023-05-25, then the next Sunday is 2023-05-28.
To complete this function, you need to use the third-party library moment. First, install this library.
```shell
pip3 install moment
```
Then, write the UDF file `/root/udf/nextsunday.py`.
Then write the UDF file `/root/udf/nextsunday.py`.
```python
import moment
@ -654,7 +645,6 @@ def init():
def destroy():
pass
def process(block):
rows, cols = block.shape()
if cols > 1:
@ -665,13 +655,13 @@ def process(block):
for i in range(rows)]
```
The UDF framework will map TDengine's timestamp type to Python's int type, so this function only accepts an integer representing milliseconds. The `process` method first performs parameter checks, then uses the `moment` package to replace the weekday of the time with Sunday, and finally formats the output. The output string has a fixed length of 10 characters, so the UDF function can be created as follows.
The UDF framework maps TDengine's timestamp type to Python's int type, so this function only accepts an integer representing milliseconds. The process method first checks the parameters, then uses the moment package to replace the day of the week with Sunday, and finally formats the output. The output string length is fixed at 10 characters long, so you can create the UDF function like this.
```sql
create function nextsunday as '/root/udf/nextsunday.py' outputtype binary(10) language 'Python';
```
At this point, test the function; if you started `taosd` using `systemctl`, you will definitely encounter an error.
At this point, test the function. If you started taosd with systemctl, you will definitely encounter an error.
```sql
taos> select ts, nextsunday(ts) from t;
@ -684,20 +674,20 @@ tail -20 taospyudf.log
2023-05-25 11:42:34.541 ERROR [1679419] [PyUdf::PyUdf@217] py udf load module failure. error ModuleNotFoundError: No module named 'moment'
```
This is because the location of "moment" is not in the default library search path of the Python UDF plugin. How can we confirm this? By searching `taospyudf.log` with the following command.
This is because the location of "moment" is not in the default library search path of the python udf plugin. How to confirm this? Search `taospyudf.log` with the following command.
```shell
grep 'sys path' taospyudf.log | tail -1
```
The output is as follows:
The output is as follows
```text
2023-05-25 10:58:48.554 INFO [1679419] [doPyOpen@592] python sys path: ['', '/lib/python38.zip', '/lib/python3.8', '/lib/python3.8/lib-dynload', '/lib/python3/dist-packages', '/var/lib/taos//.udf']
```
It shows that the default search path for third-party libraries in the Python UDF plugin is: `/lib/python3/dist-packages`, while `moment` is installed by default in `/usr/local/lib/python3.8/dist-packages`. Now, we will modify the default library search path for the Python UDF plugin.
First, open the Python 3 command line and check the current `sys.path`.
It is found that the default third-party library installation path searched by the python udf plugin is: `/lib/python3/dist-packages`, while moment is installed by default in `/usr/local/lib/python3.8/dist-packages`. Next, we modify the default library search path of the python udf plugin.
First, open the python3 command line and check the current sys.path.
```python
>>> import sys
@ -705,13 +695,13 @@ First, open the Python 3 command line and check the current `sys.path`.
'/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages'
```
Copy the output string from the above script, then edit `/var/taos/taos.cfg` to add the following configuration.
Copy the output string from the script above, then edit `/var/taos/taos.cfg` and add the following configuration.
```shell
UdfdLdLibPath /usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages
```
After saving, execute `systemctl restart taosd`, and then test again without errors.
After saving, execute `systemctl restart taosd`, then test again and there will be no errors.
```sql
taos> select ts, nextsunday(ts) from t;
@ -726,9 +716,8 @@ Query OK, 4 row(s) in set (1.011474s)
#### Example Five
Write an aggregate function to calculate the difference between the maximum and minimum values of a certain column.
The difference between aggregate functions and scalar functions is that scalar functions correspond to multiple outputs for multiple rows of input, while aggregate functions correspond to a single output for multiple rows of input. The execution process of aggregate functions is somewhat like the execution process of the classic map-reduce framework, which divides the data into several blocks, with each mapper processing one block, and the reducer aggregating the results from the mappers. The difference is that in TDengine's Python UDF, the `reduce` function has both map and reduce functionalities. The `reduce` function takes two parameters: one is the data to be processed, and the other is the result of the reduce function executed by another task. The following example demonstrates this in `/root/udf/myspread.py`.
Write an aggregate function to calculate the difference between the maximum and minimum values of a column.
The difference between aggregate functions and scalar functions is: scalar functions have multiple outputs corresponding to multiple rows of input, whereas aggregate functions have a single output corresponding to multiple rows of input. The execution process of an aggregate function is somewhat similar to the classic map-reduce framework, where the framework divides the data into several chunks, each mapper handles a chunk, and the reducer aggregates the results of the mappers. The difference is that, in the TDengine Python UDF, the reduce function has both map and reduce capabilities. The reduce function takes two parameters: one is the data it needs to process, and the other is the result of other tasks executing the reduce function. See the following example `/root/udf/myspread.py`.
```python
import io
@ -771,12 +760,12 @@ def finish(buf):
return max_number - min_number
```
In this example, we not only define an aggregate function but also add logging functionality to record execution logs.
In this example, we not only defined an aggregate function but also added the functionality to record execution logs.
1. The `init` function opens a file for logging.
2. The `log` function records logs, automatically converting the passed object to a string and adding a newline character.
2. The `log` function records logs, automatically converting the incoming object into a string and appending a newline.
3. The `destroy` function closes the log file after execution.
4. The `start` function returns the initial buffer for storing intermediate results of the aggregate function, initializing the maximum value to negative infinity and the minimum value to positive infinity.
4. The `start` function returns the initial buffer to store intermediate results of the aggregate function, initializing the maximum value as negative infinity and the minimum value as positive infinity.
5. The `reduce` function processes each data block and aggregates the results.
6. The `finish` function converts the buffer into the final output.
@ -786,17 +775,17 @@ Execute the following SQL statement to create the corresponding UDF.
create or replace aggregate function myspread as '/root/udf/myspread.py' outputtype double bufsize 128 language 'Python';
```
This SQL statement has two important differences from the SQL statement for creating scalar functions.
This SQL statement has two important differences from the SQL statement used to create scalar functions.
1. The `aggregate` keyword has been added.
2. The `bufsize` keyword has been added to specify the memory size for storing intermediate results, in bytes. This value can be greater than the actual size used. In this case, the intermediate result consists of a tuple of two floating-point numbers, and after serialization, it occupies only 32 bytes. However, the specified `bufsize` is 128, which can be printed using Python to show the actual number of bytes used.
1. Added the `aggregate` keyword.
2. Added the `bufsize` keyword, which is used to specify the memory size for storing intermediate results. This value can be larger than the actual usage. In this example, the intermediate result is a tuple consisting of two floating-point arrays, which actually occupies only 32 bytes when serialized, but the specified `bufsize` is 128. You can use the Python command line to print the actual number of bytes used.
```python
>>> len(pickle.dumps((12345.6789, 23456789.9877)))
32
```
Test this function, and you will see that the output of `myspread` is consistent with the output of the built-in `spread` function.
To test this function, you can see that the output of `myspread` is consistent with that of the built-in `spread` function.
```sql
taos> select myspread(v1) from t;
@ -812,10 +801,10 @@ taos> select spread(v1) from t;
Query OK, 1 row(s) in set (0.005501s)
```
Finally, check the execution log, and you will see that the `reduce` function was executed three times, with the `max` value being updated four times and the `min` value being updated once.
Finally, by checking the execution log, you can see that the reduce function was executed 3 times, during which the max value was updated 4 times, and the min value was updated only once.
```shell
root@slave11 /var/log/taos $ cat spread.log
root@server11 /var/log/taos $ cat spread.log
init function myspead success
initial max_number=-inf, min_number=inf
max_number=1
@ -847,7 +836,7 @@ Through this example, we learned how to define aggregate functions and print cus
#### Aggregate Function Example [pyl2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/pyl2norm.py)
`pyl2norm` implements the second-order norm of all data in the input column, which means squaring each data point, summing them, and then taking the square root.
`pyl2norm` calculates the second-order norm of all data in the input column, i.e., squares each data point, then sums them up, and finally takes the square root.
<details>
<summary>pyl2norm.py</summary>
@ -860,7 +849,7 @@ Through this example, we learned how to define aggregate functions and print cus
#### Aggregate Function Example [pycumsum](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/pycumsum.py)
`pycumsum` calculates the cumulative sum of all data in the input column using `numpy`.
`pycumsum` uses numpy to calculate the cumulative sum of all data in the input column.
<details>
<summary>pycumsum.py</summary>
@ -872,11 +861,11 @@ Through this example, we learned how to define aggregate functions and print cus
## Managing UDFs
The process of managing UDFs in the cluster involves creating, using, and maintaining these functions. Users can create and manage UDFs in the cluster via SQL, and once created, all users in the cluster can use these functions in SQL. Since UDFs are stored on the mnode of the cluster, they remain available even after the cluster is restarted.
The process of managing UDFs in a cluster involves creating, using, and maintaining these functions. Users can create and manage UDFs in the cluster through SQL. Once created, all users in the cluster can use these functions in SQL. Since UDFs are stored on the cluster's mnode, they remain available even after the cluster is restarted.
When creating UDFs, it is necessary to distinguish between scalar functions and aggregate functions. Scalar functions accept zero or more input parameters and return a single value. Aggregate functions accept a set of input values and return a single value through some computation (such as summation, counting, etc.). If the wrong function type is declared during creation, an error will occur when calling the function via SQL.
When creating UDFs, it is necessary to distinguish between scalar functions and aggregate functions. Scalar functions accept zero or more input parameters and return a single value. Aggregate functions accept a set of input values and return a single value by performing some calculation (such as summing, counting, etc.) on these values. If the wrong function category is declared during creation, an error will be reported when the function is called through SQL.
Additionally, users need to ensure that the input data types match the UDF program, and that the UDF output data types match the `outputtype`. This means that when creating a UDF, the correct data types must be specified for both input parameters and output values. This helps ensure that when calling the UDF, the input data can be correctly passed to the UDF, and that the UDF's output value matches the expected data type.
Additionally, users need to ensure that the input data type matches the UDF program, and the output data type of the UDF matches the `outputtype`. This means that when creating a UDF, you need to specify the correct data types for input parameters and output values. This helps ensure that when the UDF is called, the input data is correctly passed to the UDF, and the output values match the expected data types.
### Creating Scalar Functions
@ -886,13 +875,13 @@ The SQL syntax for creating scalar functions is as follows.
CREATE [OR REPLACE] FUNCTION function_name AS library_path OUTPUTTYPE output_type LANGUAGE 'Python';
```
Parameter descriptions are as follows:
The parameters are explained as follows.
- `or replace`: If the function already exists, it will modify the existing function properties.
- `function_name`: The name of the scalar function when called in SQL.
- `language`: Supports C language and Python language (version 3.7 and above), defaulting to C.
- `library_path`: If the programming language is C, the path is the absolute path to the dynamic link library containing the UDF implementation, usually pointing to a `.so` file. If the programming language is Python, the path is the file path containing the UDF implementation in Python. The path needs to be enclosed in single or double quotes.
- `output_type`: The data type name of the function's computation result.
- or replace: If the function already exists, it modifies the existing function properties.
- function_name: The name of the scalar function when called in SQL.
- language: Supports C and Python languages (version 3.7 and above), default is C.
- library_path: If the programming language is C, the path is the absolute path to the library file containing the UDF implementation dynamic link library, usually pointing to a .so file. If the programming language is Python, the path is the path to the Python file containing the UDF implementation. The path needs to be enclosed in single or double quotes in English.
- output_type: The data type name of the function computation result.
### Creating Aggregate Functions
@ -902,7 +891,7 @@ The SQL syntax for creating aggregate functions is as follows.
CREATE [OR REPLACE] AGGREGATE FUNCTION function_name library_path OUTPUTTYPE output_type BUFSIZE buffer_size LANGUAGE 'Python';
```
Where `buffer_size` indicates the buffer size for intermediate calculation results, measured in bytes. The meanings of other parameters are the same as for scalar functions.
Here, `buffer_size` represents the size of the buffer for intermediate calculation results, in bytes. The meanings of other parameters are the same as those for scalar functions.
The following SQL creates a UDF named `l2norm`.
@ -912,7 +901,7 @@ CREATE AGGREGATE FUNCTION l2norm AS "/home/taos/udf_example/libl2norm.so" OUTPUT
### Deleting UDFs
The SQL syntax for deleting a UDF with the specified name is as follows.
The SQL syntax for deleting a UDF with a specified name is as follows.
```sql
DROP FUNCTION function_name;
@ -928,7 +917,7 @@ show functions;
### Viewing Function Information
The version number of the UDF increases by 1 each time it is updated.
Each update of a UDF with the same name increases the version number by 1.
```sql
select * from ins_functions \G;

View File

@ -8,102 +8,101 @@ import TabItem from "@theme/TabItem";
import Image from '@theme/IdealImage';
import imgThread from '../assets/ingesting-data-efficiently-01.png';
This section introduces how to efficiently write data to TDengine.
This section describes how to write data to TDengine efficiently.
## Principles of Efficient Writing {#principle}
### From the Perspective of the Client Program {#application-view}
### From the Client Application's Perspective {#application-view}
From the perspective of the client program, efficient data writing should consider the following factors:
From the perspective of the client application, efficient data writing should consider the following factors:
1. The amount of data written at a time. Generally, the larger the batch size, the more efficient the writing (though the advantage may diminish beyond a certain threshold). When using SQL to write to TDengine, try to concatenate more data into a single SQL statement. Currently, the maximum length of a single SQL statement supported by TDengine is 1,048,576 (1MB) characters.
2. The number of concurrent connections. Generally, the more concurrent connections writing data simultaneously, the more efficient the writing (though performance may decline beyond a certain threshold, depending on server capacity).
3. The distribution of data across different tables (or subtables), i.e., the proximity of the data being written. Generally, writing data only to the same table (or subtable) in each batch is more efficient than writing to multiple tables (or subtables).
4. The method of writing. In general:
- Parameter binding is more efficient than SQL writing because it avoids SQL parsing (though it increases the number of C interface calls, which can incur performance overhead).
- SQL writing without automatic table creation is more efficient than with automatic table creation because the latter frequently checks for table existence.
- SQL writing is more efficient than schema-less writing, as the latter automatically creates tables and supports dynamic changes to table structures.
1. The amount of data written at once. Generally, the larger the batch of data written at once, the more efficient it is (but the advantage disappears beyond a certain threshold). When writing to TDengine using SQL, try to concatenate more data in one SQL statement. Currently, the maximum length of a single SQL statement supported by TDengine is 1,048,576 (1MB) characters.
2. Number of concurrent connections. Generally, the more concurrent connections writing data at the same time, the more efficient it is (but efficiency may decrease beyond a certain threshold, depending on the server's processing capacity).
3. Distribution of data across different tables (or subtables), i.e., the adjacency of the data being written. Generally, writing data to the same table (or subtable) in each batch is more efficient than writing to multiple tables (or subtables).
4. Method of writing. Generally:
- Binding parameters is more efficient than writing SQL. Parameter binding avoids SQL parsing (but increases the number of calls to the C interface, which also has a performance cost).
- Writing SQL without automatic table creation is more efficient than with automatic table creation because it frequently checks whether the table exists.
- Writing SQL is more efficient than schema-less writing because schema-less writing automatically creates tables and supports dynamic changes to the table structure.
The client program should fully and appropriately utilize these factors. In each writing operation, data should ideally only be written to the same table (or subtable), and the amount of data written per batch should be set to a value that is optimal for the current system's processing capacity based on testing and tuning. The number of concurrent write connections should also be set to an optimal value for the current system's processing capacity to achieve the best writing speed in the existing system.
Client applications should fully and appropriately utilize these factors. In a single write operation, try to write data only to the same table (or subtable), set the batch size after testing and tuning to a value that best suits the current system's processing capacity, and similarly set the number of concurrent writing connections after testing and tuning to achieve the best writing speed in the current system.
### From the Perspective of the Data Source {#datasource-view}
### From the Data Source's Perspective {#datasource-view}
Client programs typically need to read data from a data source and then write it to TDengine. From the data source perspective, the following scenarios necessitate a queue between the read and write threads:
Client applications usually need to read data from a data source before writing it to TDengine. From the data source's perspective, the following situations require adding a queue between the reading and writing threads:
1. Multiple data sources generate data at a rate that is significantly lower than the single-threaded write speed, but the overall data volume is considerable. In this case, the queue's role is to aggregate data from multiple sources to increase the amount of data written in a single operation.
2. A single data source generates data at a rate significantly higher than the single-threaded write speed. Here, the queue's role is to increase the write concurrency.
3. Data for a single table is scattered across multiple data sources. In this case, the queue's role is to aggregate data for the same table in advance, enhancing the proximity of data during writing.
1. There are multiple data sources, and the data generation speed of a single data source is much lower than the writing speed of a single thread, but the overall data volume is relatively large. In this case, the role of the queue is to aggregate data from multiple sources to increase the amount of data written at once.
2. The data generation speed of a single data source is much greater than the writing speed of a single thread. In this case, the role of the queue is to increase the concurrency of writing.
3. Data for a single table is scattered across multiple data sources. In this case, the role of the queue is to aggregate the data for the same table in advance, improving the adjacency of the data during writing.
If the data source for the writing application is Kafka, and the writing application itself is a Kafka consumer, the characteristics of Kafka can be leveraged for efficient writing. For example:
If the data source for the writing application is Kafka, and the writing application itself is a Kafka consumer, then Kafka's features can be utilized for efficient writing. For example:
1. Write data for the same table to the same Topic's same Partition, increasing data proximity.
1. Write data from the same table to the same Topic and the same Partition to increase data adjacency.
2. Aggregate data by subscribing to multiple Topics.
3. Increase write concurrency by adding more Consumer threads.
4. Increase the maximum data amount fetched each time to raise the maximum amount written in a single operation.
3. Increase the concurrency of writing by increasing the number of Consumer threads.
4. Increase the maximum amount of data fetched each time to increase the maximum amount of data written at once.
### From the Perspective of Server Configuration {#setting-view}
### From the Server Configuration's Perspective {#setting-view}
From the perspective of server configuration, it's important to set an appropriate number of vgroups when creating the database based on the number of disks in the system, their I/O capabilities, and processor capacity to fully utilize system performance. If there are too few vgroups, the system performance cannot be fully realized; if there are too many vgroups, unnecessary resource contention may occur. A general recommendation is to set the number of vgroups to twice the number of CPU cores, but tuning should still be based on the specific system resource configuration.
From the server configuration's perspective, the number of vgroups should be set appropriately when creating the database based on the number of disks in the system, the I/O capability of the disks, and the processor's capacity to fully utilize system performance. If there are too few vgroups, the system's performance cannot be maximized; if there are too many vgroups, it will cause unnecessary resource competition. The recommended number of vgroups is typically twice the number of CPU cores, but this should still be adjusted based on the specific system resource configuration.
For more tuning parameters, refer to [Manage Databases](../../tdengine-reference/sql-manual/manage-databases/) and [taosd Reference](../../tdengine-reference/components/taosd/).
For more tuning parameters, please refer to [Database Management](../../tdengine-reference/sql-manual/manage-databases/) and [Server Configuration](../../tdengine-reference/components/taosd/).
## Examples of Efficient Writing {#sample-code}
## Efficient Writing Example {#sample-code}
### Scenario Design {#scenario}
The following example program demonstrates how to efficiently write data, with the scenario designed as follows:
The following example program demonstrates how to write data efficiently, with the scenario designed as follows:
- The TDengine client program continuously reads data from other data sources, simulated in this example by generating mock data.
- A single connection cannot match the reading speed, so the client program starts multiple threads, each establishing a connection to TDengine with a dedicated fixed-size message queue.
- The client program hashes received data based on the associated table name (or subtable name) to determine the corresponding Queue index, ensuring that data belonging to a specific table (or subtable) is processed by a designated thread.
- Each sub-thread writes the data from its associated message queue to TDengine after emptying the queue or reaching a predefined data volume threshold, and continues processing the subsequently received data.
- The TDengine client application continuously reads data from other data sources. In the example program, simulated data generation is used to mimic reading from data sources.
- The speed of a single connection writing to TDengine cannot match the speed of reading data, so the client application starts multiple threads, each establishing a connection with TDengine, and each thread has a dedicated fixed-size message queue.
- The client application hashes the received data according to the table name (or subtable name) to different threads, i.e., writing to the message queue corresponding to that thread, ensuring that data belonging to a certain table (or subtable) will always be processed by a fixed thread.
- Each sub-thread empties the data in its associated message queue or reaches a predetermined threshold of data volume, writes that batch of data to TDengine, and continues to process the data received afterwards.
<figure>
<Image img={imgThread} alt="Thread model for efficient writing example"/>
<figcaption>Figure 1. Thread model for efficient writing example</figcaption>
</figure>
### Example Code {#code}
### Sample Code {#code}
This part provides example code for the above scenario. The principles of efficient writing are the same for other scenarios, but the code needs to be modified accordingly.
This section provides sample code for the above scenario. The principle of efficient writing is the same for other scenarios, but the code needs to be modified accordingly.
This example code assumes that the source data belongs to different subtables of the same supertable (`meters`). The program creates this supertable in the `test` database before writing data. For subtables, they will be automatically created by the application based on the received data. If the actual scenario involves multiple supertables, simply modify the code for automatic table creation in the write task.
This sample code assumes that the source data belongs to different subtables of the same supertable (meters). The program has already created this supertable in the test database before starting to write data. For subtables, they will be automatically created by the application according to the received data. If the actual scenario involves multiple supertables, only the code for automatic table creation in the write task needs to be modified.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
**Program Listing**
| Class Name | Function Description |
| ---------------------- | --------------------------------------------------------------------------------- |
| FastWriteExample | Main program |
| ReadTask | Reads data from the simulated source, hashes the table name to obtain the Queue index, and writes to the corresponding Queue |
| WriteTask | Retrieves data from the Queue, composes a Batch, and writes to TDengine |
| MockDataSource | Simulates the generation of data for various meters subtables |
| SQLWriter | Depends on this class for SQL concatenation, automatic table creation, SQL writing, and SQL length checking |
| StmtWriter | Implements batch writing via parameter binding (not yet completed) |
| DataBaseMonitor | Monitors write speed and prints the current write speed to the console every 10 seconds |
| Class Name | Function Description |
| ----------------- | -------------------------------------------------------------------------------- |
| FastWriteExample | Main program |
| ReadTask | Reads data from a simulated source, hashes the table name to get the Queue Index, writes to the corresponding Queue |
| WriteTask | Retrieves data from the Queue, forms a Batch, writes to TDengine |
| MockDataSource | Simulates generating data for a certain number of meters subtables |
| SQLWriter | WriteTask relies on this class to complete SQL stitching, automatic table creation, SQL writing, and SQL length checking |
| StmtWriter | Implements parameter binding for batch writing (not yet completed) |
| DataBaseMonitor | Counts the writing speed and prints the current writing speed to the console every 10 seconds |
Below is the complete code for each class and a more detailed function description.
Below are the complete codes and more detailed function descriptions for each class.
<details>
<summary>FastWriteExample</summary>
The main program is responsible for:
1. Creating message queues
2. Starting write threads
3. Starting read threads
4. Monitoring write speed every 10 seconds
4. Counting the writing speed every 10 seconds
The main program exposes 4 parameters by default, which can be adjusted during each program start for testing and tuning:
The main program exposes 4 parameters by default, which can be adjusted each time the program is started, for testing and tuning:
1. The number of read threads. Default is 1.
2. The number of write threads. Default is 3.
3. The total number of simulated tables. Default is 1,000. This will be evenly distributed among the read threads. If the total number of tables is large, table creation will take longer, and the initial monitoring of write speed may be slower.
4. The maximum number of records to write in a single batch. Default is 3,000.
1. Number of read threads. Default is 1.
2. Number of write threads. Default is 3.
3. Total number of simulated tables. Default is 1,000. This will be evenly divided among the read threads. If the total number of tables is large, table creation will take longer, and the initial writing speed statistics may be slow.
4. Maximum number of records written per batch. Default is 3,000.
The queue capacity (`taskQueueCapacity`) is also a performance-related parameter that can be adjusted by modifying the program. Generally speaking, the larger the queue capacity, the lower the probability of being blocked during enqueuing, and the greater the throughput of the queue, although memory usage will also increase. The default value in the example program has been set sufficiently high.
Queue capacity (taskQueueCapacity) is also a performance-related parameter, which can be adjusted by modifying the program. Generally speaking, the larger the queue capacity, the less likely it is to be blocked when enqueuing, the greater the throughput of the queue, but the larger the memory usage. The default value of the sample program is already set large enough.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/FastWriteExample.java}}
@ -114,9 +113,9 @@ The queue capacity (`taskQueueCapacity`) is also a performance-related parameter
<details>
<summary>ReadTask</summary>
The read task is responsible for reading data from the data source. Each read task is associated with a simulated data source. Each simulated data source generates a limited amount of data for a table. Different simulated data sources generate data for different tables.
The read task is responsible for reading data from the data source. Each read task is associated with a simulated data source. Each simulated data source can generate data for a certain number of tables. Different simulated data sources generate data for different tables.
The read task uses a blocking method to write to the message queue. This means that once the queue is full, the write operation will block.
The read task writes to the message queue in a blocking manner. That is, once the queue is full, the write operation will be blocked.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/ReadTask.java}}
@ -134,6 +133,7 @@ The read task uses a blocking method to write to the message queue. This means t
</details>
<details>
<summary>MockDataSource</summary>
```java
@ -143,9 +143,10 @@ The read task uses a blocking method to write to the message queue. This means t
</details>
<details>
<summary>SQLWriter</summary>
The `SQLWriter` class encapsulates the logic for SQL concatenation and data writing. Note that none of the tables are created in advance; they are batch-created using the supertable as a template when a table-not-found exception occurs, and the INSERT statement is then re-executed. For other exceptions, the SQL statement executed at that time is simply logged, and you can log more clues for error diagnosis and fault recovery.
The SQLWriter class encapsulates the logic of SQL stitching and data writing. Note that none of the tables are created in advance; instead, they are created in batches using the supertable as a template when a table not found exception is caught, and then the INSERT statement is re-executed. For other exceptions, this simply logs the SQL statement being executed at the time; you can also log more clues to facilitate error troubleshooting and fault recovery.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/SQLWriter.java}}
@ -154,6 +155,7 @@ The `SQLWriter` class encapsulates the logic for SQL concatenation and data writ
</details>
<details>
<summary>DataBaseMonitor</summary>
```java
@ -165,15 +167,15 @@ The `SQLWriter` class encapsulates the logic for SQL concatenation and data writ
**Execution Steps**
<details>
<summary>Running the Java Example Program</summary>
<summary>Execute the Java Example Program</summary>
Before running the program, configure the environment variable `TDENGINE_JDBC_URL`. If the TDengine Server is deployed locally and the username, password, and port are the default values, configure it as follows:
Before running the program, configure the environment variable `TDENGINE_JDBC_URL`. If the TDengine Server is deployed on the local machine, and the username, password, and port are all default values, then you can configure:
```shell
TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
**Running the Example Program in a Local IDE**
**Execute the example program in a local integrated development environment**
1. Clone the TDengine repository
@ -181,59 +183,59 @@ TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
git clone git@github.com:taosdata/TDengine.git --depth 1
```
2. Open the `docs/examples/java` directory in the IDE.
3. Configure the environment variable `TDENGINE_JDBC_URL` in the development environment. If you have already set a global environment variable for `TDENGINE_JDBC_URL`, you can skip this step.
2. Open the `docs/examples/java` directory with the integrated development environment.
3. Configure the environment variable `TDENGINE_JDBC_URL` in the development environment. If the global environment variable `TDENGINE_JDBC_URL` has already been configured, you can skip this step.
4. Run the class `com.taos.example.highvolume.FastWriteExample`.
**Running the Example Program on a Remote Server**
**Execute the example program on a remote server**
To run the example program on a server, follow these steps:
To execute the example program on a server, follow these steps:
1. Package the example code. Execute the following in the directory `TDengine/docs/examples/java`:
1. Package the example code. Execute in the directory TDengine/docs/examples/java:
```shell
mvn package
```
2. Create an `examples` directory on the remote server:
2. Create an examples directory on the remote server:
```shell
mkdir -p examples/java
```
3. Copy the dependencies to the specified directory on the server:
- Copy the dependency packages (only do this once)
3. Copy dependencies to the specified directory on the server:
- Copy dependency packages, only once
```shell
scp -r ./target/lib <user>@<host>:~/examples/java
scp -r .\target\lib <user>@<host>:~/examples/java
```
- Copy the jar file for this program (need to copy each time the code is updated)
- Copy the jar package of this program, copy every time the code is updated
```shell
scp -r ./target/javaexample-1.0.jar <user>@<host>:~/examples/java
scp -r .\target\javaexample-1.0.jar <user>@<host>:~/examples/java
```
4. Configure the environment variable.
Edit `~/.bash_profile` or `~/.bashrc` to add the following content, for example:
Edit `~/.bash_profile` or `~/.bashrc` and add the following content for example:
```shell
export TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
The above uses the default JDBC URL for a locally deployed TDengine Server. You need to modify it according to your actual situation.
The above uses the default JDBC URL when TDengine Server is deployed locally. You need to modify it according to your actual situation.
5. Use the Java command to start the example program with the command template:
5. Start the example program with the Java command, command template:
```shell
java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample <read_thread_count> <write_thread_count> <total_table_count> <max_batch_size>
java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample <read_thread_count> <white_thread_count> <total_table_count> <max_batch_size>
```
6. Terminate the test program. The test program will not automatically end. After achieving a stable write speed under the current configuration, press <kbd>CTRL</kbd> + <kbd>C</kbd> to terminate the program.
Below is an actual run log output, with a machine configuration of 16 cores + 64G + SSD.
6. End the test program. The test program will not end automatically; after obtaining a stable writing speed under the current configuration, press <kbd>CTRL</kbd> + <kbd>C</kbd> to end the program.
Below is a log output from an actual run, with machine configuration 16 cores + 64G + SSD.
```shell
root@vm85$ java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample 2 12
```text
root@vm85$ java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample 2 12
18:56:35.896 [main] INFO c.t.e.highvolume.FastWriteExample - readTaskCount=2, writeTaskCount=12 tableCount=1000 maxBatchSize=3000
18:56:36.011 [WriteThread-0] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.015 [WriteThread-0] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
@ -270,13 +272,177 @@ To run the example program on a server, follow these steps:
18:57:49.375 [main] INFO c.t.e.highvolume.FastWriteExample - count=142952417 speed=2114521
18:58:00.689 [main] INFO c.t.e.highvolume.FastWriteExample - count=163650306 speed=2069788
18:58:11.646 [main] INFO c.t.e.highvolume.FastWriteExample - count=185019808 speed=2136950
```
</details>
</TabItem>
<TabItem label="Python" value="python">
**Program Listing**
The Python example program uses a multi-process architecture and employs a cross-process message queue.
| Function or Class | Description |
| ------------------------ | ------------------------------------------------------------------- |
| main function | Entry point of the program, creates various subprocesses and message queues |
| run_monitor_process function | Creates database, supertables, tracks write speed and periodically prints to console |
| run_read_task function | Main logic for read processes, responsible for reading data from other data systems and distributing it to assigned queues |
| MockDataSource class | Simulates a data source, implements iterator interface, returns the next 1,000 records for each table in batches |
| run_write_task function | Main logic for write processes. Retrieves as much data as possible from the queue and writes in batches |
| SQLWriter class | Handles SQL writing and automatic table creation |
| StmtWriter class | Implements batch writing with parameter binding (not yet completed) |
<details>
<summary>main function</summary>
The main function is responsible for creating message queues and launching subprocesses, which are of 3 types:
1. 1 monitoring process, responsible for database initialization and tracking write speed
2. n read processes, responsible for reading data from other data systems
3. m write processes, responsible for writing to the database
The main function can accept 5 startup parameters, in order:
1. Number of read tasks (processes), default is 1
2. Number of write tasks (processes), default is 1
3. Total number of simulated tables, default is 1,000
4. Queue size (in bytes), default is 1,000,000
5. Maximum number of records written per batch, default is 3,000
```python
{{#include docs/examples/python/fast_write_example.py:main}}
```
</details>
<details>
<summary>run_monitor_process</summary>
The monitoring process is responsible for initializing the database and monitoring the current write speed.
```python
{{#include docs/examples/python/fast_write_example.py:monitor}}
```
</details>
<details>
<summary>run_read_task function</summary>
The read process, responsible for reading data from other data systems and distributing it to assigned queues.
```python
{{#include docs/examples/python/fast_write_example.py:read}}
```
</details>
<details>
<summary>MockDataSource</summary>
Below is the implementation of the mock data source. We assume that each piece of data generated by the data source includes the target table name information. In practice, you might need certain rules to determine the target table name.
```python
{{#include docs/examples/python/mockdatasource.py}}
```
</details>
<details>
<summary>run_write_task function</summary>
The write process retrieves as much data as possible from the queue and writes in batches.
```python
{{#include docs/examples/python/fast_write_example.py:write}}
```
</details>
<details>
The SQLWriter class encapsulates the logic of SQL stitching and data writing. None of the tables are pre-created; instead, they are batch-created using the supertable as a template when a table does not exist error occurs, and then the INSERT statement is re-executed. For other errors, the SQL executed at the time is recorded for error troubleshooting and fault recovery. This class also checks whether the SQL exceeds the maximum length limit, based on the TDengine 3.0 limit, the supported maximum SQL length of 1,048,576 is passed in by the input parameter maxSQLLength.
<summary>SQLWriter</summary>
```python
{{#include docs/examples/python/sql_writer.py}}
```
</details>
**Execution Steps**
<details>
<summary>Execute the Python Example Program</summary>
1. Prerequisites
- TDengine client driver installed
- Python3 installed, recommended version >= 3.8
- taospy installed
2. Install faster-fifo to replace the built-in multiprocessing.Queue in python
```shell
pip3 install faster-fifo
```
3. Click the "View Source" link above to copy the `fast_write_example.py`, `sql_writer.py`, and `mockdatasource.py` files.
4. Execute the example program
```shell
python3 fast_write_example.py <READ_TASK_COUNT> <WRITE_TASK_COUNT> <TABLE_COUNT> <QUEUE_SIZE> <MAX_BATCH_SIZE>
```
Below is an actual output from a run, on a machine configured with 16 cores + 64G + SSD.
```text
root@vm85$ python3 fast_write_example.py 8 8
2022-07-14 19:13:45,869 [root] - READ_TASK_COUNT=8, WRITE_TASK_COUNT=8, TABLE_COUNT=1000, QUEUE_SIZE=1000000, MAX_BATCH_SIZE=3000
2022-07-14 19:13:48,882 [root] - WriteTask-0 started with pid 718347
2022-07-14 19:13:48,883 [root] - WriteTask-1 started with pid 718348
2022-07-14 19:13:48,884 [root] - WriteTask-2 started with pid 718349
2022-07-14 19:13:48,884 [root] - WriteTask-3 started with pid 718350
2022-07-14 19:13:48,885 [root] - WriteTask-4 started with pid 718351
2022-07-14 19:13:48,885 [root] - WriteTask-5 started with pid 718352
2022-07-14 19:13:48,886 [root] - WriteTask-6 started with pid 718353
2022-07-14 19:13:48,886 [root] - WriteTask-7 started with pid 718354
2022-07-14 19:13:48,887 [root] - ReadTask-0 started with pid 718355
2022-07-14 19:13:48,888 [root] - ReadTask-1 started with pid 718356
2022-07-14 19:13:48,889 [root] - ReadTask-2 started with pid 718357
2022-07-14 19:13:48,889 [root] - ReadTask-3 started with pid 718358
2022-07-14 19:13:48,890 [root] - ReadTask-4 started with pid 718359
2022-07-14 19:13:48,891 [root] - ReadTask-5 started with pid 718361
2022-07-14 19:13:48,892 [root] - ReadTask-6 started with pid 718364
2022-07-14 19:13:48,893 [root] - ReadTask-7 started with pid 718365
2022-07-14 19:13:56,042 [DataBaseMonitor] - count=6676310 speed=667631.0
2022-07-14 19:14:06,196 [DataBaseMonitor] - count=20004310 speed=1332800.0
2022-07-14 19:14:16,366 [DataBaseMonitor] - count=32290310 speed=1228600.0
2022-07-14 19:14:26,527 [DataBaseMonitor] - count=44438310 speed=1214800.0
2022-07-14 19:14:36,673 [DataBaseMonitor] - count=56608310 speed=1217000.0
2022-07-14 19:14:46,834 [DataBaseMonitor] - count=68757310 speed=1214900.0
2022-07-14 19:14:57,280 [DataBaseMonitor] - count=80992310 speed=1223500.0
2022-07-14 19:15:07,689 [DataBaseMonitor] - count=93805310 speed=1281300.0
2022-07-14 19:15:18,020 [DataBaseMonitor] - count=106111310 speed=1230600.0
2022-07-14 19:15:28,356 [DataBaseMonitor] - count=118394310 speed=1228300.0
2022-07-14 19:15:38,690 [DataBaseMonitor] - count=130742310 speed=1234800.0
2022-07-14 19:15:49,000 [DataBaseMonitor] - count=143051310 speed=1230900.0
2022-07-14 19:15:59,323 [DataBaseMonitor] - count=155276310 speed=1222500.0
2022-07-14 19:16:09,649 [DataBaseMonitor] - count=167603310 speed=1232700.0
2022-07-14 19:16:19,995 [DataBaseMonitor] - count=179976310 speed=1237300.0
```
</details>
:::note
When using Python connectors for multi-process connections to TDengine, there is a limitation: connections cannot be established in the parent process; all connections must be created in the child processes. If a connection is established in the parent process, and then connections are created in the child processes, it will cause a blockage. This is a known issue.
When using the Python connector to connect to TDengine with multiple processes, there is a limitation: connections cannot be established in the parent process; all connections must be created in the child processes.
If a connection is created in the parent process, any connection attempts in the child processes will be perpetually blocked. This is a known issue.
:::

View File

@ -5,4 +5,3 @@
```csharp title="WebSocket Connection"
{{#include docs/examples/csharp/wsConnect/Program.cs}}
```

View File

@ -1,4 +1,4 @@
#### Accessing Unified Interface via Database
#### Using a Unified Interface for Database Access
```go title="Native Connection"
{{#include docs/examples/go/connect/cgoexample/main.go}}
@ -8,10 +8,10 @@
{{#include docs/examples/go/connect/restexample/main.go}}
```
#### Using High-level Encapsulation
#### Using Advanced Wrappers
You can also establish a connection using the af package of driver-go. This module encapsulates TDengine's advanced features, such as parameter binding, subscription, etc.
You can also use the `af` package from driver-go to establish connections. This module encapsulates advanced features of TDengine, such as parameter binding, subscription, etc.
```go title="Establishing Native Connection with af Package"
```go title="Establishing Native Connection Using af Package"
{{#include docs/examples/go/connect/afconn/main.go}}
```

View File

@ -6,10 +6,10 @@
{{#include docs/examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}}
```
When using REST connection, if the query data volume is large, you can also enable the batch fetching feature.
When using REST connections, if the amount of data queried is large, you can also enable batch fetching.
```java title="Enable Batch Fetching Feature" {4}
```java title="Enable Batch Fetching" {4}
{{#include docs/examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}}
```
For more connection parameter configurations, refer to [Java Connector](../../connector/java).
For more connection parameter configurations, refer to [Java Connector](../../connector/java)

View File

@ -3,5 +3,6 @@
```
:::note
For the Rust connector, the difference in connection methods only reflects the different features used. If the "ws" feature is enabled, only the WebSocket implementation will be compiled in.
For the Rust connector, the difference in connection methods is only reflected in the features used. If the "ws" feature is enabled, only the WebSocket implementation will be compiled.
:::

View File

@ -1,32 +1,23 @@
---
title: Developer's Guide
description: A guide to help developers quickly get started.
slug: /developer-guide
---
When developing an application and planning to use TDengine as a tool for time-series data processing, here are several steps to follow:
To develop an application, if you plan to use TDengine as a tool for time-series data processing, there are several things to do:
1. **Determine the Connection Method to TDengine**: Regardless of the programming language you use, you can always connect via the REST interface. However, you can also use the dedicated connectors available for each programming language for a more convenient connection.
1. Determine the connection method to TDengine. No matter what programming language you use, you can always use the REST interface, but you can also use connectors unique to each programming language for convenient connections.
2. Based on your application scenario, determine the data model. Depending on the characteristics of the data, decide whether to create one or multiple databases; distinguish between static tags and collected metrics, establish the correct supertables, and create subtables.
3. Decide on the method of inserting data. TDengine supports data insertion using standard SQL, but also supports Schemaless mode insertion, which allows data to be written directly without manually creating tables.
4. Based on business requirements, determine which SQL queries need to be written.
5. If you want to perform lightweight real-time statistical analysis based on time-series data, including various monitoring dashboards, it is recommended to use the streaming computing capabilities of TDengine 3.0, instead of deploying complex streaming computing systems like Spark or Flink.
6. If your application has modules that need to consume inserted data and you want to be notified when new data is inserted, it is recommended to use the data subscription feature provided by TDengine, without the need to deploy Kafka or other messaging queue software.
7. In many scenarios (such as vehicle management), applications need to obtain the latest status of each data collection point, so it is recommended to use TDengine's Cache feature, instead of deploying separate caching software like Redis.
8. If you find that TDengine's functions do not meet your requirements, you can use User Defined Functions (UDF) to solve the problem.
2. **Define the Data Model Based on Your Application Scenario**: Depending on the characteristics of your data, decide whether to create one or multiple databases; differentiate between static tags and collected data, establish the correct supertable, and create subtables.
This section is organized in the order mentioned above. For ease of understanding, TDengine provides example code for each feature and each supported programming language, located at [Example Code](https://github.com/taosdata/TDengine/tree/main/docs/examples). All example codes are guaranteed to be correct by CI, scripts located at [Example Code CI](https://github.com/taosdata/TDengine/tree/main/tests/docs-examples-test).
If you want to learn more about using SQL, check out the [SQL Manual](../tdengine-reference/sql-manual/). If you want to learn more about using various connectors, read the [Connector Reference Guide](../tdengine-reference/client-libraries/). If you also want to integrate TDengine with third-party systems, such as Grafana, please refer to [Third-Party Tools](../third-party-tools/).
3. **Decide on the Data Insertion Method**: TDengine supports standard SQL for data writing, but it also supports Schemaless mode, allowing you to write data directly without manually creating tables.
4. **Determine the SQL Queries Needed**: Based on business requirements, identify which SQL query statements you need to write.
5. **For Lightweight Real-time Statistical Analysis**: If you plan to perform lightweight real-time statistical analysis on time-series data, including various monitoring dashboards, it is recommended to utilize TDengine 3.0s stream computing capabilities without deploying complex stream processing systems like Spark or Flink.
6. **For Applications Needing Data Consumption Notifications**: If your application has modules that need to consume inserted data and require notifications for new data inserts, it is recommended to use the data subscription feature provided by TDengine, rather than deploying Kafka or other message queue software.
7. **Utilize TDengines Cache Feature**: In many scenarios (e.g., vehicle management), if your application needs to retrieve the latest status of each data collection point, it is advisable to use TDengines Cache feature instead of deploying separate caching software like Redis.
8. **Use User-Defined Functions (UDFs) if Necessary**: If you find that TDengine's functions do not meet your requirements, you can create user-defined functions (UDFs) to solve your problems.
This section is organized according to the above sequence. For better understanding, TDengine provides example code for each feature and supported programming language, located in the [Example Code](https://github.com/taosdata/TDengine/tree/main/docs/examples). All example codes are verified for correctness through CI, with the scripts located at [Example Code CI](https://github.com/taosdata/TDengine/tree/main/tests/docs-examples-test).
If you wish to delve deeper into SQL usage, refer to the [SQL Manual](../tdengine-reference/sql-manual/). For more information on the use of each connector, please read the [Connector Reference Guide](../tdengine-reference/client-libraries/). If you want to integrate TDengine with third-party systems, such as Grafana, please refer to [Third-party Tools](../third-party-tools/).
If you encounter any issues during development, please click on the "Feedback Issues" link at the bottom of each page to submit an issue directly on GitHub: [Feedback Issues](https://github.com/taosdata/TDengine/issues/new/choose).
If you encounter any problems during the development process, please click ["Report Issue"](https://github.com/taosdata/TDengine/issues/new/choose) at the bottom of each page to submit an Issue directly on GitHub.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -6,15 +6,7 @@ slug: /operations-and-maintenance/tdengine-components
import Image from '@theme/IdealImage';
import imgEcosys from '../assets/tdengine-components-01.png';
In the TDengine installation package, in addition to the TDengine database engine (taosd), several additional components are provided to facilitate user usage. The components include:
- **taosAdapter**: Acts as a bridge between applications and TDengine.
- **taosKeeper**: A tool for exporting TDengine monitoring metrics.
- **taosX**: A data pipeline tool.
- **taosExplorer**: A visual management tool.
- **taosc**: The TDengine client driver.
The following diagram shows the topological architecture of the entire TDengine product ecosystem (components taosX and taosX Agent are only available in TDengine Enterprise).
In the TDengine installation package, in addition to the TDengine database engine taosd, several additional components are provided to facilitate user experience. taosAdapter serves as a bridge between applications and TDengine; taosKeeper is a tool for exporting TDengine monitoring metrics; taosX is a data pipeline tool; taosExplorer is a graphical management tool; taosc is the TDengine client driver. The diagram below shows the topology of the entire TDengine product ecosystem (components taosX, taosX Agent are only available in TDengine Enterprise).
<figure>
<Image img={imgEcosys} alt="TDengine ecosystem"/>
@ -23,74 +15,81 @@ The following diagram shows the topological architecture of the entire TDengine
## taosd
In TDengine, **taosd** is a critical daemon process and the core service process. It is responsible for handling all data-related operations, including data writing, querying, and management. On Linux operating systems, users can conveniently start and stop the taosd process using the systemd command. To view all command-line parameters for taosd, users can execute `taosd -h`.
In TDengine, taosd is a key daemon and also the core service process. It handles all data-related operations, including data writing, querying, and management. On Linux operating systems, users can conveniently start and stop the taosd process using systemd commands. To view all command-line arguments of taosd, users can execute the taosd -h command.
By default, the taosd process logs are stored in the `/var/log/taos/` directory for easy access and management.
The logs of the taosd process are stored by default in the /var/log/taos/ directory, facilitating log viewing and management.
TDengine uses a vnode mechanism to partition stored data, with each vnode containing a certain number of data collection points. To provide high availability, TDengine employs a multi-replica method to ensure data reliability and persistence. Vnodes on different nodes can form a vgroup for real-time data synchronization. This design not only improves data availability and fault tolerance but also helps achieve load balancing and efficient data processing.
TDengine uses a vnode mechanism to segment stored data, with each vnode containing a certain number of data collection points. To provide high availability services, TDengine uses a multi-replica approach to ensure data reliability and persistence. Vnodes on different nodes can form a vgroup to achieve real-time data synchronization. This design not only improves data availability and fault tolerance but also helps achieve load balancing and efficient data processing.
## taosc
**taosc** is the client program for TDengine, providing developers with a set of functions and interfaces to write applications, connect to TDengine, and execute various SQL commands. Since taosc is written in C, it can be easily integrated with C/C++ applications.
taosc is TDengine's client program, providing developers with a set of functions and interfaces to write applications and connect to TDengine, executing various SQL commands. Since taosc is written in C, it can be easily integrated with C/C++ applications.
When interacting with TDengine using other programming languages, taosc is also required for native connections. This is because taosc provides the underlying protocols and data structures necessary for communication with TDengine, ensuring smooth interactions between applications in different programming languages.
When interacting with TDengine using other programming languages, reliance on taosc is necessary if using native connections. This is because taosc provides the underlying protocols and data structures required for communication with TDengine, ensuring that applications in different programming languages can interact smoothly with TDengine.
By using taosc, developers can easily build applications that interact with TDengine to implement functions such as data storage, querying, and management. This design improves the maintainability and scalability of applications while reducing development difficulty, allowing developers to focus on implementing business logic.
By using taosc, developers can easily build applications that interact with TDengine, implementing functions such as data storage, querying, and management. This design improves the maintainability and scalability of applications while reducing development difficulty, allowing developers to focus on implementing business logic.
## taosAdapter
**taosAdapter** is a standard component in the TDengine installation package, serving as a bridge and adapter between the TDengine cluster and applications. It supports users in accessing TDengine services via RESTful interfaces and WebSocket connections, facilitating convenient data access and processing.
taosAdapter is a standard component in the TDengine installation package, acting as a bridge and adapter between the TDengine cluster and applications. It supports user access to TDengine services via RESTful interfaces and WebSocket connections, facilitating convenient data access and processing.
taosAdapter can seamlessly integrate with various data collection agents (such as Telegraf, StatsD, collectd, etc.) to import data into TDengine. Additionally, it provides data writing interfaces compatible with InfluxDB/OpenTSDB, allowing applications originally using InfluxDB/OpenTSDB to be easily migrated to TDengine with minimal modifications.
taosAdapter can seamlessly connect with various data collection agent tools (such as Telegraf, StatsD, collectd, etc.), thereby importing data into TDengine. Additionally, it provides data writing interfaces compatible with InfluxDB/OpenTSDB, allowing applications originally using InfluxDB/OpenTSDB to be easily ported to TDengine without significant modifications.
Through taosAdapter, users can flexibly integrate TDengine into existing application systems, enabling real-time data storage, querying, and analysis.
Through taosAdapter, users can flexibly integrate TDengine into existing application systems, achieving real-time data storage, querying, and analysis.
taosAdapter provides the following features:
- RESTful interface
- WebSocket connection
- Compatibility with InfluxDB v1 format writing
- Compatibility with OpenTSDB JSON and Telnet format writing
- Seamless integration with Telegraf
- Seamless integration with collectd
- Seamless integration with StatsD
- Support for Prometheus `remote_read` and `remote_write`
- RESTful interface;
- WebSocket connection;
- Compatible with InfluxDB v1 format writing;
- Compatible with OpenTSDB JSON and Telnet format writing;
- Seamless connection to Telegraf;
- Seamless connection to collectd;
- Seamless connection to StatsD;
- Supports Prometheus remote_read and remote_write.
## taosKeeper
**taosKeeper** is a newly added monitoring metrics export tool in TDengine version 3.0, designed to facilitate real-time monitoring of TDengine's operational status and performance metrics. With simple configuration, TDengine can report its operational status and metrics to taosKeeper. Upon receiving the monitoring data, taosKeeper will utilize the RESTful interface provided by taosAdapter to store this data in TDengine.
taosKeeper is a monitoring metric export tool newly added in TDengine 3.0, designed to facilitate real-time monitoring of TDengine's operational status and performance metrics. Through simple configuration, TDengine can report its operational status and metrics to taosKeeper. Upon receiving monitoring data, taosKeeper uses the RESTful interface provided by taosAdapter to store this data in TDengine.
One of the key values of taosKeeper is its ability to centrally store monitoring data from multiple TDengine clusters on a unified platform. This allows monitoring software to easily access this data for comprehensive monitoring and real-time analysis of the TDengine clusters. By using taosKeeper, users can more conveniently grasp the operational status of TDengine, promptly detect and resolve potential issues, ensuring system stability and efficiency.
An important value of taosKeeper is that it can centralize the monitoring data of multiple or even a batch of TDengine clusters on a unified platform. This enables monitoring software to easily access this data, thereby achieving comprehensive monitoring and real-time analysis of TDengine clusters. Through taosKeeper, users can more conveniently grasp the operational status of TDengine, promptly identify and resolve potential issues, ensuring system stability and efficiency.
## taosExplorer
To simplify user interaction with the database, TDengine Enterprise introduces a new visualization component—**taosExplorer**. This tool provides users with an intuitive interface for easily managing various elements within the database system, such as databases, supertables, subtables, and their lifecycles.
To simplify the use and management of the database for users, TDengine Enterprise has introduced a new visual component—taosExplorer. This tool provides users with an intuitive interface, making it easy to manage various elements within the database system, such as databases, supertables, subtables, and their lifecycle.
Through taosExplorer, users can execute SQL queries, monitor system status in real-time, manage user permissions, and perform data backup and recovery operations. Additionally, it supports data synchronization and export between different clusters, as well as managing topics and stream computing functions.
Through taosExplorer, users can execute SQL queries, monitor system status in real-time, manage user permissions, and perform data backup and recovery operations. Additionally, it supports data synchronization with other clusters, data export, and management of topics and stream computing, among other features.
It is worth noting that there are functional differences between the community and enterprise editions of taosExplorer. The enterprise edition offers more features and higher levels of technical support to meet the needs of enterprise users. For specific differences and detailed information, users can refer to the official TDengine documentation.
It is worth mentioning that the community edition and enterprise edition of taosExplorer differ in functionality. The enterprise edition offers more features and higher levels of technical support to meet the needs of enterprise users. For specific differences and detailed information, users can refer to the official TDengine documentation.
## taosX
**taosX** serves as the data pipeline component of TDengine Enterprise, aiming to provide users with an easy way to connect to third-party data sources without needing to write code, enabling convenient data import. Currently, taosX supports numerous mainstream data sources, including AVEVA PI System, AVEVA Historian, OPC-UA/DA, InfluxDB, OpenTSDB, MQTT, Kafka, CSV, TDengine 2.x, TDengine 3.x, MySQL, PostgreSQL, and Oracle.
As a data pipeline component of TDengine Enterprise, taosX aims to provide users with an easy way to connect to third-party data sources without the need for coding, facilitating convenient data import. Currently, taosX supports numerous mainstream data sources, including AVEVA PI System, AVEVA Historian, OPC-UA/DA, InfluxDB, OpenTSDB, MQTT, Kafka, CSV, TDengine 2.x, TDengine 3.x, MySQL, PostgreSQL, and Oracle, among others.
In practical use, users typically do not need to interact directly with taosX. Instead, they can easily access and utilize the powerful features of taosX through the browser user interface provided by taosExplorer. This design simplifies the operational process and lowers the usage threshold, allowing users to focus more on data processing and analysis, thereby improving work efficiency.
In practice, users usually do not need to interact directly with taosX. Instead, they can easily access and utilize the powerful features of taosX through the browser user interface provided by taosExplorer. This design simplifies the operation process, lowers the usage threshold, and allows users to focus more on data processing and analysis, thereby improving work efficiency.
## taosX Agent
**taosX Agent** is a vital part of the data pipeline functionality in TDengine Enterprise. It works in conjunction with taosX to receive external data source import tasks issued by taosX. taosX Agent can initiate connectors or directly retrieve data from external data sources, subsequently forwarding the collected data to taosX for processing.
taosX Agent is an important component of the TDengine Enterprise data pipeline functionality, working in conjunction with taosX to handle external data source import tasks issued by taosX. taosX Agent can initiate connectors or directly fetch data from external data sources, then forward the collected data to taosX for processing.
In edge-cloud collaborative scenarios, taosX Agent is typically deployed on the edge, particularly suited for situations where external data sources cannot be accessed directly through the public network. By deploying taosX Agent on the edge, network restrictions and data transmission delays can be effectively addressed, ensuring data timeliness and security.
In edge-cloud collaboration scenarios, taosX Agent is typically deployed at the edge, especially suitable for situations where external data sources cannot be directly accessed via the public network. Deploying taosX Agent at the edge can effectively solve problems related to network restrictions and data transmission delays, ensuring the timeliness and security of data.
## Applications or Third-party Tools
## Applications or Third-Party Tools
By integrating with various applications, visualization, and BI (Business Intelligence) tools, as well as data sources, TDengine provides users with flexible and efficient data processing and analysis capabilities to meet business needs across different scenarios. Applications or third-party tools mainly include the following categories:
By integrating with various applications, visualization and BI (Business Intelligence) tools, and data sources, TDengine provides users with flexible and efficient data processing and analysis capabilities to meet business needs in different scenarios. Applications or third-party tools mainly include the following categories.
1. **Applications**: These applications are responsible for writing, querying business data, and subscribing to data in the business cluster. Applications can interact with the business cluster in three ways:
- Applications based on taosc: Native connection applications directly connecting to the business cluster, with the default port set to 6030.
- Applications based on RESTful connections: Applications accessing the business cluster using RESTful interfaces, requiring connection through taosAdapter, with the default port set to 6041.
- Applications based on WebSocket connections: Applications using WebSocket connections, also requiring connection through taosAdapter, with the default port set to 6041.
1. Applications
2. **Visualization/BI Tools**: TDengine supports seamless integration with various visualization and BI tools, such as Grafana, Power BI, as well as domestic visualization and BI tools. Additionally, users can utilize tools like Grafana to monitor the operational status of the TDengine cluster.
These applications are responsible for writing business data to the business cluster, querying business data, and subscribing to data. Applications can interact with the business cluster in the following three ways:
3. **Data Sources**: TDengine possesses powerful data access capabilities, capable of integrating with various data sources, such as MQTT, OPC-UA/DA, Kafka, AVEVA PI System, AVEVA Historian, etc. This enables TDengine to easily consolidate data from different sources, providing users with a comprehensive and unified data view.
- Applications based on taosc: Applications using native connections, directly connected to the business cluster, default port is 6030.
- Applications based on RESTful connections: Applications that access the business cluster using RESTful interfaces, need to connect through taosAdapter, default port is 6041.
- Applications based on WebSocket connections: Applications using WebSocket connections, also need to connect through taosAdapter, default port is 6041.
2. Visualization/BI Tools
TDengine supports seamless integration with numerous visualization and BI tools, such as Grafana, Power BI, and domestically produced visualization and BI tools. Additionally, tools like Grafana can be used to monitor the operational status of the TDengine cluster.
3. Data Sources
TDengine has strong data access capabilities and can connect to various data sources, such as MQTT, OPC-UA/DA, Kafka, AVEVA PI System, AVEVA Historian, etc. This enables TDengine to easily integrate data from different sources, providing users with a comprehensive and unified data view.

View File

@ -3,99 +3,92 @@ title: System Requirements
slug: /operations-and-maintenance/system-requirements
---
If you plan to build a time-series data platform using TDengine, you must conduct detailed planning of computing resources, storage resources, and network resources in advance to ensure that they meet the needs of your business scenario. Typically, TDengine runs multiple processes, including taosd, taosadapter, taoskeeper, taos-explorer, and taosx.
If you plan to use TDengine to build a time-series data platform, it is necessary to make detailed plans for computing resources, storage resources, and network resources in advance to ensure they meet the needs of your business scenarios. Typically, TDengine runs multiple processes, including taosd, taosadapter, taoskeeper, taos-explorer, and taosx.
Among these processes, the resource consumption of taoskeeper, taos-explorer, taosadapter, and taosx is relatively low and usually does not require special attention. Additionally, these processes have low storage space requirements, with their CPU and memory resource usage generally being one-tenth to a few tenths of that of the taosd process (except in special scenarios, such as data synchronization and historical data migration, where the technical support team of Taos Data will provide one-on-one service). System administrators should regularly monitor the resource consumption of these processes and take appropriate actions in a timely manner.
Among these processes, taoskeeper, taos-explorer, taosadapter, and taosx consume relatively fewer resources and usually do not require special attention. Additionally, these processes have lower storage space requirements, and their CPU and memory usage is generally one-tenth to a fraction of that of the taosd process (except in special scenarios such as data synchronization and historical data migration. In these cases, the technical support team of TDengine will provide one-on-one services). System administrators should regularly monitor the resource consumption of these processes and handle them promptly.
In this section, we will focus on the resource planning of the core process of the TDengine database engine—taosd. Reasonable resource planning will ensure the efficient operation of the taosd process, thereby improving the performance and stability of the entire time-series data platform.
In this section, we will focus on the resource planning for the core process of the TDengine database engine—taosd. Proper resource planning will ensure the efficient operation of the taosd process, thereby enhancing the performance and stability of the entire time-series data platform.
## Server Memory Requirements
Each database can create a fixed number of vgroups, with a default of two. When creating a database, the number of vgroups can be specified through the `vgroups<num>` parameter, while the number of replicas is determined by the `replica<num>` parameter. Since each replica in a vgroup corresponds to a vnode, the memory occupied by the database is determined by the parameters vgroups, replica, buffer, pages, pagesize, and cachesize.
Each database can create a fixed number of vgroups, with two being the default. When creating a database, you can specify the number of vgroups through the `vgroups<num>` parameter, while the number of replicas is determined by the `replica<num>` parameter. Since each replica in a vgroup corresponds to a vnode, the memory occupied by the database is determined by the parameters vgroups, replica, buffer, pages, pagesize, and cachesize.
To help users better understand and configure these parameters, the official TDengine documentation on database management provides detailed explanations. Based on these parameters, the memory size required for a database can be estimated, and the specific calculation method is as follows (the specific values should be adjusted based on actual conditions):
```text
To help users better understand and configure these parameters, the official documentation of TDengine's database management section provides detailed explanations. Based on these parameters, you can estimate the memory size required for a database, which is calculated as follows (specific values should be adjusted according to actual conditions).
vgroups × replica × (buffer + pages × pagesize + cachesize)
```
It should be noted that these memory resources are not borne solely by a single server but are shared by all dnodes in the entire cluster. In other words, the burden of these resources is actually shared by the server cluster where these dnodes reside. If there are multiple databases within the cluster, the total memory required must also sum the requirements of these databases. More complex scenarios arise when the dnodes in the cluster are not all deployed from the beginning, but instead, servers and dnodes are gradually added as node loads increase. In this case, the newly added databases may cause an uneven load distribution between old and new dnodes. In such cases, simple theoretical calculations are insufficiently accurate; it is necessary to consider the actual load status of each dnode for a comprehensive assessment.
It should be noted that these memory resources are not borne by a single server but are shared among all dnodes in the cluster, meaning the burden of these resources is actually shared by the server cluster where these dnodes are located. If there are multiple databases within the cluster, the total memory required will also need to add up the demands of these databases. More complex situations arise if the dnodes in the cluster are not all deployed at the beginning but are added gradually as node loads increase during use. In such cases, new databases may cause an uneven load distribution between new and old dnodes. In this situation, simple theoretical calculations are not accurate enough, and it is necessary to consider the actual load conditions of each dnode for a comprehensive assessment.
System administrators can use the following SQL to view the distribution of all vnodes from all databases on each dnode in the information_schema library.
System administrators can view the distribution of all databases' vnodes across various dnodes in the information_schema library's ins_vnodes table using the following SQL query.
```sql
select * from information_schema.ins_vnodes;
dnode_id | vgroup_id | db_name | status | role_time | start_time | restored |
dnode_id |vgroup_id | db_name | status | role_time | start_time | restored |
===============================================================================================
1 | 3 | log | leader | 2024-01-16 13:52:13.618 | 2024-01-16 13:52:01.628 | true |
1 | 4 | log | leader | 2024-01-16 13:52:13.630 | 2024-01-16 13:52:01.702 | true |
1| 3 | log | leader | 2024-01-16 13:52:13.618 | 2024-01-16 13:52:01.628 | true |
1| 4 | log | leader | 2024-01-16 13:52:13.630 | 2024-01-16 13:52:01.702 | true |
```
## Client Memory Requirements
1. Native Connection Method
Since client applications communicate with the server using taosc, there will be a certain amount of memory consumption. This memory consumption mainly arises from: SQL in write operations, caching of table metadata information, and inherent structural overhead. Assuming the maximum number of tables supported by the database service is N (the metadata overhead for each table created via a supertable is about 256B), the maximum number of concurrent write threads is T, and the maximum SQL statement length is S (usually 1MB). Based on these parameters, we can estimate the client's memory consumption (in MB).
Since client applications use taosc to communicate with the server, this results in certain memory consumption. This memory consumption mainly comes from: SQL in write operations, caching of table metadata information, and inherent structural overhead. Assuming the maximum number of tables that the database service can support is N (metadata overhead for each table created through a supertable is about 256B), the maximum number of concurrent write threads is T, and the maximum length of SQL statements is S (usually 1MB). Based on these parameters, we can estimate the memory consumption of the client (in MB).
M = (T × S × 3 + (N / 4096) + 100)
For example, if the maximum number of concurrent write threads for a user is 100 and the number of subtables is 10,000,000, then the minimum memory requirement for the client is as follows:
For example, if the maximum number of concurrent write threads for a user is 100, and the number of subtables is 10,000,000, then the minimum memory requirement for the client is as follows:
100 × 3 + (10000000 / 4096) + 100 ≈ 2841 (MB)
That is, configuring 3GB of memory is the minimum requirement.
2. RESTful/WebSocket Connection Method
2. RESTful/WebSocket Connection Methods
When using the WebSocket connection method for data writing, if the memory usage is not large, it is usually not a concern. However, when executing query operations, the WebSocket connection method will consume a certain amount of memory. Next, we will discuss memory usage in query scenarios in detail.
When using WebSocket connection methods for data writing, it is usually not a concern if memory usage is not significant. However, when performing query operations, the WebSocket connection method consumes a certain amount of memory. Next, we will discuss the memory usage in query scenarios in detail.
When the client initiates a query request via the WebSocket connection method, sufficient memory space must be reserved to receive and process the query results. Thanks to the characteristics of the WebSocket connection method, data can be received and decoded in batches, allowing for the processing of large amounts of data while ensuring that the memory required for each connection remains fixed.
When a client initiates a query request through a WebSocket connection, sufficient memory space must be reserved to receive and process the query results. Thanks to the characteristics of the WebSocket connection method, data can be received and decoded in batches, which allows handling large amounts of data while ensuring that the memory required for each connection is fixed.
The method for calculating client memory usage is relatively simple: it suffices to sum the read/write buffer capacity required for each connection. Typically, each connection will additionally occupy 8MB of memory. Therefore, if there are C concurrent connections, the total additional memory requirement is 8 × C (in MB).
The method to calculate client memory usage is relatively simple: just add up the capacity of the read/write buffer required for each connection. Typically, each connection consumes an additional 8MB of memory. Therefore, if there are C concurrent connections, the total additional memory requirement is 8×C (in MB).
For example, if the maximum number of concurrent connections for a user is 10, then the minimum additional memory requirement for the client is 80 (8 × 10) MB.
For example, if the maximum number of concurrent connections for a user is 10, the minimum additional memory requirement for the client is 80 (8×10) MB.
Compared to the WebSocket connection method, the RESTful connection method has a larger memory footprint. In addition to the memory required for the buffer, it also needs to consider the memory overhead for the response results for each connection. This memory overhead is closely related to the size of the JSON data in the response results, especially when querying a large amount of data, which can consume significant memory.
Compared to the WebSocket connection method, the RESTful connection method consumes more memory, including the memory needed for the buffer and the memory overhead of each connection's response result. This memory overhead is closely related to the size of the JSON data in the response, especially when querying large amounts of data, which can consume a significant amount of memory.
Since the RESTful connection method does not support batch retrieval of query data, this can lead to particularly large memory consumption when querying large result sets, potentially causing memory overflow. Therefore, in large projects, it is recommended to enable the batchfetch=true option to utilize the WebSocket connection method, allowing for stream-like result set returns to avoid the risk of memory overflow.
Since the RESTful connection method does not support fetching query data in batches, this can lead to particularly large memory usage when retrieving very large result sets, potentially causing memory overflow. Therefore, in large projects, it is recommended to use the WebSocket connection method to implement streaming result sets and avoid the risk of memory overflow.
:::note
**Note**
- It is recommended to use RESTful/WebSocket connection methods to access the TDengine cluster instead of the native taosc connection method.
- In the vast majority of cases, the RESTful/WebSocket connection methods meet the business requirements for writing and querying, and this connection method does not depend on taosc, making server maintenance and upgrades much easier.
:::
- It is recommended to use RESTful/WebSocket connection methods to access the TDengine cluster, rather than using the native taosc connection method.
- In most cases, RESTful/WebSocket connection methods meet the requirements for business writing and querying, and these methods do not depend on taosc, completely decoupling cluster server upgrades from client connection methods, making server maintenance and upgrades easier.
## CPU Requirements
The CPU requirements for TDengine users are mainly influenced by the following three factors:
TDengine users' CPU requirements are mainly influenced by the following three factors:
- Data Sharding: In TDengine, each CPU core can serve 1 to 2 vnodes. If a cluster is configured with 100 vgroups and uses a three-replica strategy, the recommended number of CPU cores for that cluster is 150 to 300 to achieve optimal performance.
- Data Writing: A single core of TDengine can handle at least 10,000 write requests per second. Notably, each write request can contain multiple records, and the computational resources consumed are similar whether writing a single record or writing 10 records simultaneously. Therefore, the more records written in each request, the higher the writing efficiency. For example, if a write request contains over 200 records, a single core can achieve a writing speed of 1 million records per second. However, this requires the front-end data collection system to have higher capability, as it needs to buffer records and write them in batches.
- Query Demand: Although TDengine provides efficient query capabilities, the varying nature of queries in each application scenario and changes in query frequency make it difficult to provide a specific number to measure the computational resources required for queries. Users need to write some query statements based on their actual scenarios to more accurately determine the necessary computational resources.
- Data sharding: In TDengine, each CPU core can serve 1 to 2 vnodes. Assuming a cluster is configured with 100 vgroups and uses a three-replica strategy, it is recommended that the cluster has 150~300 CPU cores to achieve optimal performance.
- Data writing: TDengine's single core can handle at least 10,000 write requests per second. It is worth noting that each write request can contain multiple records, and writing one record at a time compared to writing 10 records simultaneously consumes roughly the same computing resources. Therefore, the more records written at once, the higher the writing efficiency. For example, if a write request contains more than 200 records, a single core can achieve a speed of writing 1 million records per second. However, this requires the front-end data collection system to have higher capabilities, as it needs to cache records and then write them in batches.
- Query requirements: Although TDengine provides efficient querying capabilities, the computational resources required for querying vary greatly depending on each application scenario and the frequency of queries, making it difficult to provide a specific number to measure the computational resources needed for querying. Users need to write some query statements based on their actual scenarios to more accurately determine the required computational resources.
In summary, the CPU requirements for data sharding and data writing can be estimated. However, the computational resources consumed by query demand are more challenging to predict. During actual operation, it is recommended to maintain CPU usage below 50% to ensure system stability and performance. Once CPU usage exceeds this threshold, consideration should be given to adding new nodes or increasing the number of CPU cores to provide more computational resources.
In summary, the CPU requirements for data sharding and data writing can be estimated. However, the computational resources consumed by query requirements are difficult to predict. In practice, it is recommended to keep CPU usage below 50% to ensure system stability and performance. Once CPU usage exceeds this threshold, it may be necessary to consider adding new nodes or increasing the number of CPU cores to provide more computational resources.
## Storage Requirements
Compared to traditional general-purpose databases, TDengine performs excellently in data compression, achieving extremely high compression ratios. In most application scenarios, the compression ratio of TDengine is usually no less than 5 times. In certain specific cases, the compression ratio can even reach 10 times or even hundreds of times, which mainly depends on the data characteristics in actual scenarios.
Compared to traditional general-purpose databases, TDengine excels in data compression, achieving a very high compression rate. In most application scenarios, TDengine's compression rate is usually no less than 5 times. In some specific cases, the compression rate can even reach 10 times or even hundreds of times, depending mainly on the data characteristics in the actual scenario.
To calculate the raw data size before compression, the following method can be used:
To calculate the size of the original uncompressed data, the following method can be used:
RawDataSize = numOfTables × rowSizePerTable × rowsPerTable
Example: For 10 million smart meters, collecting data every 15 minutes with a data size of 20B each time results in an approximate raw data volume of 7TB per year. TDengine would require about 1.4TB of storage space.
Example: 10 million smart meters, each meter collects data every 15 minutes, and each data collection is 20B, then the original data volume for one year is about 7TB. TDengine would approximately require 1.4TB of storage space.
To cater to different users' personalized needs for data storage duration and costs, TDengine provides users with great flexibility. Users can customize their storage strategies through a series of database parameter configuration options. Among them, the keep parameter is particularly noteworthy, as it empowers users to set the maximum retention period for data in storage. This design allows users to finely control the data storage lifecycle according to the importance of the business and the timeliness of the data, thus achieving precise control over storage costs.
To cater to different users' personalized needs in terms of data storage duration and cost, TDengine provides users with great flexibility. Users can customize their storage strategy through a series of database parameter configuration options. Among these, the keep parameter is particularly noteworthy, as it allows users to independently set the maximum retention period for data in storage space. This feature design enables users to precisely control the storage lifecycle of data based on the importance of the business and the timeliness of the data, thereby achieving refined control of storage costs.
However, relying solely on the keep parameter to optimize storage costs is still insufficient. Therefore, TDengine has further innovated by introducing a multi-tier storage strategy.
However, relying solely on the keep parameter to optimize storage costs is still insufficient. For this reason, TDengine further innovates by introducing a multi-level storage strategy.
Additionally, to accelerate data processing, TDengine supports the configuration of multiple hard disks for concurrent data writing and reading. This parallel processing mechanism maximizes the processing capability of multi-core CPUs and the I/O bandwidth of hard disks, significantly improving data transfer speeds and effectively addressing the challenges of high concurrency and large data volumes in application scenarios.
Additionally, to accelerate the data processing process, TDengine specifically supports configuring multiple hard drives to achieve concurrent data writing and reading. This parallel processing mechanism maximizes the use of multi-core CPU processing power and hard disk I/O bandwidth, significantly increasing data transfer speed, perfectly addressing the challenges of high concurrency and large data volume application scenarios.
:::tip
**Tip** How to Estimate TDengine Compression Ratio
How to Estimate TDengine Compression Ratio
Users can utilize the performance testing tool taosBenchmark to assess the data compression effects of TDengine. By using the -f option to specify the write configuration file, taosBenchmark can write a specified number of CSV sample data into the specified library parameters and table structures. After completing the data write, users can execute the flush database command in the taos shell to force all data to be written to the hard disk. Then, using the du command in the Linux operating system, the size of the data folder for the specified vnode can be obtained. Finally, dividing the raw data size by the actual storage data size yields the true compression ratio.
::: tip
```text
Users can use the performance testing tool taosBenchmark to assess the data compression effect of TDengine. By using the -f option to specify the write configuration file, taosBenchmark can write a specified number of CSV sample data into the specified database parameters and table structure.
After completing the data writing, users can execute the flush database command in the taos shell to force all data to be written to the disk. Then, use the du command of the Linux operating system to get the size of the data folder of the specified vnode. Finally, divide the original data size by the actual storage data size to calculate the real compression ratio.
```
The following command can be used to obtain the storage space occupied by TDengine.
@ -104,59 +97,68 @@ taos> flush database <dbname>;
$ du -hd1 <dataDir>/vnode --exclude=wal
```
## Multi-tier Storage
## Multi-Tier Storage
In addition to the demand for storage capacity, users may also wish to reduce storage costs at specific capacities. To meet this need, TDengine has introduced multi-tier storage functionality. This feature allows for high-frequency data generated recently to be stored on high-cost storage devices, while older data with lower access frequency is stored on low-cost storage devices. Through this approach, TDengine achieves the following objectives:
In addition to storage capacity needs, users may also want to reduce storage costs under specific capacities. To meet this need, TDengine has introduced a multi-tier storage feature. This feature allows data that is recently generated and accessed frequently to be stored on high-cost storage devices, while data that is older and accessed less frequently is stored on low-cost storage devices. Through this method, TDengine achieves the following goals.
- Reduce Storage Costs: By storing vast amounts of cold data on inexpensive storage devices, storage costs can be significantly reduced.
- Improve Writing Performance: Each tier of storage supports multiple mount points, and the WAL files also support parallel writing with multiple mount points at level 0, greatly enhancing writing performance (sustained write speeds in actual scenarios can reach 300 million data points per second), achieving high I/O throughput even on mechanical hard disks (measured at up to 2GB/s).
- Reduce storage costs: By storing massive amounts of very cold data on cheap storage devices, storage costs can be significantly reduced.
- Improve write performance: Each storage tier supports multiple mount points, and WAL files also support parallel writing on multiple mount points at level 0, which greatly improves write performance (sustained write speeds of up to 300 million measurements per second) and achieves very high disk I/O throughput on mechanical hard drives (tested up to 2GB/s).
Users can determine the capacity allocation between high-speed and low-cost storage devices based on the ratio of hot and cold data.
Users can decide the capacity allocation between high-speed and low-cost storage devices based on the ratio of hot and cold data.
TDengine's multi-tier storage functionality also has the following advantages in use:
TDengine's multi-tier storage feature also has the following advantages in use.
- Easy Maintenance: Once the mount points for each storage tier are configured, tasks like data migration do not require manual intervention, making storage expansion more flexible and convenient.
- Transparent to SQL: Whether the queried data spans across tiers or not, a single SQL statement can return all data, making it concise and efficient.
- Easy maintenance: After configuring the storage mount points for each tier, tasks such as data migration do not require manual intervention, making storage expansion more flexible and convenient.
- Transparent to SQL: Regardless of whether the queried data spans tiers, a single SQL can return all data, which is simple and efficient.
## Network Bandwidth Requirements
Network bandwidth requirements can be divided into two main parts—write queries and internal cluster communication.
Network bandwidth requirements can be divided into two main parts—write queries and intra-cluster communication.
Write queries refer to the bandwidth requirements for business requests, specifically the bandwidth needed for the client to send data to the server for writing operations.
Write queries refer to the bandwidth requirements for business requests, i.e., the bandwidth needed for clients to send data to the server for writing operations.
The bandwidth requirements for internal cluster communication can be further divided into two parts:
Intra-cluster communication bandwidth requirements are further divided into two parts:
- Bandwidth requirements for normal communication between nodes, such as leaders distributing data to followers, and taosAdapter distributing write requests to each vgroup leader, etc.
- Additional internal communication bandwidth required for the cluster to respond to maintenance instructions, such as data replication caused by switching from a single replica to three replicas, or data replication caused by repairs on specified dnodes.
- The bandwidth needed for normal communication between nodes, for example, the leader distributing data to each follower, taosAdapter distributing write requests to each vgroup leader, etc.
- The additional internal communication bandwidth required by the cluster to respond to maintenance commands, such as data replication caused by switching from a single replica to three replicas, data replication triggered by repairing a specified dnode, etc.
To estimate inbound bandwidth requirements, we can use the following method:
Since the taosc writing includes built-in compression during RPC communication, the write bandwidth requirements are lower than those for RESTful/WebSocket connection methods. Here, we will base the bandwidth requirements for write requests on RESTful/WebSocket connection methods.
Since taosc writing includes compression functionality during RPC communication, the bandwidth requirements for writing are relatively lower compared to RESTful/WebSocket connections. Here, we will estimate the bandwidth requirements for write requests based on the bandwidth requirements of RESTful/WebSocket connections.
Example: For 10 million smart meters, collecting data every 15 minutes with each data collection having a size of 20B, the average bandwidth requirement can be calculated as 0.22MB.
Example: 10 million smart meters, each meter collects data every 15min, each data collection is 20B, which calculates an average bandwidth requirement of 0.22MB.
Considering that there may be a situation where smart meters report data in a centralized manner, the bandwidth requirements should be increased based on the actual project situation. Since RESTful requests are transmitted in plaintext, the actual bandwidth requirements should also be multiplied by a factor to obtain the estimated value.
Considering the situation where smart meters report data in clusters, bandwidth requirements must be increased according to the actual project situation when calculating bandwidth requirements. Considering that RESTful requests are transmitted in plaintext, the actual bandwidth requirements should be multiplied by a factor to get the estimated value.
:::tip
**Tip**
Network bandwidth and communication latency are crucial for the performance and stability of distributed systems, especially in the network connections between server nodes. We strongly recommend allocating a dedicated VLAN for the network between server nodes to avoid external communication interference. When selecting bandwidth, it is advisable to use 10G networks, or at least 1G networks, ensuring a packet loss rate below 0.01%. If a distributed storage solution is adopted, the storage network and the internal communication network of the cluster must be planned separately. A common practice is to use dual 10G networks, which consist of two independent 10G networks. This can ensure that storage data and internal cluster communication do not interfere with each other, enhancing overall performance. For inbound networks, in addition to ensuring sufficient access bandwidth, the packet loss rate should also be kept below 0.01%. This will help reduce errors and retransmissions during data transmission, thus improving the reliability and efficiency of the system.
:::
```text
Network bandwidth and communication latency are crucial for the performance and stability of distributed systems, especially in terms of network connections between server nodes.
We strongly recommend allocating a dedicated VLAN for the network between server nodes to avoid interference from external communications. In terms of bandwidth selection, it is advisable to use a 10-gigabit network, or at least a gigabit network, and ensure that the packet loss rate is below one ten-thousandth.
If a distributed storage solution is adopted, the storage network and the intra-cluster communication network must be planned separately. A common practice is to use dual 10-gigabit networks, i.e., two sets of independent 10-gigabit networks. This ensures that storage data and intra-cluster communication do not interfere with each other, improving overall performance.
For inbound networks, in addition to ensuring sufficient access bandwidth, the packet loss rate must also be kept below one ten-thousandth. This will help reduce errors and retransmissions during data transmission, thereby improving the reliability and efficiency of the system.
```
## Number of Servers
Based on the previous estimates for memory, CPU, storage, and network bandwidth, we can derive the required memory capacity, CPU core count, storage space, and network bandwidth for the entire TDengine cluster. If the number of data replicas is not 1, the total demand must be multiplied by the number of replicas to determine the actual resource requirements.
Based on the previous estimates of memory, CPU, storage, and network bandwidth, we can determine the total memory capacity, CPU cores, storage space, and network bandwidth required for the entire TDengine cluster. If the number of data replicas is not 1, the total demand must be multiplied by the number of replicas to obtain the actual required resources.
Thanks to the excellent horizontal scalability of TDengine, we can easily calculate the total resource demand. Next, simply divide this total by the resource capacity of a single physical or virtual machine to roughly determine how many physical or virtual machines need to be purchased to deploy the TDengine cluster.
Thanks to TDengine's excellent horizontal scaling capabilities, we can easily calculate the total amount of resource requirements. Next, we only need to divide this total by the resource amount of a single physical or virtual machine to roughly determine how many physical or virtual machines need to be purchased to deploy the TDengine cluster.
## Network Port Requirements
The table below lists some commonly used ports for TDengine interfaces or components, which can all be modified through parameters in the configuration file.
The table below lists some common ports for TDengine interfaces or components, which can be modified through parameters in the configuration file.
| Interface or Component | Port |
| :----------------------: | :--------: |
| Native Interface (taosc) | 6030 |
| RESTful Interface | 6041 |
| WebSocket Interface | 6041 |
| taosKeeper | 6043 |
| taosX | 6050, 6055 |
| taosExplorer | 6060 |
| Interface or Component Name | Port | Protocol |
|:---------------------------------------------------------:|:----------:|:-------------:|
| Native Interface (taosc) | 6030 | TCP |
| RESTful Interface | 6041 | TCP |
| WebSocket Interface | 6041 | TCP |
| taosKeeper | 6043 | TCP |
| StatsD Format Write Interface | 6044 | TCP/UDP |
| Collectd Format Write Interface | 6045 | TCP/UDP |
| OpenTSDB Telnet Format Write Interface | 6046 | TCP |
| Collectd using OpenTSDB Telnet Format Write Interface | 6047 | TCP |
| Icinga2 using OpenTSDB Telnet Format Write Interface | 6048 | TCP |
| tcollector using OpenTSDB Telnet Format Write Interface | 6049 | TCP |
| taosX | 6050, 6055 | TCP |
| taosExplorer | 6060 | TCP |

View File

@ -1,149 +1,147 @@
---
title: Deploy Your Cluster
title: Deploying Your Cluster
slug: /operations-and-maintenance/deploy-your-cluster
---
Since TDengine was designed from the outset with a distributed architecture, it has powerful horizontal scalability to meet the growing data processing demands. Therefore, TDengine supports clusters and open-sources this core functionality. Users can choose from four deployment methods based on their actual environment and needs—manual deployment, Docker deployment, Kubernetes deployment, and Helm deployment.
Since TDengine was designed with a distributed architecture from the beginning, it has powerful horizontal scaling capabilities to meet the growing data processing needs. Therefore, TDengine supports clustering and has open-sourced this core functionality. Users can choose from four deployment methods according to their actual environment and needs—manual deployment, Docker deployment, Kubernetes deployment, and Helm deployment.
## Manual Deployment
### Deploying taosd
taosd is the main service component in the TDengine cluster. This section describes the steps for manually deploying a taosd cluster.
taosd is the most important service component in the TDengine cluster. This section describes the steps to manually deploy a taosd cluster.
#### 1. Clear Data
If there is any previous test data or if another version (such as 1.x/2.x) of TDengine has been installed on the physical nodes where the cluster is being set up, please delete it and clear all data.
If the physical nodes for setting up the cluster contain previous test data or have had other versions of TDengine installed (such as 1.x/2.x), please delete them and clear all data first.
#### 2. Check Environment
Before deploying the TDengine cluster, it is crucial to thoroughly check the network settings of all dnodes and the physical nodes where the applications reside. Here are the checking steps:
Before deploying the TDengine cluster, it is crucial to thoroughly check the network settings of all dnodes and the physical nodes where the applications are located. Here are the steps to check:
- **Step 1:** Execute the command `hostname -f` on each physical node to check and confirm that all node hostnames are unique. This step can be skipped for the node where the application driver resides.
- **Step 2:** Execute the command `ping host` on each physical node, where `host` is the hostname of other physical nodes. This step aims to detect network connectivity between the current node and other physical nodes. If any nodes cannot be pinged, immediately check the network and DNS settings. For Linux operating systems, check the `/etc/hosts` file; for Windows operating systems, check the `C:\Windows\system32\drivers\etc\hosts` file. Poor network connectivity will prevent the cluster from being formed, so be sure to resolve this issue.
- **Step 3:** Repeat the above network check steps on the physical nodes running the applications. If network connectivity is poor, the application will not be able to connect to the taosd service. At this time, carefully check the DNS settings or hosts file on the physical node where the application resides to ensure correct configuration.
- **Step 4:** Check the ports to ensure that all hosts in the cluster can communicate over TCP on port 6030.
- Step 1: Execute the `hostname -f` command on each physical node to view and confirm that all node hostnames are unique. This step can be omitted for nodes where application drivers are located.
- Step 2: Execute the `ping host` command on each physical node, where host is the hostname of other physical nodes. This step aims to detect the network connectivity between the current node and other physical nodes. If you cannot ping through, immediately check the network and DNS settings. For Linux operating systems, check the `/etc/hosts` file; for Windows operating systems, check the `C:\Windows\system32\drivers\etc\hosts` file. Network issues will prevent the formation of a cluster, so be sure to resolve this issue.
- Step 3: Repeat the above network detection steps on the physical nodes where the application is running. If the network is found to be problematic, the application will not be able to connect to the taosd service. At this point, carefully check the DNS settings or hosts file of the physical node where the application is located to ensure it is configured correctly.
- Step 4: Check ports to ensure that all hosts in the cluster can communicate over TCP on port 6030.
By following these steps, you can ensure smooth network communication between all nodes, thereby laying a solid foundation for successfully deploying the TDengine cluster.
By following these steps, you can ensure that all nodes communicate smoothly at the network level, laying a solid foundation for the successful deployment of the TDengine cluster.
#### 3. Installation
To ensure consistency and stability across the physical nodes in the cluster, please install the same version of TDengine on all physical nodes.
To ensure consistency and stability within the cluster, install the same version of TDengine on all physical nodes.
#### 4. Modify Configuration
Modify the TDengine configuration file (all nodes' configuration files need to be modified). Suppose the endpoint of the first dnode to be started is `h1.tdengine.com:6030`, its related cluster configuration parameters are as follows.
Modify the configuration file of TDengine (the configuration files of all nodes need to be modified). Assuming the endpoint of the first dnode to be started is `h1.tdengine.com:6030`, the cluster-related parameters are as follows.
```shell
# firstEp is the first dnode to connect after each dnode starts
# firstEp is the first dnode that each dnode connects to after the initial startup
firstEp h1.tdengine.com:6030
# Must be configured to the FQDN of this dnode. If the machine only has one hostname, this line can be commented out or deleted
# Must be configured to the FQDN of this dnode, if there is only one hostname on this machine, you can comment out or delete the following line
fqdn h1.tdengine.com
# Configure the port for this dnode, default is 6030
# Configure the port of this dnode, default is 6030
serverPort 6030
```
The parameters that must be modified are `firstEp` and `fqdn`. The `firstEp` configuration should be consistent for each dnode, but the `fqdn` must be configured to the value of the respective dnode. Other parameters need not be modified unless you are very clear about why changes are necessary.
The parameters that must be modified are firstEp and fqdn. For each dnode, the firstEp configuration should remain consistent, but fqdn must be set to the value of the dnode it is located on. Other parameters do not need to be modified unless you are clear on why they should be changed.
For dnodes that wish to join the cluster, it is essential to ensure that the parameters related to the TDengine cluster listed in the table below are set to be completely consistent. Any mismatch in parameters may prevent the dnode from successfully joining the cluster.
For dnodes wishing to join the cluster, it is essential to ensure that the parameters related to the TDengine cluster listed in the table below are set identically. Any mismatch in parameters may prevent the dnode from successfully joining the cluster.
| Parameter Name | Meaning |
|:----------------------:|:------------------------------------------------------------:|
| statusInterval | Interval at which the dnode reports status to the mnode |
| timezone | Time zone |
| locale | System locale information and encoding format |
| charset | Character set encoding |
| ttlChangeOnWrite | Whether the TTL expiration time changes with the table modification |
| Parameter Name | Meaning |
|:----------------:|:---------------------------------------------------------:|
| statusInterval | Interval at which dnode reports status to mnode |
| timezone | Time zone |
| locale | System locale information and encoding format |
| charset | Character set encoding |
| ttlChangeOnWrite | Whether ttl expiration changes with table modification |
#### 5. Start
Follow the previous steps to start the first dnode, for example, `h1.tdengine.com`. Then execute `taos` in the terminal to start the TDengine CLI program and execute the command `show dnodes` to view information about all dnodes currently in the cluster.
Start the first dnode, such as `h1.tdengine.com`, following the steps mentioned above. Then execute taos in the terminal to start TDengine's CLI program taos, and execute the `show dnodes` command within it to view all dnode information in the current cluster.
```shell
taos> show dnodes;
id | endpoint | vnodes | support_vnodes | status | create_time | note |
=================================================================================
1 | h1.tdengine.com:6030 | 0 | 1024 | ready | 2022-07-16 10:50:42.673 | |
id | endpoint | vnodes|support_vnodes|status| create_time | note |
===================================================================================
1| h1.tdengine.com:6030 | 0| 1024| ready| 2022-07-16 10:50:42.673 | |
```
You can see that the endpoint of the newly started dnode is `h1.tdengine.com:6030`. This address is the first Ep of the newly created cluster.
You can see that the endpoint of the dnode node that has just started is `h1.tdengine.com:6030`. This address is the first Ep of the new cluster.
#### 6. Add dnode
#### 6. Adding dnode
Following the previous steps, start taosd on each physical node. Each dnode needs to have its `firstEp` parameter in the `taos.cfg` file set to the endpoint of the first node of the new cluster, which in this case is `h1.tdengine.com:6030`. On the machine where the first dnode resides, execute `taos` in the terminal to open the TDengine CLI program, then log into the TDengine cluster and execute the following SQL.
Follow the steps mentioned earlier, start taosd on each physical node. Each dnode needs to configure the firstEp parameter in the taos.cfg file to the endpoint of the first node of the new cluster, which in this case is `h1.tdengine.com:6030`. On the machine where the first dnode is located, run taos in the terminal, open TDengine's CLI program taos, then log into the TDengine cluster, and execute the following SQL.
```shell
create dnode "h2.tdengine.com:6030"
```
This command adds the endpoint of the new dnode to the cluster's endpoint list. You must enclose `fqdn:port` in double quotes; otherwise, a runtime error will occur. Remember to replace the example `h2.tdengine.com:6030` with the endpoint of this new dnode. Then execute the following SQL to check whether the new node has successfully joined. If the dnode you wish to add is currently offline, please refer to the "Common Issues" section at the end of this section for resolution.
Add the new dnode's endpoint to the cluster's endpoint list. You need to put `fqdn:port` in double quotes, otherwise, it will cause an error when running. Please note to replace the example h2.tdengine.com:6030 with the endpoint of this new dnode. Then execute the following SQL to see if the new node has successfully joined. If the dnode you want to join is currently offline, please refer to the "Common Issues" section later in this chapter for a solution.
```shell
show dnodes;
```
In the logs, please verify that the fqdn and port of the output dnode match the endpoint you just attempted to add. If they do not match, please correct it to the correct endpoint. By following the above steps, you can continue to add new dnodes one by one to expand the scale of the cluster and improve overall performance. Ensuring that the correct process is followed when adding new nodes helps maintain the stability and reliability of the cluster.
In the logs, please confirm that the fqdn and port of the output dnode are consistent with the endpoint you just tried to add. If they are not consistent, correct it to the correct endpoint. By following the steps above, you can continuously add new dnodes to the cluster one by one, thereby expanding the scale of the cluster and improving overall performance. Make sure to follow the correct process when adding new nodes, which helps maintain the stability and reliability of the cluster.
:::tip
**Tips**
- Any dnode that has already joined the cluster can serve as the `firstEp` for subsequent nodes to join. The `firstEp` parameter only takes effect when that dnode first joins the cluster; after joining, that dnode will retain the latest endpoint list of mnode and will no longer depend on this parameter. The `firstEp` parameter in the configuration file is primarily used for client connections. If no parameters are set for the TDengine CLI, the default connection will be to the node specified by `firstEp`.
- Two dnodes that have not configured the `firstEp` parameter will operate independently after startup. In this case, it will not be possible to join one dnode to another to form a cluster.
- Any dnode that has joined the cluster can serve as the firstEp for subsequent nodes to be added. The firstEp parameter only functions when that dnode first joins the cluster. After joining, the dnode will save the latest mnode's endpoint list, and subsequently, it no longer depends on this parameter. The firstEp parameter in the configuration file is mainly used for client connections, and if no parameters are set for TDengine's CLI, it will default to connecting to the node specified by firstEp.
- Two dnodes that have not configured the firstEp parameter will run independently after starting. At this time, it is not possible to join one dnode to another to form a cluster.
- TDengine does not allow merging two independent clusters into a new cluster.
:::
#### 7. Adding mnode
#### 7. Add mnode
When creating the TDengine cluster, the first dnode will automatically become the mnode, responsible for managing and coordinating the cluster. To achieve high availability of the mnode, subsequent dnodes need to manually create mnode. Note that a maximum of three mnodes can be created for one cluster, and only one mnode can be created on each dnode. When the number of dnodes in the cluster reaches or exceeds three, you can create an mnode for the existing cluster. On the first dnode, log into TDengine using the TDengine CLI program `taos`, then execute the following SQL.
When creating a TDengine cluster, the first dnode automatically becomes the mnode of the cluster, responsible for managing and coordinating the cluster. To achieve high availability of mnode, subsequent dnodes need to manually create mnode. Please note that a cluster can create up to 3 mnodes, and only one mnode can be created on each dnode. When the number of dnodes in the cluster reaches or exceeds 3, you can create mnode for the existing cluster. In the first dnode, first log into TDengine through the CLI program taos, then execute the following SQL.
```shell
create mnode on dnode <dnodeId>
```
Be sure to replace the dnodeId in the example above with the sequence number of the newly created dnode (this can be obtained by executing the `show dnodes` command). Finally, execute the following `show mnodes` command to check whether the newly created mnode has successfully joined the cluster.
Please note to replace the dnodeId in the example above with the serial number of the newly created dnode (which can be obtained by executing the `show dnodes` command). Finally, execute the following `show mnodes` to see if the newly created mnode has successfully joined the cluster.
:::tip
**Tips**
During the process of building a TDengine cluster, if the new node always shows as offline after executing the `create dnode` command to add a new node, please follow these steps for troubleshooting.
During the process of setting up a TDengine cluster, if a new node always shows as offline after executing the create dnode command to add a new node, please follow these steps for troubleshooting.
- **Step 1:** Check whether the `taosd` service on the new node has started correctly. You can confirm this by checking the log files or using the `ps` command.
- **Step 2:** If the `taosd` service has started, please check whether the network connection on the new node is smooth and whether the firewall is disabled. Poor network connectivity or firewall settings may prevent the node from communicating with other nodes in the cluster.
- **Step 3:** Use the command `taos -h fqdn` to try connecting to the new node, then execute the `show dnodes` command. This will display the operating status of the new node as an independent cluster. If the displayed list is inconsistent with what is shown on the main node, it indicates that the new node may have formed a single-node cluster independently. To resolve this issue, first stop the `taosd` service on the new node. Next, clear all files in the dataDir directory specified in the `taos.cfg` configuration file on the new node. This will remove all data and configuration information related to that node. Finally, restart the `taosd` service on the new node. This will restore the new node to its initial state and prepare it to rejoin the main cluster.
:::
- Step 1, check whether the taosd service on the new node has started normally. You can confirm this by checking the log files or using the ps command.
- Step 2, if the taosd service has started, next check whether the new node's network connection is smooth and confirm whether the firewall has been turned off. Network issues or firewall settings may prevent the node from communicating with other nodes in the cluster.
- Step 3, use the taos -h fqdn command to try to connect to the new node, then execute the show dnodes command. This will display the running status of the new node as an independent cluster. If the displayed list is inconsistent with that shown on the main node, it indicates that the new node may have formed a single-node cluster on its own. To resolve this issue, follow these steps. First, stop the taosd service on the new node. Second, clear all files in the dataDir directory specified in the taos.cfg configuration file on the new node. This will delete all data and configuration information related to that node. Finally, restart the taosd service on the new node. This will reset the new node to its initial state, ready to rejoin the main cluster.
### Deploying taosAdapter
This section describes how to deploy taosAdapter, which provides RESTful and WebSocket access capabilities for the TDengine cluster, thus playing a critical role in the cluster.
This section discusses how to deploy taosAdapter, which provides RESTful and WebSocket access capabilities for the TDengine cluster, thus playing a very important role in the cluster.
1. **Installation**
1. Installation
After completing the installation of TDengine Enterprise, you can use taosAdapter. If you want to deploy taosAdapter on different servers, TDengine Enterprise must be installed on those servers as well.
After the installation of TDengine Enterprise is complete, taosAdapter can be used. If you want to deploy taosAdapter on different servers, TDengine Enterprise needs to be installed on these servers.
2. **Single Instance Deployment**
2. Single Instance Deployment
Deploying a single instance of taosAdapter is straightforward; please refer to the manual for the commands and configuration parameters in the taosAdapter section.
Deploying a single instance of taosAdapter is very simple. For specific commands and configuration parameters, please refer to the taosAdapter section in the manual.
3. **Multi-instance Deployment**
3. Multiple Instances Deployment
The primary purposes of deploying multiple instances of taosAdapter are:
The main purposes of deploying multiple instances of taosAdapter are as follows:
- To enhance the throughput of the cluster and avoid taosAdapter becoming a bottleneck in the system.
- To improve the robustness and high availability of the cluster, so that when one instance fails to provide service for some reason, incoming requests to the business system can be automatically routed to other instances.
- To increase the throughput of the cluster and prevent taosAdapter from becoming a system bottleneck.
- To enhance the robustness and high availability of the cluster, allowing requests entering the business system to be automatically routed to other instances when one instance fails.
When deploying multiple instances of taosAdapter, you need to address the load balancing issue to avoid overloading one node while others remain idle. During deployment, you need to deploy multiple single instances separately, with the deployment steps for each instance being identical to that of a single instance. The critical part next is configuring Nginx. Below is a validated best practice configuration; you only need to replace the endpoint with the correct address for your actual environment. For the meanings of each parameter, please refer to the official Nginx documentation.
When deploying multiple instances of taosAdapter, it is necessary to address load balancing issues to avoid overloading some nodes while others remain idle. During the deployment process, multiple single instances need to be deployed separately, and the deployment steps for each instance are exactly the same as those for deploying a single instance. The next critical part is configuring Nginx. Below is a verified best practice configuration; you only need to replace the endpoint with the correct address in the actual environment. For the meanings of each parameter, please refer to the official Nginx documentation.
```json
user root;
worker_processes auto;
error_log /var/log/nginx_error.log;
events {
use epoll;
worker_connections 1024;
}
http {
access_log off;
map $http_upgrade $connection_upgrade {
@ -169,7 +167,7 @@ http {
location ~* {
proxy_pass http://keeper;
proxy_read_timeout 60s;
proxy_next_upstream error http_502 http_500 non_idempotent;
proxy_next_upstream error http_502 http_500 non_idempotent;
}
}
@ -178,7 +176,7 @@ http {
location ~* {
proxy_pass http://explorer;
proxy_read_timeout 60s;
proxy_next_upstream error http_502 http_500 non_idempotent;
proxy_next_upstream error http_502 http_500 non_idempotent;
}
}
upstream dbserver {
@ -193,7 +191,7 @@ http {
server 172.16.214.202:6043 ;
server 172.16.214.203:6043 ;
}
upstream explorer {
upstream explorer{
ip_hash;
server 172.16.214.201:6060 ;
server 172.16.214.202:6060 ;
@ -204,27 +202,27 @@ http {
### Deploying taosKeeper
If you want to use the monitoring capabilities of TDengine, taosKeeper is a necessary component. Please refer to [TDinsight](../../tdengine-reference/components/tdinsight/) for monitoring details and refer to the [taosKeeper Reference Manual](../../tdengine-reference/components/taoskeeper/) for details on deploying taosKeeper.
To use the monitoring capabilities of TDengine, taosKeeper is an essential component. For monitoring, please refer to [TDinsight](../../tdengine-reference/components/tdinsight), and for details on deploying taosKeeper, please refer to the [taosKeeper Reference Manual](../../tdengine-reference/components/taoskeeper).
### Deploying taosX
If you want to use TDengine's data access capabilities, you need to deploy the taosX service. For detailed descriptions and deployment instructions, please refer to the Enterprise Edition Reference Manual.
To utilize the data ingestion capabilities of TDengine, it is necessary to deploy the taosX service. For detailed explanations and deployment, please refer to the enterprise edition reference manual.
### Deploying taosX-Agent
Some data sources, such as Pi and OPC, may have network conditions and access restrictions that prevent taosX from directly accessing the data sources. In such cases, it is necessary to deploy a proxy service, taosX-Agent. For detailed descriptions and deployment instructions, please refer to the Enterprise Edition Reference Manual.
For some data sources such as Pi, OPC, etc., due to network conditions and data source access restrictions, taosX cannot directly access the data sources. In such cases, a proxy service, taosX-Agent, needs to be deployed. For detailed explanations and deployment, please refer to the enterprise edition reference manual.
### Deploying taos-Explorer
TDengine provides capabilities to visually manage the TDengine cluster. To use the graphical interface, you need to deploy the taos-Explorer service. For detailed descriptions and deployment instructions, please refer to the [taos-Explorer Reference Manual](../../tdengine-reference/components/taosexplorer/).
TDengine provides the capability to visually manage TDengine clusters. To use the graphical interface, the taos-Explorer service needs to be deployed. For detailed explanations and deployment, please refer to the [taos-Explorer Reference Manual](../../tdengine-reference/components/taosexplorer/)
## Docker Deployment
This section will introduce how to start the TDengine service in a Docker container and access it. You can use environment variables in the `docker run` command line or the `docker-compose` file to control the behavior of the services in the container.
This section will explain how to start TDengine services in Docker containers and access them. You can use environment variables in the docker run command line or docker-compose file to control the behavior of services in the container.
### Starting TDengine
The TDengine image activates the HTTP service by default upon startup. You can create a containerized TDengine environment with HTTP service using the following command.
The TDengine image is launched with HTTP service activated by default. Use the following command to create a containerized TDengine environment with HTTP service.
```shell
docker run -d --name tdengine \
@ -233,12 +231,12 @@ docker run -d --name tdengine \
-p 6041:6041 tdengine/tdengine
```
The detailed parameter explanations are as follows:
Detailed parameter explanations are as follows:
- `/var/lib/taos`: Default directory for TDengine data files, which can be modified in the configuration file.
- `/var/log/taos`: Default directory for TDengine log files, which can also be modified in the configuration file.
- /var/lib/taos: Default data file directory for TDengine, can be modified through the configuration file.
- /var/log/taos: Default log file directory for TDengine, can be modified through the configuration file.
The above command starts a container named `tdengine` and maps port 6041 of the HTTP service within the container to port 6041 on the host. You can use the following command to verify whether the HTTP service provided in that container is available.
The above command starts a container named tdengine and maps the HTTP service's port 6041 to the host port 6041. The following command can verify if the HTTP service in the container is available.
```shell
curl -u root:taosdata -d "show databases" localhost:6041/rest/sql
@ -257,17 +255,17 @@ taos> show databases;
Query OK, 2 rows in database (0.033802s)
```
In the container, the TDengine CLI or various connectors (such as JDBC-JNI) establish connections with the server using the container's hostname. Accessing TDengine from outside the container is relatively complex; using RESTful/WebSocket connection methods is the simplest approach.
Within the container, TDengine CLI or various connectors (such as JDBC-JNI) connect to the server via the container's hostname. Accessing TDengine inside the container from outside is more complex, and using RESTful/WebSocket connection methods is the simplest approach.
### Starting TDengine in Host Network Mode
### Starting TDengine in host network mode
You can run the following command to start TDengine in host network mode, allowing connections to be established using the host's FQDN instead of the container's hostname.
Run the following command to start TDengine in host network mode, which allows using the host's FQDN to establish connections, rather than using the container's hostname.
```shell
docker run -d --name tdengine --network host tdengine/tdengine
```
This method has the same effect as starting TDengine using the `systemctl` command on the host. If the TDengine client is already installed on the host, you can directly access the TDengine service using the command below.
This method is similar to starting TDengine on the host using the systemctl command. If the TDengine client is already installed on the host, you can directly use the following command to access the TDengine service.
```shell
$ taos
@ -279,9 +277,9 @@ taos> show dnodes;
Query OK, 1 rows in database (0.010654s)
```
### Starting TDengine with Specified Hostname and Port
### Start TDengine with a specified hostname and port
You can use the following command to set the `TAOS_FQDN` environment variable or the `fqdn` configuration item in `taos.cfg` to have TDengine establish a connection using the specified hostname. This approach provides greater flexibility for deploying TDengine.
Use the following command to establish a connection on a specified hostname using the TAOS_FQDN environment variable or the fqdn configuration item in taos.cfg. This method provides greater flexibility for deploying TDengine.
```shell
docker run -d \
@ -293,43 +291,42 @@ docker run -d \
tdengine/tdengine
```
First, the above command starts a TDengine service in the container that listens on the hostname `tdengine`, mapping the container's port 6030 to port 6030 on the host and mapping the container's port range [6041, 6049] to the host's port range [6041, 6049]. If the port range on the host is already occupied, you can modify the above command to specify a free port range on the host.
First, the above command starts a TDengine service in the container, listening on the hostname tdengine, and maps the container's port 6030 to the host's port 6030, and the container's port range [6041, 6049] to the host's port range [6041, 6049]. If the port range on the host is already in use, you can modify the command to specify a free port range on the host.
Secondly, ensure that the hostname `tdengine` is resolvable in the `/etc/hosts` file. You can save the correct configuration information to the hosts file using the command below.
Secondly, ensure that the hostname tdengine is resolvable in /etc/hosts. Use the following command to save the correct configuration information to the hosts file.
```shell
echo 127.0.0.1 tdengine |sudo tee -a /etc/hosts
```
Finally, you can access the TDengine service using the TDengine CLI with `tdengine` as the server address using the following command.
Finally, you can access the TDengine service using the TDengine CLI with tdengine as the server address, as follows.
```shell
taos -h tdengine -P 6030
```
If `TAOS_FQDN` is set to the same value as the hostname of the host, the effect will be the same as "starting TDengine in host network mode".
If TAOS_FQDN is set to the same as the hostname of the host, the effect is the same as "starting TDengine in host network mode".
## Kubernetes Deployment
As a time-series database designed for cloud-native architecture, TDengine natively supports Kubernetes deployment. This section describes how to create a high-availability TDengine cluster for production use step-by-step using YAML files, focusing on common operations for TDengine in a Kubernetes environment. This subsection requires readers to have a certain understanding of Kubernetes and be familiar with common `kubectl` commands, as well as concepts such as `statefulset`, `service`, `pvc`, etc. Readers who are unfamiliar with these concepts can refer to the official Kubernetes website for study.
As a time-series database designed for cloud-native architectures, TDengine inherently supports Kubernetes deployment. This section introduces how to step-by-step create a highly available TDengine cluster for production use using YAML files, with a focus on common operations of TDengine in a Kubernetes environment. This subsection requires readers to have a certain understanding of Kubernetes, be proficient in running common kubectl commands, and understand concepts such as statefulset, service, and pvc. Readers unfamiliar with these concepts can refer to the Kubernetes official website for learning.
To meet the requirements of high availability, the cluster needs to meet the following requirements:
To meet the high-availability requirements, the cluster must meet the following criteria:
- **Three or more dnodes:** Multiple vnodes from the same vgroup in TDengine cannot be distributed across a single dnode at the same time. Therefore, if creating a database with three replicas, the number of dnodes must be greater than or equal to three.
- **Three mnodes:** The mnode is responsible for managing the entire cluster. By default, TDengine starts with one mnode. If this mnode's dnode goes offline, the entire cluster becomes unavailable.
- **Three replicas of the database:** The replica configuration of TDengine is at the database level. Therefore, having three replicas of the database allows the cluster to continue functioning normally even if one of the dnodes goes offline. If two dnodes go offline, the cluster becomes unavailable, as RAFT cannot complete the election. (Enterprise Edition: In disaster recovery scenarios, any node with damaged data files can be recovered by restarting a dnode.)
- 3 or more dnodes: Multiple vnodes in the same vgroup of TDengine should not be distributed on the same dnode, so if creating a database with 3 replicas, the number of dnodes should be 3 or more.
- 3 mnodes: mnodes are responsible for managing the entire cluster, with TDengine defaulting to one mnode. If the dnode hosting this mnode goes offline, the entire cluster becomes unavailable.
- 3 replicas of the database: TDengine's replica configuration is at the database level, so 3 replicas can ensure that the cluster remains operational even if any one of the 3 dnodes goes offline. If 2 dnodes go offline, the cluster becomes unavailable because RAFT cannot complete the election. (Enterprise edition: In disaster recovery scenarios, if the data files of any node are damaged, recovery can be achieved by restarting the dnode.)
### Prerequisites
To deploy and manage a TDengine cluster using Kubernetes, the following preparations must be made:
To deploy and manage a TDengine cluster using Kubernetes, the following preparations need to be made.
- This article is applicable to Kubernetes v1.19 and above.
- The installation of the `kubectl` tool for deployment is required; ensure the relevant software is installed.
- Kubernetes must be installed, deployed, and accessible, or the necessary container repositories or other services must be updated.
- This article applies to Kubernetes v1.19 and above.
- This article uses the kubectl tool for installation and deployment, please install the necessary software in advance.
- Kubernetes has been installed and deployed and can normally access or update necessary container repositories or other services.
### Configure Service
Create a service configuration file: `taosd-service.yaml`. The service name `metadata.name` (here as "taosd") will be used in the next step. First, add the ports used by TDengine, and then set the selector to confirm the label `app` (here as “tdengine”).
Create a Service configuration file: taosd-service.yaml, the service name metadata.name (here "taosd") will be used in the next step. First, add the ports used by TDengine, then set the determined labels app (here "tdengine") in the selector.
```yaml
---
@ -351,11 +348,11 @@ spec:
app: "tdengine"
```
### Stateful Service StatefulSet
### Stateful Services StatefulSet
According to Kubernetes's instructions for various deployments, we will use `StatefulSet` as the deployment resource type for TDengine. Create a file `tdengine.yaml`, where `replicas` defines the number of nodes in the cluster to be 3. The timezone for the nodes is set to China (Asia/Shanghai), and each node is allocated 5G standard storage (you can modify this according to your actual situation).
According to Kubernetes' descriptions of various deployment types, we will use StatefulSet as the deployment resource type for TDengine. Create the file tdengine.yaml, where replicas define the number of cluster nodes as 3. The node timezone is set to China (Asia/Shanghai), and each node is allocated 5G of standard storage, which you can modify according to actual conditions.
Pay special attention to the configuration of `startupProbe`. When the Pod of the dnode goes offline for a period and then restarts, the newly launched dnode may be temporarily unavailable. If the `startupProbe` configuration is too small, Kubernetes will consider that Pod to be in an abnormal state and will attempt to restart it frequently, causing the dnode's Pod to repeatedly restart and never recover to a normal state.
Please pay special attention to the configuration of startupProbe. After a dnode's Pod goes offline for a period of time and then restarts, the newly online dnode will be temporarily unavailable. If the startupProbe configuration is too small, Kubernetes will consider the Pod to be in an abnormal state and attempt to restart the Pod. This dnode's Pod will frequently restart and never return to a normal state.
```yaml
---
@ -379,6 +376,18 @@ spec:
labels:
app: "tdengine"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- tdengine
topologyKey: kubernetes.io/hostname
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.2.3.0"
@ -450,15 +459,15 @@ spec:
storage: "5Gi"
```
### Use kubectl Commands to Deploy the TDengine Cluster
### Deploying TDengine Cluster Using kubectl Command
First, create the corresponding namespace `dengine-test` and `pvc`, ensuring that the `storageClassName` is `standard` and that there is enough remaining space. Then execute the following commands in order:
First, create the corresponding namespace `dengine-test`, as well as the PVC, ensuring that there is enough remaining space with `storageClassName` set to `standard`. Then execute the following commands in sequence:
```shell
kubectl apply -f taosd-service.yaml -n tdengine-test
```
The above configuration will generate a three-node TDengine cluster. The dnode will be automatically configured, and you can use the command `show dnodes` to check the current nodes in the cluster:
The above configuration will create a three-node TDengine cluster, with `dnode` automatically configured. You can use the `show dnodes` command to view the current cluster nodes:
```shell
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "show dnodes"
@ -478,7 +487,7 @@ taos show dnodes
Query OK, 3 row(s) in set (0.001853s)
```
Check the current mnode
View the current mnode:
```shell
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
@ -500,7 +509,7 @@ kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 2
kubectl exec -it tdengine-0 -n tdengine-test -- taos -s "create mnode on dnode 3"
```
Check mnode
View mnode
```shell
kubectl exec -it tdengine-1 -n tdengine-test -- taos -s "show mnodes\G"
@ -532,13 +541,13 @@ Query OK, 3 row(s) in set (0.003108s)
### Port Forwarding
Using the `kubectl port-forward` feature allows applications to access the TDengine cluster running in the Kubernetes environment.
Using kubectl port forwarding feature allows applications to access the TDengine cluster running in the Kubernetes environment.
```shell
kubectl port-forward -n tdengine-test tdengine-0 6041:6041 &
```
Use the `curl` command to verify the availability of the TDengine REST API on port 6041.
Use the curl command to verify the TDengine REST API using port 6041.
```shell
curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
@ -553,13 +562,13 @@ TDengine supports cluster expansion:
kubectl scale statefulsets tdengine -n tdengine-test --replicas=4
```
In the above command, the parameter `--replica=4` indicates that the TDengine cluster should be expanded to 4 nodes. After execution, first check the status of the PODs:
The command line argument `--replica=4` indicates that the TDengine cluster is to be expanded to 4 nodes. After execution, first check the status of the POD:
```shell
kubectl get pod -l app=tdengine -n tdengine-test -o wide
```
The output is as follows:
Output as follows:
```text
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
@ -569,13 +578,13 @@ tdengine-2 1/1 Running 0 5h16m 10.244.1.224 node85 <
tdengine-3 1/1 Running 0 3m24s 10.244.2.76 node86 <none> <none>
```
At this point, the status of the PODs is still Running. The dnode status in the TDengine cluster can only be seen after the POD status is ready:
At this point, the Pod status is still Running. The dnode status in the TDengine cluster can be seen after the Pod status changes to ready:
```shell
kubectl exec -it tdengine-3 -n tdengine-test -- taos -s "show dnodes"
```
The list of dnodes in the expanded four-node TDengine cluster:
The dnode list of the four-node TDengine cluster after expansion:
```text
taos> show dnodes
@ -588,15 +597,12 @@ taos> show dnodes
Query OK, 4 row(s) in set (0.003628s)
```
### Cleaning the Cluster
### Cleaning up the Cluster
:::warning
**Warning**
When deleting PVCs, pay attention to the PV persistentVolumeReclaimPolicy. It is recommended to set it to Delete, so that when the PVC is deleted, the PV will be automatically cleaned up, along with the underlying CSI storage resources. If the policy to automatically clean up PVs when deleting PVCs is not configured, after deleting the PVCs, manually cleaning up the PVs may not release the corresponding CSI storage resources.
When deleting PVCs, it is important to note the `pv persistentVolumeReclaimPolicy` strategy. It is recommended to change it to `Delete`, so that when the PVC is deleted, the PV is automatically cleaned, along with the underlying CSI storage resources. If the strategy to automatically clean PVs upon PVC deletion is not configured, then after deleting the PVC, the corresponding CSI storage resources for the PV may not be released when manually cleaning PVs.
:::
To completely remove the TDengine cluster, you need to clear the `statefulset`, `svc`, and `pvc` separately, and finally delete the namespace.
To completely remove the TDengine cluster, you need to clean up the statefulset, svc, pvc, and finally delete the namespace.
```shell
kubectl delete statefulset -l app=tdengine -n tdengine-test
@ -605,17 +611,17 @@ kubectl delete pvc -l app=tdengine -n tdengine-test
kubectl delete namespace tdengine-test
```
### Cluster Disaster Recovery Capability
### Cluster Disaster Recovery Capabilities
For TDengine's high availability and reliability in the Kubernetes environment, regarding hardware damage and disaster recovery, it can be discussed on two levels:
For high availability and reliability of TDengine in a Kubernetes environment, in terms of hardware damage and disaster recovery, it is discussed on two levels:
- The underlying distributed block storage has disaster recovery capabilities. With multiple replicas of block storage, popular distributed block storage solutions like Ceph possess multi-replica capabilities, extending storage replicas to different racks, cabinets, rooms, or data centers (or directly using the block storage services provided by public cloud vendors).
- The disaster recovery capability of TDengine itself: In TDengine Enterprise, it can automatically restore the work of a dnode if it goes offline permanently (such as due to physical disk damage or data loss) by restarting a blank dnode.
- The disaster recovery capabilities of the underlying distributed block storage, which includes multiple replicas of block storage. Popular distributed block storage like Ceph has multi-replica capabilities, extending storage replicas to different racks, cabinets, rooms, and data centers (or directly using block storage services provided by public cloud vendors).
- TDengine's disaster recovery, in TDengine Enterprise, inherently supports the recovery of a dnode's work by launching a new blank dnode when an existing dnode permanently goes offline (due to physical disk damage and data loss).
## Using Helm to Deploy the TDengine Cluster
## Deploying TDengine Cluster with Helm
Helm is the package manager for Kubernetes.
While the operations of deploying the TDengine cluster using Kubernetes are already simple, Helm can provide even more powerful capabilities.
The previous section on deploying the TDengine cluster with Kubernetes was simple enough, but Helm can provide even more powerful capabilities.
### Installing Helm
@ -626,31 +632,32 @@ chmod +x get_helm.sh
./get_helm.sh
```
Helm will use `kubectl` and the kubeconfig configuration to operate Kubernetes; you can refer to the Rancher installation of Kubernetes for configuration.
Helm operates Kubernetes using kubectl and kubeconfig configurations, which can be set up following the Rancher installation configuration for Kubernetes.
### Installing the TDengine Chart
### Installing TDengine Chart
The TDengine Chart has not yet been released to the Helm repository. Currently, it can be directly downloaded from GitHub:
The TDengine Chart has not yet been released to the Helm repository, it can currently be downloaded directly from GitHub:
```shell
wget https://github.com/taosdata/TDengine-Operator/raw/3.0/helm/tdengine-3.0.2.tgz
```
Get the current Kubernetes storage classes:
Retrieve the current Kubernetes storage class:
```shell
kubectl get storageclass
```
In `minikube`, the default is `standard`. Then use the `helm` command to install:
In minikube, the default is standard. Then, use the helm command to install:
```shell
helm install tdengine tdengine-3.0.2.tgz \
--set storage.className=<your storage class name> \
--set image.tag=3.2.3.0
```
In the `minikube` environment, you can set a smaller capacity to avoid exceeding the available disk space:
In a minikube environment, you can set a smaller capacity to avoid exceeding disk space:
```shell
helm install tdengine tdengine-3.0.2.tgz \
@ -681,22 +688,22 @@ kubectl --namespace default exec $POD_NAME -- \
select * from t1;"
```
### Configuring Values
### Configuring values
TDengine supports `values.yaml` customization.
You can use `helm show values` to get the complete list of values supported by the TDengine Chart:
TDengine supports customization through `values.yaml`.
You can obtain the complete list of values supported by the TDengine Chart with helm show values:
```shell
helm show values tdengine-3.0.2.tgz
```
You can save the output as `values.yaml`, then modify the parameters such as the number of replicas, storage class name, capacity size, TDengine configuration, etc., and install the TDengine cluster using the command below:
You can save the results as `values.yaml`, then modify various parameters in it, such as the number of replicas, storage class name, capacity size, TDengine configuration, etc., and then use the following command to install the TDengine cluster:
```shell
helm install tdengine tdengine-3.0.2.tgz -f values.yaml
```
The full parameters are as follows:
All parameters are as follows:
```yaml
# Default values for tdengine.
@ -838,7 +845,7 @@ taoscfg:
# 199: output debug, warning and error to both screen and file
# 207: output trace, debug, warning and error to both screen and file
#
# debug flag for all log type, take effect when non-zero value
# debug flag for all log type, take effect when non-zero value\
#TAOS_DEBUG_FLAG: "143"
# generate core file when service crash
@ -847,8 +854,8 @@ taoscfg:
### Expansion
For expansion, refer to the explanation in the previous section. There are some additional operations needed to obtain from the Helm deployment.
First, obtain the name of the StatefulSet from the deployment.
For expansion, refer to the explanation in the previous section, with some additional operations needed from the helm deployment.
First, retrieve the name of the StatefulSet from the deployment.
```shell
export STS_NAME=$(kubectl get statefulset \
@ -856,20 +863,20 @@ export STS_NAME=$(kubectl get statefulset \
-o jsonpath="{.items[0].metadata.name}")
```
The expansion operation is straightforward; just increase the replica count. The following command expands TDengine to three nodes:
The expansion operation is extremely simple, just increase the replica. The following command expands TDengine to three nodes:
```shell
kubectl scale --replicas 3 statefulset/$STS_NAME
```
Use the commands `show dnodes` and `show mnodes` to check whether the expansion was successful.
Use the commands `show dnodes` and `show mnodes` to check if the expansion was successful.
### Cleaning the Cluster
### Cleaning up the Cluster
With Helm management, the cleanup operation has also become simple:
Under Helm management, the cleanup operation also becomes simple:
```shell
helm uninstall tdengine
```
However, Helm will not automatically remove PVCs; you need to manually retrieve and delete them.
However, Helm will not automatically remove PVCs, you need to manually retrieve and then delete the PVCs.

View File

@ -1,17 +1,17 @@
---
title: Maintain Your Cluster
title: Maintaining Your Cluster
slug: /operations-and-maintenance/maintain-your-cluster
---
This section introduces the advanced cluster maintenance techniques provided in TDengine Enterprise, which can help TDengine clusters run more robustly and efficiently over the long term.
This section introduces advanced cluster maintenance methods provided in TDengine Enterprise, which can make the TDengine cluster run more robustly and efficiently over the long term.
## Node Management
For managing cluster nodes, please refer to [Manage Nodes](../../tdengine-reference/sql-manual/manage-nodes/).
For how to manage cluster nodes, please refer to [Node Management](../../tdengine-reference/sql-manual/manage-nodes/)
## Data Compaction
## Data Reorganization
TDengine is designed for various write scenarios, but in many cases, its storage can lead to data amplification or empty spaces in data files. This not only affects storage efficiency but can also impact query performance. To address these issues, TDengine Enterprise provides a data compaction feature, called DATA COMPACT, which reorganizes stored data files, removes empty spaces and invalid data, and improves data organization, thereby enhancing both storage and query efficiency. The data compaction feature was first released in version 3.0.3.0 and has undergone multiple iterations and optimizations since then, so it is recommended to use the latest version.
TDengine is designed for various writing scenarios, and many of these scenarios can lead to data storage amplification or data file holes in TDengine's storage. This not only affects the efficiency of data storage but also impacts query performance. To address these issues, TDengine Enterprise offers a data reorganization feature, namely the DATA COMPACT function, which reorganizes stored data files, removes file holes and invalid data, improves data organization, and thus enhances storage and query efficiency. The data reorganization feature was first released in version 3.0.3.0 and has since undergone several iterations of optimization. It is recommended to use the latest version.
### Syntax
@ -23,87 +23,87 @@ KILL COMPACT compact_id;
### Effects
- Scans and compresses all data files of all VGROUPs in the specified database.
- COMPACT will delete deleted data and data of deleted tables.
- COMPACT will merge multiple STT files.
- You can specify the start time of the data to be compacted using the `start with` keyword.
- You can specify the end time of the data to be compacted using the `end with` keyword.
- The COMPACT command will return the ID of the COMPACT task.
- The COMPACT task will execute asynchronously in the background; you can check the progress of the COMPACT task using the SHOW COMPACTS command.
- The SHOW command will return the ID of the COMPACT task, and you can terminate the COMPACT task using the KILL COMPACT command.
- Scans and compresses all data files in all VGROUP VNODEs of the specified DB
- COMPACT will delete data that has been deleted and data from deleted tables
- COMPACT will merge multiple STT files
- You can specify the start time of the COMPACT data with the start with keyword
- You can specify the end time of the COMPACT data with the end with keyword
- The COMPACT command will return the ID of the COMPACT task
- COMPACT tasks are executed asynchronously in the background, and you can view the progress of COMPACT tasks using the SHOW COMPACTS command
- The SHOW command will return the ID of the COMPACT task, and you can terminate the COMPACT task using the KILL COMPACT command
### Additional Notes
### Additional Information
- COMPACT is asynchronous; after executing the COMPACT command, it will return without waiting for the COMPACT to finish. If a previous COMPACT has not completed and another COMPACT task is initiated, it will wait for the previous task to finish before returning.
- COMPACT may block writes, especially in databases with `stt_trigger = 1`, but it does not block queries.
- COMPACT is asynchronous; after executing the COMPACT command, it returns without waiting for the COMPACT to finish. If a previous COMPACT has not completed, it will wait for the previous task to finish before returning.
- COMPACT may block writing, especially in databases where stt_trigger = 1, but it does not block queries.
## Vgroup Leader Rebalancing
After one or more nodes in a multi-replica cluster restart due to upgrades or other reasons, there may be an imbalance in the load among the dnodes, and in extreme cases, all the leaders of vgroups may be located on the same dnode. To resolve this issue, you can use the command below, which was first released in version 3.0.4.0. It is recommended to use the latest version whenever possible.
When one or more nodes in a multi-replica cluster restart due to upgrades or other reasons, it may lead to an imbalance in the load among the various dnodes in the cluster. In extreme cases, all vgroup leaders may be located on the same dnode. To solve this problem, you can use the following commands, which were first released in version 3.0.4.0. It is recommended to use the latest version as much as possible.
```SQL
balance vgroup leader; # Rebalance the leaders of all vgroups
balance vgroup leader on <vgroup_id>; # Rebalance the leader of a specific vgroup
balance vgroup leader database <database_name>; # Rebalance the leaders of all vgroups in a specific database
balance vgroup leader; # Rebalance all vgroup leaders
balance vgroup leader on <vgroup_id>; # Rebalance a vgroup leader
balance vgroup leader database <database_name>; # Rebalance all vgroup leaders within a database
```
### Functionality
This command attempts to distribute the leaders of one or all vgroups evenly across their respective replica nodes. It will force the vgroup to re-elect its leader, thereby changing the leader during the election process, which ultimately helps in achieving an even distribution of leaders.
Attempts to evenly distribute one or all vgroup leaders across their respective replica nodes. This command forces a re-election of the vgroup, changing the vgroup's leader during the election process, thereby eventually achieving an even distribution of leaders.
### Notes
### Note
Vgroup elections inherently involve randomness, so the even distribution produced by the re-election is also probabilistic and may not be completely uniform. The side effect of this command is that it can affect queries and writes; during the re-election of the vgroup, it will be unable to write or query until the new leader is elected. The election process typically completes within seconds. All vgroups will be re-elected one by one.
Vgroup elections are inherently random, so the even distribution produced by the re-election is also probabilistic and not perfectly even. The side effect of this command is that it affects queries and writing; during the vgroup re-election, from the start of the election to the election of a new leader, this vgroup cannot be written to or queried. The election process generally completes within seconds. All vgroups will be re-elected one by one sequentially.
## Restoring Data Nodes
## Restore Data Node
If a data node (dnode) in the cluster loses or damages all its data (e.g., due to disk failure or directory deletion), you can restore part or all of the logical nodes on that data node using the `restore dnode` command. This feature relies on data replication from other replicas, so it only works when the number of dnodes in the cluster is greater than or equal to 3, and the number of replicas is 3.
If the data on a data node (dnode) in the cluster is completely lost or damaged, such as due to disk damage or directory deletion, you can use the restore dnode command to recover some or all logical nodes on that data node. This feature relies on data replication from other replicas in the cluster, so it only works when the number of dnodes in the cluster is three or more and the number of replicas is three.
```SQL
restore dnode <dnode_id>; # Restore the mnode, all vnodes, and qnodes on the dnode
restore mnode on dnode <dnode_id>; # Restore the mnode on the dnode
restore vnode on dnode <dnode_id>; # Restore all vnodes on the dnode
restore qnode on dnode <dnode_id>; # Restore the qnode on the dnode
```sql
restore dnode <dnode_id>; # Restore mnode, all vnodes, and qnode on dnode
restore mnode on dnode <dnode_id>; # Restore mnode on dnode
restore vnode on dnode <dnode_id>; # Restore all vnodes on dnode
restore qnode on dnode <dnode_id>; # Restore qnode on dnode
```
### Limitations
- This function is based on the existing replication capability for recovery; it is not a disaster recovery or backup recovery feature. Therefore, for the mnode and vnode that you want to restore, the prerequisite for using this command is that there are still two other functioning replicas of that mnode or vnode.
- This command cannot repair individual file damage or loss in the data directory. For instance, if a particular file or data in a mnode or vnode is damaged, you cannot restore just that damaged file or data block. In this case, you can choose to clear all data from the mnode/vnode and then perform the restore.
- This feature is based on the recovery of existing replication capabilities, not disaster recovery or backup recovery. Therefore, for the mnode and vnode to be recovered, the prerequisite for using this command is that the other two replicas of the mnode or vnode can still function normally.
- This command cannot repair individual files in the data directory that are damaged or lost. For example, if individual files or data in an mnode or vnode are damaged, it is not possible to recover a specific file or block of data individually. In this case, you can choose to completely clear the data of that mnode/vnode and then perform recovery.
## Splitting Virtual Groups
When a vgroup experiences high CPU or disk resource utilization due to an excessive number of subtables, and additional dnodes have been added, you can use the `split vgroup` command to split that vgroup into two virtual groups. After the split, the newly created two vgroups will share the read and write services originally provided by one vgroup. This command was first released in version 3.0.6.0, and it is recommended to use the latest version whenever possible.
When a vgroup is overloaded with CPU or Disk resource usage due to too many subtables, after adding a dnode, you can split the vgroup into two virtual groups using the `split vgroup` command. After the split, the newly created two vgroups will undertake the read and write services originally provided by one vgroup. This command was first released in version 3.0.6.0, and it is recommended to use the latest version whenever possible.
```SQL
```sql
split vgroup <vgroup_id>
```
### Notes
### Note
- For a single-replica virtual group, the total disk space usage of historical time-series data may double after the split is complete. Therefore, before performing this operation, ensure that there are sufficient CPU and disk resources in the cluster by adding dnodes to avoid resource shortages.
- This command is a database-level transaction; during the execution process, other management transactions for the current database will be rejected. Other databases in the cluster will not be affected.
- The split task can continue to provide read and write services during execution; however, there may be a perceptible brief interruption in read and write services during this time.
- Streaming and subscriptions are not supported during the split. After the split, historical WAL will be cleared.
- The split operation supports fault tolerance for node crashes and restarts; however, it does not support fault tolerance for node disk failures.
- For single-replica vgroups, after the split, the total disk space usage of historical time-series data may double. Therefore, before performing this operation, ensure there are sufficient CPU and disk resources in the cluster by adding dnode nodes to avoid resource shortages.
- This command is a DB-level transaction; during execution, other management transactions of the current DB will be rejected. Other DBs in the cluster are not affected.
- During the split task, read and write services can continue; however, there may be a perceptible brief interruption in read and write operations.
- Streams and subscriptions are not supported during the splitting process. After the split ends, historical WAL will be cleared.
- During the split process, node crash restart fault tolerance is supported; however, node disk failure fault tolerance is not supported.
## Online Updating Cluster Configuration
## Online Cluster Configuration Update
Starting from version 3.1.1.0, TDengine Enterprise supports online hot updates for the important dnode configuration parameter `supportVnodes`. This parameter was originally configured in the `taos.cfg` configuration file, indicating the maximum number of vnodes that the dnode can support. When creating a database, new vnodes need to be allocated, and when deleting a database, its vnodes will be destroyed.
Starting from version 3.1.1.0, TDengine Enterprise supports online hot updates of the very important dnode configuration parameter `supportVnodes`. This parameter, originally configured in the `taos.cfg` file, represents the maximum number of vnodes that the dnode can support. When a database is created, new vnodes are allocated, and when a database is deleted, its vnodes are destroyed.
However, online updates to `supportVnodes` will not persist; after the system restarts, the maximum allowed number of vnodes will still be determined by the `supportVnodes` configured in `taos.cfg`.
However, online updates of `supportVnodes` do not persist, and after a system restart, the maximum number of vnodes allowed will still be determined by the `supportVnodes` configured in taos.cfg.
If the `supportVnodes` set by online updates or configuration file settings is less than the current actual number of vnodes, the existing vnodes will not be affected. However, whether a new database can be successfully created will still depend on the effective `supportVnodes` parameter.
If the `supportVnodes` set through online updates or configuration files is less than the current actual number of vnodes on the dnode, the existing vnodes will not be affected. However, whether a new database can be successfully created will still depend on the actual effective `supportVnodes` parameter.
## Dual Replica
## Dual Replicas
Dual replicas are a special high availability configuration for databases, and this section provides special instructions for their use and maintenance. This feature was first released in version 3.3.0.0, and it is recommended to use the latest version whenever possible.
Dual replicas are a special high-availability configuration for databases. This section provides special instructions for their use and maintenance. This feature was first released in version 3.3.0.0, and it is recommended to use the latest version whenever possible.
### Viewing the Status of Vgroups
You can view the status of each vgroup in the dual replica database using the following SQL command:
Use the following SQL commands to view the status of each Vgroup in a dual-replica database:
```SQL
```sql
show arbgroups;
select * from information_schema.ins_arbgroups;
@ -115,32 +115,32 @@ select * from information_schema.ins_arbgroups;
```
The `is_sync` column has two possible values:
is_sync has the following two values:
- 0: Vgroup data is not synchronized. In this state, if a vnode in the vgroup becomes inaccessible, the other vnode cannot be designated as the `AssignedLeader`, and the vgroup will be unable to provide services.
- 1: Vgroup data is synchronized. In this state, if a vnode in the vgroup becomes inaccessible, the other vnode can be designated as the `AssignedLeader`, and the vgroup can continue to provide services.
- 0: Vgroup data has not achieved synchronization. In this state, if one Vnode in the Vgroup is inaccessible, the other Vnode cannot be designated as the `AssignedLeader` role, and the Vgroup will not be able to provide service.
- 1: Vgroup data has achieved synchronization. In this state, if one Vnode in the Vgroup is inaccessible, the other Vnode can be designated as the `AssignedLeader` role, and the Vgroup can continue to provide service.
The `assigned_dnode` column:
assigned_dnode:
- Indicates the DnodeId of the vnode designated as the AssignedLeader.
- When there is no AssignedLeader specified, this column displays NULL.
- Identifies the DnodeId of the Vnode designated as AssignedLeader
- Displays NULL when no AssignedLeader is specified
The `assigned_token` column:
assigned_token:
- Indicates the token of the vnode designated as the AssignedLeader.
- When there is no AssignedLeader specified, this column displays NULL.
- Identifies the Token of the Vnode designated as AssignedLeader
- Displays NULL when no AssignedLeader is specified
### Best Practices
1. Fresh Deployment
1. New Deployment
The main value of dual replicas lies in saving storage costs while ensuring a certain level of high availability and reliability. In practice, it is recommended to configure:
The main value of dual replicas lies in saving storage costs while maintaining a certain level of high availability and reliability. In practice, the recommended configuration is:
- An N-node cluster (where N >= 3).
- N-1 of these dnodes are responsible for storing time-series data.
- The Nth dnode does not participate in the storage and reading of time-series data, meaning it does not hold replicas; this can be achieved by setting the `supportVnodes` parameter to 0.
- The dnode that does not store data replicas will have lower CPU/memory resource usage, allowing it to run on lower-specification servers.
- N node cluster (where N>=3)
- N-1 dnodes responsible for storing time-series data
- The Nth dnode does not participate in the storage and retrieval of time-series data, i.e., it does not store replicas; this can be achieved by setting the `supportVnodes` parameter to 0
- The dnode that does not store data replicas also has lower CPU/Memory resource usage, allowing the use of lower-specification servers
2. Upgrading from Single Replica
Assuming there is already a single replica cluster with N nodes (N >= 1), you can upgrade it to a dual replica cluster, ensuring that N >= 3 after the upgrade, and that the `supportVnodes` parameter of the newly added node is set to 0. After the cluster upgrade is complete, use the `alter database replica 2` command to change the number of replicas for a specific database.
Assuming there is an existing single replica cluster with N nodes (N>=1), and you want to upgrade it to a dual replica cluster, ensure that N>=3 after the upgrade, and configure the `supportVnodes` parameter of a newly added node to 0. After completing the cluster upgrade, use the command `alter database replica 2` to change the replica count for a specific database.

View File

@ -1,5 +1,5 @@
---
title: Monitor Your Cluster
title: Monitoring Your Cluster
slug: /operations-and-maintenance/monitor-your-cluster
---
@ -13,172 +13,160 @@ import imgMonitor6 from '../assets/monitor-your-cluster-06.png';
import imgMonitor7 from '../assets/monitor-your-cluster-07.png';
import imgMonitor8 from '../assets/monitor-your-cluster-08.png';
To ensure the stable operation of the cluster, TDengine integrates various monitoring metrics collection mechanisms, which are aggregated through taosKeeper. TaosKeeper is responsible for receiving this data and writing it into a separate TDengine instance, which can operate independently from the monitored TDengine cluster. The two core components of TDengine, taosd (the database engine) and taosX (the data access platform), use the same monitoring architecture to achieve runtime monitoring, but their monitoring metric designs differ.
To ensure stable operation of the cluster, TDengine integrates various monitoring metrics collection mechanisms and summarizes them through taosKeeper. taosKeeper is responsible for receiving these data and writing them into a separate TDengine instance, which can remain independent of the monitored TDengine cluster. The two core components of TDengine, taosd (database engine) and taosX (data access platform), both implement runtime monitoring through the same monitoring architecture, but their monitoring metrics are designed differently.
Regarding how to obtain and use this monitoring data, users can utilize third-party monitoring tools like Zabbix to gather these saved system monitoring data, seamlessly integrating the operational status of TDengine into existing IT monitoring systems. Alternatively, users can use the TDinsight plugin provided by TDengine, which allows them to visually display and manage these monitoring information through the Grafana platform, as shown in the image below. This provides users with flexible monitoring options to meet different operational needs.
As for how to obtain and use these monitoring data, users can use third-party monitoring tools such as Zabbix to retrieve the saved system monitoring data, thereby seamlessly integrating the operation status of TDengine into the existing IT monitoring system. Alternatively, users can use the TDinsight plugin provided by TDengine, which allows users to visually display and manage these monitoring information through the Grafana platform, as shown in the figure below. This provides users with flexible monitoring options to meet operational needs in different scenarios.
<figure>
<Image img={imgMonitor1} alt="Managing monitoring information"/>
<figcaption>Figure 1. Managing monitoring information</figcaption>
</figure>
## Configuring TaosKeeper
## Configuring taosKeeper
Since all monitoring data of TDengine is reported and stored through TaosKeeper, this section first introduces the configuration of TaosKeeper.
Since the monitoring data of TDengine are all reported and stored through taosKeeper, this section will first introduce the configuration of taosKeeper.
The TaosKeeper configuration file is typically located at `/etc/taos/taoskeeper.toml`. Detailed configuration can be found in the [Reference Manual](../../tdengine-reference/components/taoskeeper/#configuration-file). One of the most critical configuration items is `database`, which determines which database in the target system will store the collected monitoring data.
The configuration file of taosKeeper is located by default at `/etc/taos/taoskeeper.toml`. For detailed configuration, see [Reference Manual](../../tdengine-reference/components/taoskeeper/#configuration-file). Among them, the most critical configuration item is `database`, which determines which database in the target system the collected monitoring data is stored in.
## Monitoring Taosd
## Monitoring taosd
### Monitoring Taosd Based on TDinsight
### Monitoring taosd with TDinsight
To simplify user configuration for monitoring TDengine, TDengine provides a Grafana plugin called TDinsight. This plugin works in conjunction with TaosKeeper to monitor various performance metrics of TDengine in real-time.
To simplify the configuration work of users in TDengine monitoring, TDengine provides a Grafana plugin named TDinsight. This plugin works in conjunction with taosKeeper and can monitor various performance metrics of TDengine in real-time.
By integrating Grafana with the TDengine data source plugin, TDinsight can read the monitoring data collected by TaosKeeper. This enables users to intuitively view key metrics such as the status of the TDengine cluster, node information, read/write requests, and resource usage on the Grafana platform, allowing for data visualization.
By integrating Grafana and TDengine data source plugin, TDinsight can read the monitoring data collected by taosKeeper. This allows users to intuitively view the status of the TDengine cluster, node information, read/write requests, and resource usage on the Grafana platform, achieving data visualization.
Here are detailed instructions for using TDinsight to help you make the most of this powerful tool.
Below are the detailed instructions for using TDinsight to help you fully utilize this powerful tool.
#### Prerequisites
To successfully use TDinsight, the following conditions must be met:
To use TDinsight smoothly, the following conditions should be met.
- TDengine is installed and running normally.
- TaosAdapter is installed and running normally.
- TaosKeeper is installed and running normally.
- Grafana is installed and running normally; the following instructions are based on Grafana version 11.0.0.
- taosAdapter is installed and running normally.
- taosKeeper is installed and running normally.
- Grafana is installed and running normally, with the following introduction based on Grafna 11.0.0.
Also, record the following information:
Also, record the following information.
- The RESTful interface address of TaosAdapter, such as `http://www.example.com:6041`.
- The authentication information for the TDengine cluster, including the username and password.
- RESTful interface address of taosAdapter, such as `http://www.example.com:6041`.
- Authentication information of the TDengine cluster, including username and password.
#### Importing the Dashboard
The TDengine data source plugin has been submitted to the Grafana official website. For instructions on installing the TDengine data source plugin and configuring the data source, please refer to: [Install Grafana Plugin and Configure Data Source](../../third-party-tools/visualization/grafana/#install-grafana-plugin-and-configure-data-source). After completing the installation of the plugin and creating the data source, you can proceed to import the TDinsight dashboard.
The TDengine data source plugin has been submitted to the Grafana official website. For how to install the TDengine data source plugin and configure the data source, please refer to: [Install Grafana Plugin and Configure Data Source](../../third-party-tools/visualization/grafana/#install-grafana-plugin-and-configure-data-source). After completing the installation of the plugin and the creation of the data source, you can proceed with the import of the TDinsight dashboard.
On the Grafana "Home" -> "Dashboards" page, click the "New" -> "Import" button located in the upper right corner to enter the dashboard import page, which supports the following two import methods:
On the "Home" -> "Dashboards" page of Grafana, click the "New" -> "import" button in the upper right corner to enter the Dashboard import page, which supports the following two import methods.
- Dashboard ID: 18180.
- Dashboard URL: [https://grafana.com/grafana/dashboards/18180-tdinsight-for-3-x/](https://grafana.com/grafana/dashboards/18180-tdinsight-for-3-x/)
After filling in the Dashboard ID or Dashboard URL, click the "Load" button and follow the wizard to complete the import. Once the import is successful, the "TDinsight for 3.x" dashboard will appear on the Dashboards list page. Clicking on it will allow you to see the various metrics panels created in TDinsight, as shown in the image below:
After filling in the above Dashboard ID or Dashboard URL, click the "Load" button and follow the guide to complete the import. After the import is successful, the "TDinsight for 3.x" dashboard will appear on the Dashboards list page. Click to enter, and you will see the panels of various metrics created in TDinsight, as shown in the figure below:
<figure>
<Image img={imgMonitor2} alt="TDinsight interface"/>
<figcaption>Figure 2. TDinsight interface</figcaption>
</figure>
:::note
In the TDinsight interface, you can select the `log` database from the "Log from" dropdown list in the upper left corner.
:::
**Note** In the "Log from" dropdown list in the upper left corner of the TDinsight interface, you can select the `log` database.
### TDengine V3 Monitoring Data
The TDinsight dashboard data comes from the `log` database (the default database for storing monitoring data, which can be modified in the TaosKeeper configuration file). The "TDinsight for 3.x" dashboard queries the monitoring metrics of taosd and TaosAdapter.
TDinsight dashboard data comes from the `log` database (the default database for storing monitoring data, which can be modified in the taoskeeper configuration file). The "TDinsight for 3.x" dashboard queries monitoring metrics for taosd and TaosAdapter.
- For taosd monitoring metrics, please refer to [Taosd Monitoring Metrics](../../tdengine-reference/components/taosd/#taosd-monitoring-metrics).
- For TaosAdapter monitoring metrics, please refer to [TaosAdapter Monitoring Metrics](../../tdengine-reference/components/taosadapter/#taosadapter-monitoring-metrics).
- For taosd monitoring metrics, refer to [taosd monitoring metrics](../../tdengine-reference/components/taosd/)
- For taosAdapter monitoring metrics, refer to [taosAdapter monitoring metrics](../../tdengine-reference/components/taosadapter/)
## Monitoring TaosX
## Monitoring taosX
TaosX is the core component providing zero-code data access capabilities in TDengine, and monitoring it is also essential. The monitoring of TaosX is similar to that of TDengine; metrics collected from the service are written into a specified database through TaosKeeper, and visualized and alarmed using Grafana dashboards. This functionality can monitor the following objects:
taosX is a core component in TDengine that provides zero-code data access capabilities, and monitoring it is also very important. Monitoring of taosX is similar to TDengine monitoring; it involves using taosKeeper to write collected metrics into a specified database, then leveraging Grafana panels for visualization and alerts. The monitorable objects include:
1. The taosX process.
2. All running taosx-agent processes.
3. Various connector subprocesses running on either the taosX or taosx-agent side.
4. Various data writing tasks in progress.
1. taosX process
2. All running taosx-agent processes
3. Various connector subprocesses running on the taosX or taosx-agent side
4. Various data writing tasks in progress
### Prerequisites
1. Taosd, TaosAdapter, and TaosKeeper have all been successfully deployed and started.
2. The TaosX service monitoring configuration is correct. For configuration details, refer to the section "Configuring TaosX Monitoring" below. The service must start successfully.
:::note
The taosX included in TDengine Enterprise version 3.2.3.0 or above contains this functionality. If taosX is installed separately, it must be version 1.5.0 or above.
:::
3. Deploy Grafana, install the TDengine Datasource plugin, and configure the data source. You can refer to: [Install Grafana Plugin and Configure Data Source](../../third-party-tools/visualization/grafana/#install-grafana-plugin-and-configure-data-source).
:::note
You need to install Grafana plugin [TDengine Datasource v3.5.0](https://grafana.com/grafana/plugins/tdengine-datasource/) or a higher version.
:::
1. taosd, taosAdapter, and taosKeeper have all been deployed and started successfully.
2. taosX service monitoring is configured correctly, see below "Configuring taosX Monitoring" for configuration details, and the service has started successfully.
**Note**: TDengine Enterprise version 3.2.3.0 or above includes this functionality in taosX. If taosX is installed separately, it requires version 1.5.0 or above.
3. Deploy Grafana, install the TDengine Datasource plugin, and configure the data source. Refer to: [Install Grafana Plugin and configure data source](../../third-party-tools/visualization/grafana/).
**Note**: You need to install the Grafana plugin [TDengine Datasource v3.5.0](https://grafana.com/grafana/plugins/tdengine-datasource/) or above.
### Configuring TaosX Monitoring
### Configuring taosX Monitoring
The configuration file for TaosX (default is /etc/taos/taosx.toml) contains the following monitor-related configurations:
The configuration related to monitoring in the taosX configuration file (default /etc/taos/taosx.toml) is as follows:
```toml
[monitor]
# FQDN of the taosKeeper service, no default value
# FQDN of taosKeeper service, no default value
# fqdn = "localhost"
# port of the taosKeeper service, default 6043
# port of taosKeeper service, default 6043
# port = 6043
# how often to send metrics to taosKeeper, default every 10 seconds. Only values from 1 to 10 are valid.
# how often to send metrics to taosKeeper, default every 10 seconds. Only value from 1 to 10 is valid.
# interval = 10
```
Each configuration also has corresponding command-line options and environment variables. The following table explains:
Each configuration also has corresponding command line options and environment variables. Explained in the following table:
| Configuration File Item | Command-Line Option | Environment Variable | Meaning | Value Range | Default Value |
| ----------------------- | --------------------- | -------------------- | -------------------------------------------------------- | ----------- | ------------------------------------- |
| fqdn | --monitor-fqdn | MONITOR_FQDN | FQDN of the TaosKeeper service | | No default value; configuring fqdn enables monitoring |
| port | --monitor-port | MONITOR_PORT | Port of the TaosKeeper service | | 6043 |
| interval | --monitor-interval | MONITOR_INTERVAL | Time interval (in seconds) for TaosX to send metrics to TaosKeeper | 1-10 | 10 |
| Configuration File Option | Command Line Option | Environment Variable | Meaning | Value Range | Default Value |
| ------------------------- | ------------------- | -------------------- | ---------------------------------------------------- | ----------- | ----------------------------------------- |
| fqdn | --monitor-fqdn | MONITOR_FQDN | FQDN of taosKeeper service | | No default value, configuring fqdn enables monitoring |
| port | --monitor-port | MONITOR_PORT | Port of taosKeeper service | | 6043 |
| interval | --monitor-interval | MONITOR_INTERVAL | Interval in seconds for taosX to send metrics to taosKeeper | 1-10 | 10 |
### Monitoring TaosX Based on TDinsight
### Monitoring tasoX Based on TDinsight
"TDinsight for TaosX" is a Grafana dashboard specifically created for monitoring TaosX. You need to import this panel before use.
"TDinsight for taosX" is a Grafana dashboard specifically created for monitoring taosX. You need to import this dashboard before using it.
#### Accessing the Panel
#### Entering the Dashboard
1. In the Grafana interface menu, click on "Data sources," and then select the configured TDengine data source.
2. In the data source configuration interface, select the "Dashboard" tab, and then import the "TDinsight for TaosX" panel (import it for the first time). Here is a sample image:
1. In the Grafana interface menu, click "Data sources", then select the TDengine data source that has been configured.
2. In the data source configuration interface, select the "Dashboard" tab, and then import the "TDinsight for taosX" dashboard (you need to import it the first time you use it). Below is an example image:
<figure>
<Image img={imgMonitor3} alt=""/>
</figure>
Each row of this panel represents a monitored object or category. The top row is for taosX monitoring, followed by the agent monitoring row, and finally the monitoring of various data writing tasks.
Each row on the dashboard represents one or a category of monitoring objects. The top row is for monitoring taosX, followed by the Agent monitoring row, and finally the monitoring of various data writing tasks.
:::note
- If you do not see any data after opening this panel, you may need to click the database list in the upper left corner (i.e., the "Log from" dropdown menu) to switch to the database where the monitoring data is located.
- The number of agents in the database will automatically create as many agent rows. (as shown in the image above)
- If you open this dashboard and see no data, you likely need to click on the database list in the upper left corner (the "Log from" dropdown menu) and switch to the database where the monitoring data is stored.
- The database will automatically create as many Agent rows as there are Agents' data. (As shown in the picture above)
:::
#### Monitoring Examples
1. TaosX Monitoring Example Image
1. taosX monitoring example image
<figure>
<Image img={imgMonitor4} alt=""/>
</figure>
2. Agent Monitoring Example Image
2. Agent monitoring example image
<figure>
<Image img={imgMonitor5} alt=""/>
</figure>
3. TDengine 2 Data Source Monitoring Example Image
3. TDengine2 data source monitoring example image
<figure>
<Image img={imgMonitor6} alt=""/>
</figure>
:::info
The monitoring panel only displays part of the monitoring metrics for data writing tasks. More comprehensive monitoring metrics with detailed explanations for each metric can be found on the Explorer page.
The monitoring panel only displays some monitoring indicators for data writing tasks. There are more comprehensive monitoring indicators on the Explorer page, with detailed explanations for each indicator.
:::
4. TDengine 3 Data Source Monitoring Example Image
4. TDengine3 data source monitoring example image
<figure>
<Image img={imgMonitor7} alt=""/>
</figure>
5. Other Data Source Monitoring Example Image
5. Other data source monitoring example image
<figure>
<Image img={imgMonitor8} alt=""/>
@ -186,4 +174,4 @@ Each configuration also has corresponding command-line options and environment v
#### Limitations
Monitoring-related configurations will only take effect when taosX is running in server mode.
Monitoring-related configurations only take effect when taosX is running in server mode.

View File

@ -1,49 +1,49 @@
---
title: Back Up and Restore Data
title: Data Backup and Restoration
slug: /operations-and-maintenance/back-up-and-restore-data
---
To prevent data loss and erroneous deletions, TDengine provides comprehensive data backup, recovery, fault tolerance, and real-time remote data synchronization functions to ensure the security of data storage. This section briefly describes the backup and recovery capabilities.
To prevent data loss and accidental deletions, TDengine provides comprehensive features such as data backup, restoration, fault tolerance, and real-time synchronization of remote data to ensure the security of data storage. This section briefly explains the backup and restoration functions.
## Data Backup and Recovery Using taosdump
## Data Backup and Restoration Using taosdump
`taosdump` is an open-source tool that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster. `taosdump` can back up the database as a logical data unit or back up data records within a specified time period in the database. When using `taosdump`, you can specify the directory path for data backup. If no directory path is specified, `taosdump` will default to backing up data to the current directory.
taosdump is an open-source tool that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster. taosdump can back up the database as a logical data unit or back up data records within a specified time period in the database. When using taosdump, you can specify the directory path for data backup. If no directory path is specified, taosdump will default to backing up the data in the current directory.
The following is an example of using `taosdump` to perform data backup.
Below is an example of using taosdump to perform data backup.
```shell
taosdump -h localhost -P 6030 -D dbname -o /file/path
```
After executing the above command, `taosdump` will connect to the TDengine cluster at localhost:6030, query all data in the database `dbname`, and back up the data to `/file/path`.
After executing the above command, taosdump will connect to the TDengine cluster at localhost:6030, query all data in the database dbname, and back up the data to /file/path.
When using `taosdump`, if the specified storage path already contains data files, `taosdump` will prompt the user and exit immediately to avoid data being overwritten. This means the same storage path can only be used for one backup. If you see a related prompt, please proceed with caution to avoid data loss due to erroneous operations.
When using taosdump, if the specified storage path already contains data files, taosdump will prompt the user and exit immediately to avoid data overwriting. This means the same storage path can only be used for one backup. If you see related prompts, please operate carefully to avoid accidental data loss.
To restore data files from the specified local file path to a running TDengine cluster, you can execute the `taosdump` command by specifying the command line parameters and the path of the data files. Below is an example of executing data recovery with `taosdump`.
To restore data files from a specified local file path to a running TDengine cluster, you can execute the taosdump command by specifying command-line parameters and the data file path. Below is an example code for taosdump performing data restoration.
```shell
taosdump -i /file/path -h localhost -P 6030
```
After executing the above command, `taosdump` will connect to the TDengine cluster at `localhost:6030` and restore the data files from `/file/path` to the TDengine cluster.
After executing the above command, taosdump will connect to the TDengine cluster at localhost:6030 and restore the data files from /file/path to the TDengine cluster.
## Data Backup and Recovery Based on TDengine Enterprise
## Data Backup and Restoration Based on TDengine Enterprise
TDengine Enterprise provides an efficient incremental backup feature, with the following process.
Step 1: Access the `taosExplorer` service through a browser. The access address is usually the IP address of the TDengine cluster at port 6060, such as `http://localhost:6060`.
Step 1, access the taosExplorer service through a browser, usually at the port 6060 of the IP address where the TDengine cluster is located, such as `http://localhost:6060`.
Step 2: In the `taosExplorer` service page, go to the “System Management - Backup” page to add a new data backup task. Fill in the database name to be backed up and the backup storage file path in the task configuration information. After creating the task, you can start the data backup. In the data backup configuration page, you can configure three parameters:
Step 2, in the "System Management - Backup" page of the taosExplorer service, add a new data backup task, fill in the database name and backup storage file path in the task configuration information, and start the data backup after completing the task creation. Three parameters can be configured on the data backup configuration page:
- Backup Cycle: This is a required item, which configures the time interval for each data backup execution. You can select daily, every 7 days, or every 30 days from the dropdown menu, and the backup task will start at 0:00 on the corresponding backup cycle;
- Database: This is a required item, which configures the name of the database to be backed up (the `wal_retention_period` parameter of the database must be greater than 0);
- Directory: This is a required item, which configures the path where data will be backed up to in the environment where `taosX` is running, such as `/root/data_backup`;
- Backup cycle: Required, configure the time interval for each data backup execution, which can be selected from a dropdown menu to execute once every day, every 7 days, or every 30 days. After configuration, a data backup task will be initiated at 0:00 of the corresponding backup cycle;
- Database: Required, configure the name of the database to be backed up (the database's wal_retention_period parameter must be greater than 0);
- Directory: Required, configure the path in the running environment of taosX where the data will be backed up, such as `/root/data_backup`;
Step 3: After the data backup task is completed, find the created data backup task in the list of created tasks on the same page, and simply execute one-click recovery to restore the data into TDengine.
Step 3, after the data backup task is completed, find the created data backup task in the list of created tasks on the same page, and directly perform one-click restoration to restore the data to TDengine.
Compared to `taosdump`, if multiple backup operations are performed on the same data in the specified storage path, TDengine Enterprise not only has high backup efficiency but also employs incremental processing, so each backup task will complete quickly. Since `taosdump` is always a full backup, TDengine Enterprise can significantly reduce system overhead and is more convenient in scenarios with large data volumes.
Compared to taosdump, if the same data is backed up multiple times in the specified storage path, since TDengine Enterprise not only has high backup efficiency but also implements incremental processing, each backup task will be completed quickly. As taosdump always performs full backups, TDengine Enterprise can significantly reduce system overhead in scenarios with large data volumes and is more convenient.
## Common Error Troubleshooting
**Common Error Troubleshooting**
1. If the task fails to start and reports the following error:
@ -54,9 +54,9 @@ Caused by:
[0x000B] Unable to establish connection
```
The cause is an abnormal link to the port of the data source. You need to check whether the FQDN of the data source is reachable and whether port 6030 is accessible.
The cause is an abnormal connection to the data source port, check whether the data source FQDN is connected and whether port 6030 is accessible.
2. If the task fails to start when using a WebSocket connection and reports the following error:
2. If using a WebSocket connection, the task fails to start and reports the following error:
```text
Error: tmq to td task exec error
@ -67,13 +67,13 @@ Caused by:
2: failed to lookup address information: Temporary failure in name resolution
```
When using a WebSocket connection, various types of errors may occur. The error information can be viewed after “Caused by”. Here are some possible errors:
When using a WebSocket connection, you may encounter various types of errors, which can be seen after "Caused by". Here are some possible errors:
- "Temporary failure in name resolution": DNS resolution error; check whether the IP or FQDN is accessible.
- "IO error: Connection refused (os error 111)": Port access failure; check whether the port is correctly configured or whether it is open and accessible.
- "IO error: received corrupt message": Message parsing failure, which may be due to enabling SSL with wss, but the source port does not support it.
- "HTTP error: *": Possibly connected to the wrong `taosAdapter` port or misconfigured LSB/Nginx/Proxy.
- "WebSocket protocol error: Handshake not finished": WebSocket connection error, usually due to an incorrect configured port.
- "Temporary failure in name resolution": DNS resolution error, check if the IP or FQDN can be accessed normally.
- "IO error: Connection refused (os error 111)": Port access failure, check if the port is configured correctly or if it is open and accessible.
- "IO error: received corrupt message": Message parsing failed, possibly because SSL was enabled using wss, but the source port does not support it.
- "HTTP error: *": Possibly connected to the wrong taosAdapter port or incorrect LSB/Nginx/Proxy configuration.
- "WebSocket protocol error: Handshake not finished": WebSocket connection error, usually because the configured port is incorrect.
3. If the task fails to start and reports the following error:
@ -84,10 +84,10 @@ Caused by:
[0x038C] WAL retention period is zero
```
This is due to an error in the WAL configuration of the source database, which cannot be subscribed to.
This is due to incorrect WAL configuration in the source database, preventing subscription.
Resolution:
Modify the WAL configuration of the data:
Solution:
Modify the data WAL configuration:
```sql
alter database test wal_retention_period 3600;

View File

@ -3,33 +3,33 @@ title: Fault Tolerance and Disaster Recovery
slug: /operations-and-maintenance/fault-tolerance-and-disaster-recovery
---
To prevent data loss or accidental deletion, TDengine provides comprehensive data backup, recovery, fault tolerance, and real-time remote data synchronization features to ensure the security of data storage. This section briefly describes the fault tolerance and disaster recovery features in TDengine.
To prevent data loss and accidental deletion, TDengine provides comprehensive data backup, recovery, fault tolerance, and real-time data synchronization across different locations to ensure the security of data storage. This section briefly explains fault tolerance and disaster recovery in TDengine.
## Fault Tolerance
TDengine supports the WAL (Write-Ahead Logging) mechanism to implement fault tolerance, ensuring high data reliability. When TDengine receives a request packet from an application, it first writes the raw data packet to a database log file, and only deletes the corresponding WAL after the data is successfully written to the database data file. This ensures that TDengine can recover data from the database log file in the event of a power outage or other factors that cause service restarts, avoiding data loss. The relevant configuration parameters are as follows:
TDengine supports the WAL mechanism to achieve data fault tolerance, ensuring high data reliability. When TDengine receives a data packet from an application, it first writes the original data packet to the database log file. After the data is successfully written to the database data file, the corresponding WAL is deleted. This ensures that TDengine can recover data from the database log file in the event of a power failure or other factors causing service restart, thus preventing data loss. The relevant configuration parameters are as follows:
- wal_level: WAL level, 1 means writing WAL but not executing fsync; 2 means writing WAL and executing fsync. The default value is 1.
- wal_fsync_period: When wal_level is set to 2, this specifies the fsync execution period. When wal_fsync_period is set to 0, fsync is executed immediately after each write.
- wal_level: WAL level, 1 means write WAL but do not execute fsync; 2 means write WAL and execute fsync. The default value is 1.
- wal_fsync_period: When wal_level is set to 2, the period for executing fsync; when wal_fsync_period is set to 0, it means execute fsync immediately after each write.
To 100% guarantee no data loss, wal_level should be set to 2 and wal_fsync_period to 0. In this case, the write speed will decrease. However, if the number of write threads started by the application reaches a certain amount (more than 50), the performance of data writing will still be quite good, with only about a 30% drop compared to when wal_fsync_period is set to 3000ms.
To guarantee 100% data integrity, set wal_level to 2 and wal_fsync_period to 0. This setting will reduce the write speed. However, if the number of threads writing data on the application side reaches a certain number (more than 50), the performance of writing data will also be quite good, only about 30% lower than when wal_fsync_period is set to 3000ms.
## Disaster Recovery
To achieve disaster recovery, two TDengine Enterprise clusters can be deployed in two different data centers, utilizing their data replication capabilities. Suppose there are two clusters, Cluster A and Cluster B, where Cluster A is the source cluster, handling write requests and providing query services. Cluster B can consume the new data written to Cluster A in real-time and synchronize it to Cluster B. In the event of a disaster that renders the data center where Cluster A is located unavailable, Cluster B can be activated as the primary node for data writing and querying.
Deploy two TDengine Enterprise clusters in two different data centers to achieve data disaster recovery using their data replication capabilities. Assume the two clusters are Cluster A and Cluster B, where Cluster A is the source cluster, handling write requests and providing query services. Cluster B can consume new data written in Cluster A in real-time and synchronize it to Cluster B. In the event of a disaster causing the data center with Cluster A to be unavailable, Cluster B can be activated as the primary node for data writing and querying.
The following steps describe how to easily set up a disaster recovery system between two TDengine Enterprise clusters:
The following steps describe how to easily set up a data disaster recovery system between two TDengine Enterprise clusters:
- Step 1: Create a database `db1` in Cluster A and continuously write data to this database.
- Step 1: Create a database db1 in Cluster A and continuously write data to it.
- Step 2: Access the taosExplorer service of Cluster A through a web browser. The access address is usually the IP address of the TDengine cluster on port 6060, such as `http://localhost:6060`.
- Step 2: Access the taosExplorer service of Cluster A through a web browser, usually at the port 6060 of the IP address where the TDengine cluster is located, such as `http://localhost:6060`.
- Step 3: Access Cluster B and create a database `db2` with the same parameter configuration as database `db1` in Cluster A.
- Step 3: Access TDengine Cluster B, create a database db2 with the same parameter configuration as the database db1 in Cluster A.
- Step 4: Through a web browser, access the taosExplorer service of Cluster B, find `db2` on the "Data Explorer" page, and retrieve the DSN of the database in the "View Database Configuration" option, such as `taos+ws://root:taosdata@clusterB:6041/db2`.
- Step 4: Access the taosExplorer service of Cluster B through a web browser, find db2 on the "Data Browser" page, and obtain the DSN of the database in the "View Database Configuration" option, such as `taos+ws://root:taosdata@clusterB:6041/db2`
- Step 5: On the "System Management - Data Synchronization" page of the taosExplorer service, add a new data synchronization task. In the task configuration, enter the DSN of the source database `db1` and the target database `db2`. Once the task is created, you can start data synchronization.
- Step 5: Add a data synchronization task on the "System Management - Data Synchronization" page of the taosExplorer service, fill in the configuration information of the task with the database db1 to be synchronized and the target database db2's DSN, and start data synchronization after creating the task.
- Step 6: Access Cluster B and observe that data from database `db1` in Cluster A is continuously written to database `db2` in Cluster B until the data in both clusters is roughly synchronized. At this point, a simple disaster recovery system based on TDengine Enterprise is set up.
- Step 6: Access Cluster B, and you can see that the database db2 in Cluster B continuously writes data from the database db1 in Cluster A until the data volumes of the databases in both clusters are roughly equal. At this point, a simple data disaster recovery system based on
The TDengine Enterprise disaster recovery system setup is complete.
TDengine Enterprise is set up.

View File

@ -3,36 +3,33 @@ title: Advanced Storage Options
slug: /operations-and-maintenance/advanced-storage-options
---
This section introduces the unique multi-level storage feature of TDengine Enterprise, which aims to store frequently accessed hot data on high-speed media while placing older, less frequently accessed cold data on low-cost media, achieving the following goals:
This section introduces the multi-tier storage feature unique to TDengine Enterprise, which stores recent, frequently accessed data on high-speed media and old, infrequently accessed data on low-cost media, achieving the following objectives:
- **Reduce Storage Costs** By tiering data storage, vast amounts of extremely cold data can be stored on inexpensive media, resulting in significant economic benefits.
- **Enhance Write Performance** Each level of storage supports multiple mount points, and the WAL (Write-Ahead Logging) mechanism also supports parallel writing with multiple mount points at level 0, significantly improving write performance (in real scenarios, it supports continuous writing of over 300 million data points per second), achieving high disk I/O throughput on mechanical hard drives (measured up to 2GB/s).
- **Ease of Maintenance** After configuring the mount points for each storage level, tasks such as system data migration can be performed without manual intervention; storage expansion is more flexible and convenient.
- **Transparent to SQL** Regardless of whether the queried data spans levels, a single SQL statement can return all data simply and efficiently.
- Reduce storage costs -- By tiering data, storing massive amounts of extremely cold data on cheap storage media brings significant economic benefits
- Improve write performance -- Thanks to the support for multiple mount points at each storage level, and WAL pre-write logs also supporting parallel writing on multiple mount points at level 0, greatly enhancing write performance (tested to support continuous writing of over 300 million data points per second), achieving extremely high disk IO throughput on mechanical hard drives (tested up to 2GB/s)
- Easy maintenance -- After configuring the storage mount points for each level, system data migration and other tasks do not require manual intervention; storage expansion is more flexible and convenient
- Transparent to SQL -- Regardless of whether the queried data spans multiple levels, a single SQL query can return all data, simple and efficient
All storage media involved in multi-level storage are local storage devices. In addition to local storage devices, TDengine Enterprise also supports using object storage (S3) to keep the coldest batch of data on the cheapest media to further reduce storage costs while still allowing queries when necessary, with the data storage location being transparent to SQL. Support for object storage was first released in version 3.3.0.0, and using the latest version is recommended.
The storage media involved in multi-tier storage are all local storage devices. In addition to local storage devices, TDengine Enterprise also supports using object storage (S3) to save the coldest batch of data on the cheapest media to further reduce storage costs, and still allows querying when necessary, and where the data is stored is also transparent to SQL. Support for object storage was first released in version 3.3.0.0, and it is recommended to use the latest version.
## Multi-Level Storage
## Multi-Tier Storage
### Configuration Method
Multi-level storage supports three levels, with a maximum of 128 mount points configurable for each level.
Multi-tier storage supports 3 levels, with up to 128 mount points per level.
:::tip
**Tips** Typical configuration schemes include: Level 0 configured with multiple mount points, each corresponding to a single SAS hard drive; Level 1 configured with multiple mount points, each corresponding to a single or multiple SATA hard drives; Level 2 can be configured with S3 storage or other inexpensive network storage.
Typical configuration schemes include: level 0 configured with multiple mount points, each corresponding to a single SAS hard drive; level 1 configured with multiple mount points, each corresponding to a single or multiple SATA hard drives; level 2 can be configured with S3 storage or other inexpensive network storage.
:::
TDengine multi-level storage configuration is as follows (in the configuration file `/etc/taos/taos.cfg`):
The configuration method for TDengine multi-tier storage is as follows (in the configuration file /etc/taos/taos.cfg):
```shell
dataDir [path] <level> <primary>
```
- **path**: The folder path of the mount point.
- **level**: The storage level of the media, with values of 0, 1, or 2. Level 0 stores the newest data, level 1 stores the next newest data, and level 2 stores the oldest data, defaulting to 0 if omitted. The data flow direction between storage levels is: level 0 storage -> level 1 storage -> level 2 storage. Multiple hard drives can be mounted at the same storage level, and data files at the same storage level are distributed across all hard drives of that level. It is important to note that the movement of data between different levels of storage media is done automatically by the system, and user intervention is not required.
- **primary**: Indicates whether it is the primary mount point, with values of 0 (no) or 1 (yes), defaulting to 1 if omitted. Only one primary mount point is allowed in the configuration (level=0, primary=1). For example, the following configuration can be used:
- path: The folder path of the mount point.
- level: The storage medium level, values are 0, 1, 2. Level 0 stores the newest data, Level 1 stores the next newest data, Level 2 stores the oldest data, omitted defaults to 0. Data flow between storage levels: Level 0 -> Level 1 -> Level 2. Multiple hard drives can be mounted at the same storage level, and data files at that level are distributed across all hard drives at that level. It should be noted that the movement of data across different levels of storage media is automatically done by the system, and users do not need to intervene.
- primary: Whether it is the primary mount point, 0 (no) or 1 (yes), omitted defaults to 1.
In the configuration, only one primary mount point is allowed (level=0, primary=1), for example, using the following configuration method:
```shell
dataDir /mnt/data1 0 1
@ -43,109 +40,141 @@ dataDir /mnt/data5 2 0
dataDir /mnt/data6 2 0
```
:::note
1. Multi-level storage does not allow cross-level configuration; valid configurations include: only level 0, only level 0 + level 1, and level 0 + level 1 + level 2. Configurations that only include level 0 and level 2 without level 1 are not allowed.
2. Manually removing in-use mounted disks is prohibited; currently, mounted disks do not support non-local network disks.
:::
**Note** 1. Multi-tier storage does not allow cross-level configuration, legal configuration schemes are: only Level 0, only Level 0 + Level 1, and Level 0 + Level 1 + Level 2. It is not allowed to only configure level=0 and level=2 without configuring level=1.
2. It is forbidden to manually remove a mount disk in use, and currently, mount disks do not support non-local network disks.
### Load Balancing
In multi-level storage, there is only one primary mount point, which is responsible for the most important metadata storage in the system. At the same time, the main directories of all vnodes exist on the current dnode's primary mount point, which limits the writing performance of that dnode to the I/O throughput capability of a single disk.
In multi-tier storage, there is only one primary mount point, which bears the most important metadata storage in the system, and the main directories of various vnodes are also located on the current dnode's primary mount point, thus limiting the write performance of that dnode to the IO throughput of a single disk.
Starting from TDengine 3.1.0.0, if a dnode is configured with multiple level 0 mount points, the main directories of all vnodes on that dnode will be evenly distributed across all level 0 mount points, allowing these level 0 mount points to collectively bear the write load.
Starting from TDengine 3.1.0.0, if a dnode is configured with multiple level 0 mount points, we distribute the main directories of all vnodes on that dnode evenly across all level 0 mount points, allowing these level 0 mount points to share the write load.
In the absence of network I/O and other processing resource bottlenecks, optimizing cluster configuration has proven through testing that the overall write capacity of the system has a linear relationship with the number of level 0 mount points; that is, as the number of level 0 mount points increases, the overall write capacity of the system also increases exponentially.
When network I/O and other processing resources are not bottlenecks, by optimizing cluster configuration, test results prove that the entire system's writing capability and the number of level 0 mount points have a linear relationship, that is, as the number of level 0 mount points increases, the entire system's writing capability also increases exponentially.
### Same-Level Mount Point Selection Strategy
Generally, when TDengine needs to select one from the same-level mount points to generate a new data file, a round-robin strategy is employed. However, in reality, each disk may have different capacities, or the same capacity but different amounts of written data, which can lead to imbalances in available space on each disk. This may result in selecting a disk with very little remaining space during the actual selection process.
Generally, when TDengine needs to select a mount point from the same level to create a new data file, it uses a round-robin strategy for selection. However, in reality, each disk may have different capacities, or the same capacity but different amounts of data written, leading to an imbalance in available space on each disk. In practice, this may result in selecting a disk with very little remaining space.
To address this issue, starting from version 3.1.1.0, a new configuration option `minDiskFreeSize` was introduced. When the available space on a disk falls below or is equal to this threshold, that disk will no longer be selected for generating new data files. The unit of this configuration item is in bytes, and its value should be greater than 2GB, which means it will skip mount points with available space less than 2GB.
To address this issue, starting from 3.1.1.0, a new configuration minDiskFreeSize was introduced. When the available space on a disk is less than or equal to this threshold, that disk will no longer be selected for generating new data files. The unit of this configuration item is bytes, and its value should be greater than 2GB, i.e., mount points with less than 2GB of available space will be skipped.
Starting from version 3.3.2.0, a new configuration option `disable_create_new_file` was introduced to control whether to prohibit the creation of new files on a specific mount point. The default value is false, meaning new files can be created on each mount point by default.
Starting from version 3.3.2.0, a new configuration `disable_create_new_file` has been introduced to control the prohibition of generating new files on a certain mount point. The default value is `false`, which means new files can be generated on each mount point by default.
## Object Storage
This section describes how to use S3 object storage in TDengine Enterprise. This feature is implemented based on the general S3 SDK, with compatibility adjustments made to the access parameters for various S3 platforms, allowing access to object storage services such as Minio, Tencent Cloud COS, and Amazon S3. With appropriate parameter configuration, most cold time-series data can be stored in S3 services.
This section describes how to use S3 object storage in TDengine Enterprise. This feature is based on the generic S3 SDK and has been adapted for compatibility with various S3 platforms, allowing access to object storage services such as MinIO, Tencent Cloud COS, Amazon S3, etc. By configuring the appropriate parameters, most of the colder time-series data can be stored in S3 services.
:::note
When used in conjunction with multi-level storage, data stored on each level of storage media may be backed up to remote object storage according to specified rules, with local data files deleted.
:::
**Note** When used in conjunction with multi-tier storage, data saved on each storage medium may be backed up to remote object storage and local data files deleted according to rules.
### Configuration Method
In the configuration file `/etc/taos/taos.cfg`, add the parameters for S3 access:
In the configuration file `/etc/taos/taos.cfg`, add parameters for S3 access:
| Parameter Name | Parameter Meaning |
| :------------------- | :----------------------------------------------------------- |
| s3EndPoint | The COS service domain name for the user's region, supporting http and https; the bucket's region must match the endpoint, or access will be denied. |
| s3AccessKey | User SecretId:SecretKey separated by a colon. For example: AKIDsQmwsfKxTo2A6nGVXZN0UlofKn6JRRSJ:lIdoy99ygEacU7iHfogaN2Xq0yumSm1E |
| s3BucketName | The bucket name, with the user registered COS service's AppId following a hyphen. The AppId is unique to COS and does not exist in AWS or Aliyun; it must be included as part of the bucket name, separated by a hyphen. Parameter values are all string types and do not require quotes. For example: test0711-1309024725 |
| s3UploadDelaySec | How long the data file must remain unchanged before being uploaded to S3, in seconds. Minimum value: 1; Maximum value: 2592000 (30 days); Default value: 60 seconds. |
| s3PageCacheSize | The number of pages in the S3 page cache, in pages. Minimum value: 4; Maximum value: 1024*1024*1024; Default value: 4096. |
| s3MigrateIntervalSec | The triggering cycle for automatically uploading local data files to S3, in seconds. Minimum value: 600; Maximum value: 100000; Default value: 3600. |
| s3MigrateEnabled | Whether to automatically perform S3 migration; default value is 1, indicating that automatic S3 migration is enabled and can be set to 0. |
| Parameter Name | Description |
|:-------------|:-----------------------------------------------|
| s3EndPoint | The COS service domain name in the user's region, supports http and https, the bucket's region must match the endpoint's, otherwise access is not possible. |
| s3AccessKey | Colon-separated user SecretId:SecretKey. For example: AKIDsQmwsfKxTo2A6nGVXZN0UlofKn6JRRSJ:lIdoy99ygEacU7iHfogaN2Xq0yumSm1E |
| s3BucketName | Bucket name, the hyphen is followed by the AppId registered for the COS service. AppId is unique to COS, AWS and Alibaba Cloud do not have it, it needs to be part of the bucket name, separated by a hyphen. Parameter values are all string types, but do not need quotes. For example: test0711-1309024725 |
| s3UploadDelaySec | How long a data file remains unchanged before being uploaded to S3, in seconds. Minimum: 1; Maximum: 2592000 (30 days), default value 60 seconds |
| s3PageCacheSize | Number of s3 page cache pages, in pages. Minimum: 4; Maximum: 1024*1024*1024, default value 4096 |
| s3MigrateIntervalSec | The trigger cycle for automatic upload of local data files to S3, in seconds. Minimum: 600; Maximum: 100000. Default value 3600 |
| s3MigrateEnabled | Whether to automatically perform S3 migration, default value is 0, which means auto S3 migration is off, can be set to 1. |
### Check Configuration Parameter Availability
After completing the S3 configuration in `taos.cfg`, you can check whether the configured S3 service is available using the `checks3` parameter of the `taosd` command:
After configuring s3 in `taos.cfg`, the availability of the configured S3 service can be checked using the `taosd` command with the `checks3` parameter:
```shell
taosd --checks3
```
If the configured S3 service is inaccessible, this command will output the corresponding error message during execution.
If the configured S3 service is inaccessible, this command will output the corresponding error information during execution.
### Create a Database Using S3
### Create a DB Using S3
After completing the configuration, you can start the TDengine cluster and create a database using S3, for example:
After configuration, you can start the TDengine cluster and create a database using S3, for example:
```sql
create database demo_db duration 1d s3_keeplocal 3d;
```
Time-series data written to the database `demo_db` will automatically be stored in S3 storage in chunks after 3 days.
After writing time-series data into the database `demo_db`, time-series data older than 3 days will automatically be segmented and stored in S3 storage.
By default, the mnode will issue commands to check for S3 data migration every hour. If there is time-series data that needs to be uploaded, it will automatically be stored in S3 storage in chunks. You can also manually trigger this operation using SQL commands, with the syntax as follows:
By default, mnode issues S3 data migration check commands every hour. If there is time-series data that needs to be uploaded, it will automatically be segmented and stored in S3 storage. You can also manually trigger this operation using SQL commands, initiated by the user, with the following syntax:
```sql
s3migrate database <db_name>;
```
Detailed DB parameters are as follows:
Detailed DB parameters are shown in the table below:
| # | Parameter | Default Value | Minimum Value | Maximum Value | Description |
| :--- | :----------- | :------------ | :------------ | :------------ | :----------------------------------------------------------- |
| 1 | s3_keeplocal | 3650 | 1 | 365000 | The number of days data is retained locally, i.e., how long the data file can remain on local disks before it can be uploaded to S3. Default unit: days; supports m (minutes), h (hours), and d (days) units. |
| 2 | s3_chunksize | 262144 | 131072 | 1048576 | The size threshold for uploaded objects, which cannot be modified, and is measured in TSDB pages. |
| 3 | s3_compact | 0 | 0 | 1 | Whether to automatically perform a compact operation when the TSDB file group is first uploaded to S3. |
| # | Parameter | Default | Min | Max | Description |
| :--- | :----------- | :----- | :----- | :------ | :----------------------------------------------------------- |
| 1 | s3_keeplocal | 365 | 1 | 365000 | The number of days data is kept locally, i.e., how long data files are retained on local disks before they can be uploaded to S3. Default unit: days, supports m (minutes), h (hours), and d (days) |
| 2 | s3_chunkpages | 131072 | 131072 | 1048576 | The size threshold for uploading objects, same as the tsdb_pagesize parameter, unmodifiable, in TSDB pages |
| 3 | s3_compact | 1 | 0 | 1 | Whether to automatically perform compact operation when TSDB files are first uploaded to S3. |
### Estimation of Read and Write Operations for Object Storage
The cost of using object storage services is related to the amount of data stored and the number of requests. Below, we discuss the processes of data upload and download separately.
#### Data Upload
When the TSDB time-series data exceeds the time specified by the `s3_keeplocal` parameter, the related data files will be split into multiple file blocks, each with a default size of 512 MB (`s3_chunkpages * tsdb_pagesize`). Except for the last file block, which is retained on the local file system, the rest of the file blocks are uploaded to the object storage service.
```math
Upload Count = Data File Size / (s3_chunkpages * tsdb_pagesize) - 1
```
When creating a database, you can adjust the size of each file block through the `s3_chunkpages` parameter, thereby controlling the number of uploads for each data file.
Other types of files such as head, stt, sma, etc., are retained on the local file system to speed up pre-computed related queries.
#### Data Download
During query operations, if data in object storage needs to be accessed, TSDB does not download the entire data file. Instead, it calculates the position of the required data within the file and only downloads the relevant data into the TSDB page cache, then returns the data to the query execution engine. Subsequent queries first check the page cache to see if the data has already been cached. If the data is cached, it is used directly from the cache, thus effectively reducing the number of times data is downloaded from object storage.
Adjacent multiple data pages are downloaded as a single data block from object storage to reduce the number of downloads. The size of each data page is specified by the `tsdb_pagesize` parameter when creating the database, with a default of 4 KB.
```math
Download Count = Number of Data Blocks Needed for Query - Number of Cached Data Blocks
```
The page cache is a memory cache, and data needs to be re-downloaded after a node restart. The cache uses an LRU (Least Recently Used) strategy, and when there is not enough cache space, the least recently used data will be evicted. The size of the cache can be adjusted through the `s3PageCacheSize` parameter; generally, the larger the cache, the fewer the downloads.
## Azure Blob Storage
This section describes how to use Microsoft Azure Blob object storage in TDengine Enterprise. This feature is an extension of the previous section on 'Object Storage' and relies on the S3 gateway provided by Flexify services. With appropriate parameter configuration, most cold time-series data can be stored in Azure Blob services.
This section describes how to use Microsoft Azure Blob object storage in TDengine Enterprise. This feature is an extension of the 'Object Storage' feature discussed in the previous section and depends additionally on the Flexify service's S3 gateway. With proper parameter configuration, most of the colder time-series data can be stored in the Azure Blob service.
### Flexify Service
Flexify is an application in the Azure Marketplace that allows S3-compatible applications to store data in Azure Blob Storage via the standard S3 API. Multiple Flexify services can be used to establish several S3 gateways for the same Blob storage.
Flexify is an application in the Azure Marketplace that allows S3-compatible applications to store data in Azure Blob Storage through the standard S3 API. Multiple Flexify services can be used to establish multiple S3 gateways for the same Blob storage.
For deployment instructions, please refer to the application page of [Flexify](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/flexify.azure-s3-api?tab=Overview).
For deployment methods, please refer to the [Flexify](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/flexify.azure-s3-api?tab=Overview) application page.
### Configuration Method
In the configuration file `/etc/taos/taos.cfg`, add the parameters for S3 access:
In the configuration file /etc/taos/taos.cfg, add parameters for S3 access:
```text
```cfg
s3EndPoint http //20.191.157.23,http://20.191.157.24,http://20.191.157.25
s3AccessKey FLIOMMNL0:uhRNdeZMLD4wo,ABCIOMMN:uhRNdeZMD4wog,DEFOMMNL049ba:uhRNdeZMLD4wogXd
s3BucketName td-test
```
- You can configure multiple items for `s3EndPoint` and `s3AccessKey`, but the number of items must be the same. Separate multiple configuration items with a comma. Only one item is allowed for `s3BucketName`.
- Each group of `{s3EndPoint, s3AccessKey}` configuration corresponds to one S3 service, and a random service will be selected each time an S3 request is made.
- All S3 services are assumed to point to the same data source, and operations on various S3 services are completely equivalent.
- If an operation fails on one S3 service, it will switch to another service; if all services fail, the last generated error code will be returned.
- The maximum number of supported S3 service configurations is 10.
- Multiple items can be configured for s3EndPoint and s3AccessKey, but the number of items must match. Use ',' to separate multiple configuration items. Only one item can be configured for s3BucketName
- Each set of `{s3EndPoint, s3AccessKey}` is considered to correspond to one S3 service, and one service will be randomly selected each time an S3 request is initiated
- All S3 services are considered to point to the same data source, and operations on various S3 services are completely equivalent
- If an operation fails on one S3 service, it will switch to another service, and if all services fail, the last generated error code will be returned
- The maximum number of S3 services that can be configured is 10
### Without Relying on Flexify Service
The user interface is the same as S3, but the configuration of the following three parameters is different:
| # | Parameter | Example Value | Description |
| :--- | :------------ | :----------------------------------------- | :----------------------------------------------------------- |
| 1 | s3EndPoint | `https://fd2d01c73.blob.core.windows.net` | Blob URL |
| 2 | s3AccessKey | fd2d01c73:veUy/iRBeWaI2YAerl+AStw6PPqg== | Colon-separated user accountId:accountKey |
| 3 | s3BucketName | test-container | Container name |
The `fd2d01c73` is the account ID; Microsoft Blob storage service only supports the Https protocol, not Http.

View File

@ -1,27 +1,27 @@
---
title: Manage Users and Permissions
title: Users and Permissions
slug: /operations-and-maintenance/manage-users-and-permissions
---
TDengine is configured by default with only one root user, who has the highest privileges. TDengine supports access control for system resources, databases, tables, views, and topics. The root user can set different access permissions for each user based on different resources. This section introduces user and permission management in TDengine. User and permission management is a feature unique to TDengine Enterprise.
TDengine is configured by default with only one root user, who has the highest permissions. TDengine supports access control for system resources, databases, tables, views, and topics. The root user can set different access permissions for each user for different resources. This section introduces user and permission management in TDengine. User and permission management is a feature unique to TDengine Enterprise.
## User Management
### Creating Users
Only the root user can create users, with the syntax as follows.
Only the root user can perform the operation of creating users, with the syntax as follows.
```sql
create user user_name pass'password' [sysinfo {1|0}]
```
The related parameters are described as follows.
The parameters are explained as follows.
- user_name: Maximum length is 23 bytes.
- password: Maximum length is 128 bytes; valid characters include letters, numbers, special characters other than single and double quotes, apostrophes, backslashes, and spaces, and it cannot be empty.
- sysinfo: Whether the user can view system information. 1 means the user can view it, while 0 means the user cannot view it. System information includes server configuration information, various node information such as dnode and query nodes (qnode), and storage-related information. The default is to allow viewing of system information.
- user_name: Up to 23 B long.
- password: Up to 128 B long, valid characters include letters and numbers as well as special characters other than single and double quotes, apostrophes, backslashes, and spaces, and it cannot be empty.
- sysinfo: Whether the user can view system information. 1 means they can view it, 0 means they cannot. System information includes server configuration information, various node information such as dnode, query node (qnode), etc., as well as storage-related information, etc. The default is to view system information.
The following SQL creates a user named test with a password of 123456, who can view system information.
The following SQL can create a user named test with the password 123456 who can view system information.
```sql
create user test pass '123456' sysinfo 1
@ -29,13 +29,13 @@ create user test pass '123456' sysinfo 1
### Viewing Users
To view user information in the system, you can use the following SQL.
To view user information in the system, use the following SQL.
```sql
show users;
```
You can also query user information in the system table `information_schema.ins_users`, as shown below.
You can also obtain user information in the system by querying the system table information_schema.ins_users, as shown below.
```sql
select * from information_schema.ins_users;
@ -43,7 +43,7 @@ select * from information_schema.ins_users;
### Modifying User Information
The SQL to modify user information is as follows.
The SQL for modifying user information is as follows.
```sql
alter user user_name alter_user_clause
@ -54,13 +54,13 @@ alter_user_clause: {
}
```
The related parameters are described as follows.
The parameters are explained as follows.
- pass: Modify the user's password.
- enable: Whether to enable the user. 1 means to enable this user, 0 means to disable this user.
- sysinfo: Whether the user can view system information. 1 means the user can view system information, 0 means the user cannot view system information.
- sysinfo: Whether the user can view system information. 1 means they can view system information, 0 means they cannot.
The following SQL disables the test user.
The following SQL disables the user test.
```sql
alter user test enable 0
@ -68,7 +68,7 @@ alter user test enable 0
### Deleting Users
The SQL to delete a user is as follows.
The SQL for deleting a user is as follows.
```sql
drop user user_name
@ -76,16 +76,16 @@ drop user user_name
## Permission Management
Only the root user can manage users, nodes, vnodes, qnodes, snodes, and other system information, including querying, adding, deleting, and modifying.
Only the root user can manage system information such as users, nodes, vnode, qnode, snode, including querying, adding, deleting, and modifying.
### Granting Permissions for Databases and Tables
### Granting Permissions to Databases and Tables
In TDengine, permissions for databases and tables are divided into read and write. These permissions can be granted individually or simultaneously to users.
In TDengine, permissions for databases and tables are divided into read (read) and write (write) types. These permissions can be granted individually or together to users.
- Read Permission: A user with read permission can only query data in the database or table but cannot modify or delete it. This permission is suitable for scenarios where access to data is needed but no write operations are required, such as data analysts or report generators.
- Write Permission: A user with write permission can write data to the database or table. This permission is suitable for scenarios that require writing operations, such as data collectors or processors. If a user has only write permission without read permission, they can only write data but cannot query it.
- read permission: Users with read permission can only query data in the database or table, but cannot modify or delete data. This type of permission is suitable for scenarios where access to data is needed but no write operations are required, such as data analysts, report generators, etc.
- write permission: Users with write permission can write data to the database or table. This type of permission is suitable for scenarios where data writing operations are needed, such as data collectors, data processors, etc. If you only have write permission without read permission, you can only write data but cannot query data.
The syntax for granting access to a user for a specific database and table is as follows.
The syntax for granting a user access to databases and tables is as follows.
```sql
grant privileges on resources [with tag_filter] to user_name
@ -104,46 +104,46 @@ resources: {
}
```
The related parameters are described as follows.
The parameters are explained as follows.
- resources: The databases or tables that can be accessed. The part before the dot is the database name, and the part after the dot is the table name. `dbname.tbname` means the table `tbname` in the database `dbname` must be a basic table or a supertable. `dbname.*` means all tables in the database `dbname`. `*.*` means all tables in all databases.
- tag_filter: Filtering conditions for supertables.
- resources: The databases or tables that can be accessed. The part before the dot is the database name, and the part after the dot is the table name. dbname.tbname means the table named tbname in the database named dbname, which must be a basic table or supertable. dbname.* means all tables in the database named dbname. *.* means all tables in all databases.
- tag_filter: Filter condition for supertables.
The above SQL can grant access to a single database, all databases, a basic table or supertable under a database, or all subtables of a supertable that meet the filtering conditions using a combination of `dbname.tbname` and the `with` clause.
The following SQL grants read permission for the database `power` to the user `test`.
The above SQL can authorize a database, all databases, a regular table or a supertable under a database, and it can also authorize all subtables under a supertable that meet filter conditions through a combination of dbname.tbname and the with clause.
The following SQL grants read permission on the database power to the user test.
```sql
grant read on power to test
```
The following SQL grants all permissions on the supertable `meters` in the database `power` to the user `test`.
The following SQL grants all permissions on the supertable meters under the database power to the user test.
```sql
grant all on power.meters to test
```
The following SQL grants write permission for the subtable of the supertable `meters` with the label value `groupId` equal to 1 to the user `test`.
The following SQL grants write permission to the user test for subtables of the supertable meters where the tag value groupId equals 1.
```sql
grant all on power.meters with groupId=1 to test
```
If a user is granted write permission for a database, they will have both read and write permissions for all tables under that database. However, if a database only has read permission or even no read permission, table permissions will allow the user to read or write to specific tables. Detailed permission combinations can be found in the reference manual.
If a user is granted write permission to a database, then the user has both read and write permissions for all tables under this database. However, if a database has only read permission or even no read permission, table authorization allows the user to read or write some tables. See the reference manual for detailed authorization combinations.
### Granting Permissions for Views
### View Authorization
In TDengine, view permissions are divided into read, write, and alter. These permissions determine a user's access and operational authority over the views. The specific rules for using view permissions are as follows:
In TDengine, view permissions are divided into read, write, and alter. These determine the user's access and operation permissions on the view. Here are the specific rules for view permissions:
- The creator of the view and the root user have all permissions by default. This means the creator of the view and the root user can query, write, and modify the view.
- Granting and revoking permissions for other users can be done using the `grant` and `revoke` statements. These operations can only be executed by the root user.
- View permissions need to be granted and revoked separately; granting and revoking using `db.*` does not include view permissions.
- Views can be defined and used in nested forms, and permission checks for views are performed recursively.
- The creator of the view and root users have all permissions by default. This means that the creator of the view and root users can query, write to, and modify the view.
- Granting and revoking permissions to other users can be done through grant and revoke statements. These operations can only be performed by root users.
- View permissions need to be granted and revoked separately; authorizations and revocations through db.* do not include view permissions.
- Views can be nested in definition and use, and the verification of view permissions is also done recursively.
To facilitate sharing and usage of views, TDengine introduces the concept of valid users for views (i.e., the user who created the view). Authorized users can use the read and write permissions of the valid user's databases, tables, and nested views. When a view is replaced, the valid user is also updated.
To facilitate the sharing and use of views, TDengine introduces the concept of effective users of views (i.e., the creator of the view). Authorized users can use the read and write permissions of the effective user's databases, tables, and nested views. When a view is replaced, the effective user is also updated.
For detailed mappings of view operations and permission requirements, please refer to the reference manual.
For detailed relationships between view operations and permission requirements, see the reference manual.
The syntax for granting view permissions is as follows.
The syntax for view authorization is as follows.
```sql
grant privileges on [db_name.]view_name to user_name
@ -154,77 +154,77 @@ privileges: {
priv_type: {
read
| write
| alter
alter
}
```
The SQL below grants read permission on the view `view_name` in the database `power` to the user `test`.
To grant read permission on the view view_name under the database power to the user test, the SQL is as follows.
```sql
grant read on power.view_name to test
```
The SQL below grants all permissions on the view `view_name` in the database `power` to the user `test`.
To grant all permissions on the view view_name under the database power to the user test, the SQL is as follows.
```sql
grant all on power.view_name to test
```
### Granting Permissions for Message Subscriptions
### Message Subscription Authorization
Message subscription is a unique design in TDengine. To ensure the security of user subscription information, TDengine can grant permissions for message subscriptions. Before using the message subscription permission feature, users need to understand the following special usage rules:
Message subscription is a unique design of TDengine. To ensure the security of user subscription information, TDengine can authorize message subscriptions. Before using the message subscription authorization feature, users need to understand the following special usage rules:
- Any user with read permission on a database can create topics. The root user has the permission to create topics on any database.
- Subscription permissions for each topic can be independently granted to any user, regardless of whether they have access to the database.
- Only the root user or the creator of the topic can delete the topic.
- Only superusers, the topic creator, or users explicitly granted subscription permissions can subscribe to the topic. These permission settings ensure both the security of the database and flexible operations for users within a limited scope.
- Any user with read permission on a database can create a topic. Root users have the permission to create topics on any database.
- Each topic's subscription permission can be independently granted to any user, regardless of whether they have access permission to that database.
- Only root users or the creator of the topic can perform the deletion of a topic.
- Only superusers, the creator of the topic, or users explicitly granted subscription permission can subscribe to the topic. These permission settings ensure the security of the database while also allowing users flexibility within a limited range.
The SQL syntax for granting message subscription permissions is as follows.
The SQL syntax for message subscription authorization is as follows.
```sql
grant privileges on priv_level to user_name
privileges: {
privileges : {
all
| priv_type [, priv_type] ...
}
priv_type: {
priv_type : {
subscribe
}
priv_level: {
priv_level : {
topic_name
}
```
The following SQL grants the topic named `topic_name` to the user `test`.
To grant subscription permission on the topic named topic_name to the user test, the SQL is as follows.
```sql
grant subscribe on topic_name to test
```
### Viewing Grants
### Viewing Authorizations
When a company has multiple database users, you can use the following command to query all grants held by a specific user, as shown in the SQL below.
When a company has multiple database users, the following command can be used to query all the authorizations a specific user has, the SQL is as follows.
```sql
show user privileges
```
### Revoking Grants
### Revoking Authorizations
Due to the different characteristics of database access, data subscriptions, and views, the syntax for revoking specific grants varies slightly. Below are the specific revocation syntax corresponding to different grant objects.
The SQL for revoking database access grants is as follows.
Due to the different characteristics of database access, data subscription, and views, the syntax for revoking specific authorizations also varies slightly. Below are the specific revocation authorization syntaxes for different authorization objects.
The SQL for revoking database access authorization is as follows.
```sql
revoke privileges on priv_level [with tag_condition] from user_name
privileges: {
privileges : {
all
| priv_type [, priv_type] ...
}
priv_type: {
priv_type : {
read
| write
}
priv_level: {
priv_level : {
dbname.tbname
| dbname.*
| *.*
@ -250,31 +250,31 @@ The SQL for revoking data subscription permissions is as follows.
```sql
revoke privileges on priv_level from user_name
privileges: {
privileges : {
all
| priv_type [, priv_type] ...
}
priv_type: {
priv_type : {
subscribe
}
priv_level: {
priv_level : {
topic_name
}
```
The SQL below revokes all grants for the user `test` on the database `power`.
The SQL for revoking all permissions of user test on the database power is as follows.
```sql
revoke all on power from test
```
The SQL below revokes read permission for the user `test` on the view `view_name` in the database `power`.
The SQL for revoking the read permission of user test on the view view_name of the database power is as follows.
```sql
revoke read on power.view_name from test
```
The SQL below revokes the subscribe permission for the user `test` on the message subscription `topic_name`.
The SQL for revoking the subscribe permission of user test on the message subscription topic_name is as follows.
```sql
revoke subscribe on topic_name from test

View File

@ -3,11 +3,11 @@ title: Advanced Security Options
slug: /operations-and-maintenance/advanced-security-options
---
In addition to traditional user and permission management, TDengine offers other security policies, such as IP whitelisting, audit logs, and data encryption, which are unique features of TDengine Enterprise. The whitelisting feature was first released in version 3.2.0.0, the audit logs in version 3.1.1.0, and database encryption in version 3.3.0.0. It is recommended to use the latest version.
In addition to traditional user and permission management, TDengine also offers other security strategies such as IP whitelisting, audit logs, data encryption, etc., which are unique features of TDengine Enterprise. The whitelisting feature was first released in version 3.2.0.0, audit logs in version 3.1.1.0, and database encryption in version 3.3.0.0. It is recommended to use the latest version.
## IP Whitelisting
IP whitelisting is a network security technology that allows IT administrators to control "who" can access systems and resources, enhancing the security of database access and preventing external malicious attacks. IP whitelisting creates a list of trusted IP addresses, assigning them as unique identifiers for users, and only allows these IP addresses to access the target server. It is important to note that user permissions are independent of the IP whitelist and are managed separately. Below are the specific methods for configuring IP whitelisting.
IP whitelisting is a network security technology that allows IT administrators to control "who" can access the system and resources, enhancing the security of database access and preventing external malicious attacks. IP whitelisting works by creating a list of trusted IP addresses, assigning them as unique identifiers to users, and only allowing these IP addresses to access the target server. Please note that user permissions and IP whitelisting are managed separately. Below are the specific methods for configuring IP whitelisting.
The SQL to add an IP whitelist is as follows.
@ -23,38 +23,56 @@ SELECT TEST, ALLOWED_HOST FROM INS_USERS;
SHOW USERS;
```
The command to delete an IP whitelist is as follows.
The command to delete from the IP whitelist is as follows.
```sql
ALTER USER TEST DROP HOST HOST_NAME1
```
Notes
- Both the open-source and enterprise versions can add successfully and can be queried, but the open-source version does not impose any restrictions on IPs.
- create user u_write pass 'taosdata1' host 'iprange1','iprange2', multiple ipranges can be added at once, the server will deduplicate, the logic for deduplication requires the ipranges to be exactly the same
- By default, 127.0.0.1 is added to the whitelist and can be queried in the whitelist
- The IP set of cluster nodes is automatically added to the whitelist, but cannot be queried.
- When taosadaper and taosd are not on the same machine, the taosadaper IP needs to be manually added to the taosd whitelist
- In a cluster situation, all nodes should have enableWhiteList set the same, either all false or all true, otherwise the cluster cannot start
- Whitelist changes take effect within 1s, no more than 2s, each change has a slight impact on transmission performance (one more judgment, which can be ignored), after the change, the impact is negligible, and there is no impact on the cluster during the change process, nor on the clients currently accessing (assuming their IPs are included in the whitelist)
- If adding two ip ranges, 192.168.1.1/16 (suppose A), 192.168.1.1/24 (suppose B), strictly speaking, A includes B, but considering the situation too complicated, A and B will not be merged
- When deleting, it must match exactly. That is, if added as 192.168.1.1/24, it must also be deleted as 192.168.1.1/24
- Only root has the permission to add or delete ip white list for other users
- Compatible with previous versions, but does not support rolling back from the current version to a previous version
- x.x.x.x/32 and x.x.x.x are considered the same iprange, displayed as x.x.x.x
- If the client gets 0.0.0.0/0, it means the whitelist is not enabled
- If the whitelist changes, the client will detect it in the heartbeat.
- For one user, the maximum number of IPs that can be added is 2048
## Audit Logs
TDengine records and manages user operations, then sends these as audit logs to taosKeeper, which saves them to any TDengine cluster. Administrators can use audit logs for security monitoring and historical tracing. The audit log feature in TDengine can be easily turned on or off by modifying the TDengine configuration file and restarting the service. The configuration for audit logs is described below.
TDengine first records and manages user operations, then sends these as audit logs to taosKeeper, which then saves them to any TDengine cluster. Administrators can use audit logs for security monitoring and historical tracing. Enabling and disabling TDengine's audit log feature is very simple, just modify the TDengine configuration file and restart the service. The configuration details for audit logs are as follows.
### taosd Configuration
Audit logs are generated by the database service taosd, and the corresponding parameters must be configured in the `taos.cfg` configuration file, as shown in the table below.
Audit logs are generated by the database service taosd, and its relevant parameters need to be configured in the taos.cfg configuration file, detailed parameters are as follows.
| Parameter Name | Parameter Meaning |
| :---------------: | :----------------------------------------------------------: |
| audit | Whether to enable audit logs, with a default value of 0. 1 to enable, 0 to disable |
| monitorFqdn | FQDN of the server where the taosKeeper receiving the audit logs is located |
| monitorPort | Port used by the taosKeeper service receiving the audit logs |
| monitorCompaction | Whether to compress data during reporting |
| Parameter Name | Parameter Meaning |
|:-------------:|:--------------------------------------------------------:|
|audit | Whether to enable audit logs, default value is 0. 1 for enabled, 0 for disabled |
|monitorFqdn | The FQDN of the taosKeeper server that receives the audit logs |
|monitorPort | The port used by the taosKeeper service that receives the audit logs |
|monitorCompaction | Whether to compress data when reporting |
### taosKeeper Configuration
In the `keeper.toml` configuration file of taosKeeper, configure parameters related to audit logs, as shown in the table below.
Configure the related parameters for audit logs in the taosKeeper configuration file `keeper.toml`, as shown in the table below
| Parameter Name | Parameter Meaning |
| :------------: | :----------------------------------------------------------: |
| auditDB | The name of the database for storing audit logs, with a default value of "audit". After receiving the reported audit logs, taosKeeper will check if this database exists; if it does not, it will create it automatically. |
| Parameter Name | Description |
|:-------------:|:--------------------------------------------------------:|
| auditDB | The name of the database used to store audit logs, default is "audit". taosKeeper will check if this database exists upon receiving audit logs, and will create it automatically if it does not exist. |
### Data Format
The format of the reported audit logs is as follows:
The format of the reported audit logs is as follows
```json
{
@ -71,7 +89,7 @@ The format of the reported audit logs is as follows:
### Table Structure
taosKeeper will automatically create a supertable in the corresponding database based on the reported audit data to store the data. The definition of this supertable is as follows:
taosKeeper will automatically create a supertable in the corresponding database based on the reported audit data to store the data. The definition of this supertable is as follows
```sql
CREATE STABLE operations(ts timestamp, details VARCHAR(64000), User VARCHAR(25), Operation VARCHAR(20), db VARCHAR(65), resource VARCHAR(193), client_add(25)) TAGS (clusterID VARCHAR(64));
@ -79,78 +97,78 @@ CREATE STABLE operations(ts timestamp, details VARCHAR(64000), User VARCHAR(25),
Where:
1. `db` refers to the database involved in the operation, and `resource` refers to the resource involved in the operation.
2. `User` and `Operation` are data columns that indicate which user performed what operation on the object.
3. `timestamp` is the timestamp column indicating when the operation occurred.
4. `details` contains supplementary details about the operation, which is the SQL statement executed in most operations.
5. `client_add` refers to the client address, including IP and port.
1. `db` refers to the database involved in the operation, `resource` refers to the resource involved.
2. `User` and `Operation` are data columns, indicating which user performed what operation on the object.
3. `timestamp` is the timestamp column, indicating the time when the operation occurred.
4. `details` provide additional details of the operation, which in most cases is the SQL statement executed.
5. `client_add` is the client address, including IP and port.
### Operation List
The current list of operations recorded in the audit logs and the meaning of each field in each operation is shown in the table below (Note: the meaning of the `user` field, timestamp field, and `client_add` is the same in all operations, so they are not included in the table).
The current list of operations recorded in the audit logs and the meanings of each field in each operation are shown in the table below (Note: since the user field, timestamp, and client_add have the same meaning in all operations, they are not included in the table below)
| Operation | Operation | DB | Resource | Details |
| --------------------- | ------------------ | ------------- | -------------------- | ------------------------------------------------------------ |
| create database | createDB | db name | NULL | SQL |
| alter database | alterDB | db name | NULL | SQL |
| drop database | dropDB | db name | NULL | SQL |
| create stable | createStb | db name | stable name | SQL |
| alter stable | alterStb | db name | stable name | SQL |
| drop stable | dropStb | db name | stable name | SQL |
| create user | createUser | NULL | User created | User property parameters (excluding password) |
| alter user | alterUser | NULL | User modified | Records modified parameters and new values for password changes (excluding password); other operations record SQL |
| drop user | dropUser | NULL | User deleted | SQL |
| create topic | createTopic | topic DB | Created topic name | SQL |
| drop topic | dropTopic | topic DB | Deleted topic name | SQL |
| create dnode | createDnode | NULL | IP:Port or FQDN:Port | SQL |
| drop dnode | dropDnode | NULL | dnodeId | SQL |
| alter dnode | alterDnode | NULL | dnodeId | SQL |
| create mnode | createMnode | NULL | dnodeId | SQL |
| drop mnode | dropMnode | NULL | dnodeId | SQL |
| create qnode | createQnode | NULL | dnodeId | SQL |
| drop qnode | dropQnode | NULL | dnodeId | SQL |
| login | login | NULL | NULL | appName |
| create stream | createStream | NULL | Created stream name | SQL |
| drop stream | dropStream | NULL | Deleted stream name | SQL |
| grant privileges | grantPrivileges | NULL | User granted | SQL |
| remove privileges | revokePrivileges | NULL | User revoked | SQL |
| compact database | compact | database name | NULL | SQL |
| balance vgroup leader | balanceVgroupLead | NULL | NULL | SQL |
| restore dnode | restoreDnode | NULL | dnodeId | SQL |
| redistribute vgroup | redistributeVgroup | NULL | vgroupId | SQL |
| balance vgroup | balanceVgroup | NULL | vgroupId | SQL |
| create table | createTable | db name | NULL | table name |
| drop table | dropTable | db name | NULL | table name |
| Operation | Operation | DB | Resource | Details |
| ----------------| ----------| ---------| ---------| --------|
| create database | createDB | db name | NULL | SQL |
| alter database | alterDB | db name | NULL | SQL |
| drop database | dropDB | db name | NULL | SQL |
| create stable | createStb | db name | stable name | SQL |
| alter stable | alterStb | db name | stable name | SQL |
| drop stable | dropStb | db name | stable name | SQL |
| create user | createUser | NULL | username being created | User attribute parameters, (excluding password) |
| alter user | alterUser | NULL | username being modified | Password change logs the modified parameters and new values (excluding password); other operations log SQL |
| drop user | dropUser | NULL | username being deleted | SQL |
| create topic | createTopic | topic's DB | name of the created topic | SQL |
| drop topic | dropTopic | topic's DB | name of the deleted topic | SQL |
| create dnode | createDnode | NULL | IP:Port or FQDN:Port | SQL |
| drop dnode | dropDnode | NULL | dnodeId | SQL |
| alter dnode | alterDnode | NULL | dnodeId | SQL |
| create mnode | createMnode | NULL | dnodeId | SQL |
| drop mnode | dropMnode | NULL | dnodeId | SQL |
| create qnode | createQnode | NULL | dnodeId | SQL |
| drop qnode | dropQnode | NULL | dnodeId | SQL |
| login | login | NULL | NULL | appName |
| create stream | createStream | NULL | name of the created stream | SQL |
| drop stream | dropStream | NULL | name of the deleted stream | SQL |
| grant privileges| grantPrivileges | NULL | user granted | SQL |
| revoke privileges | revokePrivileges | NULL | user whose privileges were revoked | SQL |
| compact database| compact | database name | NULL | SQL |
| balance vgroup leader | balanceVgroupLead | NULL | NULL | SQL |
| restore dnode | restoreDnode | NULL | dnodeId | SQL |
| redistribute vgroup | redistributeVgroup | NULL | vgroupId | SQL |
| balance vgroup | balanceVgroup | NULL | vgroupId | SQL |
| create table | createTable | db name | NULL | table name |
| drop table | dropTable | db name | NULL | table name |
### Viewing Audit Logs
### View Audit Logs
After both `taosd` and `taosKeeper` are correctly configured and started, various operations in the system (as shown in the table above) will be recorded and reported in real-time as the system runs. Users can log into `taosExplorer`, click on "System Management" → "Audit" page to view the audit logs; they can also directly query the corresponding databases and tables in the TDengine CLI.
After both taosd and taosKeeper are correctly configured and started, as the system continues to operate, various operations (as shown in the table above) will be recorded and reported in real-time. Users can log in to taosExplorer, click on "System Management" → "Audit" page to view the audit logs; they can also directly query the relevant databases and tables in the TDengine CLI.
## Data Encryption
TDengine supports Transparent Data Encryption (TDE), which encrypts static data files to prevent potential attackers from bypassing the database and directly reading sensitive information from the file system. The database access program is completely unaware of this; applications do not need any modifications or recompilation to work directly with the encrypted database, supporting encryption algorithms such as the national standard SM4. In transparent encryption, database key management and the scope of database encryption are two of the most important topics. TDengine encrypts the database key using machine codes, storing it locally rather than in a third-party manager. When data files are copied to other machines, the change in machine codes means the database key cannot be obtained, and thus the data files cannot be accessed. TDengine encrypts all data files, including write-ahead log files, metadata files, and time-series data files. After encryption, the data compression rate remains unchanged, and there is only a slight decrease in write and query performance.
TDengine supports Transparent Data Encryption (TDE), which encrypts static data files to prevent potential attackers from bypassing the database and directly reading sensitive information from the file system. The database access program is completely unaware, and applications do not need to make any modifications or compilations to be directly applied to the encrypted database, supporting encryption algorithms such as the national standard SM4. In transparent encryption, database key management and database encryption scope are two of the most important topics. TDengine encrypts the database key with a machine code, which is stored locally rather than in a third-party manager. When data files are copied to another machine, the change in machine code prevents access to the database key, naturally preventing access to the data files. TDengine encrypts all data files, including write-ahead log files, metadata files, and time-series data files. After encryption, the data compression ratio remains unchanged, and both write and query performance only slightly decrease.
### Configuring Keys
### Configure Keys
Key configuration can be done in two ways: offline settings and online settings.
Key configuration is divided into offline and online settings.
**Method One: Offline Setting**. You can configure keys separately for each node through offline settings, with the command as follows.
Method one, offline setting. The following command can configure keys separately for each node.
```shell
taosd -y {encryptKey}
```
**Method Two: Online Setting**. When all nodes in the cluster are online, you can use SQL to configure the keys, with the SQL as follows.
Method two, online setting. When all nodes in the cluster are online, keys can be configured using SQL, as follows.
```sql
create encrypt_key {encryptKey};
```
The online setting method requires that all nodes that have joined the cluster have not used the offline setting method to generate keys; otherwise, the online setting will fail. Successfully setting the online key will also automatically load and use the key.
The online setting method requires that all nodes that have joined the cluster have not used the offline setting method to generate keys, otherwise the online setting method will fail. Successfully setting the key online also automatically loads and uses the key.
### Creating Encrypted Databases
### Create Encrypted Database
TDengine supports creating encrypted databases via SQL, with the SQL as follows.
TDengine supports creating encrypted databases through SQL, as follows.
```sql
create database [if not exists] db_name [database_options]
@ -161,13 +179,12 @@ database_option: {
}
```
The main parameters are described as follows.
The main parameters are explained as follows.
encrypt_algorithm: Specifies the encryption algorithm used for the data. The default is none, meaning no encryption is used. sm4 indicates the SM4 encryption algorithm.
- `encrypt_algorithm`: Specifies the encryption algorithm used for data. The default is `none`, meaning no encryption. `sm4` indicates the use of the SM4 encryption algorithm.
### View Encryption Configuration
### Viewing Encryption Configuration
Users can query the current encryption configuration of databases by querying the system database `ins_databases`, with the SQL as follows.
Users can query the current encryption configuration of the database through the system database ins_databases, SQL as follows.
```sql
select name, `encrypt_algorithm` from ins_databases;
@ -177,9 +194,9 @@ select name, `encrypt_algorithm` from ins_databases;
power | sm4 |
```
### Viewing Node Key Status
### View Node Key Status
You can check the node key status with the following SQL command:
The following SQL command can be used to view the node key status:
```sql
show encryptions;
@ -192,18 +209,18 @@ select * from information_schema.ins_encryptions;
3 | unknown |
```
The `key_status` has three possible values:
key_status has three values:
- When a node has not set a key, the status column displays `unset`.
- When the key is successfully validated and loaded, the status column displays `loaded`.
- When the node is not started, and the key's status cannot be determined, the status column displays `unknown`.
- When the node has not set a key, the status column shows unset.
- When the key is verified successfully and loaded, the status column shows loaded.
- When the node is not started, and the status of the key cannot be detected, the status column shows unknown.
### Updating Key Configuration
### Update Key Configuration
When the hardware configuration of a node changes, you need to update the key using the following command, which is the same as the offline configuration command:
When the hardware configuration of a node changes, the following command is used to update the key, the same as the command for offline key configuration:
```shell
taosd -y {encryptKey}
```
To update the key configuration, you must first stop `taosd`, and use exactly the same key; that is, the key cannot be modified after the database is created.
Updating the key configuration requires stopping taosd first, and using the exact same key, meaning the key cannot be changed after the database is created.

View File

@ -53,7 +53,7 @@ It is not necessary to configure your cluster specifically for active-active mod
- The sink endpoint is the FQDN of TDengine on the secondary node.
- You can use the native connection (port 6030) or WebSocket connection (port 6041).
- You can specify one or more databases to replicate only the data contained in those databases. If you do not specify a database, all databases on the node are replicated except for `information_schema`, `performance_schema`, `log`, and `audit`.
When the command is successful, the replica ID is displayed. You can use this ID to add other databases to the replication task if necessary.
4. Run the same command on the secondary node, specifying the FQDN of TDengine on the secondary node as the source endpoint and the FQDN of TDengine on the primary node as the sink endpoint.

View File

@ -1,10 +1,9 @@
---
title: Operations and Maintenance
description: 'TDengine Operation and Maintenance Guide'
slug: /operations-and-maintenance
---
This chapter mainly introduces how to plan, deploy, maintain, and monitor TDengine clusters.
This chapter mainly introduces how to plan, deploy, maintain, and monitor the TDengine cluster.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -1,24 +1,23 @@
---
title: Prometheus
description: Accessing TDengine with Prometheus
slug: /third-party-tools/data-collection/prometheus
---
import Prometheus from "./_prometheus.mdx"
Prometheus is a popular open-source monitoring and alerting system. It joined the Cloud Native Computing Foundation (CNCF) in 2016, becoming the second hosted project after Kubernetes, and it has a very active community of developers and users.
Prometheus is a popular open-source monitoring and alerting system. In 2016, Prometheus joined the Cloud Native Computing Foundation (CNCF), becoming the second hosted project after Kubernetes. The project has a very active developer and user community.
Prometheus provides `remote_write` and `remote_read` interfaces to utilize other database products as its storage engine. To allow users in the Prometheus ecosystem to take advantage of TDengine's efficient writing and querying capabilities, TDengine also supports these two interfaces.
Prometheus provides `remote_write` and `remote_read` interfaces to utilize other database products as its storage engine. To enable users in the Prometheus ecosystem to leverage TDengine's efficient writing and querying capabilities, TDengine also supports these two interfaces.
With appropriate configuration, Prometheus data can be stored in TDengine through the `remote_write` interface, and data stored in TDengine can be queried through the `remote_read` interface, fully leveraging TDengine's efficient storage and querying performance for time-series data and its cluster processing capabilities.
With proper configuration, Prometheus data can be stored in TDengine via the `remote_write` interface and queried via the `remote_read` interface, fully utilizing TDengine's efficient storage and querying performance for time-series data and cluster processing capabilities.
## Prerequisites
To write Prometheus data into TDengine, the following preparations are required:
The following preparations are needed to write Prometheus data into TDengine:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For detailed information, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- Prometheus has been installed. For installation, please refer to the [official documentation](https://prometheus.io/docs/prometheus/latest/installation/).
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally. For details, please refer to [taosAdapter user manual](../../../tdengine-reference/components/taosadapter)
- Prometheus is installed. For installation of Prometheus, please refer to the [official documentation](https://prometheus.io/docs/prometheus/latest/installation/)
## Configuration Steps
@ -26,9 +25,9 @@ To write Prometheus data into TDengine, the following preparations are required:
## Verification Method
After restarting Prometheus, you can use the following examples to verify that data is written from Prometheus to TDengine and can be read correctly.
After restarting Prometheus, you can refer to the following example to verify that data is written from Prometheus to TDengine and can be correctly read.
### Query Written Data Using TDengine CLI
### Querying Written Data Using TDengine CLI
```text
taos> show databases;
@ -64,26 +63,26 @@ taos> select * from metrics limit 10;
Query OK, 10 row(s) in set (0.011146s)
```
### Use promql-cli to Read Data from TDengine via remote_read
### Using promql-cli to Read Data from TDengine via remote_read
Install promql-cli:
Install promql-cli
```shell
go install github.com/nalbury/promql-cli@latest
```
Query Prometheus data while TDengine and taosAdapter services are running:
Query Prometheus data while TDengine and taosAdapter services are running
```shell
```text
ubuntu@shuduo-1804 ~ $ promql-cli --host "http://127.0.0.1:9090" "sum(up) by (job)"
JOB VALUE TIMESTAMP
prometheus 1 2022-04-20T08:05:26Z
node 1 2022-04-20T08:05:26Z
```
After stopping the taosAdapter service, query the Prometheus data:
Query Prometheus data after pausing the taosAdapter service
```shell
```text
ubuntu@shuduo-1804 ~ $ sudo systemctl stop taosadapter.service
ubuntu@shuduo-1804 ~ $ promql-cli --host "http://127.0.0.1:9090" "sum(up) by (job)"
VALUE TIMESTAMP
@ -91,6 +90,6 @@ VALUE TIMESTAMP
:::note
The subtable names generated by TDengine by default are unique ID values based on specific rules.
- The default subtable names generated by TDengine are unique IDs based on a set of rules.
:::

View File

@ -1,23 +1,22 @@
---
title: Telegraf
description: Writing Data to TDengine Using Telegraf
slug: /third-party-tools/data-collection/telegraf
---
import Telegraf from "./_telegraf.mdx"
Telegraf is a very popular open-source metrics collection software. In data collection and platform monitoring systems, Telegraf can collect operational information from various components without the need to write custom scripts for periodic collection, reducing the difficulty of data acquisition.
Telegraf is a very popular open-source metric collection software. In data collection and platform monitoring systems, Telegraf can collect operational information from various components without the need to manually write scripts for periodic collection, reducing the difficulty of data acquisition.
By simply adding the output configuration of Telegraf to point to the corresponding URL of taosAdapter and modifying several configuration items, Telegraf can write data into TDengine. Storing Telegraf data in TDengine allows users to fully leverage TDengine's efficient storage and querying performance for time-series data and its cluster processing capabilities.
Simply add the output configuration of Telegraf to point to the corresponding URL of taosAdapter and modify several configuration items to write Telegraf's data into TDengine. Storing Telegraf's data in TDengine can fully utilize TDengine's efficient storage and query performance and cluster processing capabilities for time-series data.
## Prerequisites
To write Telegraf data into TDengine, the following preparations are required:
The following preparations are needed to write Telegraf data into TDengine:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For detailed information, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- Telegraf has been installed. For installation, please refer to the [official documentation](https://docs.influxdata.com/telegraf/v1.22/install/).
- Telegraf collects system operational state data by default. By enabling [input plugins](https://docs.influxdata.com/telegraf/v1.22/plugins/), it can output [other formats](https://docs.influxdata.com/telegraf/v1.24/data_formats/input/) of data to Telegraf and then write it into TDengine.
- TDengine cluster has been deployed and is running normally
- taosAdapter has been installed and is running normally. For specific details, please refer to [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter)
- Telegraf has been installed. To install Telegraf, please refer to the [official documentation](https://docs.influxdata.com/telegraf/v1.22/install/)
- Telegraf by default collects system operational status data. By enabling [input plugins](https://docs.influxdata.com/telegraf/v1.22/plugins/), data in [other formats](https://docs.influxdata.com/telegraf/v1.24/data_formats/input/) can be output to Telegraf and then written into TDengine.
## Configuration Steps
@ -31,7 +30,7 @@ Restart the Telegraf service:
sudo systemctl restart telegraf
```
Use TDengine CLI to verify that data is written from Telegraf to TDengine and can be read correctly:
Use TDengine CLI to verify that data is being written from Telegraf to TDengine and can be correctly read:
```text
taos> show databases;
@ -73,8 +72,8 @@ Query OK, 3 row(s) in set (0.013269s)
:::note
The subtable names generated by TDengine when receiving influxdb format data are unique ID values generated based on specific rules.
If users need to specify the generated table name, they can configure the `smlChildTableName` parameter in `taos.cfg`. By controlling the input data format, users can utilize this feature of TDengine to specify the generated table name.
For example, configuring `smlChildTableName=tname` and inserting data as `st,tname=cpu1,t1=4 c1=3 1626006833639000000` will create the table name as `cpu1`. If multiple rows of data have the same `tname`, but different `tag_set`, the tag_set specified when the first row was automatically created will be used, and the other rows will be ignored. For more details, see the [TDengine Schemaless Writing Reference Guide](../../../developer-guide/schemaless-ingestion/).
- The default subtable name generated by TDengine when receiving influxdb format data is a unique ID value generated according to rules.
If users need to specify the generated table name, they can specify it by configuring the smlChildTableName parameter in taos.cfg. By controlling the input data format, you can use this feature of TDengine to specify the generated table name.
For example: Configure smlChildTableName=tname and insert data as st,tname=cpu1,t1=4 c1=3 1626006833639000000, then the created table name is cpu1. If multiple rows of data have the same tname, but the tag_set behind them is different, the tag_set specified during the automatic table creation of the first row is used, and the other rows will be ignored. [TDengine Schemaless Writing Reference Guide](../../../developer-guide/schemaless-ingestion/)
:::

View File

@ -1,22 +1,21 @@
---
title: collectd
description: Writing Data to TDengine Using collectd
slug: /third-party-tools/data-collection/collectd
---
import CollectD from "./_collectd.mdx"
Collectd is a daemon used to collect system performance metrics. It provides various storage mechanisms to store different values. It periodically aggregates relevant statistics of the system while it is running and storing information. Utilizing this information helps identify current system performance bottlenecks and predict future system loads.
collectd is a daemon for collecting system performance. collectd provides various storage mechanisms to store different values. It periodically collects relevant statistical information about the system while it is running and storing information. Utilizing this information helps identify current system performance bottlenecks and predict future system loads.
By simply configuring collectd to point to the domain name (or IP address) of the server running taosAdapter and the corresponding port, the data collected by collectd can be written into TDengine, making full use of TDengine's efficient storage and query performance for time-series data and its cluster processing capabilities.
Simply point the collectd configuration to the domain name (or IP address) and corresponding port of the server running taosAdapter to write the data collected by collectd into TDengine, fully leveraging TDengine's efficient storage and query performance and cluster processing capabilities for time-series data.
## Prerequisites
To write collectd data into TDengine, several preparations are needed:
Several preparations are needed to write collectd data into TDengine.
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- Collectd has been installed. For installation instructions, please refer to the [official documentation](https://collectd.org/download.shtml).
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally, for details please refer to [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter)
- collectd is installed. For installation of collectd, please refer to [official documentation](https://collectd.org/download.shtml)
## Configuration Steps
@ -24,13 +23,13 @@ To write collectd data into TDengine, several preparations are needed:
## Verification Method
Restart collectd:
Restart collectd
```shell
sudo systemctl restart collectd
```
Use TDengine CLI to verify that data is written from collectd to TDengine and can be read correctly:
Use TDengine CLI to verify that data is being written from collectd to TDengine and can be correctly read:
```text
taos> show databases;
@ -39,7 +38,7 @@ taos> show databases;
information_schema |
performance_schema |
collectd |
Query OK, 3 rows in set (0.003266s)
Query OK, 3 row(s) in set (0.003266s)
taos> use collectd;
Database changed.
@ -77,6 +76,6 @@ Query OK, 10 row(s) in set (0.010348s)
:::note
The subtable names generated by TDengine by default are unique ID values generated based on specific rules.
- The default subtable names generated by TDengine are unique ID values generated according to rules.
:::

View File

@ -1,22 +1,21 @@
---
title: StatsD
description: Writing to TDengine Using StatsD
slug: /third-party-tools/data-collection/statsd
---
import StatsD from "./_statsd.mdx"
StatsD is a simple daemon for aggregating and summarizing application metrics. In recent years, it has rapidly evolved into a unified protocol for collecting application performance metrics.
StatsD is a simple daemon for aggregating and summarizing application metrics that has rapidly evolved in recent years into a unified protocol for collecting application performance metrics.
By simply filling in the domain name (or IP address) of the server running taosAdapter and the corresponding port in the StatsD configuration file, the data from StatsD can be written into TDengine, fully utilizing TDengine's efficient storage and query performance for time-series data as well as its cluster processing capabilities.
Simply fill in the domain name (or IP address) and corresponding port of the server running taosAdapter in the StatsD configuration file to write StatsD data into TDengine, fully leveraging TDengine's efficient storage, query performance, and cluster processing capabilities for time-series data.
## Prerequisites
To write StatsD data into TDengine, several preparations are needed:
The following preparations are needed to write StatsD data into TDengine:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- StatsD has been installed. For installation instructions, please refer to the [official documentation](https://github.com/statsd/statsd).
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally. For details, please refer to the [taosAdapter user manual](../../../tdengine-reference/components/taosadapter)
- StatsD is installed. For installation of StatsD, please refer to the [official documentation](https://github.com/statsd/statsd)
## Configuration Steps
@ -26,20 +25,20 @@ To write StatsD data into TDengine, several preparations are needed:
Run StatsD:
```shell
```text
$ node stats.js config.js &
[1] 8546
$ 20 Apr 09:54:41 - [8546] reading config file: config.js
20 Apr 09:54:41 - server is up INFO
```
Use `nc` to write test data:
Use nc to write test data:
```shell
echo "foo:1|c" | nc -u -w0 127.0.0.1 8125
```
Use TDengine CLI to verify that data is written from StatsD to TDengine and can be read correctly:
Use TDengine CLI to verify that data is written from StatsD to TDengine and can be correctly read:
```text
taos> show databases;
@ -70,6 +69,6 @@ taos>
:::note
TDengine will automatically create unique IDs for subtable names by the rule.
- TDengine will automatically create unique IDs for subtable names by the rule.
:::

View File

@ -1,22 +1,21 @@
---
title: Icinga2
description: Writing to TDengine Using Icinga2
slug: /third-party-tools/data-collection/icinga2
---
import Icinga2 from "./_icinga2.mdx"
Icinga2 is an open-source host and network monitoring software that evolved from the Nagios network monitoring application. Currently, Icinga2 is released under the GNU GPL v2 license.
icinga2 is an open-source host and network monitoring software, originally developed from the Nagios network monitoring application. Currently, icinga2 is released under the GNU GPL v2 license.
By simply modifying the Icinga2 configuration to point to the corresponding server and port of the taosAdapter, the data collected by Icinga2 can be stored in TDengine, fully utilizing TDengine's efficient storage and query performance for time-series data as well as its cluster processing capabilities.
To store data collected by icinga2 into TDengine, simply modify the icinga2 configuration to point to the corresponding server and port of taosAdapter. This allows full utilization of TDengine's efficient storage and query performance for time-series data and cluster processing capabilities.
## Prerequisites
To write Icinga2 data into TDengine, several preparations are needed:
The following preparations are needed to write icinga2 data into TDengine:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- Icinga2 has been installed. For installation instructions, please refer to the [official documentation](https://icinga.com/docs/icinga-2/latest/doc/02-installation/).
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally. For details, please refer to [taosAdapter user manual](../../../tdengine-reference/components/taosadapter/)
- icinga2 is installed. For icinga2 installation, please refer to [official documentation](https://icinga.com/docs/icinga-2/latest/doc/02-installation/)
## Configuration Steps
@ -24,19 +23,19 @@ To write Icinga2 data into TDengine, several preparations are needed:
## Verification Method
Restart the taosAdapter:
Restart taosAdapter:
```shell
sudo systemctl restart taosadapter
```
Restart Icinga2:
Restart icinga2:
```shell
sudo systemctl restart icinga2
```
After waiting for about 10 seconds, use the TDengine CLI to query TDengine to verify if the corresponding database has been created and if data has been written:
After waiting about 10 seconds, use the TDengine CLI to query TDengine to verify if the corresponding database has been created and data has been written:
```text
taos> show databases;
@ -80,6 +79,6 @@ Query OK, 22 row(s) in set (0.002317s)
:::note
TDengine will automatically create unique IDs for subtable names by the rule.
- The default subtable names generated by TDengine are unique ID values generated according to rules.
:::

View File

@ -1,22 +1,21 @@
---
title: TCollector
description: Writing to TDengine Using TCollector
slug: /third-party-tools/data-collection/tcollector
---
import TCollector from "./_tcollector.mdx"
TCollector is a part of openTSDB that collects client logs and sends them to the database.
TCollector is part of openTSDB, used for collecting client logs and sending them to the database.
By simply modifying the TCollector configuration to point to the server domain name (or IP address) running taosAdapter and the corresponding port, the data collected by TCollector can be stored in TDengine, fully utilizing TDengine's efficient storage and query performance for time-series data as well as its cluster processing capabilities.
Simply modify the TCollector configuration to point to the server domain name (or IP address) and corresponding port where taosAdapter is running to store the data collected by TCollector into TDengine, fully utilizing TDengine's efficient storage and query performance and cluster processing capabilities for time-series data.
## Prerequisites
To write TCollector data into TDengine, several preparations are needed:
To write TCollector data into TDengine, the following preparations are needed:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- TCollector has been installed. For installation instructions, please refer to the [official documentation](http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html#installation-of-tcollector).
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally. For details, please refer to [taosAdapter user manual](../../../tdengine-reference/components/taosadapter)
- TCollector is installed. For TCollector installation, please refer to the [official documentation](http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html#installation-of-tcollector)
## Configuration Steps
@ -24,15 +23,15 @@ To write TCollector data into TDengine, several preparations are needed:
## Verification Method
Restart the taosAdapter:
Restart taosAdapter:
```shell
sudo systemctl restart taosadapter
```
Manually execute `sudo ./tcollector.py`.
Manually execute `sudo ./tcollector.py`
After waiting a few seconds, use the TDengine CLI to query TDengine to verify if the corresponding database has been created and if data has been written.
After waiting a few seconds, use the TDengine CLI to query whether TDengine has created the corresponding database and written data.
```text
taos> show databases;
@ -43,6 +42,7 @@ taos> show databases;
tcollector |
Query OK, 3 rows in database (0.001647s)
taos> use tcollector;
Database changed.
@ -72,6 +72,6 @@ taos> show stables;
:::note
TDengine will automatically create unique IDs for subtable names by the rule.
- TDengine by default generates subtable names based on a rule-generated unique ID value.
:::

View File

@ -1,6 +1,5 @@
---
title: EMQX Platform
description: Writing to TDengine Using EMQX Broker
slug: /third-party-tools/data-collection/emqx-platform
---
@ -17,25 +16,25 @@ import imgStep09 from '../../assets/emqx-platform-09.png';
import imgStep10 from '../../assets/emqx-platform-10.png';
import imgStep11 from '../../assets/emqx-platform-11.png';
MQTT is a popular data transmission protocol for the Internet of Things, and [EMQX](https://github.com/emqx/emqx) is an open-source MQTT Broker software. Without any code, you can use the "Rules" feature in the EMQX Dashboard to perform simple configurations to directly write MQTT data into TDengine. EMQX supports saving data to TDengine via sending to web services and also provides a native TDengine driver implementation for direct saving in the enterprise edition.
MQTT is a popular IoT data transmission protocol, and [EMQX](https://github.com/emqx/emqx) is an open-source MQTT Broker software. Without any coding, you can directly write MQTT data into TDengine by simply configuring "rules" in the EMQX Dashboard. EMQX supports saving data to TDengine by sending it to a web service and also provides a native TDengine driver in the enterprise version for direct saving.
## Prerequisites
To enable EMQX to add TDengine as a data source, the following preparations are needed:
To enable EMQX to properly add a TDengine data source, the following preparations are needed:
- The TDengine cluster has been deployed and is running normally.
- The taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter/).
- If you are using the simulated writing program mentioned later, you need to install a compatible version of Node.js, preferably version v12.
- TDengine cluster is deployed and running normally
- taosAdapter is installed and running normally. For details, please refer to [taosAdapter User Manual](../../../tdengine-reference/components/taosadapter)
- If using the simulation writing program mentioned later, install the appropriate version of Node.js, version 12 recommended
## Install and Start EMQX
Users can download the installation package from the [EMQX official website](https://www.emqx.com/en/downloads-and-install/broker) according to their operating system and execute the installation. After installation, start the EMQX service using `sudo emqx start` or `sudo systemctl start emqx`.
Users can download the installation package from the [EMQX official website](https://www.emqx.io/zh/downloads) according to their operating system and execute the installation. After installation, start the EMQX service using `sudo emqx start` or `sudo systemctl start emqx`.
Note: This document is based on EMQX version v4.4.5; other versions may have different configuration interfaces, methods, and features due to version upgrades.
Note: This article is based on EMQX v4.4.5. Other versions may differ in configuration interface, configuration methods, and features as the version upgrades.
## Create Database and Table
In TDengine, create the corresponding database and table structure to receive MQTT data. Enter the TDengine CLI and execute the following SQL statements:
Create the corresponding database and table structure in TDengine to receive MQTT data. Enter the TDengine CLI and copy and execute the following SQL statement:
```sql
CREATE DATABASE test;
@ -45,19 +44,19 @@ CREATE TABLE sensor_data (ts TIMESTAMP, temperature FLOAT, humidity FLOAT, volum
## Configure EMQX Rules
Since the configuration interface differs across EMQX versions, the example below uses version v4.4.5; for other versions, please refer to the relevant official documentation.
Since the configuration interface differs across EMQX versions, this section is only an example for v4.4.5. For other versions, please refer to the respective official documentation.
### Log into EMQX Dashboard
### Log in to EMQX Dashboard
Open your browser and navigate to `http://IP:18083` to log into the EMQX Dashboard. The default username is `admin` and the password is `public`.
Open the URL `http://IP:18083` in a browser and log in to the EMQX Dashboard. The initial username is `admin` and the password is: `public`.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### Create Rule
### Create a Rule (Rule)
Select "Rule" under the "Rule Engine" on the left and click the "Create" button:
Select "Rule Engine (Rule Engine)" on the left, then "Rule (Rule)" and click the "Create (Create)" button:
<figure>
<Image img={imgStep02} alt=""/>
@ -74,19 +73,19 @@ FROM
"sensor/data"
```
Here, `payload` represents the entire message body, and `sensor/data` is the message topic selected for this rule.
Where `payload` represents the entire message body, `sensor/data` is the message topic selected for this rule.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### Add Action Handler
### Add "Action Handler (action handler)"
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### Add Resource
### Add "Resource (Resource)"
<figure>
<Image img={imgStep05} alt=""/>
@ -94,27 +93,27 @@ Here, `payload` represents the entire message body, and `sensor/data` is the mes
Select "Send Data to Web Service" and click the "Create Resource" button:
### Edit Resource
### Edit "Resource"
Select "WebHook" and fill in the "Request URL" with the address for taosAdapter providing REST services. If taosAdapter is running locally, the default address is `http://127.0.0.1:6041/rest/sql`.
Select "WebHook" and fill in the "Request URL" with the address provided by taosAdapter for REST services. If taosadapter is started locally, the default address is `http://127.0.0.1:6041/rest/sql`.
Keep the other attributes at their default values.
Please keep other properties at their default values.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### Edit Action
### Edit "Action"
In the resource configuration, add the key/value pair for Authorization authentication. The default Authorization value corresponding to the username and password is:
Edit the resource configuration, adding an Authorization key/value pair. The default username and password corresponding Authorization value is:
```text
Basic cm9vdDp0YW9zZGF0YQ==
```
For more information, please refer to the [TDengine REST API documentation](../../../tdengine-reference/client-libraries/rest-api/).
For related documentation, please refer to [TDengine REST API Documentation](../../../tdengine-reference/client-libraries/rest-api/).
In the message body, input the rule engine replacement template:
Enter the rule engine replacement template in the message body:
```sql
INSERT INTO test.sensor_data VALUES(
@ -139,19 +138,19 @@ INSERT INTO test.sensor_data VALUES(
Finally, click the "Create" button at the bottom left to save the rule.
## Write Simulation Test Program
## Write a Mock Test Program
```javascript
{{#include docs/examples/other/mock.js}}
```
Note: You can initially set a smaller value for CLIENT_NUM in the code during the test to avoid overwhelming hardware performance with a large number of concurrent clients.
Note: In the code, CLIENT_NUM can be set to a smaller value at the start of the test to avoid hardware performance not being able to fully handle a large number of concurrent clients.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
## Execute Test to Simulate Sending MQTT Data
## Execute Test Simulation Sending MQTT Data
```shell
npm install mqtt mockjs --save --registry=https://registry.npm.taobao.org
@ -164,7 +163,7 @@ node mock.js
## Verify EMQX Received Data
Refresh the rule engine interface in the EMQX Dashboard to see how many records were correctly received:
Refresh the EMQX Dashboard rule engine interface to see how many records were correctly received:
<figure>
<Image img={imgStep10} alt=""/>
@ -172,10 +171,10 @@ Refresh the rule engine interface in the EMQX Dashboard to see how many records
## Verify Data Written to TDengine
Log into TDengine CLI and query the corresponding database and table to verify whether the data has been correctly written to TDengine:
Use the TDengine CLI program to log in and query the relevant database and table to verify that the data has been correctly written to TDengine:
<figure>
<Image img={imgStep11} alt=""/>
</figure>
For detailed usage of EMQX, please refer to the [EMQX Official Documentation](https://docs.emqx.com/en/emqx/latest/data-integration/rules.html#rule-engine).
For detailed usage of EMQX, please refer to [EMQX Official Documentation](https://docs.emqx.com/en/emqx/v4.4/rule/rule-engine.html).

View File

@ -1,7 +1,6 @@
---
title: HiveMQ Broker
description: Writing to TDengine Using HiveMQ Broker
slug: /third-party-tools/data-collection/hivemq-broker
---
[HiveMQ](https://www.hivemq.com/) is an MQTT broker that offers a free personal edition and an enterprise edition, primarily used for enterprise and emerging machine-to-machine (M2M) communications and internal data transfer, fulfilling requirements for scalability, manageability, and security features. HiveMQ provides an open-source plugin development kit. You can save data to TDengine using the HiveMQ extension - TDengine. For detailed usage instructions, please refer to the [HiveMQ extension - TDengine Documentation](https://github.com/huskar-t/hivemq-tdengine-extension/blob/b62a26ecc164a310104df57691691b237e091c89/README.md).
[HiveMQ](https://www.hivemq.com/) is an MQTT broker that offers both a free personal edition and an enterprise version, primarily used for enterprise and emerging machine-to-machine M2M communication and internal transmission, meeting scalability, ease of management, and security features. HiveMQ provides an open-source plugin development kit. Data can be saved to TDengine through the HiveMQ extension - TDengine. For detailed usage, please refer to the [HiveMQ extension - TDengine documentation](https://github.com/huskar-t/hivemq-tdengine-extension/blob/b62a26ecc164a310104df57691691b237e091c89/README.md).

View File

@ -1,7 +1,6 @@
---
sidebar_label: Kafka Connect
title: Kafka Connect
description: A Detailed Guide on Using TDengine Kafka Connector
slug: /third-party-tools/data-collection/kafka-connect
---
@ -9,18 +8,18 @@ import Image from '@theme/IdealImage';
import imgKafkaConnect from '../../assets/kafka-connect-01.png';
import imgKafkaIntegration from '../../assets/kafka-connect-02.png';
The TDengine Kafka Connector includes two plugins: the TDengine Source Connector and the TDengine Sink Connector. Users can simply provide a configuration file to synchronize data from a specified topic in Kafka (either in batches or in real-time) to TDengine, or to synchronize data from a specified database in TDengine (either in batches or in real-time) to Kafka.
The TDengine Kafka Connector includes two plugins: TDengine Source Connector and TDengine Sink Connector. Users only need to provide a simple configuration file to synchronize data from a specified topic in Kafka to TDengine, or synchronize data from a specified database in TDengine to Kafka, either in batches or in real-time.
## What is Kafka Connect?
Kafka Connect is a component of [Apache Kafka](https://kafka.apache.org/) that makes it easy to connect other systems, such as databases, cloud services, and file systems, to Kafka. Data can flow into Kafka from other systems via Kafka Connect, and vice versa. The plugins that read data from other systems are called Source Connectors, while those that write data to other systems are called Sink Connectors. Source and Sink Connectors do not connect directly to Kafka Brokers; instead, the Source Connector hands off the data to Kafka Connect, while the Sink Connector receives data from Kafka Connect.
Kafka Connect is a component of [Apache Kafka](https://kafka.apache.org/) that facilitates easy connections to Kafka from other systems, such as databases, cloud services, and file systems. Data can flow from these systems to Kafka and from Kafka to these systems through Kafka Connect. Plugins that read data from other systems are called Source Connectors, and plugins that write data to other systems are called Sink Connectors. Neither Source Connectors nor Sink Connectors connect directly to Kafka Brokers; instead, Source Connectors pass data to Kafka Connect, and Sink Connectors receive data from Kafka Connect.
<figure>
<Image img={imgKafkaConnect} alt="Kafka Connect structure"/>
<figcaption>Figure 1. Kafka Connect structure</figcaption>
</figure>
The TDengine Source Connector is used to read data in real-time from TDengine and send it to Kafka Connect. The TDengine Sink Connector is used to receive data from Kafka Connect and write it to TDengine.
The TDengine Source Connector is used to read data in real-time from TDengine and send it to Kafka Connect. The TDengine Sink Connector receives data from Kafka Connect and writes it to TDengine.
<figure>
<Image img={imgKafkaIntegration} alt="Streaming integration with Kafka Connect"/>
@ -29,16 +28,16 @@ The TDengine Source Connector is used to read data in real-time from TDengine an
## Prerequisites
Here are the prerequisites to run the examples in this tutorial:
Prerequisites for running the examples in this tutorial.
1. Linux operating system
2. Java 8 and Maven installed
3. Git, curl, and vi installed
4. TDengine installed and running. For installation details, refer to [Installation and Uninstallation](../../../get-started/)
3. Git, curl, vi installed
4. TDengine installed and running. If not yet installed, refer to [Installation and Uninstallation](../../../get-started/)
## Install Kafka
## Installing Kafka
- Execute the following commands in any directory:
- Execute in any directory:
```shell
curl -O https://downloads.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz
@ -46,18 +45,18 @@ Here are the prerequisites to run the examples in this tutorial:
ln -s /opt/kafka_2.13-3.4.0 /opt/kafka
```
- Then, add the `$KAFKA_HOME/bin` directory to your PATH.
- Then, add the `$KAFKA_HOME/bin` directory to PATH.
```title=".profile"
export KAFKA_HOME=/opt/kafka
export PATH=$PATH:$KAFKA_HOME/bin
```
You can append the above script to the profile file of the current user (either `~/.profile` or `~/.bash_profile`).
The above script can be appended to the current user's profile file (~/.profile or ~/.bash_profile)
## Install TDengine Connector Plugin
## Installing TDengine Connector Plugin
### Compile the Plugin
### Compiling the Plugin
```shell
git clone --branch 3.0 https://github.com/taosdata/kafka-connect-tdengine.git
@ -66,9 +65,9 @@ mvn clean package -Dmaven.test.skip=true
unzip -d $KAFKA_HOME/components/ target/components/packages/taosdata-kafka-connect-tdengine-*.zip
```
The above script first clones the project source code and then compiles and packages it using Maven. After the packaging is complete, a zip file of the plugin is generated in the `target/components/packages/` directory. Unzip this zip file to the installation path for the plugin. The example above uses the built-in plugin installation path: `$KAFKA_HOME/components/`.
The above script first clones the project source code, then compiles and packages it using Maven. After packaging, a zip file of the plugin is generated in the `target/components/packages/` directory. Unzip this package to the installation path for the plugin. The example above uses the built-in plugin installation path: `$KAFKA_HOME/components/`.
### Configure the Plugin
### Configuring the Plugin
Add the kafka-connect-tdengine plugin to the `plugin.path` in the `$KAFKA_HOME/config/connect-distributed.properties` configuration file.
@ -76,15 +75,17 @@ Add the kafka-connect-tdengine plugin to the `plugin.path` in the `$KAFKA_HOME/c
plugin.path=/usr/share/java,/opt/kafka/components
```
## Start Kafka
## Starting Kafka
```shell
zookeeper-server-start.sh -daemon $KAFKA_HOME/config/zookeeper.properties
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties
```
### Verify Kafka Connect is Successfully Started
### Verifying if Kafka Connect Has Started Successfully
Enter the command:
@ -92,21 +93,21 @@ Enter the command:
curl http://localhost:8083/connectors
```
If all components have started successfully, you will see the following output:
If all components have started successfully, the following output will be displayed:
```txt
[]
```
## Using TDengine Sink Connector
## Using the TDengine Sink Connector
The TDengine Sink Connector is used to synchronize data from a specified topic to TDengine. Users do not need to create databases and supertables in advance. You can manually specify the name of the target database (see the configuration parameter `connection.database`), or it can be generated according to certain rules (see the configuration parameter `connection.database.prefix`).
The TDengine Sink Connector is used to synchronize data from a specified topic to TDengine. Users do not need to create the database and supertables in advance. You can manually specify the name of the target database (see configuration parameter `connection.database`), or generate it according to certain rules (see configuration parameter `connection.database.prefix`).
The TDengine Sink Connector internally uses the TDengine [schemaless write interface](../../../developer-guide/schemaless-ingestion/) to write data to TDengine and currently supports three data formats: InfluxDB line protocol format, OpenTSDB Telnet protocol format, and OpenTSDB JSON protocol format.
The TDengine Sink Connector internally uses TDengine's [schemaless write interface](../../../developer-guide/schemaless-ingestion/) to write data to TDengine, currently supporting three data formats: InfluxDB Line Protocol format, OpenTSDB Telnet Protocol format, and OpenTSDB JSON Protocol format.
The following example synchronizes data from the `meters` topic to the target database `power`. The data format is InfluxDB Line protocol format.
The following example synchronizes data from the topic `meters` to the target database `power`. The data format is InfluxDB Line Protocol format.
### Add Sink Connector Configuration File
### Adding a Sink Connector Configuration File
```shell
mkdir ~/test
@ -114,7 +115,7 @@ cd ~/test
vi sink-demo.json
```
The contents of `sink-demo.json` are as follows:
Contents of `sink-demo.json`:
```json title="sink-demo.json"
{
@ -140,16 +141,16 @@ The contents of `sink-demo.json` are as follows:
Key configuration explanations:
1. `"topics": "meters"` and `"connection.database": "power"` indicate that data from the `meters` topic will be subscribed and written into the `power` database.
2. `"db.schemaless": "line"` indicates that InfluxDB line protocol format will be used for the data.
1. `"topics": "meters"` and `"connection.database": "power"`, indicate subscribing to the data of the topic `meters` and writing it to the database `power`.
2. `"db.schemaless": "line"`, indicates using data in the InfluxDB Line Protocol format.
### Create Sink Connector Instance
### Creating a Sink Connector Instance
```shell
curl -X POST -d @sink-demo.json http://localhost:8083/connectors -H "Content-Type: application/json"
```
If the above command executes successfully, you will see the following output:
If the above command is executed successfully, the following output will be displayed:
```json
{
@ -169,16 +170,16 @@ If the above command executes successfully, you will see the following output:
"name": "TDengineSinkConnector",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dead_letter_topic",
"errors.deadletterqueue.topic.replication.factor": "1"
"errors.deadletterqueue.topic.replication.factor": "1",
},
"tasks": [],
"type": "sink"
}
```
### Write Test Data
### Writing Test Data
Prepare a text file with test data as follows:
Prepare a text file with test data, content as follows:
```txt title="test-data.txt"
meters,location=California.LosAngeles,groupid=2 current=11.8,voltage=221,phase=0.28 1648432611249000000
@ -187,21 +188,19 @@ meters,location=California.LosAngeles,groupid=3 current=10.8,voltage=223,phase=0
meters,location=California.LosAngeles,groupid=3 current=11.3,voltage=221,phase=0.35 1648432611250000000
```
Use `kafka-console-producer` to add test data to the `meters` topic.
Use kafka-console-producer to add test data to the topic meters.
```shell
cat test-data.txt | kafka-console-producer.sh --broker-list localhost:9092 --topic meters
```
:::note
If the target database `power` does not exist, the TDengine Sink Connector will automatically create the database. The automatic database creation uses a time precision of nanoseconds, which requires that the timestamp of the written data is also in nanoseconds. If the timestamp precision of the written data is not in nanoseconds, an exception will be thrown.
If the target database power does not exist, the TDengine Sink Connector will automatically create the database. The database is created with nanosecond precision, which requires that the timestamp precision of the data being written is also in nanoseconds. If the timestamp precision of the data written is not in nanoseconds, an exception will be thrown.
:::
### Verify Synchronization Success
Use the TDengine CLI to verify whether the synchronization was successful.
Use TDengine CLI to verify if synchronization was successful.
```sql
taos> use power;
@ -209,7 +208,7 @@ Database changed.
taos> select * from meters;
_ts | current | voltage | phase | groupid | location |
===============================================================================================================================================================
===============================================================================================================================================================
2022-03-28 09:56:51.249000000 | 11.800000000 | 221.000000000 | 0.280000000 | 2 | California.LosAngeles |
2022-03-28 09:56:51.250000000 | 13.400000000 | 223.000000000 | 0.290000000 | 2 | California.LosAngeles |
2022-03-28 09:56:51.249000000 | 10.800000000 | 223.000000000 | 0.290000000 | 3 | California.LosAngeles |
@ -217,23 +216,23 @@ taos> select * from meters;
Query OK, 4 row(s) in set (0.004208s)
```
If you see the above data, it indicates that the synchronization was successful. If not, please check the Kafka Connect logs. For detailed parameter descriptions, refer to [Configuration Reference](#configuration-reference).
If you see the data above, it means the synchronization is successful. If not, please check the Kafka Connect logs. For a detailed explanation of the configuration parameters, see [Configuration Reference](#configuration-reference).
## Using TDengine Source Connector
The TDengine Source Connector is used to push data from a specific TDengine database to Kafka for any time after a certain point. The implementation principle of the TDengine Source Connector is to first pull historical data in batches and then synchronize incremental data using a timed query strategy. It also monitors changes to tables and can automatically synchronize newly added tables. If Kafka Connect is restarted, it will continue synchronizing from the last interrupted position.
The TDengine Source Connector is used to push all data from a TDengine database after a certain moment to Kafka. The implementation principle of the TDengine Source Connector is to first pull historical data in batches, then synchronize incremental data using a timed query strategy. It also monitors changes to tables and can automatically synchronize newly added tables. If Kafka Connect is restarted, it will continue to synchronize from the last interrupted position.
The TDengine Source Connector converts data from TDengine tables into InfluxDB line protocol format or OpenTSDB JSON format and then writes it to Kafka.
The TDengine Source Connector converts data from TDengine tables into InfluxDB Line protocol format or OpenTSDB JSON protocol format before writing it to Kafka.
The following example program synchronizes data from the `test` database to the `tdengine-test-meters` topic.
The example program below synchronizes data from the database test to the topic tdengine-test-meters.
### Add Source Connector Configuration File
### Adding Source Connector Configuration File
```shell
vi source-demo.json
```
Input the following content:
Enter the following content:
```json title="source-demo.json"
{
@ -262,9 +261,9 @@ Input the following content:
}
```
### Prepare Test Data
### Preparing Test Data
Prepare a SQL file to generate test data.
Prepare the SQL file to generate test data.
```sql title="prepare-source-data.sql"
DROP DATABASE IF EXISTS test;
@ -282,7 +281,7 @@ INSERT INTO d1001 USING meters TAGS('California.SanFrancisco', 2) VALUES('2018-1
d1004 USING meters TAGS('California.LosAngeles', 3) VALUES('2018-10-03 14:38:06.500',11.50000,221,0.35000);
```
Use the TDengine CLI to execute the SQL file.
Using TDengine CLI, execute the SQL file.
```shell
taos -f prepare-source-data.sql
@ -294,9 +293,9 @@ taos -f prepare-source-data.sql
curl -X POST -d @source-demo.json http://localhost:8083/connectors -H "Content-Type: application/json"
```
### View Topic Data
### View topic data
Use the `kafka-console-consumer` command-line tool to monitor the data in the `tdengine-test-meters` topic. Initially, it will output all historical data, and after inserting two new data points into TDengine, the `kafka-console-consumer` will immediately output the newly added data. The output data will be in InfluxDB line protocol format.
Use the kafka-console-consumer command line tool to monitor the data in the topic tdengine-test-meters. Initially, it will output all historical data. After inserting two new data into TDengine, kafka-console-consumer also immediately outputs the added two data. The output data is in the format of InfluxDB line protocol.
```shell
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic tdengine-test-meters
@ -311,7 +310,7 @@ meters,location="California.SanFrancisco",groupid=2i32 current=12.6f32,voltage=2
......
```
At this point, all historical data will be displayed. Switch to the TDengine CLI and insert two new data points:
At this point, all historical data will be displayed. Switch to TDengine CLI, insert two new data:
```sql
USE test;
@ -319,19 +318,19 @@ INSERT INTO d1001 VALUES (now, 13.3, 229, 0.38);
INSERT INTO d1002 VALUES (now, 16.3, 233, 0.22);
```
Then switch back to `kafka-console-consumer`, and the command-line window should now display the two newly inserted data points.
Switch back to kafka-console-consumer, and the command line window has already printed the 2 newly inserted data.
### Unload Plugin
### Unload plugin
After testing, use the unload command to stop the loaded connector.
To view the currently active connectors:
View the currently active connectors:
```shell
curl http://localhost:8083/connectors
```
If the previous operations were followed correctly, there should be two active connectors. Use the following commands to unload them:
If you follow the previous operations, there should be two active connectors at this time. Use the following command to unload:
```shell
curl -X DELETE http://localhost:8083/connectors/TDengineSinkConnector
@ -340,74 +339,73 @@ curl -X DELETE http://localhost:8083/connectors/TDengineSourceConnector
### Performance Tuning
If you find the performance of synchronizing data from TDengine to Kafka does not meet expectations, you can try to improve Kafka's write throughput using the following parameters:
If you find that the performance is not up to expectations during the process of synchronizing data from TDengine to Kafka, you can try using the following parameters to improve Kafka's write throughput.
1. Open the `KAFKA_HOME/config/producer.properties` configuration file.
2. Parameter explanations and configuration suggestions are as follows:
| **Parameter** | **Description** | **Recommended Setting** |
| --------------------- | ------------------------------------------------------------ | ----------------------- |
| producer.type | This parameter sets the message sending method. The default value is `sync`, which means synchronous sending, while `async` means asynchronous sending. Using asynchronous sending can improve message throughput. | async |
| request.required.acks | This parameter configures the number of acknowledgments the producer must wait for after sending a message. Setting it to 1 means that the producer will receive confirmation as long as the leader replica successfully writes the message, without waiting for others. | 1 |
| max.request.size | This parameter determines the maximum amount of data the producer can send in one request. Its default value is 1048576, which is 1M. If set too small, it may lead to frequent network requests and reduced throughput. If set too large, it may cause high memory usage. | 104857600 |
| batch.size | This parameter sets the batch size. The default value is 16384 (16KB). During the message sending process, the messages sent to the Kafka buffer are divided into batches. Therefore, reducing batch size helps to lower message latency while increasing batch size improves throughput. | 524288 |
| buffer.memory | This parameter sets the total memory for buffering messages waiting to be sent. A larger buffer allows the producer to accumulate more messages for batch sending, improving throughput, but it may also increase latency and memory usage. | 1073741824 |
1. Open the KAFKA_HOME/config/producer.properties configuration file.
2. Parameter description and configuration suggestions are as follows:
| **Parameter** | **Description** | **Suggested Setting** |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- |
| producer.type | This parameter is used to set the message sending mode, the default value is `sync` which means synchronous sending, `async` means asynchronous sending. Using asynchronous sending can improve the throughput of message sending. | async |
| request.required.acks | This parameter is used to configure the number of acknowledgments that the producer needs to wait for after sending a message. When set to 1, it means that as long as the leader replica successfully writes the message, it will send an acknowledgment to the producer without waiting for other replicas in the cluster to write successfully. This setting can ensure a certain level of message reliability while also ensuring throughput. Since it is not necessary to wait for all replicas to write successfully, it can reduce the waiting time for producers and improve the efficiency of sending messages. | 1 |
| max.request.size | This parameter determines the maximum amount of data that the producer can send in one request. Its default value is 1048576, which is 1M. If set too small, it may lead to frequent network requests, reducing throughput. If set too large, it may lead to high memory usage, or increase the probability of request failure in poor network conditions. It is recommended to set it to 100M. | 104857600 |
| batch.size | This parameter is used to set the size of the batch, the default value is 16384, which is 16KB. During the message sending process, the messages sent to the Kafka buffer are divided into batches. Therefore, reducing the batch size helps reduce message latency, while increasing the batch size helps improve throughput. It can be reasonably configured according to the actual data volume size. It is recommended to set it to 512K. | 524288 |
| buffer.memory | This parameter is used to set the total amount of memory for the producer to buffer messages waiting to be sent. A larger buffer can allow the producer to accumulate more messages and send them in batches, improving throughput, but it can also increase latency and memory usage. It can be configured according to machine resources, and it is recommended to set it to 1G. | 1073741824 |
## Configuration Reference
### General Configuration
The following configuration items apply to both the TDengine Sink Connector and the TDengine Source Connector:
The following configuration items apply to both TDengine Sink Connector and TDengine Source Connector.
1. `name`: Connector name.
2. `connector.class`: Full class name of the connector, such as `com.taosdata.kafka.connect.sink.TDengineSinkConnector`.
3. `tasks.max`: Maximum number of tasks, default is 1.
4. `topics`: List of topics to synchronize, separated by commas, such as `topic1,topic2`.
5. `connection.url`: TDengine JDBC connection string, such as `jdbc:TAOS://127.0.0.1:6030`.
6. `connection.user`: TDengine username, default is `root`.
7. `connection.password`: TDengine user password, default is `taosdata`.
8. `connection.attempts`: Maximum number of connection attempts. Default is 3.
9. `connection.backoff.ms`: Time interval for retrying to create a connection when it fails, in ms. Default is 5000.
10. `data.precision`: Time precision used when using InfluxDB line protocol format. Possible values are:
- ms: milliseconds
- us: microseconds
- ns: nanoseconds
1. `name`: connector name.
1. `connector.class`: the full class name of the connector, e.g., com.taosdata.kafka.connect.sink.TDengineSinkConnector.
1. `tasks.max`: maximum number of tasks, default is 1.
1. `topics`: list of topics to synchronize, separated by commas, e.g., `topic1,topic2`.
1. `connection.url`: TDengine JDBC connection string, e.g., `jdbc:TAOS://127.0.0.1:6030`.
1. `connection.user`: TDengine username, default is root.
1. `connection.password`: TDengine user password, default is taosdata.
1. `connection.attempts`: maximum number of connection attempts. Default is 3.
1. `connection.backoff.ms`: retry interval for failed connection attempts, in milliseconds. Default is 5000.
1. `data.precision`: timestamp precision when using the InfluxDB line protocol format. Options are:
1. ms: milliseconds
1. us: microseconds
1. ns: nanoseconds
### TDengine Sink Connector-Specific Configuration
### TDengine Sink Connector Specific Configuration
1. `connection.database`: Target database name. If the specified database does not exist, it will be created automatically. The time precision used for automatic database creation is nanoseconds. Default is null. If null, the naming rules for the target database refer to the explanation of the `connection.database.prefix` parameter.
2. `connection.database.prefix`: When `connection.database` is null, the prefix for the target database. It can include the placeholder `${topic}`. For example, `kafka_${topic}` will write to `kafka_orders` for the topic `orders`. Default is null. When null, the name of the target database is the same as the name of the topic.
3. `batch.size`: Number of records to write in batches. When the Sink Connector receives more data than this value at once, it will write in batches.
4. `max.retries`: Maximum number of retries on error. Default is 1.
5. `retry.backoff.ms`: Time interval for retrying on sending errors. Unit is milliseconds, default is 3000.
6. `db.schemaless`: Data format, possible values are:
- line: InfluxDB line protocol format
- json: OpenTSDB JSON format
- telnet: OpenTSDB Telnet line protocol format
1. `connection.database`: target database name. If the specified database does not exist, it will be automatically created with nanosecond precision. Default is null. If null, the naming rule for the target database refers to the `connection.database.prefix` parameter.
2. `connection.database.prefix`: when `connection.database` is null, the prefix for the target database. Can include the placeholder `${topic}`. For example, `kafka_${topic}`, for the topic 'orders' will write to the database 'kafka_orders'. Default is null. When null, the target database name is the same as the topic name.
3. `batch.size`: number of records per batch write. When the Sink Connector receives more data than this value, it will write in batches.
4. `max.retries`: maximum number of retries in case of an error. Default is 1.
5. `retry.backoff.ms`: retry interval in case of an error, in milliseconds. Default is 3000.
6. `db.schemaless`: data format, options are:
1. line: represents InfluxDB line protocol format
2. json: represents OpenTSDB JSON format
3. telnet: represents OpenTSDB Telnet line protocol format
### TDengine Source Connector-Specific Configuration
### TDengine Source Connector Specific Configuration
1. `connection.database`: Source database name, no default value.
2. `topic.prefix`: Prefix for the topic name used when importing data into Kafka. Default is an empty string "".
3. `timestamp.initial`: Starting time for data synchronization. Format is 'yyyy-MM-dd HH:mm:ss'. If not specified, it starts from the earliest record in the specified database.
4. `poll.interval.ms`: Time interval to check for newly created or deleted tables, in ms. Default is 1000.
5. `fetch.max.rows`: Maximum number of rows to retrieve from the database. Default is 100.
6. `query.interval.ms`: Time span for reading data from TDengine at one time. It needs to be reasonably configured based on the characteristics of the data in the table to avoid querying too much or too little data. It is recommended to set an optimal value through testing in specific environments. The default value is 0, meaning it retrieves all data at the current latest time.
7. `out.format`: Output format of the result set. `line` indicates the output format is InfluxDB line protocol format; `json` indicates the output format is JSON. Default is `line`.
8. `topic.per.stable`: If set to true, it indicates that one supertable corresponds to one Kafka topic. The naming rule for the topic is `<topic.prefix><topic.delimiter><connection.database><topic.delimiter><stable.name>`; if set to false, all data in the specified database goes into one Kafka topic, and the naming rule for the topic is `<topic.prefix><topic.delimiter><connection.database>`.
9. `topic.ignore.db`: Whether the topic naming rule includes the database name. True indicates the rule is `<topic.prefix><topic.delimiter><stable.name>`, while false indicates the rule is `<topic.prefix><topic.delimiter><connection.database><topic.delimiter><stable.name>`. The default is false. This configuration item does not take effect when `topic.per.stable` is set to false.
10. `topic.delimiter`: Topic name delimiter, default is `-`.
11. `read.method`: Method for reading data from TDengine, either `query` or `subscription`. Default is `subscription`.
12. `subscription.group.id`: Specifies the group id for TDengine data subscription, which is required when `read.method` is `subscription`.
13. `subscription.from`: Specifies the starting point for TDengine data subscription, either `latest` or `earliest`. Default is `latest`.
1. `connection.database`: source database name, no default value.
1. `topic.prefix`: prefix for the topic name when importing data into Kafka. Default is an empty string "".
1. `timestamp.initial`: start time for data synchronization. Format is 'yyyy-MM-dd HH:mm:ss', if not specified, it starts from the earliest record in the specified DB.
1. `poll.interval.ms`: interval for checking new or deleted tables, in milliseconds. Default is 1000.
1. `fetch.max.rows`: maximum number of rows to retrieve from the database. Default is 100.
1. `query.interval.ms`: time span for reading data from TDengine at once, should be configured based on the data characteristics of the table to avoid too large or too small data volumes per query; it is recommended to set an optimal value through testing in specific environments, default is 0, i.e., to fetch all data up to the current latest time.
1. `out.format`: output format of the result set. `line` means output format is InfluxDB Line protocol format, `json` means output format is json. Default is line.
1. `topic.per.stable`: if set to true, indicates that one supertable corresponds to one Kafka topic, the naming rule for the topic is `<topic.prefix><topic.delimiter><connection.database><topic.delimiter><stable.name>`; if set to false, all data in the specified DB goes into one Kafka topic, the naming rule for the topic is `<topic.prefix><topic.delimiter><connection.database>`.
1. `topic.ignore.db`: whether the topic naming rule includes the database name, true means the rule is `<topic.prefix><topic.delimiter><stable.name>`, false means the rule is `<topic.prefix><topic.delimiter><connection.database><topic.delimiter><stable.name>`, default is false. This configuration item does not take effect when `topic.per.stable` is set to false.
1. `topic.delimiter`: topic name delimiter, default is `-`.
1. `read.method`: method of reading data from TDengine, query or subscription. Default is subscription.
1. `subscription.group.id`: specifies the group id for TDengine data subscription, required when `read.method` is subscription.
1. `subscription.from`: specifies the starting position for TDengine data subscription, latest or earliest. Default is latest.
## Additional Notes
## Additional Information
1. For information on how to use the Kafka Connect plugin in an independently installed Kafka environment, please refer to the official documentation: [https://kafka.apache.org/documentation/#connect](https://kafka.apache.org/documentation/#connect).
1. For how to use the Kafka Connect plugin in a standalone Kafka environment, please refer to the official documentation: [https://kafka.apache.org/documentation/#connect](https://kafka.apache.org/documentation/#connect).
## Feedback
If you encounter any issues, feel free to report them in the GitHub repository for this project: [https://github.com/taosdata/kafka-connect-tdengine/issues](https://github.com/taosdata/kafka-connect-tdengine/issues).
If you encounter any issues, feel free to provide feedback in the Github repository of this project: [https://github.com/taosdata/kafka-connect-tdengine/issues](https://github.com/taosdata/kafka-connect-tdengine/issues).
## References

View File

@ -1,8 +1,8 @@
### Configure taosAdapter
### Configuring taosAdapter
To configure taosAdapter to receive collectd data:
Method to configure taosAdapter to receive collectd data:
- Enable the configuration items in the taosAdapter configuration file (default location is `/etc/taos/taosadapter.toml`):
- Enable the configuration item in the taosAdapter configuration file (default location is /etc/taos/taosadapter.toml)
```
...
@ -17,17 +17,17 @@ password = "taosdata"
...
```
The default database name that taosAdapter writes to is `collectd`, which can be changed by modifying the `dbs` item in the taosAdapter configuration file to specify a different name. The `user` and `password` fields should contain the actual TDengine configuration values. After modifying the configuration file, you need to restart taosAdapter.
The default database name written by taosAdapter is `collectd`, but you can also modify the dbs item in the taosAdapter configuration file to specify a different name. Fill in user and password with the actual TDengine configuration values. After modifying the configuration file, taosAdapter needs to be restarted.
- You can also enable the collectd data reception functionality in taosAdapter using command line parameters or by setting environment variables. For specific details, please refer to the taosAdapter reference manual.
- You can also use taosAdapter command line parameters or set environment variables to start, to enable taosAdapter to receive collectd data, for more details please refer to the taosAdapter reference manual.
### Configure collectd
### Configuring collectd
collectd uses a plugin mechanism to write the collected monitoring data to various data storage software in multiple forms. TDengine supports both the direct collection plugin and the write_tsdb plugin.
collectd uses a plugin mechanism that can write the collected monitoring data to different data storage software in various forms. TDengine supports direct collection plugins and write_tsdb plugins.
#### Configure Direct Collection Plugin Data Reception
#### Configuring to receive direct collection plugin data
Modify the collectd configuration file (default location is `/etc/collectd/collectd.conf`) with the relevant configuration items.
Modify the related configuration items in the collectd configuration file (default location /etc/collectd/collectd.conf).
```text
LoadPlugin network
@ -36,9 +36,9 @@ LoadPlugin network
</Plugin>
```
Replace `<taosAdapter's host>` with the domain name or IP address of the server running taosAdapter. Replace `<port for collectd direct>` with the port that taosAdapter uses to receive collectd data (default is 6045).
Where \<taosAdapter's host> should be filled with the domain name or IP address of the server running taosAdapter. \<port for collectd direct> should be filled with the port used by taosAdapter to receive collectd data (default is 6045).
Example:
Example as follows:
```text
LoadPlugin network
@ -47,9 +47,9 @@ LoadPlugin network
</Plugin>
```
#### Configure write_tsdb Plugin Data Reception
#### Configuring write_tsdb plugin data
Modify the collectd configuration file (default location is `/etc/collectd/collectd.conf`) with the relevant configuration items.
Modify the related configuration items in the collectd configuration file (default location /etc/collectd/collectd.conf).
```text
LoadPlugin write_tsdb
@ -62,9 +62,7 @@ LoadPlugin write_tsdb
</Plugin>
```
Replace `<taosAdapter's host>` with the domain name or IP address of the server running taosAdapter. Replace `<port for collectd write_tsdb plugin>` with the port that taosAdapter uses to receive data from the collectd write_tsdb plugin (default is 6047).
Example:
Where \<taosAdapter's host> should be filled with the domain name or IP address of the server running taosAdapter. \<port for collectd write_tsdb plugin> should be filled with the port used by taosAdapter to receive collectd write_tsdb plugin data (default is 6047).
```text
LoadPlugin write_tsdb

View File

@ -1,8 +1,8 @@
### Configure taosAdapter
### Configuring taosAdapter
To configure taosAdapter to receive icinga2 data:
Method to configure taosAdapter to receive icinga2 data:
- Enable the configuration items in the taosAdapter configuration file (default location is `/etc/taos/taosadapter.toml`):
- Enable the configuration item in the taosAdapter configuration file (default location /etc/taos/taosadapter.toml)
```
...
@ -17,14 +17,14 @@ password = "taosdata"
...
```
The default database name that taosAdapter writes to is `icinga2`, which can be changed by modifying the `dbs` item in the taosAdapter configuration file to specify a different name. The `user` and `password` fields should contain the actual TDengine configuration values. After modifying the configuration, you need to restart taosAdapter.
The default database name written by taosAdapter is `icinga2`, but you can also modify the dbs item in the taosAdapter configuration file to specify a different name. Fill in user and password with the actual TDengine configuration values. taosAdapter needs to be restarted after modifications.
- You can also enable the icinga2 data reception functionality in taosAdapter using command line parameters or by setting environment variables. For specific details, please refer to the taosAdapter reference manual.
- You can also use taosAdapter command line parameters or set environment variables to enable taosAdapter to receive icinga2 data, for more details please refer to the taosAdapter reference manual
### Configure icinga2
### Configuring icinga2
- Enable the icinga2 OpenTSDB writer (refer to the link [Icinga2 OpenTSDB Writer Documentation](https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer)).
- Modify the configuration file `/etc/icinga2/features-enabled/opentsdb.conf`, replacing `<taosAdapter's host>` with the domain name or IP address of the server running taosAdapter, and `<port for icinga2>` with the port that taosAdapter uses to receive icinga2 data (default is 6048).
- Enable icinga2's opentsdb-writer (reference link https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer)
- Modify the configuration file `/etc/icinga2/features-enabled/opentsdb.conf` filling in \<taosAdapter's host> with the domain name or IP address of the server running taosAdapter, \<port for icinga2> with the corresponding port supported by taosAdapter for receiving icinga2 data (default is 6048)
```
object OpenTsdbWriter "opentsdb" {
@ -33,7 +33,7 @@ object OpenTsdbWriter "opentsdb" {
}
```
Example configuration:
Example file:
```
object OpenTsdbWriter "opentsdb" {

View File

@ -1,18 +1,18 @@
To configure Prometheus, edit the Prometheus configuration file `prometheus.yml` (default location is `/etc/prometheus/prometheus.yml`).
Configuring Prometheus is done by editing the Prometheus configuration file `prometheus.yml` (default location `/etc/prometheus/prometheus.yml`).
### Configure Third-Party Database Address
Set the `remote_read` and `remote_write` URLs to point to the server domain name or IP address where the taosAdapter service is running, the REST service port (taosAdapter uses 6041 by default), and the database name where you want to write to TDengine. Ensure that the corresponding URLs are in the following format:
Set the `remote_read url` and `remote_write url` to point to the domain name or IP address of the server running the taosAdapter service, the REST service port (taosAdapter defaults to 6041), and the name of the database you want to write to in TDengine, ensuring the URLs are formatted as follows:
- `remote_read` url: `http://<taosAdapter's host>:<REST service port>/prometheus/v1/remote_read/<database name>`
- `remote_write` url: `http://<taosAdapter's host>:<REST service port>/prometheus/v1/remote_write/<database name>`
- remote_read url: `http://<taosAdapter's host>:<REST service port>/prometheus/v1/remote_read/<database name>`
- remote_write url: `http://<taosAdapter's host>:<REST service port>/prometheus/v1/remote_write/<database name>`
### Configure Basic Authentication
- `username`: `<TDengine's username>`
- `password`: `<TDengine's password>`
- username: \<TDengine's username>
- password: \<TDengine's password>
### Example Configuration in prometheus.yml for remote_write and remote_read
### Example configuration of remote_write and remote_read in the prometheus.yml file
```yaml
remote_write:

View File

@ -1,8 +1,8 @@
### Configure taosAdapter
To configure taosAdapter to receive StatsD data:
Method to configure taosAdapter to receive StatsD data:
- Enable the configuration item in the taosAdapter configuration file (default location is /etc/taos/taosadapter.toml)
- Enable the configuration item in the taosAdapter configuration file (default location /etc/taos/taosadapter.toml)
```
...
@ -25,17 +25,17 @@ deleteTimings = true
...
```
Here, the default database name written by taosAdapter is `statsd`, which can also be modified in the taosAdapter configuration file under the db field to specify a different name. The user and password should be filled with the actual TDengine configuration values. After modifying the configuration file, taosAdapter needs to be restarted.
The default database name written by taosAdapter is `statsd`, but you can also modify the db item in the taosAdapter configuration file to specify a different name. Fill in the user and password with the actual TDengine configuration values. After modifying the configuration file, taosAdapter needs to be restarted.
- You can also enable the taosAdapter to receive StatsD data by using command line parameters or setting environment variables. For specific details, please refer to the taosAdapter reference manual.
- You can also use taosAdapter command line arguments or set environment variables to enable the taosAdapter to receive StatsD data. For more details, please refer to the taosAdapter reference manual.
### Configure StatsD
To use StatsD, you need to download its [source code](https://github.com/statsd/statsd). Please refer to the example file `exampleConfig.js` located in the root directory of the downloaded source code for configuration modifications. Here, \<taosAdapter's host> should be replaced with the domain name or IP address of the server running taosAdapter, and \<port for StatsD> should be the port that taosAdapter uses to receive StatsD data (default is 6044).
To use StatsD, download its [source code](https://github.com/statsd/statsd). Modify its configuration file according to the example file `exampleConfig.js` found in the root directory of the local source code download. Replace \<taosAdapter's host> with the domain name or IP address of the server running taosAdapter, and \<port for StatsD> with the port that taosAdapter uses to receive StatsD data (default is 6044).
```
Add "./backends/repeater" to the backends section.
Add { host:'<taosAdapter's host>', port: <port for StatsD>} to the repeater section.
Add to the backends section "./backends/repeater"
Add to the repeater section { host:'<taosAdapter's host>', port: <port for StatsD>}
```
Example configuration file:
@ -48,7 +48,7 @@ port: 8125
}
```
After adding the above content, start StatsD (assuming the configuration file is modified to config.js).
After adding the following content, start StatsD (assuming the configuration file is modified to config.js).
```
npm install

View File

@ -1,8 +1,8 @@
### Configure taosAdapter
### Configuring taosAdapter
To configure taosAdapter to receive TCollector data:
To configure taosAdapter to receive data from TCollector:
- Enable the configuration item in the taosAdapter configuration file (default location is /etc/taos/taosadapter.toml)
- Enable the configuration in the taosAdapter configuration file (default location /etc/taos/taosadapter.toml)
```
...
@ -17,17 +17,17 @@ password = "taosdata"
...
```
Here, the default database name written by taosAdapter is `tcollector`, which can also be modified in the taosAdapter configuration file under the dbs field to specify a different name. The user and password should be filled with the actual TDengine configuration values. After modifying the configuration file, taosAdapter needs to be restarted.
The default database name that taosAdapter writes to is `tcollector`, but you can specify a different name by modifying the dbs option in the taosAdapter configuration file. Fill in the user and password with the actual values configured in TDengine. After modifying the configuration file, taosAdapter needs to be restarted.
- You can also enable the taosAdapter to receive TCollector data by using command line parameters or setting environment variables. For specific details, please refer to the taosAdapter reference manual.
- You can also use taosAdapter command line arguments or set environment variables to enable the taosAdapter to receive tcollector data. For more details, please refer to the taosAdapter reference manual.
### Configure TCollector
### Configuring TCollector
To use TCollector, you need to download its [source code](https://github.com/OpenTSDB/tcollector). The configuration items are in the source code. Note that there are significant differences between TCollector versions; here we only take the latest code from the current master branch (git commit: 37ae920) as an example.
To use TCollector, download its [source code](https://github.com/OpenTSDB/tcollector). Its configuration options are in its source code. Note: There are significant differences between different versions of TCollector; this only refers to the latest code in the current master branch (git commit: 37ae920).
Modify the corresponding contents in the `collectors/etc/config.py` and `tcollector.py` files. Change the address that originally pointed to the OpenTSDB host to the domain name or IP address of the server where taosAdapter is deployed, and modify the port to the one that taosAdapter supports for TCollector (default is 6049).
Modify the contents of `collectors/etc/config.py` and `tcollector.py`. Change the original address pointing to the OpenTSDB host to the domain name or IP address where taosAdapter is deployed, and change the port to the corresponding port supported by taosAdapter for TCollector use (default is 6049).
An example of the git diff output for the modified source code:
Example of git diff output for source code modifications:
```
index e7e7a1c..ec3e23c 100644

View File

@ -1,4 +1,4 @@
In the Telegraf configuration file (default location /etc/telegraf/telegraf.conf), add the outputs.http output module configuration:
In the Telegraf configuration file (default location /etc/telegraf/telegraf.conf), add the configuration for the outputs.http output module:
```
[[outputs.http]]
@ -9,9 +9,9 @@ In the Telegraf configuration file (default location /etc/telegraf/telegraf.conf
...
```
Here, \<taosAdapter's host> should be filled with the domain name or IP address of the server running the taosAdapter service, \<REST service port> should be filled with the REST service port (default is 6041), and \<TDengine's username> and \<TDengine's password> should be filled with the actual TDengine configuration currently in use. \<database name> should be filled with the name of the database where you want to save Telegraf data in TDengine.
Where \<taosAdapter's host> should be filled with the domain name or IP address of the server running the taosAdapter service, \<REST service port> should be filled with the port of the REST service (default is 6041), \<TDengine's username> and \<TDengine's password> should be filled with the actual configuration of the running TDengine, and \<database name> should be filled with the database name where you want to save Telegraf data in TDengine.
An example is as follows:
Example as follows:
```
[[outputs.http]]

View File

@ -20,50 +20,45 @@ import imgStep11 from '../../assets/grafana-11.png';
## Overview
This document describes how to integrate TDengine as a data source with the open-source data visualization system [Grafana](https://www.grafana.com/) to achieve data visualization and monitoring alarm system construction. With the TDengine plugin, you can easily display the data from TDengine tables on Grafana dashboards without complex development work.
This document describes how to integrate the TDengine data source with the open-source data visualization system [Grafana](https://www.grafana.com/) to achieve data visualization and build a monitoring and alert system. With the TDengine plugin, you can easily display data from TDengine tables on Grafana dashboards without the need for complex development work.
## Grafana Version Requirements
Currently, TDengine supports Grafana versions 7.5 and above. It is recommended to use the latest version. Please download and install the corresponding version of Grafana based on your system environment.
TDengine currently supports Grafana version 7.5 and above. It is recommended to use the latest version. Please download and install the corresponding version of Grafana according to your system environment.
## Prerequisites
To allow Grafana to properly add TDengine as a data source, the following preparations are required.
- Grafana service has been deployed and is running normally.
:::note
Ensure that the account running Grafana has write permissions for its installation directory; otherwise, you may be unable to install plugins later.
:::
To add the TDengine data source to Grafana normally, the following preparations are needed.
- Grafana service has been deployed and is running normally.
**Note**: Ensure that the account starting Grafana has write permissions to its installation directory, otherwise you may not be able to install plugins later.
- TDengine cluster has been deployed and is running normally.
- taosAdapter has been installed and is running normally. For specific details, please refer to the [taosAdapter usage manual](../../../tdengine-reference/components/taosadapter/).
- taosAdapter has been installed and is running normally. For details, please refer to the [taosAdapter user manual](../../../tdengine-reference/components/taosadapter/)
Record the following information:
- REST API address of the TDengine cluster, e.g., `http://tdengine.local:6041`.
- Authentication information for the TDengine cluster, which may include a username and password.
- TDengine cluster REST API address, such as: `http://tdengine.local:6041`.
- TDengine cluster authentication information, using username and password.
## Install Grafana Plugin and Configure Data Source
<Tabs defaultValue="script">
<TabItem value="gui" label="Graphical Interface Installation">
With the latest version of Grafana (8.5+), you can browse and manage plugins directly in Grafana's UI ([Plugin Management](https://grafana.com/docs/grafana/next/administration/plugin-management/#plugin-catalog)). For version 7.x, please use the **Install Script** or **Manual Installation** methods. In the Grafana management interface, navigate to **Configurations > Plugins**, search for `TDengine`, and follow the prompts to install.
Using the latest version of Grafana (8.5+), you can [browse and manage plugins](https://grafana.com/docs/grafana/next/administration/plugin-management/#plugin-catalog) in Grafana (for version 7.x, please use **Installation Script** or **Manual Installation** methods). In the Grafana management interface, directly search for `TDengine` on the **Configurations > Plugins** page and follow the prompts to install.
After installation, follow the instructions to **Create a TDengine data source** and enter the relevant TDengine configuration:
After installation, follow the instructions to **Create a TDengine data source** by adding the data source with the following TDengine configurations:
- Host: IP address and port number providing the REST service in the TDengine cluster, default is `http://localhost:6041`
- Host: IP address and port number providing REST service in the TDengine cluster, default `http://localhost:6041`
- User: TDengine username.
- Password: TDengine user password.
Click `Save & Test` to perform a test. If successful, it will indicate: `TDengine Data source is working`.
Click `Save & Test` to test, if successful, it will prompt: `TDengine Data source is working`.
</TabItem>
<TabItem value="script" label="Installation Script">
<TabItem value="script" label="Install Script">
For users using Grafana 7.x or those configuring with [Grafana Provisioning](https://grafana.com/docs/grafana/latest/administration/provisioning/), you can automatically install the plugin and add the data source provisioning configuration file using the install script on the Grafana server.
For users using Grafana version 7.x or configuring with [Grafana Provisioning](https://grafana.com/docs/grafana/latest/administration/provisioning/), you can use the installation script on the Grafana server to automatically install the plugin and add the data source Provisioning configuration file.
```sh
bash -c "$(curl -fsSL \
@ -73,15 +68,14 @@ bash -c "$(curl -fsSL \
-p taosdata
```
After installation, you will need to restart the Grafana service for the changes to take effect.
After installation, you need to restart the Grafana service for it to take effect.
Save this script and execute `./install.sh --help` to view detailed help documentation.
Save the script and execute `./install.sh --help` to view detailed help documentation.
</TabItem>
<TabItem value="manual" label="Manual Installation">
Use the [`grafana-cli` command line tool](https://grafana.com/docs/grafana/latest/administration/cli/) for plugin [installation](https://grafana.com/grafana/plugins/tdengine-datasource/?tab=installation).
Use the [`grafana-cli` command line tool](https://grafana.com/docs/grafana/latest/administration/cli/) to install the plugin [installation](https://grafana.com/grafana/plugins/tdengine-datasource/?tab=installation).
```bash
grafana-cli plugins install tdengine-datasource
@ -89,7 +83,7 @@ grafana-cli plugins install tdengine-datasource
sudo -u grafana grafana-cli plugins install tdengine-datasource
```
Alternatively, download the .zip file from [GitHub](https://github.com/taosdata/grafanaplugin/releases/tag/latest) or [Grafana](https://grafana.com/grafana/plugins/tdengine-datasource/?tab=installation) to your local machine and extract it to the Grafana plugin directory. Here is a command line download example:
Alternatively, download the .zip file from [GitHub](https://github.com/taosdata/grafanaplugin/releases/tag/latest) or [Grafana](https://grafana.com/grafana/plugins/tdengine-datasource/?tab=installation) to your local machine and unzip it into the Grafana plugins directory. Example command line download is as follows:
```bash
GF_VERSION=3.5.1
@ -99,33 +93,32 @@ wget https://github.com/taosdata/grafanaplugin/releases/download/v$GF_VERSION/td
wget -O tdengine-datasource-$GF_VERSION.zip https://grafana.com/api/plugins/tdengine-datasource/versions/$GF_VERSION/download
```
For example, on a CentOS 7.2 operating system, extract the plugin package to the /var/lib/grafana/plugins directory and restart Grafana.
For CentOS 7.2 operating system, unzip the plugin package into the /var/lib/grafana/plugins directory and restart Grafana.
```bash
sudo unzip tdengine-datasource-$GF_VERSION.zip -d /var/lib/grafana/plugins/
```
If Grafana is running in a Docker environment, you can use the following environment variable to automatically install the TDengine data source plugin:
If Grafana is running in a Docker environment, you can use the following environment variable to set up automatic installation of the TDengine data source plugin:
```bash
GF_INSTALL_PLUGINS=tdengine-datasource
```
Then, users can access Grafana through [http://localhost:3000](http://localhost:3000) (username/password: admin/admin) and add the data source through the left-side `Configuration -> Data Sources`.
Afterward, users can directly access the Grafana server at `http://localhost:3000` (username/password: admin/admin), and add a data source through `Configuration -> Data Sources`,
Click `Add data source` to enter the new data source page, type TDengine into the query box, and then click `select` to configure the data source. You will be prompted to modify the relevant configuration as follows:
Click `Add data source` to enter the new data source page, type TDengine in the search box, then click `select` to choose and you will enter the data source configuration page, modify the configuration according to the default prompts:
- Host: IP address and port number providing the REST service in the TDengine cluster, default is `http://localhost:6041`
- Host: IP address and port number providing REST service in the TDengine cluster, default `http://localhost:6041`
- User: TDengine username.
- Password: TDengine user password.
Click `Save & Test` to perform a test. If successful, it will indicate: `TDengine Data source is working`.
Click `Save & Test` to test, if successful, it will prompt: `TDengine Data source is working`
</TabItem>
<TabItem value="container" label="K8s/Docker container">
<TabItem value="container" label="K8s/Docker Container">
Refer to [Grafana Container Installation Instructions](https://grafana.com/docs/grafana/next/setup-grafana/installation/docker/#install-plugins-in-the-docker-container). Use the following command to start a container and automatically install the TDengine plugin:
Refer to [Grafana containerized installation instructions](https://grafana.com/docs/grafana/next/setup-grafana/installation/docker/#install-plugins-in-the-docker-container). Use the following command to start a container and automatically install the TDengine plugin:
```bash
docker run -d \
@ -135,7 +128,7 @@ docker run -d \
grafana/grafana
```
Using docker-compose, you can configure Grafana Provisioning for automated configuration and experience zero-configuration startup with TDengine + Grafana:
Using docker-compose, configure Grafana Provisioning for automated setup, and experience a zero-configuration start with TDengine + Grafana:
1. Save this file as `tdengine.yml`.
@ -155,7 +148,7 @@ Using docker-compose, you can configure Grafana Provisioning for automated confi
editable: true
```
2. Save this file as `docker-compose.yml`.
2. Save the file as `docker-compose.yml`.
```yml
version: "3.7"
@ -185,122 +178,122 @@ Using docker-compose, you can configure Grafana Provisioning for automated confi
tdengine-data:
```
3. Start TDengine + Grafana using the docker-compose command: `docker-compose up -d`.
3. Use the docker-compose command to start TDengine + Grafana: `docker-compose up -d`.
Open Grafana at [http://localhost:3000](http://localhost:3000), and you can now add dashboards.
Open Grafana [http://localhost:3000](http://localhost:3000), and now you can add Dashboards.
</TabItem>
</Tabs>
:::info
The following sections will use Grafana version 11.0.0 as an example. Other versions may have different functionalities. Please refer to the [Grafana Official Documentation](https://grafana.com/docs/grafana/latest/).
In the following text, we use Grafana v11.0.0 as an example. Other versions may have different features, please refer to [Grafana's official website](https://grafana.com/docs/grafana/latest/).
:::
## Dashboard User Guide
## Dashboard Usage Guide
This section is organized as follows:
1. Introduces the basics, including Grafana's built-in variables and custom variables, as well as TDengine's support for time series query syntax.
2. Shows how to create a dashboard using the TDengine data source in Grafana, including the unique syntax for time series queries and how to display data in groups.
3. Because the configured dashboard queries TDengine periodically to refresh the display, improper SQL writing may lead to significant performance issues. We provide performance optimization suggestions.
4. Finally, we illustrate how to import the TDinsight dashboard as an example.
1. Introduce basic knowledge, including Grafana's built-in variables and custom variables, and TDengine's special syntax support for time-series queries.
2. Explain how to use the TDengine data source in Grafana to create Dashboards, then provide the special syntax for time-series queries and how to group display data.
3. Since the configured Dashboard will periodically query TDengine to refresh the display, improper SQL writing can cause serious performance issues, so we provide performance optimization suggestions.
4. Finally, we use the TDengine monitoring panel TDinsight as an example to demonstrate how to import the Dashboards we provide.
### Grafana Built-in Variables and Custom Variables
The Variable feature in Grafana is powerful and can be used in dashboard queries, panel titles, labels, etc., to create more dynamic and interactive dashboards, improving user experience and efficiency.
The Variable feature in Grafana is very powerful and can be used in Dashboard queries, panel titles, tags, etc., to create more dynamic and interactive Dashboards, enhancing user experience and efficiency.
The main functions and characteristics of variables include:
The main functions and features of variables include:
- Dynamic Data Queries: Variables can be used in query statements, allowing users to dynamically change query conditions by selecting different variable values, thereby viewing different data views. This is useful in scenarios where data needs to be dynamically displayed based on user input.
- Dynamic data querying: Variables can be used in query statements, allowing users to dynamically change query conditions by selecting different variable values, thus viewing different data views. This is very useful for scenarios that require dynamically displaying data based on user input.
- Enhanced Reusability: By defining variables, the same configuration or query logic can be reused in multiple places without repeating the same code. This makes the maintenance and updates of the dashboard simpler and more efficient.
- Improved reusability: By defining variables, the same configuration or query logic can be reused in multiple places without having to rewrite the same code. This makes maintaining and updating Dashboards simpler and more efficient.
- Flexible Configuration Options: Variables offer various configuration options, such as predefined static value lists, dynamically queried values from data sources, regular expression filtering, etc., making the application of variables more flexible and powerful.
- Flexible configuration options: Variables offer a variety of configuration options, such as predefined static value lists, dynamically querying values from data sources, regular expression filtering, etc., making the application of variables more flexible and powerful.
Grafana provides built-in variables and custom variables, both of which can be referenced when writing SQL by using `$variableName`, where `variableName` is the name of the variable. For other referencing methods, please refer to [Reference Methods](https://grafana.com/docs/grafana/latest/dashboards/variables/variable-syntax/).
Grafana provides both built-in and custom variables, which can be referenced when writing SQL as `$variableName`, where `variableName` is the name of the variable. For other referencing methods, please refer to [Referencing Methods](https://grafana.com/docs/grafana/latest/dashboards/variables/variable-syntax/).
#### Built-in Variables
Grafana has built-in variables like `from`, `to`, and `interval`, which are derived from the Grafana plugin panel. Their meanings are as follows:
Grafana has built-in variables such as `from`, `to`, and `interval`, all derived from the Grafana plugin panel. Their meanings are as follows:
- `from`: Start time of the query range
- `to`: End time of the query range
- `interval`: Window slice interval
- `from` is the start time of the query range
- `to` is the end time of the query range
- `interval` is the window split interval
For each query, it is recommended to set the start and end times of the query range, which can effectively reduce the amount of data scanned by the TDengine server. The `interval` is the size of the window slice; in Grafana version 11, its size is calculated based on the time interval and the number of returned points.
For each query, it is recommended to set the start and end time of the query range, which can effectively reduce the amount of data scanned by the TDengine server during query execution. `interval` is the size of the window split, and in Grafana version 11, it is calculated based on the time interval and the number of returned points.
In addition to these three common variables, Grafana also provides variables like `__timezone`, `__org`, and `__user`. For more details, refer to [Built-in Variables](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables).
In addition to the three common variables mentioned above, Grafana also provides variables such as `__timezone`, `__org`, `__user`, etc. For more details, please refer to [Built-in Variables](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables).
#### Custom Variables
We can add custom variables to the dashboard. The use of custom variables is no different from built-in variables; they can be referenced in SQL using `$variableName`.
Custom variables support various types, common types include `Query` (query), `Constant` (constant), `Interval` (interval), `Data source` (data source), etc.
Custom variables can reference other custom variables; for example, one variable may represent a region, and another can reference the value of that region to query the devices in that area.
We can add custom variables in the Dashboard. The usage of custom variables is no different from built-in variables; they are referenced in SQL with `$variableName`.
Custom variables support multiple types, including common types such as `Query` (query), `Constant` (constant), `Interval` (interval), `Data source` (data source), etc.
Custom variables can reference other custom variables, for example, one variable represents a region, and another variable can reference the value of the region to query devices in that region.
##### Adding Query Type Variables
##### Adding a Query Type Variable
In the dashboard configuration, select 【Variables】, then click 【New variable】:
In the Dashboard configuration, select 【Variables】, then click 【New variable】:
1. In the “Name” field, enter your variable name; here we set the variable name as `selected_groups`.
2. In the 【Select variable type】 dropdown, select “Query”.
Based on the selected variable type, configure the corresponding options. For example, if you choose “Query” type, you need to specify the data source and the query statement used to obtain the variable value. Here, we will set the query type, select the data source, and configure SQL as `select distinct(groupid) from power.meters where groupid < 3 and ts > $from and ts < $to;`
3. After clicking the bottom 【Run Query】 button, you can see the variable values generated based on your configuration in the “Preview of values” section.
4. Other configurations will not be elaborated on; after completing the configuration, click the bottom 【Apply】 button, and then click the upper right 【Save dashboard】 to save.
1. In the "Name" field, enter your variable name, here we set the variable name as `selected_groups`.
2. In the 【Select variable type】dropdown menu, select "Query" (query).
Depending on the selected variable type, configure the corresponding options. For example, if you choose "Query", you need to specify the data source and the query statement to obtain the variable values. Here, we take smart meters as an example, set the query type, select the data source, and configure the SQL as `select distinct(groupid) from power.meters where groupid < 3 and ts > $from and ts < $to;`
3. After clicking 【Run Query】at the bottom, you can see the variable values generated based on your configuration in the "Preview of values" section.
4. Other configurations are not detailed here; after completing the configuration, click the 【Apply】button at the bottom of the page, then click 【Save dashboard】in the upper right corner to save.
After completing the above steps, we have successfully added a new custom variable `$selected_groups` to the dashboard. We can reference this variable in the dashboard queries later.
After completing the above steps, we have successfully added a new custom variable `$selected_groups` in the Dashboard. We can later reference this variable in the Dashboard's queries through `$selected_groups`.
We can also add another custom variable to reference the `selected_groups` variable; for example, we can add a query variable named `tbname_max_current` whose SQL is `select tbname from power.meters where groupid = $selected_groups and ts > $from and ts < $to;`.
We can also add another custom variable to reference this `selected_groups` variable, such as adding a query variable named `tbname_max_current`, with its SQL as `select tbname from power.meters where groupid = $selected_groups and ts > $from and ts < $to;`
##### Adding Interval Type Variables
##### Adding an Interval Type Variable
We can customize the time window interval to better fit business needs.
1. In the “Name” field, enter the variable name as `interval`.
2. In the 【Select variable type】 dropdown, select “Interval”.
3. In the 【Interval options】 section, enter `1s,2s,5s,10s,15s,30s,1m`.
4. Other configurations will not be elaborated on; after completing the configuration, click the bottom 【Apply】 button, and then click the upper right 【Save dashboard】 to save.
1. In the "Name" field, enter the variable name as `interval`.
2. In the 【Select variable type】dropdown menu, select "Interval" (interval).
3. In the 【Interval options】enter `1s,2s,5s,10s,15s,30s,1m`.
4. Other configurations are not detailed here; after completing the configuration, click the 【Apply】button at the bottom of the page, then click 【Save dashboard】in the upper right corner to save.
After completing the above steps, we have successfully added a new custom variable `$interval` to the dashboard. We can reference this variable in the dashboard queries later.
After completing the above steps, we have successfully added a new custom variable `$interval` in the Dashboard. We can later reference this variable in the Dashboard's queries through `$interval`.
:::note
If a custom variable has the same name as a Grafana built-in variable, the custom variable will be prioritized in references.
When custom variables and Grafana built-in variables have the same name, custom variables are referenced preferentially.
:::
### TDengine Time Series Query Support
### TDengine Time-Series Query Support
In addition to supporting standard SQL, TDengine also provides a series of distinctive query syntaxes that meet the needs of time series business scenarios, greatly facilitating the development of applications for time series scenarios.
On top of supporting standard SQL, TDengine also offers a series of special query syntaxes that meet the needs of time-series business scenarios, greatly facilitating the development of applications for time series scenarios.
- The `partition by` clause can segment data based on certain dimensions and then perform a series of calculations within the segmented data space. In most cases, it can replace `group by`.
- The `interval` clause is used to generate equal time period windows.
- The `fill` statement specifies the filling mode in case of data absence in a certain window interval.
- The timestamp pseudo-columns can be used in the SELECT clause to output the time window information corresponding to the aggregation results: start time of the time window (\_wstart), end time of the time window (\_wend), etc.
- The `partition by` clause can split data by certain dimensions, then perform a series of calculations within the split data space, often replacing `group by`.
- The `interval` clause is used to generate windows of equal time periods.
- The `fill` statement specifies the filling mode for missing data in a window interval.
- `Timestamp pseudocolumns` If you need to output the time window information corresponding to the aggregation results in the results, you need to use timestamp-related pseudo-columns in the SELECT clause: window start time (_wstart), window end time (_wend), etc.
For a detailed introduction to these features, please refer to [Distinctive Queries](../../../tdengine-reference/sql-manual/time-series-extensions/).
Detailed introduction to the above features can be found at [Distinguished Queries](../../../tdengine-reference/sql-manual/time-series-extensions/).
### Creating Dashboards
### Creating a Dashboard
With the foundational knowledge gained earlier, we can configure a dashboard based on the TDengine data source for time series data display.
In the main Grafana interface, create a dashboard and click 【Add Query】 to enter the panel query page:
With the foundational knowledge from earlier, we can configure a time-series data display Dashboard based on the TDengine data source.
Create a Dashboard on the Grafana main interface, click on 【Add Query】 to enter the panel query page:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
As shown in the image above, select the `TDengine` data source in the “Query” section, and you can enter the corresponding SQL for querying in the query box. Continuing with the smart meter example, we will **use virtual data** here to ensure the curve is aesthetically pleasing.
As shown in the image above, select the `TDengine` data source in "Query", and enter the corresponding SQL in the query box below. Continuing with the example of smart meters, to display a beautiful curve, **virtual data is used here**.
#### Displaying Time Series Data
#### Time-Series Data Display
Suppose we want to query the average current size over a period of time, using the `$interval` to slice the time window, filling null for missing data in a time window.
Suppose we want to query the average current size over a period of time, with the time window divided by `$interval`, and fill with null if data is missing in any time window.
- “INPUT SQL”: Enter the SQL query to execute (the result set of this SQL statement should consist of two columns and multiple rows). Here we enter: `select _wstart as ts, avg(current) as current from power.meters where groupid in ($selected_groups) and ts > $from and ts < $to interval($interval) fill(null)`, where from, to, and interval are Grafana built-in variables, and selected_groups is a custom variable.
- “ALIAS BY”: You can set the current query's alias.
- “GENERATE SQL”: Clicking this button will automatically replace the relevant variables and generate the final execution statement.
- “INPUT SQL”: Enter the query statement (the result set of this SQL statement should be two columns and multiple rows), here enter: `select _wstart as ts, avg(current) as current from power.meters where groupid in ($selected_groups) and ts > $from and ts < $to interval($interval) fill(null)`, where from, to, and interval are Grafana built-in variables, and selected_groups is a custom variable.
- “ALIAS BY”: You can set an alias for the current query.
- “GENERATE SQL”: Clicking this button will automatically replace the corresponding variables and generate the final execution statement.
In the top custom variable, if we select the value of `selected_groups` as 1, the query for the average current of all devices with `groupid` 1 in the `meters` supertable would change as shown below:
In the custom variables at the top, if the value of `selected_groups` is set to 1, then querying the average value changes of all devices' current in the `meters` supertable with `groupid` 1 is shown in the following image:
<figure>
<Image img={imgStep02} alt=""/>
@ -308,78 +301,78 @@ In the top custom variable, if we select the value of `selected_groups` as 1, th
:::note
Since the REST interface is stateless, you cannot use the `use db` statement to switch databases. In the Grafana plugin, you can specify the database in the SQL statement as `<db_name>.<table_name>`.
Since the REST interface is stateless, you cannot use the `use db` statement to switch databases. In the Grafana plugin, SQL statements can specify the database using \<db_name>.\<table_name>.
:::
#### Grouping Time Series Data for Display
#### Time-Series Data Group Display
Suppose we want to query the average current size over a period of time, displayed by `groupid`. We can modify the previous SQL to `select _wstart as ts, groupid, avg(current) as current from power.meters where ts > $from and ts < $to partition by groupid interval($interval) fill(null)`.
Suppose we want to query the average current size over a period of time and display it grouped by `groupid`, we can modify the previous SQL to `select _wstart as ts, groupid, avg(current) as current from power.meters where ts > $from and ts < $to partition by groupid interval($interval) fill(null)`
- “Group by column(s)”: Enter the `group by` or `partition by` column names separated by **commas**. If it is a `group by` or `partition by` query, set the “Group by” column to display multidimensional data. Here we set the “Group by” column name as `groupid` to display the data grouped by `groupid`.
- “Group By Format”: In `Group by` or `Partition by` scenarios, you can format the legend name. For the SQL mentioned above, set the “Group By Format” to `groupid-{{groupid}}`, and the legend name will be formatted accordingly.
- “Group by column(s)”: Comma-separated `group by` or `partition by` column names in **half-width** commas. If it is a `group by` or `partition by` query statement, set the “Group by” column to display multidimensional data. Here, set the “Group by” column name as `groupid` to display data grouped by `groupid`.
- “Group By Format”: Legend format for multidimensional data in `Group by` or `Partition by` scenarios. For example, in the above INPUT SQL, set the “Group By Format” to `groupid-{{groupid}}`, and the displayed legend name will be the formatted group name.
After completing the settings, the data will be displayed grouped by `groupid`, as shown below:
After completing the settings, the display grouped by `groupid` is shown in the following image:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
For more information on creating monitoring interfaces using Grafana and other related information, please refer to the official [Grafana documentation](https://grafana.com/docs/).
> For information on how to use Grafana to create corresponding monitoring interfaces and more about using Grafana, please refer to the official [documentation](https://grafana.com/docs/) of Grafana.
### Performance Optimization Suggestions
- **Always include a time range in all queries**. In time series databases, failing to include a time range will lead to table scans and poor performance. A common SQL statement format is `select column_name from db.table where ts > $from and ts < $to;`.
- For queries that represent the latest status, it is generally recommended to **enable caching when creating databases** (setting `CACHEMODEL` to last_row or both). A common SQL statement format is `select last(column_name) from db.table where ts > $from and ts < $to;`.
- **Add a time range to all queries**, in time-series databases, if a time range is not specified in the query, it will lead to table scanning and poor performance. A common SQL syntax is `select column_name from db.table where ts > $from and ts < $to;`
- For queries of the latest state type, we generally recommend **enabling cache when creating the database** (`CACHEMODEL` set to last_row or both), a common SQL syntax is `select last(column_name) from db.table where ts > $from and ts < $to;`
### Importing Dashboards
### Import Dashboard
In the data source configuration page, you can import the TDinsight panel as a monitoring visualization tool for your TDengine cluster. If the TDengine server is version 3.0, please select `TDinsight for 3.x` for import. Note that TDinsight for 3.x requires running and configuring taoskeeper.
On the data source configuration page, you can import the TDinsight panel for this data source, serving as a monitoring visualization tool for the TDengine cluster. If the TDengine server is version 3.0, please select `TDinsight for 3.x` for import. Note that TDinsight for 3.x requires running and configuring taoskeeper.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
Dashboards compatible with TDengine 2.* have been published on Grafana: [Dashboard 15167 - TDinsight](https://grafana.com/grafana/dashboards/15167)).
The Dashboard compatible with TDengine 2.* has been released on Grafana: [Dashboard 15167 - TDinsight](https://grafana.com/grafana/dashboards/15167)).
You can search for other panels using TDengine as the data source [here](https://grafana.com/grafana/dashboards/?dataSource=tdengine-datasource). Here is an incomplete list:
Other panels using TDengine as a data source can be [searched here](https://grafana.com/grafana/dashboards/?dataSource=tdengine-datasource). Below is a non-exhaustive list:
- [15146](https://grafana.com/grafana/dashboards/15146): Monitoring multiple TDengine clusters
- [15155](https://grafana.com/grafana/dashboards/15155): TDengine alarm examples
- [15155](https://grafana.com/grafana/dashboards/15155): TDengine alert example
- [15167](https://grafana.com/grafana/dashboards/15167): TDinsight
- [16388](https://grafana.com/grafana/dashboards/16388): Data display of node information collected by Telegraf
- [16388](https://grafana.com/grafana/dashboards/16388): Display of node information collected by Telegraf
## Alarm Configuration
## Alert Configuration
The TDengine Grafana plugin supports alarms. To configure alarms, several steps are required:
The TDengine Grafana plugin supports alerts. To configure alerts, follow these steps:
1. Configure contact points: Set up notification channels, including DingDing, Email, Slack, WebHook, Prometheus Alertmanager, etc.
2. Configure notification policies: Define which channel the alarm is sent to, along with the time and frequency of notifications.
3. Configure alarm rules: Specify detailed alarm rules
3.1 Configure the alarm name
3.2 Configure the query and alarm triggering conditions
3.3 Configure the rule evaluation strategy
3.4 Configure labels and alarm channels
2. Configure notification policies: Set up routing of alerts to specific channels, as well as notification timing and repeat frequency
3. Configure alert rules: Set up detailed alert rules
3.1 Configure alert name
3.2 Configure queries and alert trigger conditions
3.3 Configure rule evaluation strategy
3.4 Configure tags and alert channels
3.5 Configure notification content
### Introduction to Alarm Configuration Interface
### Alert Configuration Interface Introduction
In Grafana 11, the alarm interface consists of six tabs: “Alert rules,” “Contact points,” “Notification policies,” “Silences,” “Groups,” and “Settings.”
In Grafana 11, the alert interface has 6 tabs, namely "Alert rules", "Contact points", "Notification policies", "Silences", "Groups", and "Settings".
- “Alert rules” lists and configures the alarm rules.
- “Contact points” specifies notification channels, including Email, Slack, WebHook, Prometheus Alertmanager, etc.
- “Notification policies” configures the routing of alarms to specific channels and the timing and frequency of sending notifications.
- “Silences” configures silence periods for alarms.
- “Groups” manages alarm groups, which display triggered alarms in grouped format.
- “Settings” allows modification of alarm configurations via JSON.
- "Alert rules" displays and configures alert rules
- "Contact points" includes notification channels such as DingDing, Email, Slack, WebHook, Prometheus Alertmanager, etc.
- "Notification policies" sets up routing of alerts to specific channels, as well as notification timing and repeat frequency
- "Silences" configures silent periods for alerts
- "Groups" displays grouped alerts after they are triggered
- "Settings" allows modifying alert configurations via JSON
### Configure Contact Points
### Configuring Contact Points
This section provides an example of configuring contact points for email.
This section uses email and Lark as examples to configure contact points.
#### Configure Email Contact Point
#### Configuring Email Contact Points
Add SMTP/Emailing and Alerting modules to the Grafana service configuration file (for Linux systems, this configuration file is generally located at `/etc/grafana/grafana.ini`).
Add the SMTP/Emailing and Alerting modules to the configuration file of the Grafana service. (For Linux systems, the configuration file is usually located at `/etc/grafana/grafana.ini`)
Add the following content to the configuration file:
@ -387,20 +380,18 @@ Add the following content to the configuration file:
#################################### SMTP / Emailing ##########################
[smtp]
enabled = true
host = smtp.qq.com:465 # Email server
host = smtp.qq.com:465 #Email used
user = receiver@foxmail.com
password = *********** # Use email authorization code
password = *********** #Use mail authorization code
skip_verify = true
from_address = sender@foxmail.com
```
Then restart the Grafana service (for Linux systems, execute `systemctl restart grafana-server.service`).
Then restart the Grafana service (for Linux systems, execute `systemctl restart grafana-server.service`) to complete the addition.
In the Grafana UI, navigate to “Home” -> “Alerting” -> “Contact points” to create a new contact point.
“Name”: Email Contact Point
“Integration”: Select the contact type; here, choose Email and fill in the email address for receiving notifications.
On the Grafana page, go to "Home" -> "Alerting" -> "Contact points" and create a new contact point
"Name": Email Contact Point
"Integration": Select the contact type, here choose Email, fill in the email receiving address, and save the contact point after completion
<figure>
<Image img={imgStep05} alt=""/>
@ -408,13 +399,13 @@ In the Grafana UI, navigate to “Home” -> “Alerting” -> “Contact points
### Configure Notification Policies
After configuring contact points, you will see an existing Default Policy.
After configuring the contact points, you can see there is a Default Policy
<figure>
<Image img={imgStep06} alt=""/>
</figure>
Click the right “...” -> “Edit” to modify the default notification policy, opening the configuration window:
Click on the right side "..." -> "Edit", then edit the default notification policy, a configuration window pops up:
<figure>
<Image img={imgStep07} alt=""/>
@ -422,59 +413,59 @@ Click the right “...” -> “Edit” to modify the default notification polic
Then configure the following parameters:
- “Group wait”: The waiting time before sending the first alarm.
- “Group interval”: The waiting time for sending the next batch of new alarms after the first alarm is sent.
- “Repeat interval”: The waiting time to resend alarms after successfully sending them.
- "Group wait": The wait time before sending the first alert.
- "Group interval": The wait time to send the next batch of new alerts for the group after the first alert.
- "Repeat interval": The wait time to resend the alert after a successful alert.
### Configure Alarm Rules
### Configure Alert Rules
Using the smart meter alarm as an example, configuring alarm rules mainly involves the alarm name, query and alarm triggering conditions, rule evaluation strategy, labels and alarm channels, and notification content.
Taking the configuration of smart meter alerts as an example, the configuration of alert rules mainly includes alert name, query and alert trigger conditions, rule evaluation strategy, tags, alert channels, and notification copy.
#### Configure Alarm Name
#### Configure Alert Name
In the panel where you want to configure the alarm, select “Edit” -> “Alert” -> “New alert rule”.
In the panel where you need to configure the alert, select "Edit" -> "Alert" -> "New alert rule".
“Enter alert rule name”: For the smart meter example, enter `power meters alert`.
"Enter alert rule name" (input alert rule name): Here, enter `power meters alert` as an example
#### Configure Query and Alarm Triggering Conditions
#### Configure Query and Alert Trigger Conditions
In the “Define query and alert condition” section, configure the alarm rule.
In "Define query and alert condition" configure the alert rule.
1. Select data source: `TDengine Datasource`
1. Choose data source: `TDengine Datasource`
2. Query statement:
```sql
select _wstart as ts, groupid, avg(current) as current from power.meters where ts > $from and ts < $to partition by groupid interval($interval) fill(null)
```
3. Set “Expression”: `Threshold is above 100`
4. Click 【Set as alert condition】.
5. “Preview”: View the results of the configured rules.
3. Set "Expression" (expression): `Threshold is above 100`
4. Click [Set as alert condition]
5. "Preview": View the results of the set rules
After completing the settings, the interface will look like this:
After setting, you can see the image displayed below:
<figure>
<Image img={imgStep08} alt=""/>
</figure>
Grafanas “Expression” supports various operations and calculations on data. The types include:
Grafana's "Expression" (expression) supports various operations and calculations on data, which are divided into:
1. “Reduce”: Aggregates time series values within the selected time range into a single value.
1.1 “Function” is used to set the aggregation method, supporting Min, Max, Last, Mean, Sum, and Count.
1.2 “Mode” supports the following three options:
- “Strict”: If no data is queried, it will assign NaN to the data.
- “Drop Non-numeric Value”: Removes invalid data results.
- “Replace Non-numeric Value”: If invalid data is present, it replaces it with a fixed value.
2. “Threshold”: Checks if the time series data meets threshold conditions. If false, returns 0; if true, returns 1. The supported conditions include:
1. "Reduce": Aggregates the values of a time-series within a selected time range into a single value
1.1 "Function" is used to set the aggregation method, supporting Min, Max, Last, Mean, Sum, and Count.
1.2 "Mode" supports the following three:
- "Strict": If no data is queried, the data will be assigned as NaN.
- "Drop Non-numeric Value": Remove illegal data results.
- "Replace Non-numeric Value": If it is illegal data, replace it with a fixed value.
2. "Threshold": Checks whether the time-series data meets the threshold judgment conditions. Returns 0 when the condition is false, and 1 when true. Supports the following methods:
- Is above (x > y)
- Is below (x < y)
- Is within range (x > y1 AND x < y2)
- Is outside range (x < y1 AND x > y2)
3. “Math”: Performs mathematical operations on time series data.
4. “Resample”: Alters the timestamps in each time series to have consistent time intervals for mathematical operations.
5. “Classic condition (legacy)”: Allows configuring multiple logical conditions to determine whether to trigger the alarm.
- Is below (x \< y)
- Is within range (x > y1 AND x \< y2)
- Is outside range (x \< y1 AND x > y2)
3. "Math": Performs mathematical operations on the data of the time-series.
4. "Resample": Changes the timestamps in each time-series to have a consistent interval, allowing mathematical operations to be performed between them.
5. "Classic condition (legacy)": Configurable multiple logical conditions to determine whether to trigger an alert.
As shown in the previous screenshot, we set it to trigger an alarm when the maximum value exceeds 100.
As shown in the screenshot above, here we set the maximum value to trigger an alarm when it exceeds 100.
#### Configure Rule Evaluation Strategy
@ -482,27 +473,27 @@ As shown in the previous screenshot, we set it to trigger an alarm when the maxi
<Image img={imgStep09} alt=""/>
</figure>
Configure the following:
Complete the following configurations:
- “Folder”: Set the folder for the alarm rule.
- “Evaluation group”: Set the evaluation group for the alarm rule. The “Evaluation group” can either choose an existing group or create a new one; new groups can have names and evaluation intervals set.
- “Pending period”: Reasonably set the duration during which the abnormal value persists after the alarm rule threshold is triggered, which helps avoid false positives.
- "Folder": Set the directory to which the alert rule belongs.
- "Evaluation group": Set the evaluation group for the alert rule. "Evaluation group" can either select an existing group or create a new one, where you can set the group name and evaluation interval.
- "Pending period": After the threshold of the alert rule is triggered, how long the abnormal value continues can trigger an alarm, and a reasonable setting can avoid false alarms.
#### Configure Labels and Alarm Channels
#### Configure Labels and Alert Channels
<figure>
<Image img={imgStep10} alt=""/>
</figure>
Complete the configuration:
Complete the following configurations:
- “Labels”: Add labels to the rule for searching, silencing, or routing to notification policies.
- “Contact point”: Select the contact point; when an alarm occurs, notifications will be sent through the configured contact point.
- "Labels" adds labels to the rule for searching, silencing, or routing to notification policies.
- "Contact point" selects a contact point to notify through the set contact point when an alert occurs.
#### Configure Notification Content
#### Configure Notification Text
<figure>
<Image img={imgStep11} alt=""/>
</figure>
Set the “Summary” and “Description,” so if the alarm triggers, you will receive an alarm notification.
Set "Summary" and "Description", and if an alert is triggered, you will receive a notification.

View File

@ -3,34 +3,27 @@ title: Looker Studio
slug: /third-party-tools/analytics/looker-studio
---
Looker Studio, as a powerful reporting and business intelligence tool under Google, was formerly known as Google Data Studio. At the Google Cloud Next conference in 2022, Google renamed it to Looker Studio. This tool provides users with a convenient data reporting generation experience, thanks to its rich data visualization options and diverse data connection capabilities. Users can easily create data reports based on pre-set templates to meet various data analysis needs.
Looker Studio, a powerful reporting and business intelligence tool under Google, was formerly known as Google Data Studio. At the Google Cloud Next conference in 2022, Google renamed it to Looker Studio. This tool offers a convenient data reporting experience with its rich data visualization options and diverse data connection capabilities. Users can easily create data reports based on preset templates to meet various data analysis needs.
Due to its user-friendly interface and vast ecosystem support, Looker Studio is favored by many data scientists and professionals in the field of data analysis. Whether for beginners or seasoned analysts, Looker Studio allows users to quickly build beautiful and practical data reports, thereby gaining better insights into business trends, optimizing decision-making processes, and improving overall operational efficiency.
Due to its easy-to-use interface and extensive ecosystem support, Looker Studio is favored by many data scientists and professionals in the field of data analysis. Whether a beginner or a seasoned analyst, users can quickly build attractive and practical data reports with Looker Studio, thereby better understanding business trends, optimizing decision-making processes, and enhancing overall operational efficiency.
## Access
## Acquisition
Currently, the TDengine connector, as a partner connector for Looker Studio, is available on the Looker Studio website. When users access the Data Source list in Looker Studio, they can easily find and immediately use the TDengine connector by simply entering "TDengine" in the search bar.
Currently, the TDengine connector, as a partner connector for Looker Studio, is now available on the Looker Studio official website. When users access the Data Source list in Looker Studio, they simply need to enter "TDengine" in the search to easily find and immediately use the TDengine connector.
The TDengine connector is compatible with two types of data sources: TDengine Cloud and TDengine Server. TDengine Cloud is a fully managed IoT and industrial IoT big data cloud service platform launched by Taos Data, providing users with a one-stop data storage, processing, and analysis solution; while TDengine Server is the locally deployed version that supports access via the public network. The following content will focus on TDengine Cloud as an example.
The TDengine connector is compatible with both TDengine Cloud and TDengine Server data sources. TDengine Cloud is a fully managed IoT and industrial internet big data cloud service platform launched by Taos Data, providing users with a one-stop data storage, processing, and analysis solution; while TDengine Server is a locally deployed version that supports access via the public internet. The following content will introduce using TDengine Cloud as an example.
## Usage
The steps to use the TDengine connector in Looker Studio are as follows.
Step 1: After entering the details page of the TDengine connector, select TDengine Cloud from the Data Source dropdown list and click the Next button to enter the data source configuration page. Fill in the following information on this page, then click the Connect button.
Step 1, after entering the details page of the TDengine connector, select TDengine Cloud from the Data Source dropdown list, then click the Next button to enter the data source configuration page. Fill in the following information on this page, then click the Connect button.
- URL and TDengine Cloud Token, which can be obtained from the TDengine Cloud instance list.
- URL and TDengine Cloud Token, which can be obtained from the instance list of TDengine Cloud.
- Database name and supertable name.
- Start time and end time for querying data.
- Start and end times for querying data.
Step 2, Looker Studio will automatically load the fields and tags of the supertable under the configured TDengine database based on the configuration.
Step 3, click the Explore button in the upper right corner of the page to view the data loaded from the TDengine database.
Step 4, based on the needs, use the charts provided by Looker Studio to configure data visualization.
Step 2: Looker Studio will automatically load the fields and tags of the configured supertables under the TDengine database based on the configuration.
Step 3: Click the Explore button at the top right of the page to view the data loaded from the TDengine database.
Step 4: Based on your needs, use the charts provided by Looker Studio to configure data visualization.
:::note
When using for the first time, please authorize access to the TDengine connector in Looker Studio according to the prompts on the page.
:::
**Note** On the first use, please follow the prompts on the page to authorize access to Looker Studio's TDengine connector.

View File

@ -3,93 +3,77 @@ title: Microsoft Power BI
slug: /third-party-tools/analytics/power-bi
---
Power BI is a business analytics tool provided by Microsoft. By configuring the ODBC connector, Power BI can quickly access data from TDengine. Users can import tag data, raw time-series data, or time-aggregated time-series data from TDengine into Power BI to create reports or dashboards, all without any coding involved.
Power BI is a business analytics tool provided by Microsoft. By configuring the use of the ODBC connector, Power BI can quickly access data from TDengine. Users can import tag data, raw time-series data, or time-aggregated time series data from TDengine into Power BI to create reports or dashboards, all without the need for any coding.
## Prerequisites
Ensure that Power BI Desktop is installed and running (if not installed, please download the latest version for Windows operating system, 32/64-bit, from its official site).
Install and run Power BI Desktop software (if not installed, please download the latest version for Windows OS 32/64 bit from its official address).
## Install ODBC Driver
Download the latest Windows x64 client driver from the TDengine official website and install it on the machine running Power BI. Once installed successfully, the TDengine driver should be visible in the "ODBC Data Sources (32-bit)" or "ODBC Data Sources (64-bit)" management tools.
Download the latest Windows OS X64 client driver from the TDengine official website and install it on the machine running Power BI. After successful installation, the TDengine driver can be seen in the "ODBC Data Sources (32-bit)" or "ODBC Data Sources (64-bit)" management tool.
## Configure ODBC Data Source
The steps to configure the ODBC data source are as follows.
Step 1: Search for and open the "ODBC Data Sources (32-bit)" or "ODBC Data Sources (64-bit)" management tool in the Windows start menu.
Step 1, search and open "ODBC Data Sources (32-bit)" or "ODBC Data Sources (64-bit)" management tool from the Windows start menu.
Step 2, click the "User DSN" tab → "Add" button, enter the "Create New Data Source" dialog box.
Step 3, in the list of "Select the driver you want to install for this data source" choose "TDengine", click the "Finish" button, enter the TDengine ODBC data source configuration page. Fill in the necessary information as follows.
Step 2: Click the "User DSN" tab → "Add" button to enter the "Create New Data Source" dialog.
Step 3: Select "TDengine" from the list of drivers and click "Finish" to enter the TDengine ODBC data source configuration page. Fill in the following required information.
- DSN: Data Source Name, required, for example, "MyTDengine".
- DSN: Data source name, required, such as "MyTDengine".
- Connection Type: Check the "WebSocket" checkbox.
- URL: ODBC Data Source URL, required, for example, `http://127.0.0.1:6041`.
- Database: Indicates the database to connect to, optional, for example, "test".
- Username: Enter the username; if left blank, the default is "root".
- Password: Enter the user password; if left blank, the default is "taosdata".
- URL: ODBC data source URL, required, such as `http://127.0.0.1:6041`.
- Database: Indicates the database to connect to, optional, such as "test".
- Username: Enter username, if not filled, default is "root".
- Password: Enter user password, if not filled, default is "taosdata".
Step 4: Click the "Test Connection" button to test the connection. If successful, a message will prompt "Successfully connected to `http://127.0.0.1:6041`".
Step 5: Click the "OK" button to save the configuration and exit.
Step 4, click the "Test Connection" button, test the connection situation, if successfully connected, it will prompt "Successfully connected to `http://127.0.0.1:6041`".
Step 5, click the "OK" button, to save the configuration and exit.
## Import TDengine Data into Power BI
The steps to import TDengine data into Power BI are as follows:
Step 1, open Power BI and log in, click "Home" → "Get Data" → "Other" → "ODBC" → "Connect", add data source.
Step 2, select the data source name just created, such as "MyTDengine", if you need to enter SQL, you can click the "Advanced options" tab, in the expanded dialog box enter the SQL statement. Click the "OK" button to connect to the configured data source.
Step 3, enter the "Navigator", you can browse the corresponding database's tables/views and load data.
Step 1: Open Power BI, log in, then click "Home" → "Get Data" → "Other" → "ODBC" → "Connect" to add the data source.
To fully leverage Power BI's advantages in analyzing data from TDengine, users need to first understand core concepts such as dimensions, metrics, window split queries, data split queries, time-series, and correlation, then import data through custom SQL.
Step 2: Select the data source name you just created, for example, "MyTDengine". If you need to enter SQL, click on the "Advanced options" tab and enter the SQL statement in the editing box of the expanded dialog. Click the "OK" button to connect to the configured data source.
Step 3: In the "Navigator", you can browse the corresponding data tables/views in the database and load the data.
To fully leverage Power BI's capabilities in analyzing data from TDengine, users need to understand core concepts such as dimensions, measures, windowed queries, data partitioning queries, time series, and correlations before importing data via custom SQL.
- Dimensions: Typically categorical (text) data that describe categories like device, measurement point, model, etc. In TDengine's supertable, dimension information is stored in the tag columns, which can be quickly retrieved with SQL syntax like `select distinct tbname, tag1, tag2 from supertable`.
- Measures: Quantitative (numeric) fields that can be used for calculations, common calculations include sum, average, and minimum values. If the collection frequency for measurement points is 1 second, then one year will have over 30 million records, which would severely impact performance if all this data were imported into Power BI. In TDengine, users can utilize data partitioning queries, windowed queries, and related pseudo-columns to import downsampled data into Power BI. For specific syntax, refer to the distinguished query section of the TDengine official documentation.
- Windowed Queries: For example, if a temperature sensor collects data every second but needs to query the average temperature every 10 minutes, a window clause can be used to obtain the required downsampled query result, corresponding SQL might look like `select tbname, _wstart date, avg(temperature) temp from table interval(10m)`, where `_wstart` is a pseudo-column representing the start time of the time window, and `10m` indicates the duration of the time window, with `avg(temperature)` being the aggregation value within that time window.
- Data Partitioning Queries: If you need to obtain aggregated values for many temperature sensors simultaneously, data can be partitioned and a series of calculations can be performed within the partitioned data space, corresponding SQL would look like `partition by part_list`. The most common use of data partitioning clauses is to segment subtable data in supertable queries by tags, allowing each subtable's data to be separated and forming independent time series for statistical analysis in various time-series scenarios.
- Time Series: When drawing curves or aggregating data by time, a date table is usually required. The date table can be imported from an Excel file or retrieved using SQL in TDengine, for example, `select _wstart date, count(*) cnt from test.meters where ts between A and B interval(1d) fill(0)`, where the `fill` clause indicates the fill mode for missing data, and the pseudo-column `_wstart` represents the date column to be obtained.
- Correlation: Indicates how data relates to one another, such as measures and dimensions being associated through the `tbname` column, while the date table and measures can be associated through the `date` column, facilitating the creation of visual reports.
- Dimensions: Usually categorical (text) data, describing device, measurement point, model, and other category information. In TDengine's supertables, dimension information is stored using tag columns, which can be quickly obtained through SQL syntax like "select distinct tbname, tag1, tag2 from supertable".
- Metrics: Quantitative (numeric) fields that can be used for calculations, common calculations include sum, average, and minimum. If the measurement point's collection period is 1s, then there are over 30 million records in a year, importing all these data into Power BI would severely affect its performance. In TDengine, users can use data split queries, window split queries, and other syntax combined with window-related pseudocolumns to import down-sampled data into Power BI, for specific syntax please refer to the TDengine official documentation's feature query function section.
- Window split query: For example, a temperature sensor collects data every second, but needs to query the average temperature every 10 minutes, in this scenario you can use the window clause to get the required down-sampling query result, corresponding SQL like `select tbname, _wstart date, avg(temperature) temp from table interval(10m)`, where `_wstart` is a pseudocolumn, representing the start time of the time window, 10m represents the duration of the time window, avg(temperature) represents the aggregate value within the time window.
- Data split query: If you need to obtain aggregate values for many temperature sensors at the same time, you can split the data, then perform a series of calculations within the split data space, corresponding SQL like `partition by part_list`. The most common use of the data split clause is to split subtable data by tags in supertable queries, isolating each subtable's data into independent time-series, facilitating statistical analysis for various time series scenarios.
- Time-Series: When drawing curves or aggregating data by time, it is usually necessary to introduce a date table. Date tables can be imported from Excel spreadsheets, or obtained in TDengine by executing SQL like `select _wstart date, count(*) cnt from test.meters where ts between A and B interval(1d) fill(0)`, where the fill clause represents the filling mode in case of data missing, and the pseudocolumn `_wstart` is the date column to be obtained.
- Correlation: Tells how data is related, such as metrics and dimensions can be associated together through the tbname column, date tables and metrics can be associated through the date column, combined to form visual reports.
## Smart Meter Example
TDengine employs a unique data model optimized for the storage and query performance of time-series data. This model uses supertables as templates to create independent tables for each device. Each table is designed with high scalability in mind, capable of containing up to 4096 data columns and 128 tag columns. This design allows TDengine to efficiently handle vast amounts of time-series data while maintaining data flexibility and usability.
TDengine employs a unique data model to optimize the storage and query performance of time-series data. This model uses supertables as templates to create an independent table for each device. Each table is designed with high scalability in mind, supporting up to 4096 data columns and 128 tag columns. This design enables TDengine to efficiently handle large volumes of time-series data while maintaining flexibility and ease of use.
For example, if each smart meter generates one record every second, it would produce 86,400 records per day. For 1,000 smart meters, this would result in approximately 600GB of storage space occupied by the records generated annually. Given this large volume of data, business intelligence tools like Power BI play a crucial role in data analysis and visualization.
Taking smart meters as an example, suppose each meter generates one record per second, resulting in 86,400 records per day. For 1,000 smart meters, the records generated per year would occupy about 600GB of storage space. Facing such a massive volume of data, business intelligence tools like Power BI play a crucial role in data analysis and visualization.
In Power BI, users can map the tag columns from the TDengine table as dimension columns for data grouping and filtering. Meanwhile, the aggregated results of the data columns can be imported as measure columns for calculating key metrics and generating reports. This way, Power BI helps decision-makers quickly access the information they need, gain insights into business operations, and make more informed decisions.
In Power BI, users can map the tag columns in TDengine tables to dimension columns for grouping and filtering data. Meanwhile, the aggregated results of the data columns can be imported as measure columns for calculating key indicators and generating reports. In this way, Power BI helps decision-makers quickly obtain the information they need, gain a deeper understanding of business operations, and make more informed decisions.
By following these steps, users can experience the functionality of generating time-series data reports through Power BI.
Follow the steps below to experience the functionality of generating time-series data reports through Power BI.
Step 1, Use TDengine's taosBenchMark to quickly generate data for 1,000 smart meters over 3 days, with a collection frequency of 1s.
```shell
taosBenchmark -t 1000 -n 259200 -S 1000 -y
```
Step 2, Import dimension data. In Power BI, import the tag columns of the table, named as tags, using the following SQL to get the tag data of all smart meters under the supertable.
```sql
select distinct tbname device, groupId, location from test.meters
```
Step 3, Import measure data. In Power BI, import the average current, voltage, and phase of each smart meter in 1-hour time windows, named as data, with the following SQL.
```sql
select tbname, _wstart ws, avg(current), avg(voltage), avg(phase) from test.meters PARTITION by tbname interval(1h)
```
Step 4, Import date data. Using a 1-day time window, obtain the time range and data count of the time-series data, with the following SQL. In the Power Query editor, convert the format of the date column from "text" to "date".
```sql
select _wstart date, count(*) from test.meters interval(1d) having count(*)>0
```
Step 5, Establish the relationship between dimensions and measures. Open the model view and establish the relationship between the tags and data tables, setting tbname as the relationship data column.
Step 6, Establish the relationship between date and measures. Open the model view and establish the relationship between the date dataset and data, with the relationship data columns being date and datatime.
Step 7, Create reports. Use these data in bar charts, pie charts, and other controls.
Step 1: Use TDengine's taosBenchMark to quickly generate data for 1,000 smart meters over 3 days, with a collection frequency of 1 second.
```shell
taosBenchmark -t 1000 -n 259200 -S 1000 -y
```
Step 2: Import dimension data. In Power BI, import the tag columns from the table, naming it `tags`, and use the following SQL to retrieve all smart meters' tag data under the supertable.
```sql
select distinct tbname device, groupId, location from test.meters
```
Step 3: Import measure data. In Power BI, import the average current, average voltage, and average phase of each smart meter based on a 1-hour time window, naming it `data`, with the following SQL.
```sql
select tbname, _wstart ws, avg(current), avg(voltage), avg(phase) from test.meters PARTITION by tbname interval(1h)
```
Step 4: Import date data. Based on a 1-day time window, retrieve the time range and data count of the time-series data with the following SQL. It is necessary to convert the `date` column format from "text" to "date" in the Power Query editor.
```sql
select _wstart date, count(*) from test.meters interval(1d) having count(*)>0
```
Step 5: Establish relationships between dimensions and measures. Open the model view and establish a relationship between the `tags` and `data` tables by setting `tbname` as the related data column.
Step 6: Establish a relationship between the date and measure. Open the model view and establish a relationship between the `date` dataset and `data` by using the `date` and `datatime` data columns.
Step 7: Create reports. Use these data in bar charts, pie charts, and other controls.
Thanks to TDengine's powerful performance in handling time-series data, users can enjoy a great experience during data import and daily periodic data refreshes. For more information on building visual effects in Power BI, please refer to the official documentation of Power BI.
Due to TDengine's superior performance in handling time-series data, users can enjoy a very good experience during data import and daily regular data refreshes. For more information on building Power BI visual effects, please refer to the official Power BI documentation.

View File

@ -9,33 +9,37 @@ import imgStep02 from '../../assets/seeq-02.png';
import imgStep03 from '../../assets/seeq-03.png';
import imgStep04 from '../../assets/seeq-04.png';
Seeq is an advanced analytics software designed for manufacturing and the Industrial Internet of Things (IIOT). It supports new features that leverage machine learning innovations within process manufacturing organizations. These features enable organizations to deploy their own or third-party machine learning algorithms to frontline process engineers and subject matter experts, thereby scaling the efforts of individual data scientists to many frontline employees.
Seeq is advanced analytics software for the manufacturing and Industrial Internet of Things (IIOT). Seeq supports innovative new features using machine learning in process manufacturing organizations. These features enable organizations to deploy their own or third-party machine learning algorithms to advanced analytics applications used by frontline process engineers and subject matter experts, thus extending the efforts of a single data scientist to many frontline staff.
With the TDengine Java connector, Seeq can easily support querying the time-series data provided by TDengine and offer capabilities for data display, analysis, and forecasting.
Through the TDengine Java connector, Seeq can easily support querying time-series data provided by TDengine and offer data presentation, analysis, prediction, and other functions.
## Installing Seeq
## Seeq Installation Method
Download the relevant software, such as Seeq Server and Seeq Data Lab, from the [Seeq official website](https://www.seeq.com/customer-download). Seeq Data Lab needs to be installed on a different server than Seeq Server and interconnected via configuration. For detailed installation instructions, refer to the [Seeq Knowledge Base](https://support.seeq.com/kb/latest/cloud/).
Download the relevant software from [Seeq's official website](https://www.seeq.com/customer-download), such as Seeq Server and Seeq Data Lab, etc. Seeq Data Lab needs to be installed on a different server from Seeq Server and interconnected through configuration. For detailed installation and configuration instructions, refer to the [Seeq Knowledge Base](https://support.seeq.com/kb/latest/cloud/).
### TDengine Local Instance Installation Method
Please refer to the [official documentation](../../../get-started).
## Configuring Seeq to Access TDengine
1. Check the data storage location.
1. Check the data storage location
```shell
sudo seeq config get Folders/Data
```
2. Download the TDengine Java connector package from maven.org; the latest version is [3.2.5](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/3.2.5/taos-jdbcdriver-3.2.5-dist.jar), and copy it to the plugins\lib directory in the data storage location.
2. Download the TDengine Java connector package from maven.org, the latest version is [3.2.5](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/3.2.5/taos-jdbcdriver-3.2.5-dist.jar), and copy it to the plugins\lib in the data storage location.
3. Restart the Seeq server.
3. Restart seeq server
```shell
sudo seeq restart
```
4. Enter the License.
4. Enter License
Access the server via the browser at `ip:34216` and follow the instructions to enter the license.
Use a browser to visit ip:34216 and follow the instructions to enter the license.
## Using Seeq to Analyze TDengine Time-Series Data
@ -43,7 +47,7 @@ This section demonstrates how to use Seeq software in conjunction with TDengine
### Scenario Introduction
The example scenario involves a power system where users collect electricity consumption data from station instruments daily and store it in a TDengine cluster. Now, users want to predict how electricity consumption will develop and purchase more equipment to support it. Electricity consumption varies with monthly orders and also changes with the seasons; for example, in summer, electricity consumption tends to be higher in this northern hemisphere city. We simulate data to reflect these assumptions.
The example scenario is a power system where users collect electricity usage data from power station instruments daily and store it in the TDengine cluster. Now, users want to predict how power consumption will develop and purchase more equipment to support it. User power consumption varies with monthly orders, and considering seasonal changes, power consumption will differ. This city is located in the northern hemisphere, so more electricity is used in summer. We simulate data to reflect these assumptions.
### Data Schema
@ -63,13 +67,13 @@ python mockdata.py
taos -s "insert into power.goods select _wstart, _wstart + 10d, avg(goods) from power.meters interval(10d);"
```
The source code is hosted in the [GitHub repository](https://github.com/sangshuduo/td-forecasting).
The source code is hosted on [GitHub Repository](https://github.com/sangshuduo/td-forecasting).
## Analyzing Data with Seeq
## Using Seeq for Data Analysis
### Configuring Data Source
Log in with a Seeq administrator account and create a new data source.
Log in using a Seeq administrator role account and create a new data source.
- Power
@ -251,7 +255,7 @@ Log in with a Seeq administrator account and create a new data source.
### Using Seeq Workbench
Log in to the Seeq service page and create a new Seeq Workbench. You can display or forecast data by selecting data sources and different tools as needed. For detailed usage instructions, refer to the [official knowledge base](https://support.seeq.com/space/KB/146440193/Seeq+Workbench).
Log in to the Seeq service page and create a new Seeq Workbench. By selecting data sources from search results and choosing different tools as needed, you can display data or make predictions. For detailed usage methods, refer to the [official knowledge base](https://support.seeq.com/space/KB/146440193/Seeq+Workbench).
<figure>
<Image img={imgStep02} alt=""/>
@ -259,7 +263,7 @@ Log in to the Seeq service page and create a new Seeq Workbench. You can display
### Further Data Analysis with Seeq Data Lab Server
Log in to the Seeq service page and create a new Seeq Data Lab to use Python programming or other machine learning tools for more complex data mining functions.
Log in to the Seeq service page and create a new Seeq Data Lab, where you can use Python programming or other machine learning tools for more complex data mining functions.
```Python
from seeq import spy
@ -322,17 +326,18 @@ pd.concat([data2, predicts]).set_index("ds").plot(title = "current data with for
plt.show()
```
The output of the program will be:
Program output results:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
## Configuring Seeq Data Source to Connect to TDengine Cloud
## Configuring Seeq Data Source Connection to TDengine Cloud
Configuring the Seeq data source to connect to TDengine Cloud is not fundamentally different from connecting to a locally installed instance of TDengine. Simply log in to TDengine Cloud, choose "Programming - Java," and copy the JDBC URL with the token string to fill in the DatabaseJdbcUrl value of the Seeq Data Source. Note that when using TDengine Cloud, you need to specify the database name in the SQL command.
Configuring a Seeq data source connection to TDengine Cloud is essentially no different from connecting to a local TDengine installation. Simply log in to TDengine Cloud, select "Programming - Java" and copy the JDBC string with a token to fill in as the DatabaseJdbcUrl value for the Seeq Data Source.
Note that when using TDengine Cloud, the database name needs to be specified in SQL commands.
### Example Configuration for Using TDengine Cloud as Data Source
### Configuration example using TDengine Cloud as a data source
```json
{
@ -391,7 +396,7 @@ Configuring the Seeq data source to connect to TDengine Cloud is not fundamental
}
```
### Example Seeq Workbench Interface with TDengine Cloud as Data Source
### Example of Seeq Workbench Interface with TDengine Cloud as Data Source
<figure>
<Image img={imgStep04} alt=""/>
@ -399,8 +404,8 @@ Configuring the Seeq data source to connect to TDengine Cloud is not fundamental
## Solution Summary
By integrating Seeq with TDengine, users can fully leverage TDengine's efficient storage and querying performance while benefiting from the powerful data visualization and analysis capabilities that Seeq offers.
By integrating Seeq and TDengine, users can fully leverage the efficient storage and querying capabilities of TDengine, while also benefiting from the powerful data visualization and analysis features provided by Seeq.
This integration enables users to take full advantage of TDengine's high-performance time-series data storage and retrieval, ensuring efficient processing of large datasets. At the same time, Seeq provides advanced analytics features such as data visualization, anomaly detection, correlation analysis, and predictive modeling, allowing users to gain valuable insights and make data-driven decisions.
This integration enables users to fully utilize TDengine's high-performance time-series data storage and retrieval, ensuring efficient handling of large volumes of data. Meanwhile, Seeq offers advanced analytical features such as data visualization, anomaly detection, correlation analysis, and predictive modeling, allowing users to gain valuable insights and make data-driven decisions.
Overall, Seeq and TDengine together provide a comprehensive solution for time-series data analysis across various industries, including manufacturing, industrial IoT, and power systems. The combination of efficient data storage and advanced analytics empowers users to unlock the full potential of time-series data, drive operational improvements, and support predictive and planning analytics applications.
Overall, Seeq and TDengine together provide a comprehensive solution for time-series data analysis across various industries such as manufacturing, industrial IoT, and power systems. The combination of efficient data storage and advanced analytics empowers users to fully exploit the potential of time-series data, driving operational improvements, and supporting predictive and planning analysis applications.

View File

@ -10,42 +10,43 @@ import imgStep03 from '../../assets/dbeaver-03.png';
import imgStep04 from '../../assets/dbeaver-04.png';
import imgStep05 from '../../assets/dbeaver-05.png';
DBeaver is a popular cross-platform database management tool that facilitates data management for developers, database administrators, data analysts, and other users. Starting from version 23.1.1, DBeaver has built-in support for TDengine, accommodating both standalone TDengine clusters and TDengine Cloud.
DBeaver is a popular cross-platform database management tool that facilitates developers, database administrators, and data analysts in managing data. DBeaver has embedded support for TDengine starting from version 23.1.1. It supports both standalone deployed TDengine clusters and TDengine Cloud.
## Prerequisites
To manage TDengine with DBeaver, the following preparations are necessary:
Using DBeaver to manage TDengine requires the following preparations.
- Install DBeaver. DBeaver supports major operating systems, including Windows, macOS, and Linux. Please ensure to [download](https://dbeaver.io/download/) the installation package for the correct platform and version (23.1.1+). For detailed installation steps, refer to the [DBeaver official documentation](https://github.com/dbeaver/dbeaver/wiki/Installation).
- If using a standalone TDengine cluster, ensure that TDengine is running normally and that the taosAdapter is installed and functioning correctly. For specific details, refer to the [taosAdapter user manual](../../../tdengine-reference/components/taosadapter/).
- Install DBeaver. DBeaver supports mainstream operating systems including Windows, macOS, and Linux. Please make sure to [download](https://dbeaver.io/download/) the correct platform and version (23.1.1+) of the installer. For detailed installation steps, refer to the [DBeaver official documentation](https://github.com/dbeaver/dbeaver/wiki/Installation).
- If using a standalone deployed TDengine cluster, ensure that TDengine is running normally, and that taosAdapter has been installed and is running properly. For specific details, refer to the [taosAdapter user manual](../../../tdengine-reference/components/taosadapter).
## Using DBeaver to Access an On-Premises TDengine
## Using DBeaver to Access Internally Deployed TDengine
1. Launch the DBeaver application, click the button or menu item to "Connect to Database," and select TDengine from the time series category.
1. Launch the DBeaver application, click the button or menu item to "Connect to Database", then select TDengine in the time-series category.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
2. Configure the TDengine connection by entering the host address, port number, username, and password. If TDengine is deployed on the local machine, you can simply enter the username and password; the default username is root, and the default password is taosdata. Click "Test Connection" to check the availability of the connection. If the TDengine Java connector is not installed on the local machine, DBeaver will prompt you to download it.
2. Configure the TDengine connection by entering the host address, port number, username, and password. If TDengine is deployed on the local machine, you can just enter the username and password, with the default username being root and the default password being taosdata. Click "Test Connection" to test if the connection is available. If the TDengine Java
connector is not installed on the local machine, DBeaver will prompt to download and install it.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
3. If the connection is successful, it will display as shown in the image below. If the connection fails, please check whether the TDengine service and taosAdapter are running correctly, and verify the host address, port number, username, and password.
3. A successful connection will be displayed as shown below. If the connection fails, check whether the TDengine service and taosAdapter are running correctly, and whether the host address, port number, username, and password are correct.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
4. Use DBeaver to select databases and tables to browse the data from the TDengine service.
4. Using DBeaver to select databases and tables allows you to browse data from the TDengine service.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
5. You can also perform operations on TDengine data by executing SQL commands.
5. You can also operate on TDengine data by executing SQL commands.
<figure>
<Image img={imgStep05} alt=""/>

View File

@ -1,6 +1,5 @@
---
title: qStudio
description: A detailed guide to accessing TDengine data using qStudio
slug: /third-party-tools/management/qstudio
---
@ -12,48 +11,48 @@ import imgStep04 from '../../assets/qstudio-04.png';
import imgStep05 from '../../assets/qstudio-05.png';
import imgStep06 from '../../assets/qstudio-06.png';
qStudio is a free multi-platform SQL data analysis tool that allows users to easily browse tables, variables, functions, and configuration settings in a database. The latest version of qStudio has built-in support for TDengine.
qStudio is a free multi-platform SQL data analysis tool that allows easy browsing of tables, variables, functions, and configuration settings in databases. The latest version of qStudio has built-in support for TDengine.
## Prerequisites
To connect qStudio to TDengine, the following preparations are necessary:
Using qStudio to connect to TDengine requires the following preparations.
- Install qStudio. qStudio supports major operating systems, including Windows, macOS, and Linux. Please ensure to [download](https://www.timestored.com/qstudio/download/) the installation package for the correct platform.
- Install a TDengine instance. Ensure that TDengine is running normally and that the taosAdapter is installed and functioning correctly. For specific details, refer to the [taosAdapter user manual](../../../tdengine-reference/components/taosadapter/).
- Install qStudio. qStudio supports mainstream operating systems including Windows, macOS, and Linux. Please make sure to [download](https://www.timestored.com/qstudio/download/) the correct platform package.
- Install a TDengine instance, ensure that TDengine is running properly, and that taosAdapter is installed and running smoothly. For more details, please refer to [the taosAdapter user manual](../../../tdengine-reference/components/taosadapter).
## Using qStudio to Connect to TDengine
1. Launch the qStudio application, select “Server” from the menu, and then choose “Add Server...”. In the Server Type dropdown, select TDengine.
1. Launch the qStudio application, select "Server" and "Add Server..." from the menu items, then choose TDengine from the Server Type dropdown.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
2. Configure the TDengine connection by entering the host address, port number, username, and password. If TDengine is deployed on the local machine, you can simply enter the username and password; the default username is root, and the default password is taosdata. Click “Test” to check the availability of the connection. If the TDengine Java connector is not installed on the local machine, qStudio will prompt you to download it.
2. Configure the TDengine connection by entering the host address, port number, username, and password. If TDengine is deployed on the local machine, you can just enter the username and password, with the default username being root and the default password being taosdata. Click "Test" to check if the connection is available. If the TDengine Java connector is not installed on the local machine, qStudio will prompt to download and install it.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
3. If the connection is successful, it will display as shown in the image below. If the connection fails, please check whether the TDengine service and taosAdapter are running correctly and verify the host address, port number, username, and password.
3. A successful connection will be displayed as shown below. If the connection fails, please check whether the TDengine service and taosAdapter are running correctly, and whether the host address, port number, username, and password are correct.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
4. Use qStudio to select databases and tables to browse the data from the TDengine service.
4. Using qStudio to select databases and tables allows you to browse data from the TDengine service.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
5. You can also perform operations on TDengine data by executing SQL commands.
5. You can also operate on TDengine data by executing SQL commands.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
6. qStudio supports features such as charting data. Please refer to the [qStudio help documentation](https://www.timestored.com/qstudio/help).
6. qStudio supports features like charting based on data, please refer to [qStudio's help documentation](https://www.timestored.com/qstudio/help)
<figure>
<Image img={imgStep06} alt=""/>

View File

@ -3,9 +3,9 @@ title: Third-Party Tools
slug: /third-party-tools
---
TDengine's support for standard SQL commands, common database connector standards (such as JDBC), ORM, and other popular time-series database writing protocols (such as InfluxDB Line Protocol, OpenTSDB JSON, OpenTSDB Telnet, etc.) makes it very easy to use TDengine with third-party tools.
TDengine supports standard SQL commands, common database connector standards (such as JDBC), ORMs, and other popular time-series database write protocols (such as InfluxDB Line Protocol, OpenTSDB JSON, OpenTSDB Telnet, etc.), making it very easy to use TDengine with third-party tools.
For supported third-party tools, no coding is required; you simply need to perform some basic configurations to seamlessly integrate TDengine with these tools.
For supported third-party tools, no code is needed; you only need to do simple configuration to seamlessly integrate TDengine with these tools.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -4,465 +4,558 @@ sidebar_label: taosd
slug: /tdengine-reference/components/taosd
---
taosd is the core service of the TDengine database engine, with its configuration file defaulting to `/etc/taos/taos.cfg`, though it can also be specified at a different path. This section provides a detailed introduction to the command line parameters and configuration parameters within the taosd configuration file.
taosd is the core service of the TDengine database engine, and its configuration file is by default located at `/etc/taos/taos.cfg`, but you can also specify a configuration file in a different path. This section provides a detailed introduction to the command-line parameters of taosd and the configuration parameters in the configuration file.
## Command Line Parameters
The command line parameters for taosd are as follows:
- -a `<json file>`: Specifies a JSON file that contains various configuration parameters for the service at startup, formatted like `{"fqdn":"td1"}`. For details on configuration parameters, please refer to the next section.
- -a `<json file>`: Specifies a JSON file containing various configuration parameters for service startup, formatted like `{"fqdn":"td1"}`. For details on configuration parameters, please refer to the next section.
- -c `<directory>`: Specifies the directory where the configuration file is located.
- -s: Prints SDB information.
- -C: Prints configuration information.
- -e: Specifies an environment variable, formatted like `-e 'TAOS_FQDN=td1'`.
- -k: Gets the machine code.
- -e: Specifies environment variables, formatted like `-e 'TAOS_FQDN=td1'`.
- -k: Retrieves the machine code.
- -dm: Enables memory scheduling.
- -V: Prints version information.
## Configuration Parameters
:::note
After modifying configuration file parameters, it is necessary to restart the *taosd* service or the client application for changes to take effect.
After modifying configuration file parameters, you need to restart the *taosd* service or client application for the changes to take effect.
:::
### Connection Related
| Parameter Name | support version | Parameter Description |
| :--------------------- |:---------------| :----------------------------------------------------------- |
| firstEp | | The endpoint of the first dnode in the cluster to connect to when taosd starts; default value: localhost:6030 |
| secondEp | | If firstEp cannot connect, attempt to connect to the second dnode's endpoint in the cluster; default value: none |
| fqdn | | The service address that taosd listens on after startup; default value: the first hostname configured on the server |
| compressMsgSize | | Whether to compress RPC messages; -1: no messages are compressed; 0: all messages are compressed; N (N>0): only messages larger than N bytes are compressed; default value: -1 |
| shellActivityTimer | | The duration in seconds for the client to send heartbeats to the mnode; range: 1-120; default value: 3 |
| numOfRpcSessions | | The maximum number of RPC connections supported; range: 100-100000; default value: 30000 |
| numOfRpcThreads | | The number of threads for RPC data transmission; range: 1-50, default value: half of the CPU cores |
| numOfTaskQueueThreads | | The number of threads for the client to process RPC messages, range: 4-16, default value: half of the CPU cores |
| rpcQueueMemoryAllowed | | The maximum amount of memory allowed for RPC messages received on a dnode; unit: bytes; range: 104857600-INT64_MAX; default value: 1/10 of server memory |
| resolveFQDNRetryTime | Removed in 3.x | The number of retries when FQDN resolution fails |
| timeToGetAvailableConn | Removed in 3.3.4.x | The maximum waiting time to obtain an available connection; range: 10-50000000; unit: milliseconds; default value: 500000 |
| maxShellConns | Removed in 3.x | The maximum number of connections allowed to be created |
| maxRetryWaitTime | | The maximum timeout for reconnection; default value: 10s |
| shareConnLimit | Added in 3.3.4.0 | The number of requests that a connection can share; range: 1-512; default value: 10 |
| readTimeout | Added in 3.3.4.0 | The minimum timeout for a single request; range: 64-604800; unit: seconds; default value: 900 |
|Parameter Name |Supported Version |Description|
|-----------------------|-------------------------|------------|
|firstEp | |Endpoint of the first dnode in the cluster that taosd actively connects to at startup, default value localhost:6030|
|secondEp | |Endpoint of the second dnode in the cluster that taosd tries to connect to if the firstEp is unreachable, no default value|
|fqdn | |The service address that taosd listens on, default is the first hostname configured on the server|
|serverPort | |The port that taosd listens on, default value 6030|
|compressMsgSize | |Whether to compress RPC messages; -1: do not compress any messages; 0: compress all messages; N (N>0): only compress messages larger than N bytes; default value -1|
|shellActivityTimer | |Duration in seconds for the client to send heartbeat to mnode, range 1-120, default value 3 |
|numOfRpcSessions | |Maximum number of connections supported by RPC, range 100-100000, default value 30000|
|numOfRpcThreads | |Number of threads for receiving and sending RPC data, range 1-50, default value is half of the CPU cores|
|numOfTaskQueueThreads | |Number of threads for client to process RPC messages, range 4-16, default value is half of the CPU cores|
|rpcQueueMemoryAllowed | |Maximum memory allowed for received RPC messages in dnode, in bytes, range 104857600-INT64_MAX, default value is 1/10 of server memory |
|resolveFQDNRetryTime | Cancelled after 3.x |Number of retries when FQDN resolution fails|
|timeToGetAvailableConn | Cancelled after 3.3.4.x |Maximum waiting time to get an available connection, range 10-50000000, in milliseconds, default value 500000|
|maxShellConns | Cancelled after 3.x |Maximum number of connections allowed|
|maxRetryWaitTime | |Maximum timeout for reconnection, default value is 10s|
|shareConnLimit |Added in 3.3.4.0 |Number of requests a connection can share, range 1-512, default value 10|
|readTimeout |Added in 3.3.4.0 |Minimum timeout for a single request, range 64-604800, in seconds, default value 900|
### Monitoring Related
| Parameter Name | Parameter Description |
| :----------------- | :----------------------------------------------------------- |
| monitor | Whether to collect monitoring data and report it; 0: off; 1: on; default value: 0 |
| monitorFqdn | The FQDN of the server where the taosKeeper service is located; default value: none |
| monitorPort | The port number the taosKeeper service listens on; default value: 6043 |
| monitorInternal | The time interval for recording system parameters (CPU/memory) in the monitoring database; unit: seconds; range: 1-200000; default value: 30 |
| telemetryReporting | Whether to upload telemetry; 0: do not upload; 1: upload; default value: 1 |
| crashReporting | Whether to upload crash information; 0: do not upload; 1: upload; default value: 1 |
|Parameter Name|Supported Version|Description|
|-----------------------|----------|-|
|monitor | |Whether to collect and report monitoring data, 0: off; 1: on; default value 0|
|monitorFqdn | |The FQDN of the server where the taosKeeper service is located, default value none|
|monitorPort | |The port number listened to by the taosKeeper service, default value 6043|
|monitorInterval | |The time interval for recording system parameters (CPU/memory) in the monitoring database, in seconds, range 1-200000, default value 30|
|monitorMaxLogs | |Number of cached logs pending report|
|monitorComp | |Whether to use compression when reporting monitoring logs|
|monitorLogProtocol | |Whether to print monitoring logs|
|monitorForceV2 | |Whether to use V2 protocol for reporting|
|telemetryReporting | |Whether to upload telemetry, 0: do not upload, 1: upload, default value 1|
|telemetryServer | |Telemetry server address|
|telemetryPort | |Telemetry server port number|
|telemetryInterval | |Telemetry upload interval, in seconds, default 43200|
|crashReporting | |Whether to upload crash information; 0: do not upload, 1: upload; default value 1|
### Query Related
| Parameter Name | Parameter Description |
| :--------------------- | :----------------------------------------------------------- |
| queryPolicy | Query policy; 1: only use vnode, do not use qnode; 2: sub-tasks without scanning operators execute on qnode, those with scanning operators execute on vnode; 3: vnode only runs scanning operators, other operators run on qnode; 4: use client aggregation mode; default value: 1 |
| maxNumOfDistinctRes | The maximum number of distinct results allowed to be returned; default value: 100,000; maximum allowed value: 100 million |
| countAlwaysReturnValue | Whether the count/hyperloglog function returns a value when the input data is empty or NULL; 0: returns empty row; 1: returns; when set to 1, if the query contains an INTERVAL clause or uses TSMA, and the corresponding group or window has no data or is NULL, that group or window will not return query results. Note that this parameter should have consistent values on both client and server. |
|Parameter Name|Supported Version|Description|
|------------------------|----------|-|
|countAlwaysReturnValue | |Whether count/hyperloglog functions return a value when input data is empty or NULL; 0: return empty row, 1: return; default value 1; When this parameter is set to 1, if the query contains an INTERVAL clause or the query uses TSMA, and the corresponding group or window has empty or NULL data, the corresponding group or window will not return a query result; Note that this parameter should be consistent between client and server|
|tagFilterCache | |Whether to cache tag filter results|
|maxNumOfDistinctRes | |Maximum number of distinct results allowed to return, default value 100,000, maximum allowed value 100 million|
|queryBufferSize | |Not effective yet|
|queryRspPolicy | |Query response strategy|
|filterScalarMode | |Force scalar filter mode, 0: off; 1: on, default value 0|
|queryPlannerTrace | |Internal parameter, whether the query plan outputs detailed logs|
|queryNodeChunkSize | |Internal parameter, chunk size of the query plan|
|queryUseNodeAllocator | |Internal parameter, allocation method of the query plan|
|queryMaxConcurrentTables| |Internal parameter, concurrency number of the query plan|
|queryRsmaTolerance | |Internal parameter, tolerance time for determining which level of rsma data to query, in milliseconds|
|enableQueryHb | |Internal parameter, whether to send query heartbeat messages|
|pqSortMemThreshold | |Internal parameter, memory threshold for sorting|
### Regional Related
### Region Related
| Parameter Name | Parameter Description |
| :------------- | :----------------------------------------------------------: |
| timezone | The timezone; default value: the timezone configured on the current server |
| locale | System locale information and encoding format; default value: dynamically retrieved from the system; if auto-retrieval fails, the user needs to set it in the configuration file or through the API |
| charset | Character set encoding; default value: automatically retrieved from the system |
|Parameter Name|Supported Version|Description|
|-----------------|----------|-|
|timezone | |Time zone; defaults to dynamically obtaining the current time zone setting from the system|
|locale | |System locale information and encoding format, defaults to obtaining from the system|
|charset | |Character set encoding, defaults to obtaining from the system|
:::info
1. To address the issue of writing and querying data across multiple time zones, TDengine uses Unix timestamps (Unix Timestamp) to record and store timestamps. The characteristic of Unix timestamps ensures that at any given moment, regardless of time zone, the generated timestamps are consistent. Note that Unix timestamps are converted and recorded on the client side. To ensure other forms of time conversions on the client are correctly converted to Unix timestamps, the correct timezone needs to be set.
1. To address the issue of data writing and querying across multiple time zones, TDengine uses Unix Timestamps to record and store timestamps. The nature of Unix Timestamps ensures that the timestamps generated are consistent at any given moment across any time zone. It is important to note that the conversion to Unix Timestamps is done on the client side. To ensure that other forms of time on the client are correctly converted to Unix Timestamps, it is necessary to set the correct time zone.
In Linux/macOS, the client will automatically read the timezone information set by the system. Users can also set the timezone in the configuration file in various ways. For example:
On Linux/macOS, the client automatically reads the time zone information set by the system. Users can also set the time zone in the configuration file in various ways. For example:
```text
timezone UTC-8
timezone GMT-8
timezone Asia/Shanghai
```
```text
timezone UTC-8
timezone GMT-8
timezone Asia/Shanghai
```
All of these are valid formats for setting the timezone to UTC+8. However, note that this format (timezone Asia/Shanghai) is not supported under Windows; it must be written as timezone UTC-8.
All are valid settings for the GMT+8 time zone. However, note that on Windows, the format `timezone Asia/Shanghai` is not supported, and must be written as `timezone UTC-8`.
The timezone setting affects the contents of SQL statements for queries and writes that are not in Unix timestamp format (e.g., timestamp strings, the interpretation of the keyword now). For example:
The setting of the time zone affects the querying and writing of SQL statements involving non-Unix timestamp content (timestamp strings, interpretation of the keyword now). For example:
```sql
SELECT count(*) FROM table_name WHERE TS<'2019-04-11 12:01:08';
```
```sql
SELECT count(*) FROM table_name WHERE TS<'2019-04-11 12:01:08';
```
In UTC+8, the SQL statement is equivalent to:
In GMT+8, the SQL statement is equivalent to:
```sql
SELECT count(*) FROM table_name WHERE TS<1554955268000;
```
```sql
SELECT count(*) FROM table_name WHERE TS<1554955268000;
```
In UTC, the SQL statement is equivalent to:
In the UTC time zone, the SQL statement is equivalent to:
```sql
SELECT count(*) FROM table_name WHERE TS<1554984068000;
```
```sql
SELECT count(*) FROM table_name WHERE TS<1554984068000;
```
To avoid uncertainty when using string time formats, Unix timestamps can be used directly. Additionally, timestamp strings with time zones can also be used in SQL statements, such as RFC3339 format timestamp strings (e.g., 2013-04-12T15:52:01.123+08:00) or ISO-8601 format timestamp strings (e.g., 2013-04-12T15:52:01.123+0800). The conversion of these two strings to Unix timestamps is not affected by the system's time zone.
To avoid the uncertainties brought by using string time formats, you can also directly use Unix Timestamps. Additionally, you can use timestamp strings with time zones in SQL statements, such as RFC3339 formatted timestamp strings, 2013-04-12T15:52:01.123+08:00 or ISO-8601 formatted timestamp strings 2013-04-12T15:52:01.123+0800. The conversion of these two strings to Unix Timestamps is not affected by the system's local time zone.
2. TDengine provides a special field type `nchar` for storing wide characters such as Chinese, Japanese, and Korean that are not ASCII encoded. Data written to `nchar` fields will be encoded in UCS4-LE format and sent to the server. It is important to note that the correctness of the encoding is guaranteed by the client. Therefore, if users want to use `nchar` fields to store non-ASCII characters like Chinese, Japanese, or Korean, the encoding format of the client must be correctly set.
2. TDengine provides a special field type, nchar, for storing wide characters in non-ASCII encodings such as Chinese, Japanese, and Korean. Data written to the nchar field is uniformly encoded in UCS4-LE format and sent to the server. It is important to note that the correctness of the encoding is ensured by the client. Therefore, if users want to properly use the nchar field to store non-ASCII characters such as Chinese, Japanese, and Korean, they need to correctly set the client's encoding format.
Characters input by the client use the current default encoding format of the operating system, which is usually UTF-8 on Linux/macOS systems. Some Chinese systems may use GB18030 or GBK, etc. In Docker environments, the default encoding is POSIX. In Chinese versions of Windows, the encoding is CP936. Clients must ensure that the character set they are using is correctly set to guarantee that data in `nchar` is accurately converted to UCS4-LE format.
The characters input by the client use the current default encoding format of the operating system, which is often UTF-8 on Linux/macOS systems, but may be GB18030 or GBK on some Chinese systems. The default encoding in a Docker environment is POSIX. In Chinese version Windows systems, the encoding is CP936. The client needs to ensure the correct setting of the character set they are using, i.e., the current encoding character set of the operating system on which the client is running, to ensure that the data in nchar is correctly converted to UCS4-LE encoding format.
In Linux/macOS, the naming convention for locales is: `<language>_<region>.<charset encoding>`, for example: zh_CN.UTF-8, where "zh" stands for Chinese, "CN" represents mainland China, and "UTF-8" denotes the character set. The character set encoding provides guidance for the client to correctly parse local strings. In Linux/macOS, the locale can be set to determine the system's character encoding; because Windows uses a locale format that is not POSIX compliant, it requires another configuration parameter `charset` to specify the character encoding. The `charset` parameter can also be used in Linux/macOS to specify the character encoding.
In Linux/macOS, the naming rule for locale is: \<language>_\<region>.\<character set encoding> such as: zh_CN.UTF-8, where zh represents Chinese, CN represents mainland China, and UTF-8 represents the character set. The character set encoding provides instructions for the client to correctly parse local strings. Linux/macOS can set the system's character encoding by setting the locale, but since Windows uses a locale format that is not POSIX standard, another configuration parameter charset is used to specify the character encoding in Windows. Charset can also be used in Linux/macOS to specify the character encoding.
3. If the configuration file does not specify `charset`, in Linux/macOS, taos will automatically read the system's current locale information at startup and extract the charset encoding format from it. If the automatic reading of locale information fails, it will attempt to read the charset configuration. If reading the charset configuration also fails, the startup process will be interrupted.
3. If charset is not set in the configuration file, in Linux/macOS, taos automatically reads the current locale information of the system at startup, and extracts the charset encoding format from the locale information. If it fails to automatically read the locale information, it attempts to read the charset configuration, and if reading the charset configuration also fails, it interrupts the startup process.
In Linux/macOS, the locale information includes character encoding information, so if the locale of Linux/macOS is set correctly, there is no need to set `charset` separately. For example:
In Linux/macOS, the locale information includes character encoding information, so after correctly setting the locale in Linux/macOS, there is no need to set charset separately. For example:
```text
locale zh_CN.UTF-8
```
```text
locale zh_CN.UTF-8
```
In Windows, the system cannot retrieve the current encoding from the locale. If the encoding information cannot be retrieved from the configuration file, taos defaults to CP936 as the character encoding. This is equivalent to adding the following configuration to the configuration file:
In Windows systems, it is not possible to obtain the current encoding from the locale. If it is not possible to read the string encoding information from the configuration file, taos defaults to setting the character encoding to CP936. This is equivalent to adding the following configuration in the configuration file:
```text
charset CP936
```
```text
charset CP936
```
If you need to adjust the character encoding, please refer to the current operating system's encoding and set it correctly in the configuration file.
If you need to adjust the character encoding, please consult the encoding used by the current operating system and set it correctly in the configuration file.
In Linux/macOS, if the user sets both `locale` and `charset` and they are inconsistent, the latter will override the former.
In Linux/macOS, if the user sets both locale and charset encoding, and if the locale and charset are inconsistent, the latter setting will override the earlier one.
```text
locale zh_CN.UTF-8
charset GBK
```
```text
locale zh_CN.UTF-8
charset GBK
```
The effective value of `charset` is GBK.
Then the effective value of charset is GBK.
```text
charset GBK
locale zh_CN.UTF-8
```
```text
charset GBK
locale zh_CN.UTF-8
```
The effective value of `charset` is UTF-8.
The effective value of charset is UTF-8.
:::
### Storage Related
| Parameter Name | Parameter Description |
| :--------------- | :----------------------------------------------------------: |
| dataDir | Directory for data files; all data files will be written to this directory; default value: /var/lib/taos |
| tempDir | Specifies the directory for temporary files generated during system operation; default value: /tmp |
| minimalTmpDirGB | Minimum space required to be preserved in the tempDir specified; unit: GB; default value: 1 |
| minimalDataDirGB | Minimum space required to be preserved in the dataDir specified; unit: GB; default value: 2 |
|Parameter Name|Supported Version|Description|
|--------------------|----------|-|
|dataDir | |Directory for data files, all data files are written to this directory, default value /var/lib/taos|
|tempDir | |Specifies the directory for generating temporary files during system operation, default value /tmp|
|minimalDataDirGB | |Minimum space to be reserved in the time-series data storage directory specified by dataDir, in GB, default value 2|
|minimalTmpDirGB | |Minimum space to be reserved in the temporary file directory specified by tempDir, in GB, default value 1|
|minDiskFreeSize |After 3.1.1.0|When the available space on a disk is less than or equal to this threshold, the disk will no longer be selected for generating new data files, unit is bytes, range 52428800-1073741824, default value 52428800; Enterprise parameter|
|s3MigrateIntervalSec|After 3.3.4.3|Trigger cycle for automatic upload of local data files to S3, in seconds. Minimum: 600; Maximum: 100000. Default value 3600; Enterprise parameter|
|s3MigrateEnabled |After 3.3.4.3|Whether to automatically perform S3 migration, default value is 0, which means auto S3 migration is off, can be set to 1; Enterprise parameter|
|s3Accesskey |After 3.3.4.3|Colon-separated user SecretId:SecretKey, for example AKIDsQmwsfKxTo2A6nGVXZN0UlofKn6JRRSJ:lIdoy99ygEacU7iHfogaN2Xq0yumSm1E; Enterprise parameter|
|s3Endpoint |After 3.3.4.3|COS service domain name in the user's region, supports http and https, the region of the bucket must match the endpoint, otherwise it cannot be accessed; Enterprise parameter|
|s3BucketName |After 3.3.4.3|Bucket name, followed by a hyphen and the AppId of the user registered COS service, where AppId is unique to COS, not present in AWS and Alibaba Cloud, needs to be part of the bucket name, separated by a hyphen; parameter values are string type, but do not need quotes; for example test0711-1309024725; Enterprise parameter|
|s3PageCacheSize |After 3.3.4.3|Number of S3 page cache pages, range 4-1048576, unit is pages, default value 4096; Enterprise parameter|
|s3UploadDelaySec |After 3.3.4.3|How long a data file remains unchanged before being uploaded to S3, range 1-2592000 (30 days), in seconds, default value 60; Enterprise parameter|
|cacheLazyLoadThreshold| |Internal parameter, cache loading strategy|
### Cluster Related
| Parameter Name | Parameter Description |
| :------------- | :----------------------------------------------------------- |
| supportVnodes | Maximum number of vnodes supported by dnode; range: 0-4096; default value: 2 times the number of CPU cores + 5 |
|Parameter Name|Supported Version|Description|
|--------------------------|----------|-|
|supportVnodes | |Maximum number of vnodes supported by a dnode, range 0-4096, default value is twice the number of CPU cores + 5|
|numOfCommitThreads | |Maximum number of commit threads, range 0-1024, default value 4|
|numOfMnodeReadThreads | |Number of Read threads for mnode, range 0-1024, default value is one quarter of the CPU cores (not exceeding 4)|
|numOfVnodeQueryThreads | |Number of Query threads for vnode, range 0-1024, default value is twice the number of CPU cores (not exceeding 16)|
|numOfVnodeFetchThreads | |Number of Fetch threads for vnode, range 0-1024, default value is one quarter of the CPU cores (not exceeding 4)|
|numOfVnodeRsmaThreads | |Number of Rsma threads for vnode, range 0-1024, default value is one quarter of the CPU cores (not exceeding 4)|
|numOfQnodeQueryThreads | |Number of Query threads for qnode, range 0-1024, default value is twice the number of CPU cores (not exceeding 16)|
|numOfSnodeSharedThreads | |Number of shared threads for snode, range 0-1024, default value is one quarter of the CPU cores (not less than 2, not exceeding 4)|
|numOfSnodeUniqueThreads | |Number of exclusive threads for snode, range 0-1024, default value is one quarter of the CPU cores (not less than 2, not exceeding 4)|
|ratioOfVnodeStreamThreads | |Ratio of stream computing using vnode threads, range 0.01-4, default value 4|
|ttlUnit | |Unit for ttl parameter, range 1-31572500, in seconds, default value 86400|
|ttlPushInterval | |Frequency of ttl timeout checks, range 1-100000, in seconds, default value 10|
|ttlChangeOnWrite | |Whether ttl expiration time changes with table modification; 0: no change, 1: change; default value 0|
|ttlBatchDropNum | |Number of subtables deleted in a batch for ttl, minimum value 0, default value 10000|
|retentionSpeedLimitMB | |Speed limit for data migration across different levels of disks, range 0-1024, in MB, default value 0, which means no limit|
|maxTsmaNum | |Maximum number of TSMAs that can be created in the cluster; range 0-3; default value 3|
|tmqMaxTopicNum | |Maximum number of topics that can be established for subscription; range 1-10000; default value 20|
|tmqRowSize | |Maximum number of records in a subscription data block, range 1-1000000, default value 4096|
|audit | |Audit feature switch; Enterprise parameter|
|auditInterval | |Time interval for reporting audit data; Enterprise parameter|
|auditCreateTable | |Whether to enable audit feature for creating subtables; Enterprise parameter|
|encryptAlgorithm | |Data encryption algorithm; Enterprise parameter|
|encryptScope | |Encryption scope; Enterprise parameter|
|enableWhiteList | |Switch for whitelist feature; Enterprise parameter|
|syncLogBufferMemoryAllowed| |Maximum memory allowed for sync log cache messages for a dnode, in bytes, range 104857600-INT64_MAX, default value is 1/10 of server memory, effective from versions 3.1.3.2/3.3.2.13|
|syncElectInterval | |Internal parameter, for debugging synchronization module|
|syncHeartbeatInterval | |Internal parameter, for debugging synchronization module|
|syncHeartbeatTimeout | |Internal parameter, for debugging synchronization module|
|syncSnapReplMaxWaitN | |Internal parameter, for debugging synchronization module|
|syncSnapReplMaxWaitN | |Internal parameter, for debugging synchronization module|
|arbHeartBeatIntervalSec | |Internal parameter, for debugging synchronization module|
|arbCheckSyncIntervalSec | |Internal parameter, for debugging synchronization module|
|arbSetAssignedTimeoutSec | |Internal parameter, for debugging synchronization module|
|mndSdbWriteDelta | |Internal parameter, for debugging mnode module|
|mndLogRetention | |Internal parameter, for debugging mnode module|
|skipGrant | |Internal parameter, for authorization checks|
|trimVDbIntervalSec | |Internal parameter, for deleting expired data|
|ttlFlushThreshold | |Internal parameter, frequency of ttl timer|
|compactPullupInterval | |Internal parameter, frequency of data reorganization timer|
|walFsyncDataSizeLimit | |Internal parameter, threshold for WAL to perform FSYNC|
|transPullupInterval | |Internal parameter, retry interval for mnode to execute transactions|
|mqRebalanceInterval | |Internal parameter, interval for consumer rebalancing|
|uptimeInterval | |Internal parameter, for recording system uptime|
|timeseriesThreshold | |Internal parameter, for usage statistics|
|udf | |Whether to start UDF service; 0: do not start, 1: start; default value 0 |
|udfdResFuncs | |Internal parameter, for setting UDF result sets|
|udfdLdLibPath | |Internal parameter, indicates the library path for loading UDF|
### Memory Related
### Stream Computing Parameters
| Parameter Name | Parameter Description |
| :------------------------- | :----------------------------------------------------------- |
| rpcQueueMemoryAllowed | Maximum amount of memory allowed for rpc messages on a dnode; unit: bytes; range: 10485760-INT64_MAX; default value: 1/10 of server memory |
| syncLogBufferMemoryAllowed | Maximum amount of memory allowed for sync log buffer messages on a dnode; unit: bytes; range: 10485760-INT64_MAX; default value: 1/10 of server memory; effective from versions 3.1.3.2/3.3.2.13 |
### Performance Tuning
| Parameter Name | Parameter Description |
| :----------------- | :----------------------------------------------------------- |
| numOfCommitThreads | Maximum number of write threads; range: 0-1024; default value: 4 |
| Parameter Name | Supported Version | Description |
|-----------------------|----------|-|
| disableStream | | Switch to enable or disable stream computing |
| streamBufferSize | | Controls the size of the window state cache in memory, default value is 128MB |
| streamAggCnt | | Internal parameter, number of concurrent aggregation computations |
| checkpointInterval | | Internal parameter, checkpoint synchronization interval |
| concurrentCheckpoint | | Internal parameter, whether to check checkpoints concurrently |
| maxStreamBackendCache | | Internal parameter, maximum cache used by stream computing |
| streamSinkDataRate | | Internal parameter, used to control the write speed of stream computing results |
### Log Related
| Parameter Name | Parameter Description |
| :--------------- | :----------------------------------------------------------- |
| logDir | Directory for log files; runtime logs will be written to this directory; default value: /var/log/taos |
| minimalLogDirGB | When the available disk space of the log folder is less than this value, logging will stop; unit: GB; default value: 1 |
| numOfLogLines | Maximum number of lines allowed in a single log file; default value: 10,000,000 |
| asyncLog | Log writing mode; 0: synchronous, 1: asynchronous; default value: 1 |
| logKeepDays | Maximum retention time for log files; unit: days; default value: 0, meaning unlimited retention; log files will not be renamed, and no new log file rolling will occur, but the contents of the log file may continue to roll depending on the size settings; when set to a value greater than 0, if the log file size reaches the set limit, it will be renamed to taosdlog.xxx, where xxx is the timestamp of the last modification of the log file, and a new log file will be rolled |
| slowLogThreshold | Slow query threshold; if it exceeds or equals the threshold, it is considered a slow query; unit: seconds; default value: 3 |
| slowLogScope | Determines which types of slow queries to log; optional values: ALL, QUERY, INSERT, OTHERS, NONE; default value: ALL |
| debugFlag | Run log switch; 131 (outputs error and warning logs), 135 (outputs error, warning, and debug logs), 143 (outputs error, warning, debug, and trace logs); default value: 131 or 135 (depending on different modules) |
| tmrDebugFlag | Timer module log switch; same value range as above |
| uDebugFlag | Shared function module log switch; same value range as above |
| rpcDebugFlag | RPC module log switch; same value range as above |
| jniDebugFlag | JNI module log switch; same value range as above |
| qDebugFlag | Query module log switch; same value range as above |
| dDebugFlag | Dnode module log switch; same value range as above; default value 135 |
| vDebugFlag | Vnode module log switch; same value range as above |
| mDebugFlag | Mnode module log switch; same value range as above |
| wDebugFlag | WAL module log switch; same value range as above |
| sDebugFlag | Sync module log switch; same value range as above |
| tsdbDebugFlag | TSDB module log switch; same value range as above |
| tqDebugFlag | TQ module log switch; same value range as above |
| fsDebugFlag | FS module log switch; same value range as above |
| udfDebugFlag | UDF module log switch; same value range as above |
| smaDebugFlag | SMA module log switch; same value range as above |
| idxDebugFlag | Index module log switch; same value range as above |
| tdbDebugFlag | TDB module log switch; same value range as above |
| Parameter Name | Supported Version | Description |
|----------------|----------|-|
| logDir | | Log file directory, operational logs will be written to this directory, default value /var/log/taos |
| minimalLogDirGB | | Stops writing logs when the available space on the disk where the log folder is located is less than this value, unit GB, default value 1 |
| numOfLogLines | | Maximum number of lines allowed in a single log file, default value 10,000,000 |
| asyncLog | | Log writing mode, 0: synchronous, 1: asynchronous, default value 1 |
| logKeepDays | | Maximum retention time for log files, unit: days, default value 0, which means unlimited retention, log files will not be renamed, nor will new log files be rolled out, but the content of the log files may continue to roll depending on the log file size setting; when set to a value greater than 0, when the log file size reaches the set limit, it will be renamed to taosdlog.yyy, where yyy is the timestamp of the last modification of the log file, and a new log file will be rolled out |
| slowLogThreshold| 3.3.3.0 onwards | Slow query threshold, queries taking longer than or equal to this threshold are considered slow, unit seconds, default value 3 |
| slowLogMaxLen | 3.3.3.0 onwards | Maximum length of slow query logs, range 1-16384, default value 4096 |
| slowLogScope | 3.3.3.0 onwards | Type of slow query records, range ALL/QUERY/INSERT/OTHERS/NONE, default value QUERY |
| slowLogExceptDb | 3.3.3.0 onwards | Specifies the database that does not report slow queries, only supports configuring one database |
| debugFlag | | Log switch for running logs, 131 (outputs error and warning logs), 135 (outputs error, warning, and debug logs), 143 (outputs error, warning, debug, and trace logs); default value 131 or 135 (depending on the module) |
| tmrDebugFlag | | Log switch for the timer module, range as above |
| uDebugFlag | | Log switch for the utility module, range as above |
| rpcDebugFlag | | Log switch for the rpc module, range as above |
| qDebugFlag | | Log switch for the query module, range as above |
| dDebugFlag | | Log switch for the dnode module, range as above |
| vDebugFlag | | Log switch for the vnode module, range as above |
| mDebugFlag | | Log switch for the mnode module, range as above |
| azDebugFlag | 3.3.4.3 onwards | Log switch for the S3 module, range as above |
| sDebugFlag | | Log switch for the sync module, range as above |
| tsdbDebugFlag | | Log switch for the tsdb module, range as above |
| tqDebugFlag | | Log switch for the tq module, range as above |
| fsDebugFlag | | Log switch for the fs module, range as above |
| udfDebugFlag | | Log switch for the udf module, range as above |
| smaDebugFlag | | Log switch for the sma module, range as above |
| idxDebugFlag | | Log switch for the index module, range as above |
| tdbDebugFlag | | Log switch for the tdb module, range as above |
| metaDebugFlag | | Log switch for the meta module, range as above |
| stDebugFlag | | Log switch for the stream module, range as above |
| sndDebugFlag | | Log switch for the snode module, range as above |
### Debugging Related
| Parameter Name | Supported Version | Description |
|----------------------|-------------------|-------------|
| enableCoreFile | | Whether to generate a core file when crashing, 0: do not generate, 1: generate; default value is 1 |
| configDir | | Directory where the configuration files are located |
| scriptDir | | Directory for internal test tool scripts |
| assert | | Assertion control switch, default value is 0 |
| randErrorChance | | Internal parameter, used for random failure testing |
| randErrorDivisor | | Internal parameter, used for random failure testing |
| randErrorScope | | Internal parameter, used for random failure testing |
| safetyCheckLevel | | Internal parameter, used for random failure testing |
| experimental | | Internal parameter, used for some experimental features |
| simdEnable | After 3.3.4.3 | Internal parameter, used for testing SIMD acceleration |
| AVX512Enable | After 3.3.4.3 | Internal parameter, used for testing AVX512 acceleration |
| rsyncPort | | Internal parameter, used for debugging stream computing |
| snodeAddress | | Internal parameter, used for debugging stream computing |
| checkpointBackupDir | | Internal parameter, used for restoring snode data |
| enableAuditDelete | | Internal parameter, used for testing audit functions |
| slowLogThresholdTest | | Internal parameter, used for testing slow logs |
### Compression Parameters
| Parameter Name | Parameter Description |
| :-------------: | :----------------------------------------------------------- |
| compressMsgSize | Whether to compress RPC messages; -1: no messages are compressed; 0: all messages are compressed; N (N>0): only messages larger than N bytes are compressed; default value: -1 |
| fPrecision | Sets the compression precision for float type floating-point numbers; range: 0.1 ~ 0.00000001; default value: 0.00000001; float numbers smaller than this value will have their tail truncated |
| dPrecision | Sets the compression precision for double type floating-point numbers; range: 0.1 ~ 0.0000000000000001; default value: 0.0000000000000001; double numbers smaller than this value will have their tail truncated |
| lossyColumn | Enables TSZ lossy compression for float and/or double types; range: float, double, none; default value: none; indicates that lossy compression is turned off. **Note: This parameter is no longer used in versions 3.3.0.0 and higher.** |
| ifAdtFse | When TSZ lossy compression is enabled, use the FSE algorithm instead of the HUFFMAN algorithm. The FSE algorithm compresses faster but decompresses slightly slower. It can be selected if compression speed is a priority; 0: off, 1: on; default value: 0 |
| Parameter Name | Supported Version | Description |
|----------------|-------------------|-------------|
| fPrecision | | Sets the compression precision for float type floating numbers, range 0.1 ~ 0.00000001, default value 0.00000001, floating numbers smaller than this value will have their mantissa truncated |
| dPrecision | | Sets the compression precision for double type floating numbers, range 0.1 ~ 0.0000000000000001, default value 0.0000000000000001, floating numbers smaller than this value will have their mantissa truncated |
| lossyColumn | Before 3.3.0.0 | Enables TSZ lossy compression for float and/or double types; range float/double/none; default value none, indicating lossless compression is off |
| ifAdtFse | | When TSZ lossy compression is enabled, use the FSE algorithm instead of the HUFFMAN algorithm, FSE algorithm is faster in compression but slightly slower in decompression, choose this for faster compression speed; 0: off, 1: on; default value is 0 |
| maxRange | | Internal parameter, used for setting lossy compression |
| curRange | | Internal parameter, used for setting lossy compression |
| compressor | | Internal parameter, used for setting lossy compression |
**Supplementary Notes**
**Additional Notes**
1. Effective from versions 3.2.0.0 to 3.3.0.0 (exclusive), enabling this parameter cannot revert back to the version prior to the upgrade.
2. The TSZ compression algorithm uses data prediction techniques for compression, making it more suitable for data that varies in a regular pattern.
3. TSZ compression may take longer; if your server's CPU is largely idle and storage space is limited, it is suitable to opt for this.
4. Example: Enable lossy compression for both float and double types.
1. Effective in versions 3.2.0.0 ~ 3.3.0.0 (not inclusive), enabling this parameter will prevent rollback to the version before the upgrade
2. TSZ compression algorithm is completed through data prediction technology, thus it is more suitable for data with regular changes
3. TSZ compression time will be longer, if your server CPU is mostly idle and storage space is small, it is suitable to choose this
4. Example: Enable lossy compression for both float and double types
```shell
lossyColumns float|double
```
```shell
lossyColumns float|double
```
5. Configuration changes require a service restart to take effect. If you see the following content in the taosd log upon restarting, it indicates that the configuration has taken effect:
5. Configuration requires service restart to take effect, if you see the following content in the taosd log after restarting, it indicates that the configuration has taken effect:
```sql
```sql
02/22 10:49:27.607990 00002933 UTL lossyColumns float|double
```
### Other Parameters
| Parameter Name | Parameter Description |
| :--------------- | :----------------------------------------------------------- |
| enableCoreFile | Whether to generate a core file upon crash; 0: do not generate; 1: generate; default value: 1; Depending on the startup method, the directory for generated core files is as follows: 1. When started with systemctl start taosd: the core will be generated in the root directory; 2. When started manually, it will be in the directory where taosd is executed. |
| udf | Whether to start the UDF service; 0: do not start; 1: start; default value: 0 |
| ttlChangeOnWrite | Whether the ttl expiration time changes along with table modifications; 0: do not change; 1: change; default value: 0 |
| tmqMaxTopicNum | Maximum number of topics that can be established for subscription; range: 1-10000; default value: 20 |
| maxTsmaNum | Number of TSMA that can be created in the cluster; range: 0-3; default value: 3 |
```
## taosd Monitoring Metrics
taosd reports monitoring metrics to taosKeeper, which writes these metrics to the monitoring database, defaulting to `log`. This section provides a detailed introduction to these monitoring metrics.
taosd reports monitoring metrics to taosKeeper, which are written into the monitoring database by taosKeeper, default is `log` database, which can be modified in the taoskeeper configuration file. Below is a detailed introduction to these monitoring metrics.
### taosd_cluster_basic Table
The `taosd_cluster_basic` table records basic cluster information.
`taosd_cluster_basic` table records basic cluster information.
| field | type | is_tag | comment |
| :---------------- | :-------- | :----- | :--------------------------------- |
| ts | TIMESTAMP | | timestamp |
| first_ep | VARCHAR | | Cluster first ep |
| first_ep_dnode_id | INT | | Dnode ID of the cluster's first ep |
| cluster_version | VARCHAR | | TDengine version, e.g., 3.0.4.0 |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :------------------- | :-------- | :------ | :------------------------------ |
| ts | TIMESTAMP | | timestamp |
| first\_ep | VARCHAR | | cluster first ep |
| first\_ep\_dnode\_id | INT | | dnode id of cluster first ep |
| cluster_version | VARCHAR | | tdengine version. e.g.: 3.0.4.0 |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_cluster_info Table
### taosd\_cluster\_info table
The `taosd_cluster_info` table records cluster information.
`taosd_cluster_info` table records cluster information.
| field | type | is_tag | comment |
| :---------------------- | :-------- | :----- | :----------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| cluster_uptime | DOUBLE | | Current uptime of the master node; unit: seconds |
| dbs_total | DOUBLE | | Total number of databases |
| tbs_total | DOUBLE | | Current total number of tables in the cluster |
| stbs_total | DOUBLE | | Current total number of stable tables |
| dnodes_total | DOUBLE | | Total number of dnodes in the cluster |
| dnodes_alive | DOUBLE | | Total number of alive dnodes in the cluster |
| mnodes_total | DOUBLE | | Total number of mnodes in the cluster |
| mnodes_alive | DOUBLE | | Total number of alive mnodes in the cluster |
| vgroups_total | DOUBLE | | Total number of vgroups in the cluster |
| vgroups_alive | DOUBLE | | Total number of alive vgroups in the cluster |
| vnodes_total | DOUBLE | | Total number of vnodes in the cluster |
| vnodes_alive | DOUBLE | | Total number of alive vnodes in the cluster |
| connections_total | DOUBLE | | Total number of connections in the cluster |
| topics_total | DOUBLE | | Total number of topics in the cluster |
| streams_total | DOUBLE | | Total number of streams in the cluster |
| grants_expire_time | DOUBLE | | Expiration time for authentication, valid for enterprise version, max value for community version |
| grants_timeseries_used | DOUBLE | | Used measuring points |
| grants_timeseries_total | DOUBLE | | Total measuring points, max value for open source version |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :----------------------- | :-------- | :------ | :----------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| cluster_uptime | DOUBLE | | uptime of the current master node. Unit: seconds |
| dbs\_total | DOUBLE | | total number of databases |
| tbs\_total | DOUBLE | | total number of tables in the current cluster |
| stbs\_total | DOUBLE | | total number of stables in the current cluster |
| dnodes\_total | DOUBLE | | total number of dnodes in the current cluster |
| dnodes\_alive | DOUBLE | | total number of alive dnodes in the current cluster |
| mnodes\_total | DOUBLE | | total number of mnodes in the current cluster |
| mnodes\_alive | DOUBLE | | total number of alive mnodes in the current cluster |
| vgroups\_total | DOUBLE | | total number of vgroups in the current cluster |
| vgroups\_alive | DOUBLE | | total number of alive vgroups in the current cluster |
| vnodes\_total | DOUBLE | | total number of vnodes in the current cluster |
| vnodes\_alive | DOUBLE | | total number of alive vnodes in the current cluster |
| connections\_total | DOUBLE | | total number of connections in the current cluster |
| topics\_total | DOUBLE | | total number of topics in the current cluster |
| streams\_total | DOUBLE | | total number of streams in the current cluster |
| grants_expire\_time | DOUBLE | | authentication expiration time, valid in enterprise edition, maximum DOUBLE value in community edition |
| grants_timeseries\_used | DOUBLE | | number of used timeseries |
| grants_timeseries\_total | DOUBLE | | total number of timeseries, maximum DOUBLE value in open source version |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_vgroups_info Table
### taosd\_vgroups\_info Table
The `taosd_vgroups_info` table records information about virtual node groups.
`taosd_vgroups_info` table records virtual node group information.
| field | type | is_tag | comment |
| :------------ | :-------- | :----- | :-------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| tables_num | DOUBLE | | Number of tables in the vgroup |
| status | DOUBLE | | vgroup status, value range: unsynced = 0, ready = 1 |
| vgroup_id | VARCHAR | TAG | vgroup id |
| database_name | VARCHAR | TAG | Database name the vgroup belongs to |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :------------- | :-------- | :------ | :--------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| tables\_num | DOUBLE | | Number of tables in vgroup |
| status | DOUBLE | | vgroup status, range: unsynced = 0, ready = 1 |
| vgroup\_id | VARCHAR | tag | vgroup id |
| database\_name | VARCHAR | tag | Name of the database the vgroup belongs to |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_dnodes_info Table
### taosd\_dnodes\_info Table
The `taosd_dnodes_info` table records dnode information.
`taosd_dnodes_info` records dnode information.
| field | type | is_tag | comment |
| :-------------- | :-------- | :----- | :----------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| uptime | DOUBLE | | dnode uptime; unit: seconds |
| cpu_engine | DOUBLE | | taosd CPU usage, read from `/proc/<taosd_pid>/stat` |
| cpu_system | DOUBLE | | Server CPU usage, read from `/proc/stat` |
| cpu_cores | DOUBLE | | Number of CPU cores on the server |
| mem_engine | DOUBLE | | taosd memory usage, read from `/proc/<taosd_pid>/status` |
| mem_free | DOUBLE | | Available memory on the server; unit: KB |
| mem_total | DOUBLE | | Total memory on the server; unit: KB |
| disk_used | DOUBLE | | Disk usage of the data dir mount; unit: bytes |
| disk_total | DOUBLE | | Total disk capacity of the data dir mount; unit: bytes |
| system_net_in | DOUBLE | | Network throughput, read from `/proc/net/dev`, received bytes; unit: byte/s |
| system_net_out | DOUBLE | | Network throughput, read from `/proc/net/dev`, transmitted bytes; unit: byte/s |
| io_read | DOUBLE | | I/O throughput, calculated from rchar read from `/proc/<taosd_pid>/io` and previous values; unit: byte/s |
| io_write | DOUBLE | | I/O throughput, calculated from wchar read from `/proc/<taosd_pid>/io` and previous values; unit: byte/s |
| io_read_disk | DOUBLE | | Disk I/O throughput, read from read_bytes from `/proc/<taosd_pid>/io`; unit: byte/s |
| io_write_disk | DOUBLE | | Disk I/O throughput, read from write_bytes from `/proc/<taosd_pid>/io`; unit: byte/s |
| vnodes_num | DOUBLE | | Number of vnodes on the dnode |
| masters | DOUBLE | | Number of master nodes on the dnode |
| has_mnode | DOUBLE | | Whether the dnode contains an mnode, value range: contains=1, does not contain=0 |
| has_qnode | DOUBLE | | Whether the dnode contains a qnode, value range: contains=1, does not contain=0 |
| has_snode | DOUBLE | | Whether the dnode contains an snode, value range: contains=1, does not contain=0 |
| has_bnode | DOUBLE | | Whether the dnode contains a bnode, value range: contains=1, does not contain=0 |
| error_log_count | DOUBLE | | Total number of errors |
| info_log_count | DOUBLE | | Total number of info messages |
| debug_log_count | DOUBLE | | Total number of debug messages |
| trace_log_count | DOUBLE | | Total number of trace messages |
| dnode_id | VARCHAR | TAG | dnode id |
| dnode_ep | VARCHAR | TAG | dnode endpoint |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :---------------- | :-------- | :------ | :------------------------------------------------------------------------------------------------ |
| \_ts | TIMESTAMP | | timestamp |
| uptime | DOUBLE | | dnode uptime, unit: seconds |
| cpu\_engine | DOUBLE | | taosd CPU usage, read from `/proc/<taosd_pid>/stat` |
| cpu\_system | DOUBLE | | Server CPU usage, read from `/proc/stat` |
| cpu\_cores | DOUBLE | | Number of server CPU cores |
| mem\_engine | DOUBLE | | taosd memory usage, read from `/proc/<taosd_pid>/status` |
| mem\_free | DOUBLE | | Server free memory, unit: KB |
| mem\_total | DOUBLE | | Total server memory, unit: KB |
| disk\_used | DOUBLE | | Disk usage of data dir mount, unit: bytes |
| disk\_total | DOUBLE | | Total disk capacity of data dir mount, unit: bytes |
| system\_net\_in | DOUBLE | | Network throughput, received bytes read from `/proc/net/dev`. Unit: byte/s |
| system\_net\_out | DOUBLE | | Network throughput, transmit bytes read from `/proc/net/dev`. Unit: byte/s |
| io\_read | DOUBLE | | IO throughput, speed calculated from `rchar` read from `/proc/<taosd_pid>/io` since last value. Unit: byte/s |
| io\_write | DOUBLE | | IO throughput, speed calculated from `wchar` read from `/proc/<taosd_pid>/io` since last value. Unit: byte/s |
| io\_read\_disk | DOUBLE | | Disk IO throughput, read_bytes read from `/proc/<taosd_pid>/io`. Unit: byte/s |
| io\_write\_disk | DOUBLE | | Disk IO throughput, write_bytes read from `/proc/<taosd_pid>/io`. Unit: byte/s |
| vnodes\_num | DOUBLE | | Number of vnodes on dnode |
| masters | DOUBLE | | Number of master nodes on dnode |
| has\_mnode | DOUBLE | | Whether dnode contains mnode, range: contains=1, does not contain=0 |
| has\_qnode | DOUBLE | | Whether dnode contains qnode, range: contains=1, does not contain=0 |
| has\_snode | DOUBLE | | Whether dnode contains snode, range: contains=1, does not contain=0 |
| has\_bnode | DOUBLE | | Whether dnode contains bnode, range: contains=1, does not contain=0 |
| error\_log\_count | DOUBLE | | Total number of error logs |
| info\_log\_count | DOUBLE | | Total number of info logs |
| debug\_log\_count | DOUBLE | | Total number of debug logs |
| trace\_log\_count | DOUBLE | | Total number of trace logs |
| dnode\_id | VARCHAR | tag | dnode id |
| dnode\_ep | VARCHAR | tag | dnode endpoint |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_dnodes_status Table
### taosd\_dnodes\_status table
The `taosd_dnodes_status` table records the status information of dnodes.
The `taosd_dnodes_status` table records dnode status information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :-------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| status | DOUBLE | | dnode status; value range: ready=1, offline=0 |
| dnode_id | VARCHAR | TAG | dnode id |
| dnode_ep | VARCHAR | TAG | dnode endpoint |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :---------- | :-------- | :------ | :--------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| status | DOUBLE | | dnode status, value range ready=1, offline=0 |
| dnode\_id | VARCHAR | tag | dnode id |
| dnode\_ep | VARCHAR | tag | dnode endpoint |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_dnodes_log_dir Table
### taosd\_dnodes\_log\_dir table
The `taosd_dnodes_log_dir` table records log directory information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :------------------------------------------------ |
| \_ts | TIMESTAMP | | timestamp |
| avail | DOUBLE | | Available space in the log directory; unit: bytes |
| used | DOUBLE | | Used space in the log directory; unit: bytes |
| total | DOUBLE | | Total space in the log directory; unit: bytes |
| name | VARCHAR | TAG | Log directory name, usually `/var/log/taos/` |
| dnode_id | VARCHAR | TAG | dnode id |
| dnode_ep | VARCHAR | TAG | dnode endpoint |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :---------- | :-------- | :------ | :---------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| avail | DOUBLE | | available space in log directory. Unit: byte |
| used | DOUBLE | | used space in log directory. Unit: byte |
| total | DOUBLE | | space in log directory. Unit: byte |
| name | VARCHAR | tag | log directory name, usually `/var/log/taos/` |
| dnode\_id | VARCHAR | tag | dnode id |
| dnode\_ep | VARCHAR | tag | dnode endpoint |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_dnodes_data_dir Table
### taosd\_dnodes\_data\_dir table
The `taosd_dnodes_data_dir` table records data directory information.
| field | type | is_tag | comment |
| :---- | :-------- | :----- | :------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| avail | DOUBLE | | Available space in the data directory; unit: bytes |
| used | DOUBLE | | Used space in the data directory; unit: bytes |
| total | DOUBLE | | Total space in the data directory; unit: bytes |
| level | VARCHAR | TAG | Multi-level storage level: 0, 1, 2 |
| name | VARCHAR | TAG | Data directory, usually `/var/lib/taos` |
| field | type | is\_tag | comment |
| :---------- | :-------- | :------ | :-------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| avail | DOUBLE | | available space in data directory. Unit: byte |
| used | DOUBLE | | used space in data directory. Unit: byte |
| total | DOUBLE | | space in data directory. Unit: byte |
| level | VARCHAR | tag | multi-level storage levels 0, 1, 2 |
| name | VARCHAR | tag | data directory, usually `/var/lib/taos` |
| dnode\_id | VARCHAR | tag | dnode id |
| dnode\_ep | VARCHAR | tag | dnode endpoint |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_mnodes_info Table
### taosd\_mnodes\_info table
The `taosd_mnodes_info` table records mnode role information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :----------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| role | DOUBLE | | mnode role; value range: offline = 0, follower = 100, candidate = 101, leader = 102, error = 103, learner = 104 |
| mnode_id | VARCHAR | TAG | master node id |
| mnode_ep | VARCHAR | TAG | master node endpoint |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is\_tag | comment |
| :---------- | :-------- | :------ | :------------------------------------------------------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| role | DOUBLE | | mnode role, value range offline = 0, follower = 100, candidate = 101, leader = 102, error = 103, learner = 104 |
| mnode\_id | VARCHAR | tag | master node id |
| mnode\_ep | VARCHAR | tag | master node endpoint |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_vnodes_role Table
### taosd\_vnodes\_role table
The `taosd_vnodes_role` table records virtual node role information.
| field | type | is_tag | comment |
| :------------ | :-------- | :----- | :----------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| vnode_role | DOUBLE | | vnode role; value range: offline = 0, follower = 100, candidate = 101, leader = 102, error = 103, learner = 104 |
| vgroup_id | VARCHAR | TAG | dnode id |
| dnode_id | VARCHAR | TAG | dnode id |
| database_name | VARCHAR | TAG | Database name the vgroup belongs to |
| field | type | is\_tag | comment |
| :------------- | :-------- | :------ | :------------------------------------------------------------------------------------------------------ |
| \_ts | TIMESTAMP | | timestamp |
| vnode\_role | DOUBLE | | vnode role, value range offline = 0, follower = 100, candidate = 101, leader = 102, error = 103, learner = 104 |
| vgroup\_id | VARCHAR | tag | dnode id |
| dnode\_id | VARCHAR | tag | dnode id |
| database\_name | VARCHAR | tag | vgroup's belonging database name |
| cluster\_id | VARCHAR | tag | cluster id |
### taosd_sql_req Table
The `taosd_sql_req` table records server-side SQL request information.
`taosd_sql_req` records server-side SQL request information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | Number of SQL requests |
| result | VARCHAR | TAG | SQL execution result; value range: Success, Failed |
| username | VARCHAR | TAG | User name executing the SQL |
| sql_type | VARCHAR | TAG | SQL type; value range: inserted_rows |
| dnode_id | VARCHAR | TAG | dnode id |
| dnode_ep | VARCHAR | TAG | dnode endpoint |
| vgroup_id | VARCHAR | TAG | vgroup id |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is_tag | comment |
| :---------- | :-------- | :------ | :----------------------------------------------- |
| _ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | number of SQL queries |
| result | VARCHAR | tag | SQL execution result, values range: Success, Failed |
| username | VARCHAR | tag | user name executing the SQL |
| sql_type | VARCHAR | tag | SQL type, value range: inserted_rows |
| dnode_id | VARCHAR | tag | dnode id |
| dnode_ep | VARCHAR | tag | dnode endpoint |
| vgroup_id | VARCHAR | tag | dnode id |
| cluster_id | VARCHAR | tag | cluster id |
### taos_sql_req Table
The `taos_sql_req` table records client-side SQL request information.
`taos_sql_req` records client-side SQL request information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | Number of SQL requests |
| result | VARCHAR | TAG | SQL execution result; value range: Success, Failed |
| username | VARCHAR | TAG | User name executing the SQL |
| sql_type | VARCHAR | TAG | SQL type; value range: select, insert, delete |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is_tag | comment |
| :---------- | :-------- | :------ | :----------------------------------------------- |
| _ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | number of SQL queries |
| result | VARCHAR | tag | SQL execution result, values range: Success, Failed |
| username | VARCHAR | tag | user name executing the SQL |
| sql_type | VARCHAR | tag | SQL type, value range: select, insert, delete |
| cluster_id | VARCHAR | tag | cluster id |
### taos_slow_sql Table
The `taos_slow_sql` table records client-side slow query information.
`taos_slow_sql` records client-side slow query information.
| field | type | is_tag | comment |
| :--------- | :-------- | :----- | :----------------------------------------------------------- |
| \_ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | Number of SQL requests |
| result | VARCHAR | TAG | SQL execution result; value range: Success, Failed |
| username | VARCHAR | TAG | User name executing the SQL |
| duration | VARCHAR | TAG | SQL execution duration; value range: 3-10s, 10-100s, 100-1000s, 1000s- |
| cluster_id | VARCHAR | TAG | cluster id |
| field | type | is_tag | comment |
| :---------- | :-------- | :------ | :------------------------------------------------ |
| _ts | TIMESTAMP | | timestamp |
| count | DOUBLE | | number of SQL queries |
| result | VARCHAR | tag | SQL execution result, values range: Success, Failed |
| username | VARCHAR | tag | user name executing the SQL |
| duration | VARCHAR | tag | SQL execution duration, value range: 3-10s, 10-100s, 100-1000s, 1000s- |
| cluster_id | VARCHAR | tag | cluster id |
## Log Related
TDengine uses log files to record the system's operational status, helping users monitor the system's operation and troubleshoot issues. This section mainly introduces the system logs related to taosc and taosd.
TDengine records the system's operational status through log files, helping users monitor the system's condition and troubleshoot issues. This section mainly introduces the related explanations of two system logs: taosc and taosd.
TDengine's log files mainly include ordinary logs and slow logs.
TDengine's log files mainly include two types: normal logs and slow logs.
1. Ordinary Log Behavior Explanation
1. Multiple client processes can run on the same machine, so client log naming follows the format taoslogX.Y, where X is the sequence number (which can be empty or a digit from 0 to 9) and Y is the suffix (0 or 1).
2. Only one server process can run on the same machine, so server log naming follows the format taosdlog.Y, where Y is the suffix (0 or 1).
The rules for determining sequence numbers and suffixes are as follows (assuming the log path is /var/log/taos/):
1. Determine the sequence number: Use 10 sequence numbers as log naming, from /var/log/taos/taoslog0.Y to /var/log/taos/taoslog9.Y, checking each sequence number to see if it is in use. The first unused sequence number will be used for the log file of that process. If all 10 sequence numbers are in use, the sequence number will be empty, i.e., /var/log/taos/taoslog.Y, and all processes will write to the same file.
2. Determine the suffix: 0 or 1. For example, if the determined sequence number is 3, the alternative log file names will be /var/log/taos/taoslog3.0 and /var/log/taos/taoslog3.1. If both files do not exist, use suffix 0; if one exists and one does not, use the existing suffix. If both exist, use the one with the most recent modification time.
3. If the log file exceeds the configured number of lines (numOfLogLines), it will switch suffixes and continue logging. For example, when /var/log/taos/taoslog3.0 is filled, it will switch to /var/log/taos/taoslog3.1 to continue logging. /var/log/taos/taoslog3.0 will be renamed with a timestamp suffix and compressed (asynchronous thread operation).
4. The parameter logKeepDays controls how many days the log files are retained. For instance, if configured to 1, logs older than one day will be checked and deleted when new logs are compressed. This is not based on natural days.
1. Normal Log Behavior Explanation
1. Multiple client processes can be started on the same machine, so the client log naming convention is taoslogX.Y, where X is a number, either empty or from 0 to 9, and Y is a suffix, either 0 or 1.
2. Only one server process can exist on the same machine. Therefore, the server log naming convention is taosdlog.Y, where Y is a suffix, either 0 or 1.
In addition to recording ordinary logs, SQL statements that exceed the configured execution time will be recorded in the slow logs. Slow log files are primarily used to analyze system performance and troubleshoot performance issues.
The rules for determining the number and suffix are as follows (assuming the log path is /var/log/taos/):
1. Determining the number: Use 10 numbers as the log naming convention, /var/log/taos/taoslog0.Y - /var/log/taos/taoslog9.Y, check each number sequentially to find the first unused number as the log file number for that process. If all 10 numbers are used by processes, do not use a number, i.e., /var/log/taos/taoslog.Y, and all processes write to the same file (number is empty).
2. Determining the suffix: 0 or 1. For example, if the number is determined to be 3, the alternative log file names would be /var/log/taos/taoslog3.0 /var/log/taos/taoslog3.1. If both files do not exist, use suffix 0; if one exists and the other does not, use the existing suffix. If both exist, use the suffix of the file that was modified most recently.
3. If the log file exceeds the configured number of lines numOfLogLines, it will switch suffixes and continue logging, e.g., /var/log/taos/taoslog3.0 is full, switch to /var/log/taos/taoslog3.1 to continue logging. /var/log/taos/taoslog3.0 will be renamed with a timestamp suffix and compressed for storage (handled by an asynchronous thread).
4. Control how many days log files are kept through the configuration logKeepDays, logs older than a certain number of days will be deleted when new logs are compressed and stored. It is not based on natural days.
In addition to recording normal logs, SQL statements that take longer than the configured time will be recorded in the slow logs. Slow log files are mainly used for analyzing system performance and troubleshooting performance issues.
2. Slow Log Behavior Explanation
1. Slow logs will be recorded in the local slow log file and, simultaneously, sent to taosKeeper for structured storage (the monitor switch must be enabled).
1. Slow logs are recorded both locally in slow log files and sent to taosKeeper for structured storage via taosAdapter (monitor switch must be turned on).
2. Slow log file storage rules are:
1. A slow log file is generated for each day; if there are no slow logs on that day, there will be no file for that day.
2. The file name is taosSlowLog.yyyy-mm-dd (e.g., taosSlowLog.2024-08-02), and the log storage path is specified by the logDir configuration.
3. Logs from multiple clients will be stored in the same taosSlowLog.yyyy-mm-dd file in the specified log path.
4. Slow log files are not automatically deleted and are not compressed.
5. They share the same three parameters as ordinary log files: logDir, minimalLogDirGB, and asyncLog. The other two parameters, numOfLogLines and logKeepDays, do not apply to slow logs.
1. One slow log file per day; if there are no slow logs for the day, there is no file for that day.
2. The file name is taosSlowLog.yyyy-mm-dd (taosSlowLog.2024-08-02), and the log storage path is configured through logDir.
3. Logs from multiple clients are stored in the same taosSlowLog.yyyy.mm.dd file under the respective log path.
4. Slow log files are not automatically deleted or compressed.
5. Uses the same three parameters as normal log files: logDir, minimalLogDirGB, asyncLog. The other two parameters, numOfLogLines and logKeepDays, do not apply to slow logs.

View File

@ -4,46 +4,114 @@ sidebar_label: taosc
slug: /tdengine-reference/components/taosc
---
The TDengine client driver provides all the APIs needed for application programming and plays an important role in distributed computing across the entire cluster. The behavior of the client driver can be globally controlled not only by API and its specific parameters but also through configuration files. This section lists the configuration parameters available for the TDengine client.
The TDengine client driver provides all the APIs needed for application programming and plays an important role in the distributed computing across the entire cluster. In addition to the API and its specific parameters, the behavior of the client driver can also be globally controlled through a configuration file. This section lists the configuration parameters that can be used by the TDengine client.
## Configuration Parameters
| Parameter Name | Parameter Meaning |
| :-------------------------------: | :----------------------------------------------------------: |
| firstEp | The endpoint of the first dnode in the cluster to connect to when taos starts; default value: localhost:6030 |
| secondEp | If firstEp fails to connect, attempt to connect to the endpoint of the second dnode in the cluster; no default value |
| numOfRpcSessions | The maximum number of connections a client can create; range: 10-50000000 (in milliseconds); default value: 500000 |
| telemetryReporting | Whether to upload telemetry; 0: do not upload, 1: upload; default value: 1 |
| crashReporting | Whether to upload crash information; 0: do not upload, 1: upload; default value: 1 |
| queryPolicy | Execution policy for query statements; 1: use only vnode, do not use qnode; 2: non-scanning sub-tasks executed on qnode, scanning sub-tasks executed on vnode; 3: vnode only runs scanning operators, all other operators run on qnode; default value: 1 |
| querySmaOptimize | Optimization strategy for sma index; 0: do not use sma index, always query from raw data; 1: use sma index for qualifying statements, directly query from pre-calculated results; default value: 0 |
| keepColumnName | For Last, First, LastRow function queries without specified aliases, automatically set the alias to the column name (excluding the function name), so if the order by clause references the column name, it will automatically reference the corresponding function; 1: automatically set alias to column name (excluding function name), 0: do not automatically set alias; default value: 0 |
| countAlwaysReturnValue | Whether to return a value for count/hyperloglog functions when the input data is empty or NULL; 0: return empty row, 1: return; default value: 1; when set to 1, if the query contains an INTERVAL clause or uses TSMA, and the corresponding group or window data is empty or NULL, that group or window will not return query results. Note that this parameter should be consistent between the client and server. |
| multiResultFunctionStarReturnTags | When querying supertables, whether last(*), last_row(*), first(*) should return tag columns; this parameter does not affect queries on basic tables or subtables; 0: do not return tag columns, 1: return tag columns; default value: 0; when set to 0, last(*), last_row(*), first(*) will only return ordinary columns of the supertable; when set to 1, it will return both ordinary columns and tag columns of the supertable |
| maxTsmaCalcDelay | The allowable tsma calculation delay for the client during queries; if the tsma calculation delay exceeds this value, that TSMA will not be used; range: 600s - 86400s (i.e., 10 minutes to 1 hour); default value: 600 seconds |
| tsmaDataDeleteMark | The retention time for historical intermediate results of TSMA calculations; unit: milliseconds; range: >= 3600000 (i.e., at least 1 hour); default value: 86400000 (i.e., 1 day) |
| timezone | Time zone; defaults to dynamically obtaining the current time zone setting from the system |
| locale | System locale information and encoding format; defaults to dynamically obtaining from the system |
| charset | Character set encoding; defaults to dynamically obtaining from the system |
| metaCacheMaxSize | Maximum size of metadata cache for a single client; unit: MB; default value: -1 (unlimited) |
| logDir | Log file directory; client runtime logs will be written to this directory; default value: /var/log/taos |
| minimalLogDirGB | Stops logging when the available space in the disk where the log directory is located is less than this value; default value: 1 |
| numOfLogLines | Maximum number of lines allowed in a single log file; default value: 10,000,000 |
| asyncLog | Whether to asynchronously write logs; 0: synchronous; 1: asynchronous; default value: 1 |
| logKeepDays | Maximum retention time for log files; default value: 0 (indefinite retention); when greater than 0, log files will be renamed as taosdlog.xxx, where xxx is the timestamp of the last modification of the log file |
| smlChildTableName | Key for custom subtable names in schemaless mode; no default value |
| smlAutoChildTableNameDelimiter | Delimiter between tags in schemaless mode, combined to form the subtable name; no default value |
| smlTagName | Default tag name when schemaless tags are empty; default value: "_tag_null" |
| smlTsDefaultName | The name of the time column in automatically created schemaless tables is set through this configuration; default value: "_ts" |
| enableCoreFile | Whether to generate a core file in case of a crash; 0: do not generate, 1: generate; default value: 1 |
| enableScience | Whether to enable scientific notation for floating-point numbers; 0: disable, 1: enable; default value: 1 |
| compressMsgSize | Whether to compress RPC messages; -1: do not compress any messages; 0: compress all messages; N (N>0): compress only messages larger than N bytes; default value: -1 |
| queryTableNotExistAsEmpty | Whether to return an empty result set when the queried table does not exist; false: return an error; true: return an empty result set; default value: false |
| numOfRpcThreads | The number of threads for RPC data transmission; range: 1-50, default value: half of the CPU cores |
| numOfTaskQueueThreads | The number of threads for the client to process RPC messages, range: 4-16, default value: half of the CPU cores |
| shareConnLimit | The number of requests that a connection can share; range: 1-512; default value: 10 |
| readTimeout | The minimum timeout for a single request; range: 64-604800; unit: seconds; default value: 900 |
### Connection Related
|Parameter Name|Supported Version|Description|
|----------------------|----------|-------------|
|firstEp | |At startup, the endpoint of the first dnode in the cluster to actively connect to, default value: hostname:6030, if the server's hostname cannot be obtained, it is assigned to localhost|
|secondEp | |At startup, if the firstEp cannot be connected, try to connect to the endpoint of the second dnode in the cluster, no default value|
|compressMsgSize | |Whether to compress RPC messages; -1: no messages are compressed; 0: all messages are compressed; N (N>0): only messages larger than N bytes are compressed; default value -1|
|shellActivityTimer | |The duration in seconds for the client to send heartbeats to mnode, range 1-120, default value 3|
|numOfRpcSessions | |Maximum number of connections supported by RPC, range 100-100000, default value 30000|
|numOfRpcThreads | |Number of threads for RPC to send and receive data, range 1-50, default value is half of the CPU cores|
|numOfTaskQueueThreads | |Number of threads for the client to handle RPC messages, range 4-16, default value is half of the CPU cores|
|timeToGetAvailableConn| Cancelled after 3.3.4.* |The longest waiting time to get an available connection, range 10-50000000, in milliseconds, default value 500000|
|useAdapter | |Internal parameter, whether to use taosadapter, affects CSV file import|
|shareConnLimit |Added in 3.3.4.0|Internal parameter, the number of queries a link can share, range 1-256, default value 10|
|readTimeout |Added in 3.3.4.0|Internal parameter, minimum timeout, range 64-604800, in seconds, default value 900|
### Query Related
|Parameter Name|Supported Version|Description|
|---------------------------------|---------|-|
|countAlwaysReturnValue | |Whether the count/hyperloglog function returns a value when the input data is empty or NULL; 0: returns an empty row, 1: returns; default value 1; when this parameter is set to 1, if the query contains an INTERVAL clause or the query uses TSMA, and the corresponding group or window has empty or NULL data, the corresponding group or window will not return a query result; note that this parameter should be consistent between client and server|
|keepColumnName | |Automatically sets the alias to the column name (excluding the function name) when querying with Last, First, LastRow functions without specifying an alias, thus the order by clause will automatically refer to the column corresponding to the function; 1: automatically sets the alias to the column name (excluding the function name), 0: does not automatically set an alias; default value: 0|
|multiResultFunctionStarReturnTags|After 3.3.3.0|When querying a supertable, whether last(\*)/last_row(\*)/first(\*) returns tag columns; when querying basic tables, subtables, it is not affected by this parameter; 0: does not return tag columns, 1: returns tag columns; default value: 0; when this parameter is set to 0, last(\*)/last_row(\*)/first(\*) only returns the ordinary columns of the supertable; when set to 1, it returns both the ordinary columns and tag columns of the supertable|
|metaCacheMaxSize | |Specifies the maximum size of metadata cache for a single client, in MB; default value -1, meaning unlimited|
|maxTsmaCalcDelay | |The allowable delay for tsma calculation by the client during query, range 600s - 86400s, i.e., 10 minutes - 1 day; default value: 600 seconds|
|tsmaDataDeleteMark | |The retention time for intermediate results of historical data calculated by TSMA, in milliseconds; range >= 3600000, i.e., at least 1h; default value: 86400000, i.e., 1d |
|queryPolicy | |Execution strategy for query statements, 1: only use vnode, do not use qnode; 2: subtasks without scan operators are executed on qnode, subtasks with scan operators are executed on vnode; 3: vnode only runs scan operators, all other operators are executed on qnode; default value: 1|
|queryTableNotExistAsEmpty | |Whether to return an empty result set when the queried table does not exist; false: returns an error; true: returns an empty result set; default value false|
|querySmaOptimize | |Optimization strategy for sma index, 0: do not use sma index, always query from original data; 1: use sma index, directly query from pre-calculated results for eligible statements; default value: 0|
|queryPlannerTrace | |Internal parameter, whether the query plan outputs detailed logs|
|queryNodeChunkSize | |Internal parameter, chunk size of the query plan|
|queryUseNodeAllocator | |Internal parameter, allocation method of the query plan|
|queryMaxConcurrentTables | |Internal parameter, concurrency number of the query plan|
|enableQueryHb | |Internal parameter, whether to send query heartbeat messages|
|minSlidingTime | |Internal parameter, minimum allowable value for sliding|
|minIntervalTime | |Internal parameter, minimum allowable value for interval|
### Writing Related
| Parameter Name | Supported Version | Description |
|---------------------------------|-------------------|-------------|
| smlChildTableName | | Key for custom child table name in schemaless, no default value |
| smlAutoChildTableNameDelimiter | | Delimiter between schemaless tags, concatenated as the child table name, no default value |
| smlTagName | | Default tag name when schemaless tag is empty, default value "_tag_null" |
| smlTsDefaultName | | Configuration for setting the time column name in schemaless auto table creation, default value "_ts" |
| smlDot2Underline | | Converts dots in supertable names to underscores in schemaless |
| maxInsertBatchRows | | Internal parameter, maximum number of rows per batch insert |
### Region Related
| Parameter Name | Supported Version | Description |
|----------------|-------------------|-------------|
| timezone | | Time zone; defaults to dynamically obtaining the current system time zone setting |
| locale | | System locale and encoding format, defaults to system settings |
| charset | | Character set encoding, defaults to system settings |
### Storage Related
| Parameter Name | Supported Version | Description |
|-----------------|-------------------|-------------|
| tempDir | | Specifies the directory for generating temporary files during operation, default on Linux platform is /tmp |
| minimalTmpDirGB | | Minimum space required to be reserved in the directory specified by tempDir, in GB, default value: 1 |
### Log Related
| Parameter Name | Supported Version | Description |
|------------------|-------------------|-------------|
| logDir | | Log file directory, operational logs will be written to this directory, default value: /var/log/taos |
| minimalLogDirGB | | Stops writing logs when the disk space available in the log directory is less than this value, in GB, default value: 1 |
| numOfLogLines | | Maximum number of lines allowed in a single log file, default value: 10,000,000 |
| asyncLog | | Log writing mode, 0: synchronous, 1: asynchronous, default value: 1 |
| logKeepDays | | Maximum retention time for log files, in days, default value: 0, meaning unlimited retention. Log files will not be renamed, nor will new log files be rolled out, but the content of the log files may continue to roll depending on the log file size setting; when set to a value greater than 0, the log file will be renamed to taoslogx.yyy, where yyy is the timestamp of the last modification of the log file, and a new log file will be rolled out |
| debugFlag | | Log switch for running logs, 131 (output error and warning logs), 135 (output error, warning, and debug logs), 143 (output error, warning, debug, and trace logs); default value 131 or 135 (depending on the module) |
| tmrDebugFlag | | Log switch for the timer module, value range as above |
| uDebugFlag | | Log switch for the utility module, value range as above |
| rpcDebugFlag | | Log switch for the rpc module, value range as above |
| jniDebugFlag | | Log switch for the jni module, value range as above |
| qDebugFlag | | Log switch for the query module, value range as above |
| cDebugFlag | | Log switch for the client module, value range as above |
| simDebugFlag | | Internal parameter, log switch for the test tool, value range as above |
| tqClientDebugFlag| After 3.3.4.3 | Log switch for the client module, value range as above |
### Debugging Related
| Parameter Name | Supported Version | Description |
|------------------|-------------------|-------------|
| crashReporting | | Whether to upload crash to telemetry, 0: do not upload, 1: upload; default value: 1 |
| enableCoreFile | | Whether to generate a core file when crashing, 0: do not generate, 1: generate; default value: 1 |
| assert | | Assertion control switch, default value: 0 |
| configDir | | Directory for configuration files |
| scriptDir | | Internal parameter, directory for test cases |
| randErrorChance | After 3.3.3.0 | Internal parameter, used for random failure testing |
| randErrorDivisor | After 3.3.3.0 | Internal parameter, used for random failure testing |
| randErrorScope | After 3.3.3.0 | Internal parameter, used for random failure testing |
| safetyCheckLevel | After 3.3.3.0 | Internal parameter, used for random failure testing |
| simdEnable | After 3.3.4.3 | Internal parameter, used for testing SIMD acceleration |
| AVX512Enable | After 3.3.4.3 | Internal parameter, used for testing AVX512 acceleration |
### SHELL Related
|Parameter Name|Supported Version|Description|
|-----------------|----------|-|
|enableScience | |Whether to enable scientific notation for displaying floating numbers; 0: do not enable, 1: enable; default value: 1|
## API
Please refer to [Client Libraries](../../client-libraries/)
Please refer to [Connector](../../client-libraries/)

View File

@ -12,18 +12,18 @@ import StatsD from "../../10-third-party/01-collection/_statsd.mdx"
import Icinga2 from "../../10-third-party/01-collection/_icinga2.mdx"
import TCollector from "../../10-third-party/01-collection/_tcollector.mdx"
taosAdapter is a supporting tool for TDengine, acting as a bridge and adapter between the TDengine cluster and applications. It provides an easy-to-use and efficient way to ingest data directly from data collection agents such as Telegraf, StatsD, collectd, etc. It also offers InfluxDB/OpenTSDB-compatible data ingestion interfaces, allowing InfluxDB/OpenTSDB applications to be seamlessly ported to TDengine.
taosAdapter is a companion tool for TDengine, serving as a bridge and adapter between the TDengine cluster and applications. It provides an easy and efficient way to ingest data directly from data collection agents (such as Telegraf, StatsD, collectd, etc.). It also offers InfluxDB/OpenTSDB compatible data ingestion interfaces, allowing InfluxDB/OpenTSDB applications to be seamlessly ported to TDengine.
taosAdapter offers the following features:
- RESTful interface
- InfluxDB v1 write interface compatibility
- OpenTSDB JSON and telnet format writing compatibility
- Compatible with InfluxDB v1 write interface
- Compatible with OpenTSDB JSON and telnet format writing
- Seamless connection to Telegraf
- Seamless connection to collectd
- Seamless connection to StatsD
- Support for Prometheus remote_read and remote_write
- Retrieve the VGroup ID of the table's virtual node group (VGroup)
- Supports Prometheus remote_read and remote_write
- Retrieves the VGroup ID of the virtual node group (VGroup) where the table is located
## taosAdapter Architecture Diagram
@ -32,31 +32,32 @@ taosAdapter offers the following features:
<figcaption>Figure 1. taosAdapter architecture</figcaption>
</figure>
## taosAdapter Deployment Method
## Deployment Methods for taosAdapter
### Installing taosAdapter
taosAdapter is part of the TDengine server software. If you are using the TDengine server, no additional steps are needed to install taosAdapter. If you need to deploy taosAdapter separately on a server outside the TDengine server, you should install the full TDengine on that server to install taosAdapter. If you need to compile taosAdapter from the source code, you can refer to the documentation on [Building taosAdapter](https://github.com/taosdata/taosadapter/blob/3.0/BUILD.md).
taosAdapter is part of the TDengine server software. If you are using TDengine server, you do not need any additional steps to install taosAdapter. If you need to deploy taosAdapter separately from the TDengine server, you should install the complete TDengine on that server to install taosAdapter. If you need to compile taosAdapter from source code, you can refer to the [Build taosAdapter](https://github.com/taosdata/taosadapter/blob/3.0/BUILD.md) document.
### Starting/Stopping taosAdapter
On Linux systems, the taosAdapter service is managed by systemd by default. Use the command `systemctl start taosadapter` to start the taosAdapter service. Use the command `systemctl stop taosadapter` to stop the taosAdapter service.
On Linux systems, the taosAdapter service is managed by default by systemd. Use the command `systemctl start taosadapter` to start the taosAdapter service. Use the command `systemctl stop taosadapter` to stop the taosAdapter service.
### Removing taosAdapter
Use the command `rmtaos` to remove the TDengine server software, including taosAdapter.
Use the command rmtaos to remove the TDengine server software, including taosAdapter.
### Upgrading taosAdapter
taosAdapter and the TDengine server need to be the same version. Please upgrade taosAdapter by upgrading the TDengine server. The taosAdapter deployed separately from taosd must be upgraded by upgrading the TDengine server it is installed on.
taosAdapter and TDengine server need to use the same version. Please upgrade taosAdapter by upgrading the TDengine server.
taosAdapter deployed separately from taosd must be upgraded by upgrading the TDengine server on its server.
## taosAdapter Parameter List
taosAdapter supports configuration through command-line parameters, environment variables, and configuration files. The default configuration file is `/etc/taos/taosadapter.toml`.
Command-line parameters take precedence over environment variables, which take precedence over configuration files. The command-line usage is `arg=val`, for example, `taosadapter -p=30000 --debug=true`. The detailed list is as follows:
Command-line parameters take precedence over environment variables, which take precedence over configuration files. The command-line usage is arg=val, such as taosadapter -p=30000 --debug=true, detailed list as follows:
```text
```shell
Usage of taosAdapter:
--collectd.db string collectd db name. Env "TAOS_ADAPTER_COLLECTD_DB" (default "collectd")
--collectd.enable enable collectd. Env "TAOS_ADAPTER_COLLECTD_ENABLE" (default true)
@ -117,7 +118,7 @@ Usage of taosAdapter:
--opentsdb_telnet.flushInterval duration opentsdb_telnet flush interval (0s means not valid) . Env "TAOS_ADAPTER_OPENTSDB_TELNET_FLUSH_INTERVAL"
--opentsdb_telnet.maxTCPConnections int max tcp connections. Env "TAOS_ADAPTER_OPENTSDB_TELNET_MAX_TCP_CONNECTIONS" (default 250)
--opentsdb_telnet.password string opentsdb_telnet password. Env "TAOS_ADAPTER_OPENTSDB_TELNET_PASSWORD" (default "taosdata")
--opentsdb_telnet.ports ints opentsdb telnet tcp port. Env "TAOS_ADAPTER_OPENTSDB_TELNET_PORTS" (default [6046,6047,6048,6049])
--opentsdb_telnet.ports ints opentsdb_telnet tcp port. Env "TAOS_ADAPTER_OPENTSDB_TELNET_PORTS" (default [6046,6047,6048,6049])
--opentsdb_telnet.tcpKeepAlive enable tcp keep alive. Env "TAOS_ADAPTER_OPENTSDB_TELNET_TCP_KEEP_ALIVE"
--opentsdb_telnet.ttl int opentsdb_telnet data ttl. Env "TAOS_ADAPTER_OPENTSDB_TELNET_TTL"
--opentsdb_telnet.user string opentsdb_telnet user. Env "TAOS_ADAPTER_OPENTSDB_TELNET_USER" (default "root")
@ -156,8 +157,8 @@ Usage of taosAdapter:
-V, --version Print the version and exit
```
:::note
When making interface calls using a browser, please set the following CORS parameters according to your actual situation:
Note:
When using a browser to make API calls, please set the following Cross-Origin Resource Sharing (CORS) parameters according to the actual situation:
```text
AllowAllOrigins
@ -168,76 +169,69 @@ AllowCredentials
AllowWebSockets
```
If you are not making interface calls through a browser, there is no need to worry about these configurations.
If you are not making API calls through a browser, you do not need to worry about these configurations.
For details on the CORS protocol, please refer to: [https://www.w3.org/wiki/CORS_Enabled](https://www.w3.org/wiki/CORS_Enabled) or [https://developer.mozilla.org/docs/Web/HTTP/CORS](https://developer.mozilla.org/docs/Web/HTTP/CORS).
For details about the CORS protocol, please refer to: [https://www.w3.org/wiki/CORS_Enabled](https://www.w3.org/wiki/CORS_Enabled) or [https://developer.mozilla.org/docs/Web/HTTP/CORS](https://developer.mozilla.org/docs/Web/HTTP/CORS).
:::
See the example configuration file at [example/config/taosadapter.toml](https://github.com/taosdata/taosadapter/blob/3.0/example/config/taosadapter.toml).
An example configuration file can be found at [example/config/taosadapter.toml](https://github.com/taosdata/taosadapter/blob/3.0/example/config/taosadapter.toml).
### Connection Pool Parameters Description
### Connection Pool Parameter Explanation
When using the RESTful API, the system will manage TDengine connections through a connection pool. The connection pool can be configured with the following parameters:
When using the RESTful interface for requests, the system will manage TDengine connections through a connection pool. The connection pool can be configured using the following parameters:
- **`pool.maxConnect`**: The maximum number of connections allowed in the connection pool; default value is twice the number of CPU cores. It is recommended to keep the default setting.
- **`pool.maxIdle`**: The maximum number of idle connections allowed in the connection pool; defaults to the same as `pool.maxConnect`. It is recommended to keep the default setting.
- **`pool.idleTimeout`**: The idle timeout for connections; defaults to never timing out. It is recommended to keep the default setting.
- **`pool.waitTimeout`**: The timeout for obtaining connections from the connection pool; defaults to 60 seconds. If a connection cannot be obtained within the timeout period, an HTTP status code of 503 will be returned. This parameter has been available since version 3.3.3.0.
- **`pool.maxWait`**: The upper limit on the number of requests waiting to obtain connections in the connection pool; default value is 0, indicating no limit. When the number of queued requests exceeds this value, new requests will return an HTTP status code of 503. This parameter has been available since version 3.3.3.0.
- **`pool.maxConnect`**: The maximum number of connections allowed in the pool, default is twice the number of CPU cores. It is recommended to keep the default setting.
- **`pool.maxIdle`**: The maximum number of idle connections in the pool, default is the same as `pool.maxConnect`. It is recommended to keep the default setting.
- **`pool.idleTimeout`**: Connection idle timeout, default is never timeout. It is recommended to keep the default setting.
- **`pool.waitTimeout`**: Timeout for obtaining a connection from the pool, default is set to 60 seconds. If a connection is not obtained within the timeout period, HTTP status code 503 will be returned. This parameter is available starting from version 3.3.3.0.
- **`pool.maxWait`**: The maximum number of requests waiting to get a connection in the pool, default is 0, which means no limit. When the number of queued requests exceeds this value, new requests will return HTTP status code 503. This parameter is available starting from version 3.3.3.0.
## Feature List
- RESTful interface
- RESTful API
[RESTful API](../../client-libraries/rest-api/)
- InfluxDB v1 write interface compatibility
- Compatible with InfluxDB v1 write interface
[https://docs.influxdata.com/influxdb/v2.0/reference/api/influxdb-1x/write/](https://docs.influxdata.com/influxdb/v2.0/reference/api/influxdb-1x/write/)
- OpenTSDB JSON and telnet format writing compatibility
- Compatible with OpenTSDB JSON and telnet format writing
- [http://opentsdb.net/docs/build/html/api_http/put.html](http://opentsdb.net/docs/build/html/api_http/put.html)
- [http://opentsdb.net/docs/build/html/api_telnet/put.html](http://opentsdb.net/docs/build/html/api_telnet/put.html)
- Seamless connection to collectd.
collectd is a system statistics collection daemon; visit [https://collectd.org/](https://collectd.org/) for more information.
- Seamless connection to StatsD.
StatsD is a simple yet powerful statistics aggregator daemon. Visit [https://github.com/statsd/statsd](https://github.com/statsd/statsd) for more information.
- Seamless connection to icinga2.
icinga2 is software for collecting check results metrics and performance data. Visit [https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer](https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer) for more information.
- Seamless connection to tcollector.
- Seamless connection with collectd.
collectd is a system statistics collection daemon, visit [https://collectd.org/](https://collectd.org/) for more information.
- Seamless connection with StatsD.
StatsD is a simple yet powerful daemon for gathering statistics. Visit [https://github.com/statsd/statsd](https://github.com/statsd/statsd) for more information.
- Seamless connection with icinga2.
icinga2 is a software for collecting check results metrics and performance data. Visit [https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer](https://icinga.com/docs/icinga-2/latest/doc/14-features/#opentsdb-writer) for more information.
- Seamless connection with tcollector.
TCollector is a client process that collects data from local collectors and pushes it to OpenTSDB. Visit [http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html](http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html) for more information.
- Seamless connection to node_exporter.
node_exporter is an exporter for machine metrics exposed by the *NIX kernel. Visit [https://github.com/prometheus/node_exporter](https://github.com/prometheus/node_exporter) for more information.
- Support for Prometheus remote_read and remote_write.
remote_read and remote_write are cluster solutions for separating data read and write in Prometheus. Visit [https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/#remote-apis](https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/#remote-apis) for more information.
- Retrieve the VGroup ID of the table's virtual node group (VGroup).
- Seamless connection with node_exporter.
node_exporter is an exporter of machine metrics. Visit [https://github.com/prometheus/node_exporter](https://github.com/prometheus/node_exporter) for more information.
- Supports Prometheus remote_read and remote_write.
remote_read and remote_write are Prometheus's data read-write separation cluster solutions. Visit [https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/#remote-apis](https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/#remote-apis) for more information.
- Get the VGroup ID of the virtual node group (VGroup) where the table is located.
## Interfaces
## Interface
### TDengine RESTful Interface
You can use any client that supports the HTTP protocol to write data to TDengine or query data from TDengine by accessing the RESTful interface address `http://<fqdn>:6041/rest/sql`. For details, please refer to the [REST API documentation](../../client-libraries/rest-api/).
You can use any client that supports the HTTP protocol to write data to TDengine or query data from TDengine by accessing the RESTful interface URL `http://<fqdn>:6041/rest/sql`. For details, please refer to the [REST API documentation](../../client-libraries/rest-api/).
### InfluxDB
You can use any client that supports the HTTP protocol to access the RESTful interface address `http://<fqdn>:6041/influxdb/v1/write` to write InfluxDB-compatible format data to TDengine.
You can use any client that supports the HTTP protocol to write data in InfluxDB compatible format to TDengine by accessing the Restful interface URL `http://<fqdn>:6041/influxdb/v1/write`.
The following InfluxDB parameters are supported:
Supported InfluxDB parameters are as follows:
- `db` specifies the database name used by TDengine
- `precision` is the time precision used by TDengine
- `u` is the username for TDengine
- `p` is the password for TDengine
- `ttl` is the lifecycle of the automatically created subtable, based on the TTL parameter of the first data in the subtable and cannot be updated. For more information, please refer to the [Create Table Documentation](../../sql-manual/manage-tables/) for the TTL parameter.
- `precision` the time precision used by TDengine
- `u` TDengine username
- `p` TDengine password
- `ttl` the lifespan of automatically created subtables, determined by the TTL parameter of the first data entry in the subtable, which cannot be updated. For more information, please refer to the TTL parameter in the [table creation document](../../sql-manual/manage-tables/).
:::note
Currently, the token authentication method of InfluxDB is not supported; only Basic authentication and query parameter authentication are supported.
Example: `curl --request POST http://127.0.0.1:6041/influxdb/v1/write?db=test --user "root:taosdata" --data-binary "measurement,host=host1 field1=2i,field2=2.0 1577836800000000000"`
:::
Note: Currently, InfluxDB's token authentication method is not supported, only Basic authentication and query parameter verification are supported.
Example: curl --request POST `http://127.0.0.1:6041/influxdb/v1/write?db=test` --user "root:taosdata" --data-binary "measurement,host=host1 field1=2i,field2=2.0 1577836800000000000"
### OpenTSDB
You can use any client that supports the HTTP protocol to access the RESTful interface address `http://<fqdn>:6041/<APIEndPoint>` to write OpenTSDB-compatible format data to TDengine. The endpoints are as follows:
You can use any client that supports the HTTP protocol to write data in OpenTSDB compatible format to TDengine by accessing the Restful interface URL `http://<fqdn>:6041/<APIEndPoint>`. EndPoint as follows:
```text
/opentsdb/v1/put/json/<db>
@ -262,121 +256,121 @@ You can use any client that supports the HTTP protocol to access the RESTful int
### node_exporter
Prometheus uses an exporter for machine metrics exposed by the \*NIX kernel.
An exporter used by Prometheus that exposes hardware and operating system metrics from \*NIX kernels
- Enable taosAdapter's configuration `node_exporter.enable`
- Set relevant configurations for `node_exporter`
- Enable configuration of taosAdapter node_exporter.enable
- Set the relevant configuration for node_exporter
- Restart taosAdapter
### prometheus
<Prometheus />
### Retrieve the VGroup ID of a Table
### Getting the VGroup ID of a table
You can access the HTTP interface `http://<fqdn>:6041/rest/vgid?db=<db>&table=<table>` to get the VGroup ID of the table.
You can access the HTTP interface `http://<fqdn>:6041/rest/vgid?db=<db>&table=<table>` to get the VGroup ID of a table.
## Memory Usage Optimization Methods
taosAdapter monitors its memory usage during runtime and adjusts based on two thresholds. The valid range is integers from -1 to 100, representing the percentage of the system's physical memory.
taosAdapter will monitor its memory usage during operation and adjust it through two thresholds. Valid values range from -1 to 100 as a percentage of system physical memory.
- pauseQueryMemoryThreshold
- pauseAllMemoryThreshold
When the pauseQueryMemoryThreshold is exceeded, processing of query requests will stop.
When the pauseQueryMemoryThreshold threshold is exceeded, it stops processing query requests.
HTTP returns:
HTTP return content:
- code 503
- body "query memory exceeds threshold"
When the pauseAllMemoryThreshold is exceeded, processing of all write and query requests will stop.
When the pauseAllMemoryThreshold threshold is exceeded, it stops processing all write and query requests.
HTTP returns:
HTTP return content:
- code 503
- body "memory exceeds threshold"
When memory falls below the thresholds, the corresponding functionalities will resume.
When memory falls below the threshold, the corresponding functions are resumed.
Status check interface `http://<fqdn>:6041/-/ping`
- Returns `code 200` when normal
- No parameters; if memory exceeds pauseAllMemoryThreshold, it will return `code 503`
- Request parameter `action=query`; if memory exceeds pauseQueryMemoryThreshold or pauseAllMemoryThreshold, it will return `code 503`
- Normally returns `code 200`
- No parameters If memory exceeds pauseAllMemoryThreshold, it will return `code 503`
- Request parameter `action=query` If memory exceeds either pauseQueryMemoryThreshold or pauseAllMemoryThreshold, it will return `code 503`
Corresponding configuration parameters:
Corresponding configuration parameters
```text
monitor.collectDuration Monitoring interval Environment variable "TAOS_MONITOR_COLLECT_DURATION" (default value 3s)
monitor.incgroup Whether running in cgroup (set to true if running in a container) Environment variable "TAOS_MONITOR_INCGROUP"
monitor.pauseAllMemoryThreshold Memory threshold for pausing inserts and queries Environment variable "TAOS_MONITOR_PAUSE_ALL_MEMORY_THRESHOLD" (default value 80)
monitor.pauseQueryMemoryThreshold Memory threshold for pausing queries Environment variable "TAOS_MONITOR_PAUSE_QUERY_MEMORY_THRESHOLD" (default value 70)
monitor.incgroup Whether it is running in cgroup (set to true in containers) Environment variable "TAOS_MONITOR_INCGROUP"
monitor.pauseAllMemoryThreshold Memory threshold for stopping inserts and queries Environment variable "TAOS_MONITOR_PAUSE_ALL_MEMORY_THRESHOLD" (default value 80)
monitor.pauseQueryMemoryThreshold Memory threshold for stopping queries Environment variable "TAOS_MONITOR_PAUSE_QUERY_MEMORY_THRESHOLD" (default value 70)
```
You can make corresponding adjustments based on specific project application scenarios and operational strategies, and it is recommended to use operational monitoring software to monitor the system's memory status in a timely manner. Load balancers can also use this interface to check the running status of taosAdapter.
You can adjust according to the specific project application scenario and operational strategy, and it is recommended to use operational monitoring software to monitor the system memory status in real time. Load balancers can also check the running status of taosAdapter through this interface.
## taosAdapter Monitoring Metrics
taosAdapter collects monitoring metrics related to REST/Websocket requests. It reports these metrics to taosKeeper, which writes them into the monitoring database, which is by default the `log` database, and can be modified in the taoskeeper configuration file. The following is a detailed introduction to these monitoring metrics.
taosAdapter collects monitoring metrics related to REST/WebSocket requests. These monitoring metrics are reported to taosKeeper, which writes them into the monitoring database, by default the `log` database, which can be modified in the taoskeeper configuration file. Below is a detailed introduction to these monitoring metrics.
### adapter_requests Table
### adapter_requests table
`adapter_requests` records taosadapter monitoring data.
| field | type | is\_tag | comment |
| :----------------- | :----------- | :------ | :---------------------------------------- |
| ts | TIMESTAMP | | timestamp |
| total | INT UNSIGNED | | Total number of requests |
| query | INT UNSIGNED | | Number of query requests |
| write | INT UNSIGNED | | Number of write requests |
| other | INT UNSIGNED | | Number of other requests |
| in\_process | INT UNSIGNED | | Number of requests in process |
| success | INT UNSIGNED | | Number of successful requests |
| fail | INT UNSIGNED | | Number of failed requests |
| query\_success | INT UNSIGNED | | Number of successful query requests |
| query\_fail | INT UNSIGNED | | Number of failed query requests |
| write\_success | INT UNSIGNED | | Number of successful write requests |
| write\_fail | INT UNSIGNED | | Number of failed write requests |
| other\_success | INT UNSIGNED | | Number of other successful requests |
| other\_fail | INT UNSIGNED | | Number of other failed requests |
| query\_in\_process | INT UNSIGNED | | Number of queries in process |
| write\_in\_process | INT UNSIGNED | | Number of writes in process |
| endpoint | VARCHAR | | Request endpoint |
| req\_type | NCHAR | TAG | Request type: 0 for REST, 1 for Websocket |
| field | type | is_tag | comment |
| :--------------- | :----------- | :----- | :---------------------------------------- |
| ts | TIMESTAMP | | timestamp |
| total | INT UNSIGNED | | total number of requests |
| query | INT UNSIGNED | | number of query requests |
| write | INT UNSIGNED | | number of write requests |
| other | INT UNSIGNED | | number of other requests |
| in_process | INT UNSIGNED | | number of requests in process |
| success | INT UNSIGNED | | number of successful requests |
| fail | INT UNSIGNED | | number of failed requests |
| query_success | INT UNSIGNED | | number of successful query requests |
| query_fail | INT UNSIGNED | | number of failed query requests |
| write_success | INT UNSIGNED | | number of successful write requests |
| write_fail | INT UNSIGNED | | number of failed write requests |
| other_success | INT UNSIGNED | | number of successful other requests |
| other_fail | INT UNSIGNED | | number of failed other requests |
| query_in_process | INT UNSIGNED | | number of query requests in process |
| write_in_process | INT UNSIGNED | | number of write requests in process |
| endpoint | VARCHAR | | request endpoint |
| req_type | NCHAR | tag | request type: 0 for REST, 1 for WebSocket |
## Result Return Count Limit
## Result Return Limit
taosAdapter controls the number of rows returned by the parameter `restfulRowLimit`, where -1 indicates no limit, and the default is unlimited.
taosAdapter controls the number of results returned through the parameter `restfulRowLimit`, -1 represents no limit, default is no limit.
This parameter controls the return of the following interfaces:
This parameter controls the return of the following interfaces
- `http://<fqdn>:6041/rest/sql`
- `http://<fqdn>:6041/prometheus/v1/remote_read/:db`
## HTTP Return Code Configuration
## Configure HTTP Return Codes
taosAdapter uses the parameter `httpCodeServerError` to set whether to return a non-200 HTTP status code when the C interface returns an error. When set to true, it will return different HTTP status codes based on the C return error code. For specifics, see [HTTP Response Codes](../../client-libraries/rest-api/#http-response-codes).
taosAdapter uses the parameter `httpCodeServerError` to set whether to return a non-200 HTTP status code when the C interface returns an error. When set to true, it will return different HTTP status codes based on the error code returned by C. See [HTTP Response Codes](../../client-libraries/rest-api/) for details.
## Configuration for Automatic DB Creation on Schemaless Writes
## Configure Automatic DB Creation for Schemaless Writes
Starting from version 3.0.4.0, taosAdapter provides the parameter `smlAutoCreateDB` to control whether to automatically create a DB when writing using the schemaless protocol. The default value is false, meaning the DB must be manually created by the user before performing schemaless writes.
Starting from version 3.0.4.0, taosAdapter provides the parameter `smlAutoCreateDB` to control whether to automatically create a DB when writing via the schemaless protocol. The default value is false, which does not automatically create a DB, and requires the user to manually create a DB before performing schemaless writes.
## Troubleshooting
You can check the running status of taosAdapter by using the command `systemctl status taosadapter`.
You can check the running status of taosAdapter with the command `systemctl status taosadapter`.
You can also adjust the verbosity of taosAdapter's log output by setting the `--logLevel` parameter or the environment variable `TAOS_ADAPTER_LOG_LEVEL`. Valid values include: panic, fatal, error, warn, warning, info, debug, and trace.
You can also adjust the detail level of taosAdapter log output by setting the --logLevel parameter or the environment variable TAOS_ADAPTER_LOG_LEVEL. Valid values include: panic, fatal, error, warn, warning, info, debug, and trace.
## How to Migrate from Old Versions of TDengine to taosAdapter
## How to Migrate from Older Versions of TDengine to taosAdapter
In TDengine server version 2.2.x.x or earlier, the taosd process contained an embedded HTTP service. As mentioned earlier, taosAdapter is an independent software managed by systemd, with its own process. There are some configuration parameters and behaviors that differ between the two, as shown in the table below:
In TDengine server version 2.2.x.x or earlier, the taosd process included an embedded HTTP service. As mentioned earlier, taosAdapter is a standalone software managed by systemd, having its own process. Moreover, there are some differences in configuration parameters and behaviors between the two, as shown in the table below:
| **#** | **embedded httpd** | **taosAdapter** | **comment** |
| ----- | ------------------- | ------------------------------------------------------ | ------------------------------------------------------------ |
| 1 | httpEnableRecordSql | --logLevel=debug | |
| 2 | httpMaxThreads | n/a | taosAdapter automatically manages the thread pool; this parameter is not needed. |
| 3 | telegrafUseFieldNum | Refer to the taosAdapter telegraf configuration method | |
| 4 | restfulRowLimit | restfulRowLimit | The embedded httpd defaults to output 10,240 rows of data, with a maximum allowable value of 102,400. taosAdapter also provides restfulRowLimit but defaults to no limit. You can configure it according to your actual scenario needs. |
| 5 | httpDebugFlag | Not applicable | httpdDebugFlag does not apply to taosAdapter. |
| 6 | httpDBNameMandatory | Not applicable | taosAdapter requires the database name to be specified in the URL. |
| **#** | **embedded httpd** | **taosAdapter** | **comment** |
| ----- | ------------------- | ---------------------------------------------------------- | ------------------------------------------------------------ |
| 1 | httpEnableRecordSql | --logLevel=debug | |
| 2 | httpMaxThreads | n/a | taosAdapter automatically manages the thread pool, this parameter is not needed |
| 3 | telegrafUseFieldNum | Please refer to taosAdapter telegraf configuration methods | |
| 4 | restfulRowLimit | restfulRowLimit | The embedded httpd defaults to outputting 10240 rows of data, with a maximum allowable value of 102400. taosAdapter also provides restfulRowLimit but does not impose a limit by default. You can configure it according to actual scenario needs. |
| 5 | httpDebugFlag | Not applicable | httpdDebugFlag does not affect taosAdapter |
| 6 | httpDBNameMandatory | Not applicable | taosAdapter requires the database name to be specified in the URL |

View File

@ -7,29 +7,29 @@ slug: /tdengine-reference/components/taosx
import Image from '@theme/IdealImage';
import imgTdx from '../../assets/taosx-01.png';
taosX is a core component in TDengine Enterprise that provides zero-code data access capabilities. taosX supports two running modes: service mode and command-line mode. This section describes how to use taosX in both ways. To use taosX, you must first install the TDengine Enterprise installation package.
taosX is a core component of TDengine Enterprise, providing the capability of zero-code data access. taosX supports two modes of operation: service mode and command line mode. This section discusses how to use taosX in these two ways. To use taosX, you must first install the TDengine Enterprise package.
## Command-Line Mode
## Command Line Mode
### Command-Line Format
### Command Line Format
The command-line parameter format for taosX is as follows:
The command line argument format for taosX is as follows:
```shell
taosx -f <from-DSN> -t <to-DSN> <other parameters>
```
The command-line parameters for taosX are divided into three main parts:
The command line arguments for taosX are divided into three main parts:
- `-f` specifies the data source, i.e., Source DSN.
- `-t` specifies the write target, i.e., Sink DSN.
- Other parameters.
- `-f` specifies the data source, i.e., Source DSN
- `-t` specifies the write target, i.e., Sink DSN
- Other parameters
For the following parameter descriptions and examples, if no special instructions are provided, the format of `<content>` is a placeholder, which needs to be replaced with actual parameters when used.
The following parameter descriptions and examples use `<content>` as a placeholder format, which should be replaced with actual parameters when used.
### DSN (Data Source Name)
In the command-line mode, taosX uses DSN to represent a data source (source or destination). A typical DSN is as follows:
In command line mode, taosX uses DSN to represent a data source (source or destination), a typical DSN is as follows:
```bash
# url-like
@ -37,48 +37,52 @@ In the command-line mode, taosX uses DSN to represent a data source (source or d
|------|------------|---|-----------|-----------|------|------|----------|-----------------------|
|driver| protocol | | username | password | host | port | object | params |
// URL example
// url example
tmq+ws://root:taosdata@localhost:6030/db1?timeout=never
```
The data within [] are optional parameters.
Data in [] is optional.
1. Different drivers have different parameters. The driver includes the following options:
- taos: Use the query interface to get data from TDengine
- tmq: Enable data subscription to get data from TDengine
- local: Data backup or recovery
- pi: Enable pi-connector to get data from pi database
- opc: Enable opc-connector to get data from opc-server
- mqtt: Enable mqtt-connector to get data from mqtt-broker
- kafka: Enable Kafka connector to subscribe to messages from Kafka Topics
- influxdb: Enable influxdb connector to get data from InfluxDB
- csv: Parse data from CSV files
1. Different drivers (driver) have different parameters. The drivers include the following options:
- taos: Use the query interface to retrieve data from TDengine.
- tmq: Enable data subscription to retrieve data from TDengine.
- local: Data backup or recovery.
- pi: Enable pi-connector to retrieve data from the pi database.
- opc: Enable opc-connector to retrieve data from the opc-server.
- mqtt: Enable mqtt-connector to retrieve data from the mqtt-broker.
- kafka: Enable Kafka connector to subscribe messages from Kafka Topics for writing.
- influxdb: Enable influxdb connector to retrieve data from InfluxDB.
- csv: Parse data from a CSV file.
2. +protocol includes the following options:
- +ws: Used when the driver is either taos or tmq, indicating the use of REST to retrieve data. If +ws is not used, it indicates the use of a native connection, which requires taosx to be installed on the server.
- +ua: Used when the driver is opc, indicating the collected data's OPC server is OPC-UA.
- +da: Used when the driver is opc, indicating the collected data's OPC server is OPC-DA.
3. host:port indicates the address and port of the data source.
4. object represents the specific data source, which can be a TDengine database, supertable, table, the path of a local backup file, or the database on the corresponding data source server.
5. username and password represent the username and password for the data source.
6. params represent the parameters of the DSN.
- +ws: Used when the driver is taos or tmq, indicating that data is obtained using rest. If +ws is not used, it indicates that data is obtained using a native connection, in which case taosx must be installed on the server.
- +ua: Used when the driver is opc, indicating that the data's opc-server is opc-ua
- +da: Used when the driver is opc, indicating that the data's opc-server is opc-da
3. host:port represents the address and port of the data source.
4. object represents the specific data source, which can be a TDengine database, supertable, table, or a local backup file path, or a database in the corresponding data source server.
5. username and password represent the username and password of that data source.
6. params represent the parameters of the dsn.
### Other Parameters
1. --jobs `<number>` specifies the concurrency number of tasks, only supports tmq tasks.
2. -v is used to specify the log level of taosx, -v means enable info level logs, -vv corresponds to debug, -vvv corresponds to trace.
1. --jobs `<number>` specifies the number of concurrent tasks, only supports tmq tasks
2. -v is used to specify the log level of taosx, -v enables info level logs, -vv corresponds to debug, -vvv corresponds to trace
### Usage Examples
#### Import and Export User and Permission Information
#### Import and Export of User and Permission Information
Export usernames, passwords, permissions, and whitelist information from cluster A to cluster B:
Export username, password, permissions, and whitelist information from Cluster A to Cluster B:
```shell
taosx privileges -f "taos://root:taosdata@hostA:6030" \
-t "taos+ws://root:password@hostB:6041"
```
Export usernames, passwords, permissions, and whitelist information from cluster A to a JSON file:
Export username, password, permissions, and whitelist information from Cluster A to a JSON file:
```shell
taosx privileges -f "taos+ws://root:taosdata@localhost:6041" \
@ -91,111 +95,106 @@ Restore from the exported JSON file to the local machine:
taosx privileges -i ./user-pass-privileges-backup.json -t "taos:///"
```
Available parameters:
List of available parameters:
| Parameter | Description |
| --------- | ----------------------------------------- |
| Parameter | Description |
| --------- | --------------------------------------- |
| -u | Includes user basic information (password, whether enabled, etc.) |
| -p | Includes permission information |
| -w | Includes whitelist information |
| -p | Includes permission information |
| -w | Includes whitelist information |
When the `-u`/`-p` parameter is applied, it will only include the specified information. When no parameters are included, it indicates all information (username, password, permission, and whitelist).
When the `-u`/`-p` parameters are applied, only the specified information will be included. Without parameters, it means all information (username, password, permissions, and whitelist).
The `-w` parameter cannot be used alone; it is only valid when used with `-u` (using `-u` alone will not include the whitelist).
The `-w` parameter cannot be used alone; it is only effective when used together with `-u` (using `-u` alone will not include the whitelist).
#### Migrating Data from Old Versions
#### Migrating Data from Older Versions
1. Synchronize Historical Data
1. Synchronize historical data
Synchronize the entire database:
Synchronize the entire database:
```shell
taosx run -f 'taos://root:taosdata@localhost:6030/db1' -t 'taos:///db2' -v
```
```shell
taosx run -f 'taos://root:taosdata@localhost:6030/db1' -t 'taos:///db2' -v
```
Synchronize a specified supertable:
Synchronize specified supertables:
```shell
taosx run \
-f 'taos://root:taosdata@localhost:6030/db1?stables=meters' \
-t 'taos:///db2' -v
```
```shell
taosx run \
-f 'taos://root:taosdata@localhost:6030/db1?stables=meters' \
-t 'taos:///db2' -v
```
Synchronize subtables or basic tables; support `{stable}.{table}` to specify a supertable's subtable or directly specify the table name `{table}`.
Synchronize subtables or regular tables, support specifying subtables of supertables with `{stable}.{table}`, or directly specify the table name `{table}`
```shell
taosx run \
-f 'taos://root:taosdata@localhost:6030/db1?tables=meters.d0,d1,table1' \
-t 'taos:///db2' -v
```
```shell
taosx run \
-f 'taos://root:taosdata@localhost:6030/db1?tables=meters.d0,d1,table1' \
-t 'taos:///db2' -v
```
2. Synchronize Data within a Specified Time Range (using RFC3339 time format, note the timezone):
2. Synchronize data for a specified time interval (using RFC3339 time format, note the timezone):
```shell
taosx run -f 'taos:///db1?start=2022-10-10T00:00:00Z' -t 'taos:///db2' -v
```
```shell
taosx run -f 'taos:///db1?start=2022-10-10T00:00:00Z' -t 'taos:///db2' -v
```
3. Continuous Synchronization; `restro` specifies synchronizing the data from the last 5 minutes and continues to synchronize new data. In the example, it checks once every second, and `excursion` allows for a delay or out-of-order data of 500ms.
3. Continuous synchronization, `restro` specifies syncing data from the last 5 minutes and syncing new data, in the example it checks every 1s, `excursion` allows for 500ms of delay or out-of-order data
```shell
taosx run \
-f 'taos:///db1?mode=realtime&restro=5m&interval=1s&excursion=500ms' \
-t 'taos:///db2' -v
```
```shell
taosx run \
-f 'taos:///db1?mode=realtime&restro=5m&interval=1s&excursion=500ms' \
-t 'taos:///db2' -v
```
4. Synchronize Historical Data + Real-Time Data:
4. Synchronize historical data + real-time data:
```shell
taosx run -f 'taos:///db1?mode=all' -t 'taos:///db2' -v
```
```shell
taosx run -f 'taos:///db1?mode=all' -t 'taos:///db2' -v
```
5. Configure Data Synchronization through `--transform` or `-T` (supports synchronization between versions 2.6 to 3.0 and 3.0); this process performs operations on table names and table fields. Currently, this cannot be set through Explorer. Configuration descriptions are as follows:
5. Configure data synchronization through --transform or -T (only supports synchronization between 2.6 to 3.0 and within 3.0) for operations on table names and table fields during the process. It cannot be set through Explorer yet. Configuration instructions are as follows:
1. AddTag, to add a tag to the table. Set example: `-T add-tag:<tag1>=<value1>`.
2. Rename Table:
2.1 Rename Table Scope:
2.1.1 RenameTable: Rename all matching tables.
2.1.2 RenameChildTable: Rename all matching subtables.
2.1.3 RenameSuperTable: Rename all matching supertables.
2.2 Renaming Methods:
2.2.1 Prefix: Add prefix.
2.2.2 Suffix: Add suffix.
2.2.3 Template: Template method.
2.2.4 ReplaceWithRegex: Regular expression replacement. Added in taosx 1.1.0.
Renaming configuration method: `<Table Scope>:<Renaming Method>:<Renaming Value>`
Usage examples:
1. Add prefix `<prefix>` to all tables: `--transform rename-table:prefix:<prefix>`
2. Replace prefix1 with prefix2 for matching tables. In the following example, `<>` are no longer placeholders for regular expressions.
```text
```shell
1.AddTag, add tag to a table. Setting example: -T add-tag:<tag1>=<value1>.
2.Table renaming:
2.1 Renaming scope
2.1.1 RenameTable: Rename all tables that meet the conditions.
2.1.2 RenameChildTable: Rename all child tables that meet the conditions.
2.1.3 RenameSuperTable: Rename all supertables that meet the conditions.
2.2 Renaming method
2.2.1 Prefix: Add a prefix.
2.2.2 Suffix: Add a suffix.
2.2.3 Template: Template method.
2.2.4 ReplaceWithRegex: Regular expression replacement. New in taosx 1.1.0.
Renaming configuration method:
<Table scope>:<Renaming method>:<Renaming value>
Usage example:
1.Add a prefix <prefix> to all tables
--transform rename-table:prefix:<prefix>
2.Replace the prefix for tables that meet the conditions: replace prefix1 with prefix2, in the following example <> is no longer a placeholder in regular expressions.
-T rename-child-table:replace_with_regex:^prefix1(?<old>)::prefix2_$old
```
```
Example explanation: `^prefix1(?<old>)` is a regular expression that matches table names starting with prefix1 and records the suffix as old; prefix2$old will replace the old with prefix2. Note: The two parts are separated using the special character ::, so the regular expression must not include this character.
Example explanation: `^prefix1(?<old>)` is a regular expression that matches table names starting with `prefix1` and records the suffix as `old`. `prefix2$old` will then replace it using `prefix2` and `old`. Note: The two parts are separated by the key character `::`, so ensure that the regular expression does not contain this character.
For more complex replacement needs, please refer to: [https://docs.rs/regex/latest/regex/#example-replacement-with-named-capture-groups](https://docs.rs/regex/latest/regex/#example-replacement-with-named-capture-groups) or consult taosx developers.
For more complex replacement needs, please refer to: [https://docs.rs/regex/latest/regex/#example-replacement-with-named-capture-groups](https://docs.rs/regex/latest/regex/#example-replacement-with-named-capture-groups) or consult the taosx developers.
3. Using a CSV mapping file to rename tables: The following example uses the `map.csv` file to rename tables
3. Use a CSV mapping file for table renaming: The following example uses the map.csv file for table renaming.
`-T rename-child-table:map:@./map.csv`
```text
`-T rename-child-table:map:@./map.csv`
```
The format of the CSV file `./map.csv` is as follows:
The CSV file `./map.csv` is formatted as follows:
```csv
name1,newname1
name2,newname2
```
```text
name1,newname1
name2,newname2
```
It is important to note: When migrating between versions that are not the same, and using a native connection, you need to specify `libraryPath` in the DSN, such as: `taos:///db1?libraryPath=./libtaos.so`
Please note that when migrating between incompatible versions on both ends and using a native connection, it is necessary to specify `libraryPath` in the DSN, for example: `taos:///db1?libraryPath=./libtaos.so`.
#### Importing CSV File Data
#### Import Data from CSV Files
The basic usage is as follows:
Basic usage is as follows:
```shell
taosx run -f csv:./meters/meters.csv.gz \
@ -203,7 +202,7 @@ taosx run -f csv:./meters/meters.csv.gz \
-t taos:///csv1 -qq
```
For example, with the electric meter data, the CSV file is as follows:
Taking electricity meter data as an example, the CSV file is as follows:
```csv
tbname,ts,current,voltage,phase,groupid,location
@ -216,7 +215,7 @@ d4,2017-07-14T10:40:00.005+08:00,-2.718924,6,-0.886308,7,California.LosAngles
d4,2017-07-14T10:40:00.006+08:00,-2.740636,10,-0.893545,7,California.LosAngles
```
The `--parser` option is used to set the database parameters. The example is as follows:
`--parser` is used to set the import parameters, as shown below:
```json
{
@ -237,38 +236,38 @@ The `--parser` option is used to set the database parameters. The example is as
}
```
It will import data from `./meters/meters.csv.gz` (a gzip compressed CSV file) into the supertable `meters`, where each row is inserted into the specified table name - `${tbname}`, using the `tbname` column in the CSV content as the table name (i.e., in the JSON parser, it is defined in `.model.name`).
It will import data from `./meters/meters.csv.gz` (a gzip-compressed CSV file) into the supertable `meters`, inserting each row into the specified table name - `${tbname}` using the `tbname` column from the CSV content as the table name (i.e., in the JSON parser's `.model.name`).
## Service Mode
This section describes how to deploy `taosX` in service mode. The functions of taosX running in service mode need to be accessed through the graphical interface on taosExplorer.
This section discusses how to deploy `taosX` in service mode. When running taosX in service mode, its functions need to be used through the graphical interface on taosExplorer.
### Configuration
`taosX` can be configured through a configuration file. On Linux, the default configuration file path is `/etc/taos/taosx.toml`, and on Windows, the default configuration file path is `C:\\TDengine\\cfg\\taosx.toml`. It includes the following configuration items:
`taosX` supports configuration through a configuration file. On Linux, the default configuration file path is `/etc/taos/taosx.toml`, and on Windows, it is `C:\\TDengine\\cfg\\taosx.toml`, which includes the following configuration items:
- `plugins_home`: The directory where external data source connectors are located.
- `data_dir`: The directory for storing data files.
- `instanceId`: The instance ID of the current taosX service. If multiple taosX instances are started on the same machine, ensure that the instance IDs are unique.
- `logs_home`: The directory where log files are stored. The log file prefix for `taosX` is `taosx.log`, and external data sources have their own log file prefix. This has been deprecated; please use `log.path` instead.
- `log_level`: The log level; optional levels include `error`, `warn`, `info`, `debug`, `trace`, with a default value of `info`. This has been deprecated; please use `log.level` instead.
- `log_keep_days`: The maximum storage days for logs; `taosX` logs will be divided into different files by day. This has been deprecated; please use `log.keepDays` instead.
- `jobs`: The maximum number of threads per runtime. In service mode, the total number of threads is `jobs*2`, with a default thread count of `current server cores*2`.
- `serve.listen`: The listening address for the `taosX` REST API, with a default value of `0.0.0.0:6050`.
- `serve.database_url`: The address of the `taosX` database, in the format `sqlite:<path>`.
- `serve.request_timeout`: The global API timeout duration.
- `monitor.fqdn`: The FQDN of the `taosKeeper` service, with no default value. If empty, monitoring functionality will be disabled.
- `monitor.port`: The port of the `taosKeeper` service, defaulting to `6043`.
- `monitor.interval`: The frequency of sending metrics to `taosKeeper`, defaulting to every 10 seconds, with only values from 1 to 10 being valid.
- `log.path`: The directory for storing log files.
- `log.level`: The log level, with optional values of "error", "warn", "info", "debug", "trace".
- `log.compress`: Whether to compress archived log files or not.
- `log.rotationCount`: The maximum number of files retained in the log directory; old files exceeding this number will be deleted.
- `log.rotationSize`: The file size (in bytes) that triggers log file rotation; when the log file exceeds this size, a new file will be generated, and new logs will be written to the new file.
- `log.reservedDiskSize`: The threshold for stopping log writes when the disk where the logs are located reaches this size (in bytes).
- `log.keepDays`: The number of days logs are retained; old log files exceeding this number will be deleted.
- `log.watching`: Whether to listen for changes in the `log.loggers` configuration in the log file and attempt to reload.
- `log.loggers`: Specifies the log output level of modules, in the format `"modname" = "level"`; it also adapts to the tracing library syntax and can be specified as `modname[span{field=value}]=level`, where `level` is the log level.
- `plugins_home`: Directory where external data source connectors are located.
- `data_dir`: Directory where data files are stored.
- `instanceId`: Instance ID of the current taosX service. If multiple taosX instances are started on the same machine, it is necessary to ensure that the instance IDs of each instance are unique.
- `logs_home`: Directory where log files are stored, the prefix for `taosX` log files is `taosx.log`, external data sources have their own log file name prefixes. Deprecated, please use `log.path` instead.
- `log_level`: Log level, available levels include `error`, `warn`, `info`, `debug`, `trace`, default value is `info`. Deprecated, please use `log.level` instead.
- `log_keep_days`: Maximum storage days for logs, `taosX` logs will be divided into different files by day. Deprecated, please use `log.keepDays` instead.
- `jobs`: Maximum number of threads per runtime. In service mode, the total number of threads is `jobs*2`, default number of threads is `current server cores*2`.
- `serve.listen`: `taosX` REST API listening address, default value is `0.0.0.0:6050`.
- `serve.database_url`: Address of the `taosX` database, format is `sqlite:<path>`.
- `serve.request_timeout`: Global interface API timeout.
- `monitor.fqdn`: FQDN of the `taosKeeper` service, no default value, leave blank to disable monitoring.
- `monitor.port`: Port of the `taosKeeper` service, default `6043`.
- `monitor.interval`: Frequency of sending metrics to `taosKeeper`, default is every 10 seconds, only values between 1 and 10 are valid.
- `log.path`: Directory where log files are stored.
- `log.level`: Log level, available values are "error", "warn", "info", "debug", "trace".
- `log.compress`: Whether to compress log files after rolling.
- `log.rotationCount`: Maximum number of files to retain in the log file directory, older files exceeding this number are deleted.
- `log.rotationSize`: File size that triggers log file rolling (in bytes), when log files exceed this size a new file is generated, and new logs are written to the new file.
- `log.reservedDiskSize`: Threshold for stopping log writing on the disk where logs are stored (in bytes), when the remaining disk space reaches this size, log writing is stopped.
- `log.keepDays`: Number of days to keep log files, old log files exceeding this number of days are deleted.
- `log.watching`: Whether to monitor changes in the `log.loggers` configuration content in the log files and attempt to reload.
- `log.loggers`: Specifies the log output level of modules, format is `"modname" = "level"`, also compatible with tracing library syntax, can be specified as `modname[span{field=value}]=level`, where `level` is the log level.
As shown below:
@ -359,13 +358,13 @@ As shown below:
### Start
On Linux, `taosX` can be started with the Systemd command:
On Linux systems, `taosX` can be started using the Systemd command:
```shell
systemctl start taosx
```
On Windows, find the `taosX` service in the system management tool "Services" and start it, or execute the following command in the command line tool (cmd.exe or PowerShell) to start it:
On Windows systems, find the `taosX` service through the system management tool "Services" and start it, or execute the following command in the command line tool (cmd.exe or PowerShell) to start:
```shell
sc.exe start taosx
@ -373,200 +372,207 @@ sc.exe start taosx
### Troubleshooting
1. Modify the log level of `taosX`
The default log level for `taosX` is `info`. To specify a different level, modify the configuration file or use the following command-line parameters:
- `error`: `taosx serve -qq`
- `debug`: `taosx serve -q`
- `info`: `taosx serve -v`
- `debug`: `taosx serve -vv`
- `trace`: `taosx serve -vvv`
To specify command-line parameters while running `taosX` as a service, refer to the configuration.
2. View `taosX` logs
You can view log files or use the `journalctl` command to check the logs of `taosX`.
On Linux, the command to view logs with `journalctl` is as follows:
1. Modify `taosX` log level
```bash
journalctl -u taosx [-f]
```
The default log level of `taosX` is `info`. To specify a different level, please modify the configuration file, or use the following command line parameters:
- `error`: `taosx serve -qq`
- `debug`: `taosx serve -q`
- `info`: `taosx serve -v`
- `debug`: `taosx serve -vv`
- `trace`: `taosx serve -vvv`
To specify command line parameters when `taosX` is running as a service, please refer to the configuration.
2. View `taosX` logs
You can view the log files or use the `journalctl` command to view the logs of `taosX`.
The command to view logs under Linux using `journalctl` is as follows:
```bash
journalctl -u taosx [-f]
```
## taosX Monitoring Metrics
taosX reports monitoring metrics to taosKeeper, which writes these metrics to the monitoring database, defaulting to the `log` database, which can be modified in the taosKeeper configuration file. The following is a detailed introduction to these monitoring metrics.
taosX reports monitoring metrics to taosKeeper, which are written into the monitoring database by taosKeeper, defaulting to the `log` database, which can be modified in the taoskeeper configuration file. Here is a detailed introduction to these monitoring metrics.
### taosX Service
| Field | Description |
| -------------------------- | ------------------------------------------------------------ |
| sys_cpu_cores | Number of CPU cores in the system |
| sys_total_memory | Total memory in the system, in bytes |
| sys_used_memory | Used memory in the system, in bytes |
| sys_available_memory | Available memory in the system, in bytes |
| process_uptime | Running time of taosX, in seconds |
| process_id | Process ID of taosX |
| running_tasks | Number of tasks currently being executed by taosX |
| completed_tasks | Number of tasks completed by the taosX process within a monitoring cycle (e.g., 10s) |
| failed_tasks | Number of tasks that failed within a monitoring cycle (e.g., 10s) |
| process_cpu_percent | CPU percentage used by the taosX process, in % |
| process_memory_percent | Memory percentage used by the taosX process, in % |
| process_disk_read_bytes | Average number of bytes read from the disk by the taosX process within a monitoring cycle (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to the disk by the taosX process within a monitoring cycle (e.g., 10s), in bytes/s |
| Field | Description |
| -------------------------- | ----------------------------------------------------------------------------- |
| sys_cpu_cores | Number of system CPU cores |
| sys_total_memory | Total system memory, in bytes |
| sys_used_memory | System memory used, in bytes |
| sys_available_memory | System memory available, in bytes |
| process_uptime | taosX runtime, in seconds |
| process_id | taosX process ID |
| running_tasks | Number of current tasks being executed by taosX |
| completed_tasks | Number of tasks completed by taosX in a monitoring period (e.g., 10s) |
| failed_tasks | Number of tasks failed by taosX in a monitoring period (e.g., 10s) |
| process_cpu_percent | CPU percentage used by the taosX process, in % |
| process_memory_percent | Memory percentage used by the taosX process, in % |
| process_disk_read_bytes | Average number of bytes read from disk by the taosX process in a monitoring period (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to disk by the taosX process in a monitoring period (e.g., 10s), in bytes/s |
### Agent
| Field | Description |
| -------------------------- | ------------------------------------------------------------ |
| sys_cpu_cores | Number of CPU cores in the system |
| sys_total_memory | Total memory in the system, in bytes |
| sys_used_memory | Used memory in the system, in bytes |
| sys_available_memory | Available memory in the system, in bytes |
| process_uptime | Running time of the agent, in seconds |
| process_id | Process ID of the agent |
| process_cpu_percent | CPU percentage used by the agent process, in % |
| process_memory_percent | Memory percentage used by the agent process, in % |
| process_disk_read_bytes | Average number of bytes read from the disk by the agent process within a monitoring cycle (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to the disk by the agent process within a monitoring cycle (e.g., 10s), in bytes/s |
| Field | Description |
| ---------------------------- | --------------------------------------------------------------------------- |
| sys_cpu_cores | Number of system CPU cores |
| sys_total_memory | Total system memory, in bytes |
| sys_used_memory | System memory used, in bytes |
| sys_available_memory | System memory available, in bytes |
| process_uptime | Agent runtime, in seconds |
| process_id | Agent process id |
| process_cpu_percent | CPU percentage used by the agent process |
| process_memory_percent | Memory percentage used by the agent process |
| process_uptime | Process uptime, in seconds |
| process_disk_read_bytes | Average number of bytes read from disk by the agent process in a monitoring period (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to disk by the agent process in a monitoring period (e.g., 10s), in bytes/s |
### Connector
| Field | Description |
| -------------------------- | ------------------------------------------------------------ |
| process_id | Process ID of the connector |
| process_uptime | Start time of the process, in seconds |
| process_cpu_percent | CPU percentage used by the process, in % |
| process_memory_percent | Memory percentage used by the process, in % |
| process_disk_read_bytes | Average number of bytes read from the disk by the connector process within a monitoring cycle (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to the disk by the connector process within a monitoring cycle (e.g., 10s), in bytes/s |
| Field | Description |
| ---------------------------- | ---------------------------------------------------------------------------------------- |
| process_id | Connector process id |
| process_uptime | Process uptime, in seconds |
| process_cpu_percent | CPU percentage used by the process, in % |
| process_memory_percent | Memory percentage used by the process, in % |
| process_disk_read_bytes | Average number of bytes read from disk by the connector process in a monitoring period (e.g., 10s), in bytes/s |
| process_disk_written_bytes | Average number of bytes written to disk by the connector process in a monitoring period (e.g., 10s), in bytes/s |
### taosX General Data Source Task
| Field | Description |
| -------------------- | ------------------------------------------------------------ |
| total_execute_time | Cumulative running time of the task, in milliseconds |
| total_written_rows | Total number of rows successfully written to TDengine (including duplicate records) |
| total_written_points | Cumulative number of successfully written points (equal to the number of rows multiplied by the number of columns in the data block) |
| start_time | Task start time (reset on each restart of the task) |
| written_rows | Total number of rows successfully written to TDengine in this task run (including duplicate records) |
| written_points | Total number of successfully written points in this run (equal to the number of rows multiplied by the number of columns in the data block) |
| execute_time | Running time of this task run, in seconds |
| Field | Description |
| -------------------- | ------------------------------------------------------------------- |
| total_execute_time | Total cumulative runtime of the task, in milliseconds |
| total_written_rowsls | Total number of rows successfully written to TDengine (including duplicates) |
| total_written_points | Total number of successful write points (equal to the number of rows in the data block multiplied by the number of columns in the data block) |
| start_time | Task start time (reset each time the task is restarted) |
| written_rows | Total number of rows successfully written to TDengine in this run (including duplicates) |
| written_points | Number of successful write points in this run (equal to the number of rows in the data block multiplied by the number of columns in the data block) |
| execute_time | Runtime of the task in this instance, in seconds |
### taosX TDengine V2 Task
| Field | Description |
| --------------------- | ------------------------------------------------------------ |
| read_concurrency | Number of workers reading data concurrently from the data source, equal to the number of workers writing to TDengine concurrently |
| total_stables | Total number of supertables to be migrated |
| total_updated_tags | Total number of tags updated |
| total_created_tables | Total number of subtables created |
| total_tables | Total number of subtables to be migrated |
| total_finished_tables | Total number of subtables completed for data migration (may be greater than the actual value if the task is interrupted and restarted) |
| total_success_blocks | Total number of data blocks successfully written |
| finished_tables | Total number of subtables migrated in this run |
| success_blocks | Total number of data blocks successfully written in this run |
| created_tables | Total number of subtables created in this run |
| updated_tags | Total number of tags updated in this run |
| Field | Description |
| --------------------- | -------------------------------------------------------------------- |
| read_concurrency | Number of data worker threads reading concurrently from the data source, also equals the number of worker threads writing concurrently to TDengine |
| total_stables | Number of supertables to be migrated |
| total_updated_tags | Total number of tags updated |
| total_created_tables | Total number of subtables created |
| total_tables | Number of subtables to be migrated |
| total_finished_tables | Number of subtables that have completed data migration (may be greater than actual value if the task is interrupted and restarted) |
| total_success_blocks | Total number of successful data blocks written |
| finished_tables | Number of subtables that have completed migration in this run |
| success_blocks | Number of successful data blocks written in this run |
| created_tables | Number of subtables created in this run |
| updated_tags | Number of tags updated in this run |
### taosX TDengine V3 Task
| Field | Description |
| ---------------------- | ------------------------------------------------------------ |
| total_messages | Total number of messages received through TMQ |
| total_messages_of_meta | Total number of Meta-type messages received through TMQ |
| total_messages_of_data | Total number of Data and MetaData-type messages received through TMQ |
| total_write_raw_fails | Total number of failures in writing raw meta |
| total_success_blocks | Total number of data blocks successfully written |
| topics | Number of topics subscribed through TMQ |
| consumers | Number of TMQ consumers |
| messages | Total number of messages received in this run through TMQ |
| messages_of_meta | Total number of Meta-type messages received in this run through TMQ |
| messages_of_data | Total number of Data and MetaData-type messages received in this run through TMQ |
| write_raw_fails | Total number of failures in writing raw meta in this run |
| success_blocks | Total number of data blocks successfully written in this run |
| Field | Description |
| ---------------------- | -------------------------------------------------------------- |
| total_messages | Total number of messages received through TMQ |
| total_messages_of_meta | Total number of Meta type messages received through TMQ |
| total_messages_of_data | Total number of Data and MetaData type messages received through TMQ |
| total_write_raw_fails | Total number of failures in writing raw meta |
| total_success_blocks | Total number of successful data blocks written |
| topics | Number of topics subscribed through TMQ |
| consumers | Number of TMQ consumers |
| messages | Total number of messages received through TMQ in this run |
| messages_of_meta | Number of Meta type messages received through TMQ in this run |
| messages_of_data | Number of Data and MetaData type messages received through TMQ in this run |
| write_raw_fails | Number of failures in writing raw meta in this run |
| success_blocks | Number of successful data blocks written in this run |
### taosX Other Data Source Tasks
### taosX Other Data Sources Task
These data sources include: InfluxDB, OpenTSDB, OPC UA, OPC DA, PI, CSV, MQTT, AVEVA Historian, and Kafka.
| Field | Description |
| ----------------------- | ------------------------------------------------------------ |
| total_received_batches | Total batches of data received through IPC Stream |
| total_processed_batches | Total number of batches processed |
| total_processed_rows | Total number of rows processed (equal to the sum of data rows contained in each batch) |
| total_inserted_sqls | Total number of INSERT SQL executed |
| total_failed_sqls | Total number of failed INSERT SQL executions |
| total_created_stables | Total number of supertables created (may be greater than the actual value) |
| total_created_tables | Total number of attempts to create subtables (may be greater than the actual value) |
| total_failed_rows | Total number of rows that failed to write |
| total_failed_points | Total number of points that failed to write |
| total_written_blocks | Total number of successfully written raw blocks |
| total_failed_blocks | Total number of failed raw blocks written |
| received_batches | Total batches of data received in this task run |
| processed_batches | Total number of batches processed in this task run |
| processed_rows | Total number of rows processed in this task run (equal to the sum of data rows contained in each batch) |
| received_records | Total number of records received in this task run |
| inserted_sqls | Total number of INSERT SQL executed in this task run |
| failed_sqls | Total number of failed INSERT SQL executions in this task run |
| created_stables | Total number of attempts to create supertables in this task run |
| created_tables | Total number of attempts to create subtables in this task run |
| failed_rows | Total number of rows that failed to write in this task run |
| failed_points | Total number of points that failed to write in this task run |
| written_blocks | Total number of successfully written raw blocks in this task run |
| failed_blocks | Total number of failed raw blocks written in this task run |
| Field | Description |
| ----------------------- | ----------------------------------------------------------- |
| total_received_batches | Total number of data batches received through IPC Stream |
| total_processed_batches | Number of batches processed |
| total_processed_rows | Total number of rows processed (equals the sum of rows in each batch) |
| total_inserted_sqls | Total number of INSERT SQLs executed |
| total_failed_sqls | Total number of failed INSERT SQLs |
| total_created_stables | Total number of supertables created (may be greater than actual) |
| total_created_tables | Total number of child tables attempted to create (may be greater than actual) |
| total_failed_rows | Total number of rows failed to write |
| total_failed_point | Total number of points failed to write |
| total_written_blocks | Total number of raw blocks successfully written |
| total_failed_blocks | Total number of raw blocks failed to write |
| received_batches | Total number of data batches received through IPC Stream in this task run |
| processed_batches | Number of batches processed in this task run |
| processed_rows | Total number of rows processed in this task run (equals the sum of rows in data-containing batches) |
| received_records | Total number of data rows received through IPC Stream in this task run |
| inserted_sqls | Total number of INSERT SQLs executed in this task run |
| failed_sqls | Total number of failed INSERT SQLs in this task run |
| created_stables | Number of supertables attempted to create in this task run (may be greater than actual) |
| created_tables | Number of child tables attempted to create in this task run (may be greater than actual) |
| failed_rows | Number of rows failed to write in this task run |
| failed_points | Number of points failed to write in this task run |
| written_blocks | Number of raw blocks successfully written in this task run |
| failed_blocks | Number of raw blocks failed to write in this task run |
### Kafka Data Source Related Metrics
### Kafka Data Source Metrics
| Field | Description |
| ----------------------------- | ------------------------------------------------------------ |
| kafka_consumers | Number of Kafka consumers in this task run |
| kafka_total_partitions | Total number of partitions in the Kafka topic |
| kafka_consuming_partitions | Number of partitions currently being consumed in this task run |
| kafka_consumed_messages | Number of messages already consumed in this task run |
| total_kafka_consumed_messages | Total number of messages consumed so far |
| Field | Description |
| ----------------------------- | ---------------------------- |
| kafka_consumers | Number of Kafka consumers in this run |
| kafka_total_partitions | Total number of Kafka topic partitions |
| kafka_consuming_partitions | Number of partitions being consumed in this run |
| kafka_consumed_messages | Number of messages consumed in this run |
| total_kafka_consumed_messages | Total number of messages consumed |
## taosX Data Parsing Plugins
## taosX Data Parsing Plugin
When connecting to Kafka/MQTT message middleware, you need to parse the raw data. If using JSON/regex or other pattern parsers does not meet the parsing requirements, and UDT (custom parsing scripts) also cannot meet performance requirements, you can create custom data parsing plugins.
When integrating with kafka/mqtt message middleware, it is necessary to parse the original data. If json/regex and other pattern parsers cannot meet the parsing requirements, and UDT (custom parsing script) also cannot meet the performance requirements, you can customize a data parsing plugin.
### Plugin Overview
The taosX Parser plugin is a dynamic library developed in C/Rust language, requiring compatibility with the C ABI. This dynamic library must implement the specified API and be compiled into a dynamic library that can run correctly in the taosX operating environment. After that, it should be copied to a designated location for taosX to load it at runtime during data processing's parsing stage.
The taosX Parser plugin is a dynamic library compatible with the C ABI, developed in C/Rust language. This dynamic library must implement the agreed API and be compiled into a dynamic library that can run correctly in the taosX operating environment. It is then copied to the specified location to be loaded by taosX at runtime and called during the data Parsing phase.
### Plugin Deployment
After completing plugin development, ensure that the compilation environment is compatible with the target runtime environment. Copy the compiled plugin dynamic library to the plugin directory. When taosX starts, it will initialize and load the plugin when the system first uses it. You can check whether it has loaded successfully in the Kafka or MQTT data access configuration page of Explorer.
After the plugin development is completed, the compilation environment needs to be compatible with the target operating environment. Copy the compiled plugin dynamic library to the plugin directory. After taosX starts, the system initializes and loads the plugin the first time it is used. You can check whether it is loaded successfully on the kafka or mqtt data access configuration page in explorer.
The plugin directory reuses the plugins configuration in the `taosx.toml` configuration file, appending `/parsers` as the plugin installation path. The default value in a UNIX environment is `/usr/local/taos/plugins/parsers`, and in Windows, it is `C:\TDengine\plugins\parsers`.
The plugin directory reuses the plugins configuration in the `taosx.toml` configuration file, appending `/parsers` as the plugin installation path. The default value in UNIX environment is `/usr/local/taos/plugins/parsers`, and in Windows it is `C:\TDengine\plugins\parsers`.
### Plugin API Description
#### 1. Get Plugin Name
Retrieve the plugin name for frontend display.
Get the plugin name for display on the frontend.
**Function Signature**: `const char* parser_name()`
**Function Signature**: const char* parser_name()
**Return Value**: String.
#### 2. Get Plugin Version
Retrieve the plugin version for troubleshooting purposes.
Plugin version, useful for troubleshooting.
**Function Signature**: `const char* parser_version()`
**Function Signature**: const char* parser_version()
**Return Value**: String.
#### 3. Configure the Parser
#### 3. Configure Parser
Parse a string parameter into a configuration object, for internal plugin use only.
Parse a string argument into a configuration object, for internal use only.
**Function Signature**: `parser_resp_t parser_new(char* ctx, uint32_t len);`
**Function Signature**: parser_resp_t parser_new(char* ctx, uint32_t len);
`char* ctx`: User-defined configuration string.
char* ctx: User-defined configuration string.
`uint32_t len`: The binary length of the string (excluding `\0`).
uint32_t len: The binary length of this string (excluding `\0`).
**Return Value**:
```c
``` c
struct parser_resp_t {
int e; // 0 if success.
void* p; // Success if contains.
@ -581,9 +587,9 @@ When creation is successful, e = 0, p is the parser object.
**Function Signature**:
Parse the input payload and return the result in JSON format [u8]. The returned JSON will be fully decoded (the root array and all objects will be expanded) using the default JSON parser.
Parse the input payload and return the result in JSON format [u8]. The returned JSON will be fully decoded using the default JSON parser (expanding the root array and all objects).
```c
``` c
const char* parser_mutate(
void* parser,
const uint8_t* in_ptr, uint32_t in_len,
@ -591,22 +597,22 @@ const char* parser_mutate(
);
```
`void* parser`: Pointer to the object generated by parser_new;
`void* parser`: Object pointer generated by parser_new;
`const uint8_t* in_ptr`: Pointer to the input payload;
`const uint8_t* in_ptr`: Pointer to the input Payload;
`uint32_t in_len`: Length of the input payload in bytes (excluding `\0`);
`uint32_t in_len`: Byte length of the input Payload (excluding `\0`);
`const void* uint8_t* out_ptr`: Pointer to the output JSON string (excluding `\0`). If out_ptr points to NULL, it indicates that the output is empty.
`const void* uint8_t* out_ptr`: Pointer to the output JSON string (excluding \0). When out_ptr points to null, it indicates that the output is null.
`uint32_t * out_len`: Length of the output JSON string.
`uint32_t * out_len`: Output length of the JSON string.
**Return Value**: When the call is successful, the return value is NULL.
#### 5. Free the Parser
#### 5. Release the Parser
Release the memory of the parser object.
**Function Signature**: `void parser_free(void* parser);`
**Function Signature**: void parser_free(void* parser);
`void* parser`: Pointer to the object generated by `parser_new`.
void* parser: Pointer to the object generated by parser_new.

View File

@ -4,25 +4,25 @@ sidebar_label: taosX Agent
slug: /tdengine-reference/components/taosx-agent
---
This section explains how to deploy the `Agent` (for `taosX`). Before using it, you need to install the TDengine Enterprise installation package. The taosX-Agent is used in certain data access scenarios, such as Pi, OPC UA, and OPC DA, where access to the data source is restricted or the network environment is special. It can be deployed close to the data source or even on the same server as the data source, with the taosX-Agent responsible for reading data from the source and sending it to taosX.
This section discusses how to deploy `Agent` (for `taosX`). Before using it, you need to install the TDengine Enterprise package. taosX-Agent is used in some data access scenarios, such as Pi, OPC UA, OPC DA, etc., where there are certain restrictions on accessing data sources or the network environment is special. In such cases, taosX-Agent can be deployed close to the data source or even on the same server as the data source, and it is responsible for reading data from the data source and sending it to taosX.
## Configuration
The default configuration file for the `Agent` is located at `/etc/taos/agent.toml`, which contains the following configuration items:
The default configuration file for `Agent` is located at `/etc/taos/agent.toml`, and includes the following configuration items:
- `endpoint`: Required, the GRPC service address of `taosX`.
- `token`: Required, the token generated when creating the `Agent` on `Explorer`.
- `instanceId`: The instance ID of the current taosx-agent service. If multiple taosx-agent instances are started on the same machine, the instance IDs must be unique.
- `compression`: Optional, can be set to `true` or `false`, defaulting to `false`. When set to `true`, it enables data compression for communication between the `Agent` and `taosX`.
- `log_level`: Optional, the log level, defaulting to `info`, supporting `error`, `warn`, `info`, `debug`, and `trace`. Deprecated; please use `log.level` instead.
- `log_keep_days`: Optional, the number of days to keep logs, defaulting to `30` days. Deprecated; please use `log.keepDays` instead.
- `log.path`: Directory where log files are stored.
- `log.level`: Log level, with optional values of "error", "warn", "info", "debug", and "trace".
- `log.compress`: Whether to compress the files after rolling the logs.
- `log.rotationCount`: The maximum number of files to retain in the log file directory; older files beyond this limit are deleted.
- `log.rotationSize`: The file size (in bytes) that triggers the log file to roll; a new file is created when the log exceeds this size, and new logs will be written to the new file.
- `log.reservedDiskSize`: The threshold (in bytes) for stopping log writing when the remaining disk space reaches this size.
- `log.keepDays`: The number of days log files are retained; old log files exceeding this number will be deleted.
- `token`: Required, the Token generated when creating `Agent` in `Explorer`.
- `instanceId`: The instance ID of the current taosx-agent service. If multiple taosx-agent instances are started on the same machine, it is necessary to ensure that the instance IDs of each instance are unique.
- `compression`: Optional, can be configured as `true` or `false`, default is `false`. If set to `true`, it enables data compression in communication between `Agent` and `taosX`.
- `log_level`: Optional, log level, default is `info`. Like `taosX`, it supports five levels: `error`, `warn`, `info`, `debug`, `trace`. Deprecated, please use `log.level` instead.
- `log_keep_days`: Optional, the number of days to keep logs, default is `30` days. Deprecated, please use `log.keepDays` instead.
- `log.path`: The directory where log files are stored.
- `log.level`: Log level, options are "error", "warn", "info", "debug", "trace".
- `log.compress`: Whether to compress the log files after rolling.
- `log.rotationCount`: The maximum number of log files to keep in the directory, older files are deleted when this number is exceeded.
- `log.rotationSize`: The file size that triggers log rolling (in bytes), a new file is created when the log file exceeds this size, and new logs are written to the new file.
- `log.reservedDiskSize`: The threshold of remaining disk space to stop writing logs (in bytes), logging stops when the disk space reaches this size.
- `log.keepDays`: The number of days to keep log files, older log files are deleted after this period.
As shown below:
@ -77,23 +77,23 @@ As shown below:
#keepDays = 30
```
You don't need to be confused about how to set the configuration file; read and follow the prompts in `Explorer` when creating the `Agent`, and you can view, modify, and check the configuration file.
You don't need to be confused about how to set up the configuration file. Read and follow the prompts in `Explorer` to create an `Agent`, where you can view, modify, and check the configuration file.
## Start
On Linux, the `Agent` can be started with the Systemd command:
On Linux systems, the `Agent` can be started with the Systemd command:
```bash
systemctl start taosx-agent
```
On Windows, find the taosx-agent service in the system management tool "Services" and start it.
On Windows systems, find the taosx-agent service through the system management tool "Services", and then start it.
## Troubleshooting
You can view log files or use the `journalctl` command to check the logs of the `Agent`.
You can view the log files or use the `journalctl` command to view the logs of the `Agent`.
On Linux, the command to view logs with `journalctl` is as follows:
The command to view logs with `journalctl` on Linux is as follows:
```bash
journalctl -u taosx-agent [-f]

View File

@ -7,72 +7,79 @@ slug: /tdengine-reference/components/taoskeeper
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
taosKeeper is a tool for exporting monitoring metrics in TDengine version 3.0, allowing you to obtain the operational status of TDengine with just a few simple configurations. taosKeeper uses the TDengine RESTful interface, so there's no need to install the TDengine client.
taosKeeper is a monitoring metric export tool for TDengine version 3.0, which can obtain the running status of TDengine with a few simple configurations. taosKeeper uses the TDengine RESTful interface, so there is no need to install the TDengine client to use it.
## Installation
taosKeeper can be installed in two ways:
There are two ways to install taosKeeper:
- It is automatically installed when you install the official TDengine installation package. For details, please refer to [TDengine Installation](../../../get-started/).
- You can also compile and install taosKeeper separately. For more details, please refer to the [taosKeeper](https://github.com/taosdata/taoskeeper) repository.
- taosKeeper is automatically installed with the official TDengine installation package, for details please refer to [TDengine Installation](../../../get-started/).
- Compile and install taosKeeper separately, for details please refer to the [taosKeeper](https://github.com/taosdata/taoskeeper) repository.
## Configuration
taosKeeper needs to be executed in the operating system terminal. This tool supports three configuration methods: command line parameters, environment variables, and configuration files. The priority order is: command line parameters, environment variables, configuration file parameters. Generally, we recommend using a configuration file.
taosKeeper needs to be executed in the operating system terminal, and this tool supports three configuration methods: command line parameters, environment variables, and configuration files. The priority is: command line parameters, environment variables, configuration file parameters. Generally, we recommend using the configuration file.
### Command Line Parameters and Environment Variables
The command line parameters and environment variable descriptions can be referenced from the output of the command `taoskeeper --help`. Heres an example:
For explanations of command line parameters and environment variables, refer to the output of the command `taoskeeper --help`. Below is an example:
```shell
Usage of taosKeeper v3.3.2.0:
--debug enable debug mode. Env "TAOS_KEEPER_DEBUG"
-P, --port int http port. Env "TAOS_KEEPER_PORT" (default 6043)
--logLevel string log level (panic fatal error warn warning info debug trace). Env "TAOS_KEEPER_LOG_LEVEL" (default "info")
--gopoolsize int coroutine size. Env "TAOS_KEEPER_POOL_SIZE" (default 50000)
-R, --RotationInterval string interval for refresh metrics, such as "300ms", Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". Env "TAOS_KEEPER_ROTATION_INTERVAL" (default "15s")
--tdengine.host string TDengine server's ip. Env "TAOS_KEEPER_TDENGINE_HOST" (default "127.0.0.1")
--tdengine.port int TDengine REST server(taosAdapter)'s port. Env "TAOS_KEEPER_TDENGINE_PORT" (default 6041)
--tdengine.username string TDengine server's username. Env "TAOS_KEEPER_TDENGINE_USERNAME" (default "root")
--tdengine.password string TDengine server's password. Env "TAOS_KEEPER_TDENGINE_PASSWORD" (default "taosdata")
--tdengine.usessl TDengine server use ssl or not. Env "TAOS_KEEPER_TDENGINE_USESSL"
--metrics.prefix string prefix in metrics names. Env "TAOS_KEEPER_METRICS_PREFIX"
--metrics.database.name string database for storing metrics data. Env "TAOS_KEEPER_METRICS_DATABASE" (default "log")
--metrics.tables stringArray export some tables that are not super table, multiple values split with white space. Env "TAOS_KEEPER_METRICS_TABLES"
--environment.incgroup whether running in cgroup. Env "TAOS_KEEPER_ENVIRONMENT_INCGROUP"
--log.path string log path. Env "TAOS_KEEPER_LOG_PATH" (default "/var/log/taos")
--log.rotationCount uint log rotation count. Env "TAOS_KEEPER_LOG_ROTATION_COUNT" (default 5)
--log.rotationTime duration log rotation time. Env "TAOS_KEEPER_LOG_ROTATION_TIME" (default 24h0m0s)
--log.rotationSize string log rotation size(KB MB GB), must be a positive integer. Env "TAOS_KEEPER_LOG_ROTATION_SIZE" (default "100000000")
-c, --config string config path default /etc/taos/taoskeeper.toml
-V, --version Print the version and exit
-h, --help Print this help message and exit
Usage of taoskeeper v3.3.3.0:
-R, --RotationInterval string interval for refresh metrics, such as "300ms", Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". Env "TAOS_KEEPER_ROTATION_INTERVAL" (default "15s")
-c, --config string config path default /etc/taos/taoskeeper.toml
--drop string run taoskeeper in command mode, only support old_taosd_metric_stables.
--environment.incgroup whether running in cgroup. Env "TAOS_KEEPER_ENVIRONMENT_INCGROUP"
--fromTime string parameter of transfer, example: 2020-01-01T00:00:00+08:00 (default "2020-01-01T00:00:00+08:00")
--gopoolsize int coroutine size. Env "TAOS_KEEPER_POOL_SIZE" (default 50000)
-h, --help Print this help message and exit
--instanceId int instance ID. Env "TAOS_KEEPER_INSTANCE_ID" (default 64)
--log.compress whether to compress old log. Env "TAOS_KEEPER_LOG_COMPRESS"
--log.keepDays uint log retention days, must be a positive integer. Env "TAOS_KEEPER_LOG_KEEP_DAYS" (default 30)
--log.level string log level (trace debug info warning error). Env "TAOS_KEEPER_LOG_LEVEL" (default "info")
--log.path string log path. Env "TAOS_KEEPER_LOG_PATH" (default "/var/log/taos")
--log.reservedDiskSize string reserved disk size for log dir (KB MB GB), must be a positive integer. Env "TAOS_KEEPER_LOG_RESERVED_DISK_SIZE" (default "1GB")
--log.rotationCount uint log rotation count. Env "TAOS_KEEPER_LOG_ROTATION_COUNT" (default 5)
--log.rotationSize string log rotation size(KB MB GB), must be a positive integer. Env "TAOS_KEEPER_LOG_ROTATION_SIZE" (default "1GB")
--log.rotationTime duration deprecated: log rotation time always 24 hours. Env "TAOS_KEEPER_LOG_ROTATION_TIME" (default 24h0m0s)
--logLevel string log level (trace debug info warning error). Env "TAOS_KEEPER_LOG_LEVEL" (default "info")
--metrics.database.name string database for storing metrics data. Env "TAOS_KEEPER_METRICS_DATABASE" (default "log")
--metrics.database.options.buffer int database option buffer for audit database. Env "TAOS_KEEPER_METRICS_BUFFER" (default 64)
--metrics.database.options.cachemodel string database option cachemodel for audit database. Env "TAOS_KEEPER_METRICS_CACHEMODEL" (default "both")
--metrics.database.options.keep int database option buffer for audit database. Env "TAOS_KEEPER_METRICS_KEEP" (default 90)
--metrics.database.options.vgroups int database option vgroups for audit database. Env "TAOS_KEEPER_METRICS_VGROUPS" (default 1)
--metrics.prefix string prefix in metrics names. Env "TAOS_KEEPER_METRICS_PREFIX"
--metrics.tables stringArray export some tables that are not supertable, multiple values split with white space. Env "TAOS_KEEPER_METRICS_TABLES"
-P, --port int http port. Env "TAOS_KEEPER_PORT" (default 6043)
--tdengine.host string TDengine server's ip. Env "TAOS_KEEPER_TDENGINE_HOST" (default "127.0.0.1")
--tdengine.password string TDengine server's password. Env "TAOS_KEEPER_TDENGINE_PASSWORD" (default "taosdata")
--tdengine.port int TDengine REST server(taosAdapter)'s port. Env "TAOS_KEEPER_TDENGINE_PORT" (default 6041)
--tdengine.username string TDengine server's username. Env "TAOS_KEEPER_TDENGINE_USERNAME" (default "root")
--tdengine.usessl TDengine server use ssl or not. Env "TAOS_KEEPER_TDENGINE_USESSL"
--transfer string run taoskeeper in command mode, only support old_taosd_metric. transfer old metrics data to new tables and exit
-V, --version Print the version and exit
```
### Configuration File
taosKeeper supports specifying a configuration file using the command `taoskeeper -c <keeper config file>`.
taosKeeper supports specifying a configuration file with the command `taoskeeper -c <keeper config file>`.
If no configuration file is specified, taosKeeper will use the default configuration file located at: `/etc/taos/taoskeeper.toml`.
If neither a taosKeeper configuration file is specified nor does `/etc/taos/taoskeeper.toml` exist, the default configuration will be used.
If you do not specify a configuration file, taosKeeper will use the default configuration file, which is located at `/etc/taos/taoskeeper.toml`.
If neither the taosKeeper configuration file is specified, nor does `/etc/taos/taoskeeper.toml` exist, the default configuration will be used.
**Here is an example configuration file:**
**Below is an example of the configuration file:**
```toml
# Start with debug middleware for gin
debug = false
# The ID of the currently running taoskeeper instance, default is 64.
instanceId = 64
# Listen port, default is 6043
# Listening port, default is 6043.
port = 6043
# log level
loglevel = "info"
# go pool size
# Go pool size
gopoolsize = 50000
# interval for metrics
# Interval for metrics
RotationInterval = "15s"
[tdengine]
@ -83,20 +90,21 @@ password = "taosdata"
usessl = false
[metrics]
# metrics prefix in metrics names.
# Metrics prefix in metrics names.
prefix = "taos"
# export some tables that are not super table
# Export some tables that are not supertable.
tables = []
# database for storing metrics data
# Database for storing metrics data.
[metrics.database]
name = "log"
# database options for db storing metrics data
# Database options for db storing metrics data.
[metrics.database.options]
vgroups = 1
buffer = 64
KEEP = 90
keep = 90
cachemodel = "both"
[environment]
@ -104,26 +112,36 @@ cachemodel = "both"
incgroup = false
[log]
rotationCount = 5
rotationTime = "24h"
rotationSize = 100000000
# The directory where log files are stored.
# path = "/var/log/taos"
level = "info"
# Number of log file rotations before deletion.
rotationCount = 30
# The number of days to retain log files.
keepDays = 30
# The maximum size of a log file before rotation.
rotationSize = "1GB"
# If set to true, log files will be compressed.
compress = false
# Minimum disk space to reserve. Log files will not be written if disk space falls below this limit.
reservedDiskSize = "1GB"
```
## Start
## Startup
**Before running taosKeeper, ensure that the TDengine cluster and taosAdapter are running correctly.** Additionally, monitoring services must be enabled in the TDengine configuration file `taos.cfg`, requiring at least the configuration of `monitor` and `monitorFqdn`.
**Before running taosKeeper, ensure that the TDengine cluster and taosAdapter are already running correctly.** Additionally, monitoring services must be enabled in TDengine, with at least `monitor` and `monitorFqdn` configured in the TDengine configuration file `taos.cfg`.
```shell
monitor 1
monitorFqdn localhost # FQDN of the taoskeeper service
monitorFqdn localhost # FQDN for taoskeeper service
```
For more details on TDengine monitoring configuration, please refer to: [Monitor Your Cluster](../../../operations-and-maintenance/monitor-your-cluster/).
For details on TDengine monitoring configuration, please refer to: [TDengine Monitoring Configuration](../../../operations-and-maintenance/monitor-your-cluster/).
<Tabs>
<TabItem label="Linux" value="linux">
After installation, use the `systemctl` command to start the taoskeeper service process.
After installation, please use the `systemctl` command to start the taoskeeper service process.
```bash
systemctl start taoskeeper
@ -135,13 +153,13 @@ Check if the service is working properly:
systemctl status taoskeeper
```
If the service process is active, the status command will display the following related information:
If the service process is active, the status command will display the following information:
```text
Active: active (running)
```
If the background service process is stopped, the status command will display the following related information:
If the background service process is stopped, the status command will display the following information:
```text
Active: inactive (dead)
@ -150,18 +168,20 @@ Active: inactive (dead)
The following `systemctl` commands can help you manage the taoskeeper service:
- Start the service process: `systemctl start taoskeeper`
- Stop the service process: `systemctl stop taoskeeper`
- Restart the service process: `systemctl restart taoskeeper`
- View service status: `systemctl status taoskeeper`
- Check the service status: `systemctl status taoskeeper`
:::info
- The `systemctl` command requires _root_ permissions to run. If you are not a _root_ user, please add `sudo` before the command.
- If your system does not support `systemd`, you can manually run `/usr/local/taos/bin/taoskeeper` to start the taoskeeper service.
- Troubleshooting: If the service is abnormal, please check the logs for more information. The log files are stored by default in `/var/log/taos`.
- `systemctl` commands require _root_ permissions to run. If you are not a _root_ user, please add `sudo` before the command.
- If the system does not support `systemd`, you can also manually run `/usr/local/taos/bin/taoskeeper` to start the taoskeeper service.
- Troubleshooting: If the service is abnormal, please check the log for more information. Log files are by default located in `/var/log/taos`.
:::
</TabItem>
<TabItem label="macOS" value="macOS">
@ -171,14 +191,16 @@ After installation, you can run `sudo launchctl start com.tdengine.taoskeeper` t
The following `launchctl` commands are used to manage the taoskeeper service:
- Start the service process: `sudo launchctl start com.tdengine.taoskeeper`
- Stop the service process: `sudo launchctl stop com.tdengine.taoskeeper`
- View service status: `sudo launchctl list | grep taoskeeper`
- Check the service status: `sudo launchctl list | grep taoskeeper`
:::info
- The `launchctl` command requires administrator privileges to manage `com.tdengine.taoskeeper`, so make sure to add `sudo` before it for security.
- The first column returned by `sudo launchctl list | grep taoskeeper` is the PID of the `taoskeeper` program. If it shows `-`, it means the taoskeeper service is not running.
- Troubleshooting: If the service is abnormal, please check the logs for more information. The log files are stored by default in `/var/log/taos`.
- `launchctl` commands managing `com.tdengine.taoskeeper` require administrator privileges, always use `sudo` to enhance security.
- The command `sudo launchctl list | grep taoskeeper` returns the PID of the `taoskeeper` program in the first column. If it is `-`, it indicates that the taoskeeper service is not running.
- Troubleshooting: If the service is abnormal, please check the log for more information. Log files are by default located in `/var/log/taos`.
:::
@ -187,13 +209,13 @@ The following `launchctl` commands are used to manage the taoskeeper service:
## Health Check
You can access the taosKeeper `check_health` interface to determine if the service is alive. If the service is normal, it will return an HTTP 200 status code:
You can access the taosKeeper's `check_health` interface to determine if the service is alive. If the service is normal, it will return an HTTP 200 status code:
```shell
curl -i http://127.0.0.1:6043/check_health
```
Return result:
Response:
```text
HTTP/1.1 200 OK
@ -206,11 +228,11 @@ Content-Length: 21
## Data Collection and Monitoring
taosKeeper, as a tool for exporting TDengine monitoring metrics, can record the monitoring data generated by TDengine in a specified database (default monitoring data is `log`). This monitoring data can be used to configure TDengine monitoring.
taosKeeper, as a tool for exporting monitoring metrics of TDengine, can record the monitoring data generated by TDengine in a specified database (the default monitoring data is `log`). These monitoring data can be used to configure TDengine monitoring.
### Viewing Monitoring Data
You can view the supertables in the `log` database, with each supertable corresponding to a set of monitoring metrics, which will not be detailed further.
You can view the supertables under the `log` database, each supertable corresponds to a set of monitoring metrics, specific metrics are not further described.
```shell
taos> use log;
@ -234,9 +256,10 @@ taos> show stables;
taosd_dnodes_data_dirs |
taosd_dnodes_log_dirs |
Query OK, 14 row(s) in set (0.006542s)
```
You can check the most recent report record of a supertable, such as:
You can view the most recent report record of a supertable, such as:
``` shell
taos> select last_row(*) from taosd_dnodes_info;
@ -246,17 +269,17 @@ taos> select last_row(*) from taosd_dnodes_info;
Query OK, 1 row(s) in set (0.003168s)
```
### Using TDInsight to Configure Monitoring
### Configuring Monitoring with TDInsight
Once monitoring data has been collected, you can use TDInsight to configure monitoring for TDengine. For specific details, please refer to the [TDInsight Reference Manual](../tdinsight/).
After collecting monitoring data, you can use TDInsight to configure monitoring for TDengine. For details, please refer to the [TDinsight Reference Manual](../tdinsight/).
## Integrating Prometheus
taosKeeper provides the `/metrics` interface, which returns monitoring data in Prometheus format. Prometheus can extract monitoring data from taosKeeper to monitor TDengine.
taoskeeper provides a `/metrics` endpoint, which returns monitoring data in Prometheus format. Prometheus can extract monitoring data from taoskeeper to achieve monitoring of TDengine through Prometheus.
### Exporting Monitoring Metrics
The following curl command shows the data format returned by the `/metrics` interface:
Below, the data format returned by the `/metrics` endpoint is demonstrated using the `curl` command:
```shell
curl http://127.0.0.1:6043/metrics
@ -285,234 +308,234 @@ taos_cluster_info_first_ep{cluster_id="554014120921134497",value="tdengine:6030"
taos_cluster_info_first_ep_dnode_id{cluster_id="554014120921134497"} 1
```
### Monitoring Metrics Details
### Details of Monitoring Metrics
#### taosd Cluster
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `cluster_id`: Cluster ID
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| --------------------------------------- | ------- | ------------------------------------ |
| taos_cluster_info_connections_total | counter | Total number of connections |
| taos_cluster_info_dbs_total | counter | Total number of databases |
| taos_cluster_info_dnodes_alive | counter | Number of alive dnodes |
| taos_cluster_info_dnodes_total | counter | Total number of dnodes |
| taos_cluster_info_first_ep | gauge | First endpoint; the label value indicates the endpoint value |
| taos_cluster_info_first_ep_dnode_id | counter | The dnode ID of the first endpoint |
| taos_cluster_info_master_uptime | gauge | Uptime of the master node (in days) |
| taos_cluster_info_mnodes_alive | counter | Number of alive mnodes |
| taos_cluster_info_mnodes_total | counter | Total number of mnodes |
| taos_cluster_info_stbs_total | counter | Total number of supertables |
| taos_cluster_info_streams_total | counter | Total number of streams |
| taos_cluster_info_tbs_total | counter | Total number of tables |
| taos_cluster_info_topics_total | counter | Total number of topics |
| taos_cluster_info_version | gauge | Version information; the label value indicates the version number |
| taos_cluster_info_vgroups_alive | counter | Number of alive virtual groups |
| taos_cluster_info_vgroups_total | counter | Total number of virtual groups |
| taos_cluster_info_vnodes_alive | counter | Number of alive virtual nodes |
| taos_cluster_info_vnodes_total | counter | Total number of virtual nodes |
| taos_grants_info_expire_time | counter | Remaining time until cluster authorization expires (in seconds) |
| taos_grants_info_timeseries_total | counter | Total number of time series allowed for cluster authorization |
| taos_grants_info_timeseries_used | counter | Number of time series already owned by the cluster |
| Metric Name | Type | Meaning |
| ---------------------------------------- | ------- | --------------------------------------------- |
| taos_cluster_info_connections_total | counter | Total number of connections |
| taos_cluster_info_dbs_total | counter | Total number of databases |
| taos_cluster_info_dnodes_alive | counter | Number of alive dnodes |
| taos_cluster_info_dnodes_total | counter | Total number of dnodes |
| taos_cluster_info_first_ep | gauge | First endpoint, label value indicates endpoint value |
| taos_cluster_info_first_ep_dnode_id | counter | dnode ID of the first endpoint |
| taos_cluster_info_master_uptime | gauge | Master node uptime in days |
| taos_cluster_info_mnodes_alive | counter | Number of alive mnodes |
| taos_cluster_info_mnodes_total | counter | Total number of mnodes |
| taos_cluster_info_stbs_total | counter | Total number of supertables |
| taos_cluster_info_streams_total | counter | Total number of streams |
| taos_cluster_info_tbs_total | counter | Total number of tables |
| taos_cluster_info_topics_total | counter | Total number of topics |
| taos_cluster_info_version | gauge | Version information, label value indicates version number |
| taos_cluster_info_vgroups_alive | counter | Number of alive virtual groups |
| taos_cluster_info_vgroups_total | counter | Total number of virtual groups |
| taos_cluster_info_vnodes_alive | counter | Number of alive virtual nodes |
| taos_cluster_info_vnodes_total | counter | Total number of virtual nodes |
| taos_grants_info_expire_time | counter | Remaining time until cluster authorization expires (in seconds) |
| taos_grants_info_timeseries_total | counter | Total number of time-series allowed by cluster authorization |
| taos_grants_info_timeseries_used | counter | Number of time-series currently owned by the cluster |
#### dnode
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `cluster_id`: Cluster ID
- `cluster_id`: Cluster id
- `dnode_ep`: dnode endpoint
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
##### Related Metrics and Their Meaning
##### Relevant Metrics and Their Meanings
| Metric Name | Type | Meaning |
| ------------------------------------- | ------- | ---------------------------------------------------------------------------------------- |
| taos_d_info_status | gauge | dnode status; the label value indicates the status: ready (normal), offline (down), unknown (unknown). |
| taos_dnodes_info_cpu_cores | gauge | Number of CPU cores |
| taos_dnodes_info_cpu_engine | gauge | CPU percentage used by this dnode's process (range: 0-100) |
| taos_dnodes_info_cpu_system | gauge | CPU percentage used by the system on which this dnode resides (range: 0-100) |
| taos_dnodes_info_disk_engine | counter | Disk capacity used by this dnode's process (in Bytes) |
| taos_dnodes_info_disk_total | counter | Total disk capacity of the node where this dnode resides (in Bytes) |
| taos_dnodes_info_disk_used | counter | Disk capacity used by the node where this dnode resides (in Bytes) |
| taos_dnodes_info_has_mnode | counter | Whether there is an mnode |
| taos_dnodes_info_has_qnode | counter | Whether there is a qnode |
| taos_dnodes_info_has_snode | counter | Whether there is an snode |
| taos_dnodes_info_io_read | gauge | I/O read rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_io_read_disk | gauge | Disk I/O write rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_io_write | gauge | I/O write rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_io_write_disk | gauge | Disk I/O write rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_masters | counter | Number of master nodes |
| taos_dnodes_info_mem_engine | counter | Memory used by this dnode's process (in KB) |
| taos_dnodes_info_mem_system | counter | Memory used by the system where this dnode resides (in KB) |
| taos_dnodes_info_mem_total | counter | Total memory of the node where this dnode resides (in KB) |
| taos_dnodes_info_net_in | gauge | Network inbound rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_net_out | gauge | Network outbound rate of the node where this dnode resides (in Bytes/s) |
| taos_dnodes_info_uptime | gauge | Uptime of this dnode (in seconds) |
| taos_dnodes_info_vnodes_num | counter | Number of vnodes on the node where this dnode resides |
| Metric Name | Type | Meaning |
| ------------------------------ | ------- | ---------------------------------------------------------------------------------------- |
| taos_d_info_status | gauge | dnode status, the label value indicates the status, ready means normal, offline means offline, unknown means unknown. |
| taos_dnodes_info_cpu_cores | gauge | Number of CPU cores |
| taos_dnodes_info_cpu_engine | gauge | Percentage of CPU used by the process on this dnode (range 0~100) |
| taos_dnodes_info_cpu_system | gauge | Percentage of CPU used by the system on this dnode (range 0~100) |
| taos_dnodes_info_disk_engine | counter | Disk capacity used by the process on this dnode (unit Byte) |
| taos_dnodes_info_disk_total | counter | Total disk capacity of this dnode (unit Byte) |
| taos_dnodes_info_disk_used | counter | Disk capacity used on this dnode (unit Byte) |
| taos_dnodes_info_has_mnode | counter | Whether there is an mnode |
| taos_dnodes_info_has_qnode | counter | Whether there is a qnode |
| taos_dnodes_info_has_snode | counter | Whether there is an snode |
| taos_dnodes_info_io_read | gauge | IO read rate of this dnode (unit Byte/s) |
| taos_dnodes_info_io_read_disk | gauge | Disk IO read rate of this dnode (unit Byte/s) |
| taos_dnodes_info_io_write | gauge | IO write rate of this dnode (unit Byte/s) |
| taos_dnodes_info_io_write_disk | gauge | Disk IO write rate of this dnode (unit Byte/s) |
| taos_dnodes_info_masters | counter | Number of master nodes |
| taos_dnodes_info_mem_engine | counter | Memory used by the process on this dnode (unit KB) |
| taos_dnodes_info_mem_system | counter | Memory used by the system on this dnode (unit KB) |
| taos_dnodes_info_mem_total | counter | Total memory of this dnode (unit KB) |
| taos_dnodes_info_net_in | gauge | Network incoming rate of this dnode (unit Byte/s) |
| taos_dnodes_info_net_out | gauge | Network outgoing rate of this dnode (unit Byte/s) |
| taos_dnodes_info_uptime | gauge | Uptime of this dnode (unit seconds) |
| taos_dnodes_info_vnodes_num | counter | Number of vnodes on this dnode |
#### Data Directory
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `cluster_id`: Cluster ID
- `cluster_id`: Cluster id
- `dnode_ep`: dnode endpoint
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
- `data_dir_name`: Data directory name
- `data_dir_level`: Data directory level
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| ------------------------------------- | ----- | --------------------------- |
| taos_taosd_dnodes_data_dirs_avail | gauge | Available space (in Bytes) |
| taos_taosd_dnodes_data_dirs_total | gauge | Total space (in Bytes) |
| taos_taosd_dnodes_data_dirs_used | gauge | Used space (in Bytes) |
| Metric Name | Type | Meaning |
| ------------------------------------ | ----- | ------------------------ |
| taos_taosd_dnodes_data_dirs_avail | gauge | Available space (in Byte)|
| taos_taosd_dnodes_data_dirs_total | gauge | Total space (in Byte) |
| taos_taosd_dnodes_data_dirs_used | gauge | Used space (in Byte) |
#### Log Directory
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `cluster_id`: Cluster ID
- `cluster_id`: Cluster id
- `dnode_ep`: dnode endpoint
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
- `log_dir_name`: Log directory name
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| ------------------------------------- | ----- | --------------------------- |
| taos_taosd_dnodes_log_dirs_avail | gauge | Available space (in Bytes) |
| taos_taosd_dnodes_log_dirs_total | gauge | Total space (in Bytes) |
| taos_taosd_dnodes_log_dirs_used | gauge | Used space (in Bytes) |
| Metric Name | Type | Meaning |
| ----------------------------------- | ----- | ------------------------ |
| taos_taosd_dnodes_log_dirs_avail | gauge | Available space (in Byte)|
| taos_taosd_dnodes_log_dirs_total | gauge | Total space (in Byte) |
| taos_taosd_dnodes_log_dirs_used | gauge | Used space (in Byte) |
#### Log Count
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `cluster_id`: Cluster ID
- `cluster_id`: Cluster id
- `dnode_ep`: dnode endpoint
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| ------------------------------------- | ------- | ------------------------- |
| taos_log_summary_debug | counter | Debug log count |
| taos_log_summary_error | counter | Error log count |
| taos_log_summary_info | counter | Info log count |
| taos_log_summary_trace | counter | Trace log count |
| Metric Name | Type | Meaning |
| ------------------------- | ------- | ---------------- |
| taos_log_summary_debug | counter | Number of debug logs |
| taos_log_summary_error | counter | Number of error logs |
| taos_log_summary_info | counter | Number of info logs |
| taos_log_summary_trace | counter | Number of trace logs |
#### taosadapter
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `endpoint`: Endpoint
- `req_type`: Request type, where 0 indicates REST, and 1 indicates WebSocket
- `req_type`: Request type, 0 for rest, 1 for websocket
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| ------------------------------------------ | ------- | ----------------------------- |
| taos_adapter_requests_fail | counter | Number of failed requests |
| taos_adapter_requests_in_process | counter | Number of requests in process |
| taos_adapter_requests_other | counter | Number of other types of requests |
| taos_adapter_requests_other_fail | counter | Number of other types of failed requests |
| taos_adapter_requests_other_success | counter | Number of other types of successful requests |
| taos_adapter_requests_query | counter | Number of query requests |
| taos_adapter_requests_query_fail | counter | Number of query failed requests |
| taos_adapter_requests_query_in_process | counter | Number of queries in process |
| taos_adapter_requests_query_success | counter | Number of successful query requests |
| taos_adapter_requests_success | counter | Number of successful requests |
| taos_adapter_requests_total | counter | Total number of requests |
| taos_adapter_requests_write | counter | Number of write requests |
| taos_adapter_requests_write_fail | counter | Number of write failed requests |
| taos_adapter_requests_write_in_process | counter | Number of write requests in process |
| taos_adapter_requests_write_success | counter | Number of successful write requests |
| Metric Name | Type | Meaning |
| ----------------------------------------- | ------- | ------------------------ |
| taos_adapter_requests_fail | counter | Number of failed requests|
| taos_adapter_requests_in_process | counter | Number of requests in process|
| taos_adapter_requests_other | counter | Number of other type requests|
| taos_adapter_requests_other_fail | counter | Number of failed other type requests|
| taos_adapter_requests_other_success | counter | Number of successful other type requests|
| taos_adapter_requests_query | counter | Number of query requests |
| taos_adapter_requests_query_fail | counter | Number of failed query requests|
| taos_adapter_requests_query_in_process | counter | Number of queries in process|
| taos_adapter_requests_query_success | counter | Number of successful query requests|
| taos_adapter_requests_success | counter | Number of successful requests|
| taos_adapter_requests_total | counter | Total number of requests |
| taos_adapter_requests_write | counter | Number of write requests |
| taos_adapter_requests_write_fail | counter | Number of failed write requests|
| taos_adapter_requests_write_in_process | counter | Number of writes in process|
| taos_adapter_requests_write_success | counter | Number of successful write requests|
#### taoskeeper
##### Supported Labels for Monitoring Information
##### Supported Tags for Monitoring Information
- `identify`: Node endpoint
##### Related Metrics and Their Meaning
##### Related Metrics and Their Meanings
| Metric Name | Type | Meaning |
| --------------------------- | ----- | ----------------------------------- |
| taos_keeper_monitor_cpu | gauge | taoskeeper CPU usage (range: 0~1) |
| taos_keeper_monitor_mem | gauge | taoskeeper memory usage (range: 0~1) |
| Metric Name | Type | Meaning |
| -------------------------- | ----- | -------------------------------------------- |
| taos_keeper_monitor_cpu | gauge | taoskeeper CPU usage rate (range 0~1) |
| taos_keeper_monitor_mem | gauge | taoskeeper memory usage rate (range 0~1) |
#### Other taosd Cluster Monitoring Items
##### taos_m_info_role
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `mnode_ep`: mnode endpoint
- `mnode_id`: mnode ID
- `value`: Role value (the status of this mnode; range: offline, follower, candidate, leader, error, learner)
- `mnode_id`: mnode id
- `value`: Role value (the state of this mnode, range: offline, follower, candidate, leader, error, learner)
- **Type**: gauge
- **Meaning**: mnode role
##### taos_taos_sql_req_count
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `result`: Request result (range: Success, Failed)
- `sql_type`: SQL type (range: select, insert, inserted_rows, delete)
- `username`: Username
- **Type**: gauge
- **Meaning**: SQL request count
- **Meaning**: Number of SQL requests
##### taos_taosd_sql_req_count
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `dnode_ep`: dnode endpoint
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
- `result`: Request result (range: Success, Failed)
- `sql_type`: SQL type (range: select, insert, inserted_rows, delete)
- `username`: Username
- `vgroup_id`: Virtual group ID
- `vgroup_id`: Virtual group id
- **Type**: gauge
- **Meaning**: SQL request count
- **Meaning**: Number of SQL requests
##### taos_taosd_vgroups_info_status
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `database_name`: Database name
- `vgroup_id`: Virtual group ID
- `vgroup_id`: Virtual group id
- **Type**: gauge
- **Meaning**: Virtual group status. 0 means unsynced, indicating no leader has been elected; 1 means ready.
- **Meaning**: Virtual group status. 0 for unsynced, indicating no leader elected; 1 for ready.
##### taos_taosd_vgroups_info_tables_num
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `database_name`: Database name
- `vgroup_id`: Virtual group ID
- `vgroup_id`: Virtual group id
- **Type**: gauge
- **Meaning**: Number of tables in the virtual group.
- **Meaning**: Number of tables in the virtual group
##### taos_taosd_vnodes_info_role
- **Labels**:
- `cluster_id`: Cluster ID
- **Tags**:
- `cluster_id`: Cluster id
- `database_name`: Database name
- `dnode_id`: dnode ID
- `dnode_id`: dnode id
- `value`: Role value (range: offline, follower, candidate, leader, error, learner)
- `vgroup_id`: Virtual group ID
- `vgroup_id`: Virtual group id
- **Type**: gauge
- **Meaning**: Virtual node role
### Extraction Configuration
### Scrape Configuration
Prometheus provides the `scrape_configs` configuration to extract monitoring data from an endpoint. Typically, you only need to modify the targets configuration in `static_configs` to the endpoint address of taoskeeper. For more configuration information, please refer to the [Prometheus Configuration Documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
Prometheus provides `scrape_configs` to configure how to extract monitoring data from endpoints. Usually, you only need to modify the targets in `static_configs` to the endpoint address of taoskeeper. For more configuration information, please refer to [Prometheus Configuration Documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
```text
# A scrape configuration containing exactly one endpoint to scrape:
@ -527,21 +550,21 @@ scrape_configs:
### Dashboard
We provide the `TaosKeeper Prometheus Dashboard for 3.x`, which offers a monitoring dashboard similar to TDInsight.
We provide the `TaosKeeper Prometheus Dashboard for 3.x` dashboard, which offers a monitoring dashboard similar to TDinsight.
In the Grafana Dashboard menu, click `import`, enter the dashboard ID `18587`, and click the `Load` button to import the `TaosKeeper Prometheus Dashboard for 3.x`.
In the Grafana Dashboard menu, click `import`, fill in the dashboard ID `18587`, and click the `Load` button to import the `TaosKeeper Prometheus Dashboard for 3.x` dashboard.
## taosKeeper Monitoring Metrics
taosKeeper will also write its collected monitoring data to the monitoring database, with the default being the `log` database, which can be modified in the taoskeeper configuration file.
taosKeeper also writes the monitoring data it collects into a monitoring database, which by default is the `log` database. This can be modified in the taoskeeper configuration file.
### keeper_monitor Table
### keeper_monitor table
The `keeper_monitor` table records monitoring data from taoskeeper.
The `keeper_monitor` records taoskeeper monitoring data.
| field | type | is_tag | comment |
| :------- | :-------- | :----- | :-------------- |
| ts | TIMESTAMP | | timestamp |
| cpu | DOUBLE | | CPU usage rate |
| mem | DOUBLE | | Memory usage rate|
| identify | NCHAR | TAG | Identity information |
| field | type | is_tag | comment |
| :------- | :-------- | :------ | :----------- |
| ts | TIMESTAMP | | timestamp |
| cpu | DOUBLE | | CPU usage |
| mem | DOUBLE | | Memory usage |
| identify | NCHAR | tag | Identity information |

View File

@ -4,15 +4,15 @@ sidebar_label: taosExplorer
slug: /tdengine-reference/components/taosexplorer
---
taosExplorer is a web service that provides users with a visual management interface for TDengine instances. Although it is not open source, it is offered for free with the open-source installation package. This section primarily discusses its installation and deployment. Its features are based on an easy-to-use graphical interface that can be directly tried out. If needed, you can also refer to the advanced features and operations guide. To ensure the best experience when accessing taosExplorer, please use Chrome version 79 or above, or Edge version 79 or above.
taosExplorer is a web service that provides users with a visual management interaction tool for TDengine instances. Although it is not open source, it is provided for free with the open source version installation package. This section mainly discusses its installation and deployment. All its features are based on a simple and easy-to-use graphical interface, which you can try directly. If needed, you can also refer to the related content in the advanced features and operation and maintenance guide. To ensure the best experience when accessing taosExplorer, please use Chrome version 79 or above, or Edge version 79 or above.
## Installation
taosExplorer does not require separate installation. Starting from TDengine version 3.3.0.0, it is bundled with the TDengine installation package. After installation, you will see the `taos-explorer` service. If you compile the TDengine source code yourself following the steps in GitHub, the installation package will not include taosExplorer.
taosExplorer does not require separate installation. Starting from TDengine version 3.3.0.0, it is released together with the TDengine installation package. After installation, you can see the `taos-explorer` service. If you compile the TDengine source code according to the steps on GitHub, the installation package generated does not include taosExplorer.
## Configuration
Before starting taosExplorer, please ensure that the contents of the configuration file are correct.
Before starting taosExplorer, please make sure the content in the configuration file is correct.
```TOML
# This is an automatically generated configuration file for Explorer in [TOML](https://toml.io/) format.
@ -122,38 +122,38 @@ cors = true
# keepDays = 30
```
**Explanation:**
Description:
- `port`: The port that the taosExplorer service binds to.
- `addr`: The IPv4 address that the taosExplorer service binds to, default is `0.0.0.0`. If you need to modify it, configure it to an address other than `localhost` to provide external services.
- `ipv6`: The IPv6 address that the taosExplorer service binds to, default does not bind to IPv6.
- `instanceId`: The instance ID of the current explorer service. If multiple explorer instances are started on the same machine, ensure that their instance IDs are unique.
- `log_level`: The log level, optional values are "error", "warn", "info", "debug", "trace". This parameter is deprecated; please use `log.level` instead.
- `cluster`: The taosAdapter address for the TDengine cluster.
- `cluster_native`: The native connection address for the TDengine cluster, disabled by default.
- `x_api`: The gRPC address for taosX.
- `grpc`: The gRPC address for taosX agent to connect to taosX.
- `port`: The port to which the taosExplorer service is bound.
- `addr`: The IPv4 address to which the taosExplorer service is bound, default is `0.0.0.0`. To modify, configure it to an address other than `localhost` to provide external service.
- `ipv6`: The IPv6 address to which the taosExplorer service is bound, by default no IPv6 address is bound.
- `instanceId`: The instance ID of the current explorer service. If multiple explorer instances are started on the same machine, it is necessary to ensure that the instance IDs of each instance are unique.
- `log_level`: Log level, options are "error", "warn", "info", "debug", "trace". This parameter is deprecated, please use `log.level` instead.
- `cluster`: The taosAdapter address of the TDengine cluster.
- `cluster_native`: The native connection address of the TDengine cluster, off by default.
- `x_api`: The gRPC address of taosX.
- `grpc`: The gRPC address for taosX proxy to establish connection with taosX.
- `cors`: CORS configuration switch, default is `false`. When set to `true`, cross-origin access is allowed.
- `ssl.certificate`: SSL certificate (HTTPS service will be enabled if both certificate and certificate_key parameters are set).
- `ssl.certificate`: SSL certificate (if both certificate and certificate_key parameters are set, HTTPS service is enabled, otherwise it is not).
- `ssl.certificate_key`: SSL certificate key.
- `log.path`: The directory for storing log files.
- `log.level`: The log level, optional values are "error", "warn", "info", "debug", "trace".
- `log.compress`: Whether to compress archived log files.
- `log.rotationCount`: The maximum number of log files to retain in the directory, exceeding this number will delete old files.
- `log.rotationSize`: The size of log files that triggers rotation (in bytes). When the log file exceeds this size, a new file will be generated, and new logs will be written to the new file.
- `log.reservedDiskSize`: The threshold for stopping log writing when the remaining disk space reaches this size (in bytes).
- `log.keepDays`: The number of days that log files are retained. Old log files exceeding this duration will be deleted.
- `log.path`: The directory where log files are stored.
- `log.level`: Log level, options are "error", "warn", "info", "debug", "trace".
- `log.compress`: Whether to compress the log files after rolling.
- `log.rotationCount`: The maximum number of files to keep in the log file directory, older files exceeding this number are deleted.
- `log.rotationSize`: The file size that triggers log file rolling (in bytes), a new file is generated when the log file exceeds this size, and new logs are written to the new file.
- `log.reservedDiskSize`: The threshold of remaining disk space to stop writing logs (in bytes), logging stops when the disk space reaches this size.
- `log.keepDays`: The number of days to keep log files, older log files exceeding this number of days are deleted.
## Starting and Stopping
## Start and Stop
Next, start taosExplorer. You can directly execute `taos-explorer` in the command line or use the systemctl command:
Then start taosExplorer, you can directly execute taos-explorer in the command line or use the systemctl command:
```bash
systemctl start taos-explorer # Linux
sc.exe start taos-explorer # Windows
```
To stop it, use the following commands:
Correspondingly, use the following command to stop
```shell
systemctl stop taos-explorer # Linux
@ -162,11 +162,11 @@ sc.exe stop taos-explorer # Windows
## Troubleshooting
1. If you encounter an error message saying "Cannot access this website" when opening the Explorer site in the browser, log in to the machine where taosExplorer is located via the command line and check the service status using the command `systemctl status taos-explorer`. If the returned status is `inactive`, start the service using the command `systemctl start taos-explorer`.
2. To obtain detailed logs for taosExplorer, use the command `journalctl -u taos-explorer`.
3. When using Nginx or other tools for forwarding, pay attention to CORS settings or use `cors = true` in the configuration file.
1. When encountering the error message "This site can't be reached" while opening the Explorer site through a browser, log in to the machine where taosExplorer is located via command line, and use the command `systemctl status taos-explorer` to check the status of the service. If the returned status is `inactive`, use the command `systemctl start taos-explorer` to start the service.
2. If detailed logs of taosExplorer are needed, use the command `journalctl -u taos-explorer`.
3. When using Nginx or other tools for forwarding, pay attention to setting CORS or use `cors = true` in the configuration file.
Here is an example of Nginx configuration file CORS settings:
Here is an example of a CORS setting in an Nginx configuration file:
```conf
http {
@ -180,15 +180,16 @@ sc.exe stop taos-explorer # Windows
add_header 'Access-Control-Allow-Credentials' 'true';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
```nginx
add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';
add_header 'Access-Control-Max-Age' 86400;
add_header 'Content-Type' 'text/plain charset=UTF-8';
add_header 'Content-Length' 0;
return 204; break;
}
if ($request_method = 'POST') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Credentials' 'true';
@ -201,10 +202,10 @@ sc.exe stop taos-explorer # Windows
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';
}
proxy_set_header Host $host:$server_port;
proxy_set_header X-Real-IP $remote_addr;
#proxy_http_version 1.1;
proxy_read_timeout 60s;
proxy_next_upstream error http_502 http_500 non_idempotent;
@ -216,12 +217,12 @@ sc.exe stop taos-explorer # Windows
server 192.168.1.68:6060 ;
}
}
```
```
## Registration and Login
Once installed, open your browser and access the taos-explorer service by default at `http://ip:6060`. If you haven't registered yet, first go to the registration page. Enter your phone number to receive a verification code, and after entering the correct code, registration will be successful.
Once installed, open your browser and by default access `http://ip:6060` to visit the taos-explorer service. If you have not registered yet, first enter the registration screen. Enter your mobile number to get a verification code, and after entering the correct verification code, you can register successfully.
When logging in, use your database username and password. The default username for first-time use is `root`, and the password is `taosdata`. Once logged in successfully, you will enter the `Data Browser` page, where you can manage functionalities such as viewing databases, creating databases, and creating supertables/subtables.
When logging in, please use the database username and password. For first-time use, the default username is `root` and the password is `taosdata`. After a successful login, you will enter the `Data Browser` page, where you can use management functions such as viewing databases, creating databases, and creating supertables/subtables.
Other feature pages, such as `Data Insertion - Data Source`, are exclusive to the enterprise version. You can click to view and have a simple experience, but they cannot be used practically.
Other feature pages, such as `Data Writing - Data Source` and others, are exclusive to the enterprise edition. You can click to view and have a simple experience, but they cannot be actually used.

View File

@ -1,5 +1,5 @@
---
title: TDinsight Reference Manual
title: TDinsight Reference
sidebar_label: TDinsight
slug: /tdengine-reference/components/tdinsight
---
@ -15,59 +15,63 @@ import imgStep05 from '../../assets/tdinsight-05.png';
import imgStep06 from '../../assets/tdinsight-06.png';
import imgStep07 from '../../assets/tdinsight-07.png';
TDinsight is a solution for monitoring TDengine using Grafana.
TDinsight is a monitoring solution for TDengine using [Grafana].
TDengine periodically writes server information such as CPU, memory, disk space, bandwidth, request count, disk read/write speed, and slow queries into a specified database through taosKeeper. With Grafana and the TDengine data source plugin, TDinsight visualizes cluster status, node information, insert and query requests, and resource usage, providing developers with convenient real-time monitoring of TDengine cluster operation. This article will guide users on how to install the TDengine data source plugin and deploy the TDinsight visualization panel.
TDengine writes information such as server CPU, memory, disk space, bandwidth, request count, disk read/write speed, and slow queries into a specified database periodically through taosKeeper. By using Grafana and the TDengine data source plugin, TDinsight visualizes cluster status, node information, insert and query requests, and resource usage, providing developers with the convenience to monitor the operational status of the TDengine cluster in real-time. This document will guide users on how to install the TDengine data source plugin and deploy the TDinsight visualization dashboard.
## Prerequisites
First, check the following services:
- TDengine has been installed and is running correctly. This dashboard requires TDengine version 3.0.0.0 or above, with monitoring reporting configured. For detailed configuration, please refer to: [TDengine Monitoring Configuration](../taosd/#monitoring-related).
- taosAdapter has been installed and is running correctly. For specific details, please refer to: [taosAdapter Reference Manual](../taosadapter).
- taosKeeper has been installed and is running correctly. For specific details, please refer to: [taosKeeper Reference Manual](../taoskeeper).
- Grafana has been installed and is running correctly. We recommend using the latest version of Grafana, with TDInsight supporting Grafana version 7.5 or above.
- TDengine is installed and running normally. This dashboard requires TDengine 3.0.0.0 or above, and monitoring reporting configuration enabled. For specific configurations, please refer to: [TDengine Monitoring Configuration](../taosd/).
- taosAdapter is installed and running normally. For details, please refer to: [taosAdapter Reference Manual](../taosadapter)
- taosKeeper is installed and running normally. For details, please refer to: [taosKeeper Reference Manual](../taoskeeper)
- Grafana service is installed and running normally. We recommend using the latest version of Grafana, TDInsight supports Grafana 7.5 and above.
:::info
The following descriptions use Grafana v11.0.0 as an example; other versions may have differences. Please refer to the [Grafana Official Site](https://grafana.com/docs/grafana/latest/).
In the following description, we use Grafana v11.0.0 as an example. Other versions may differ in functionality, please refer to [Grafana Official Website](https://grafana.com/docs/grafana/latest/).
:::
Then, note the following information:
Then record the following information:
- The REST API address for the taosAdapter cluster, e.g., `http://localhost:6041`.
- Authentication information for the taosAdapter cluster, which can be a username and password.
- The name of the database where taosKeeper records monitoring metrics.
- taosAdapter cluster REST API address, such as: `http://localhost:6041`.
- taosAdapter cluster authentication information, which can use username and password.
- taosKeeper database name for recording monitoring metrics.
## Install the TDengine Data Source Plugin and Configure the Data Source
## Install TDengine Data Source Plugin and Configure Data Source
For steps on installing the Grafana TDengine data source plugin and configuring the data source, please refer to: [Integrating with Grafana](../../../third-party-tools/visualization/grafana/#install-grafana-plugin-and-configure-data-source).
For steps on installing the Grafana TDengine data source plugin and configuring the data source, please refer to: [Integration with Grafana](../../../third-party-tools/visualization/grafana/)
## Import the TDinsightV3 Dashboard
## Import TDinsightV3 Dashboard
On the TDengine data source configuration page, click the “Dashboards” tab, then click “import” to import the “TDengine for 3.x” dashboard. After successful import, you can access this dashboard and select the database set in taosKeeper for recording monitoring metrics in the “Log from” option at the top left to view the monitoring results.
In the TDengine data source configuration interface, click the "Dashboards" tab, then click "import" to import the "TDengine for 3.x" dashboard.
After successful import, you can access this dashboard. In the "Log from" option in the top left corner, select the database set in taosKeeper for recording monitoring metrics to view the monitoring results.
## TDinsightV3 Dashboard Details
The TDinsight dashboard is designed to provide usage and status information about TDengine-related resources, such as dnodes, mnodes, vnodes, and databases. It is mainly divided into cluster status, DNodes overview, MNodes overview, request statistics, database statistics, DNode resource usage, and taosAdapter monitoring information. Below, we will introduce each section in detail.
The TDinsight dashboard aims to provide information on the usage and status of TDengine-related resources, such as dnodes, mnodes, vnodes, and databases.
It mainly includes Cluster Status, DNodes Overview, MNode Overview, Requests, Databases, DNode Resource Usage, and taosAdapter Monitoring Information. Below, we will explain each in detail.
### Cluster Status
This section includes the current information and status of the cluster.
This section includes current information and status of the cluster.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
**Metrics Details (from top to bottom, left to right):**
Metric details (from top to bottom, left to right):
- **First EP**: The current `firstEp` setting in the TDengine cluster.
- **First EP**: The `firstEp` setting in the current TDengine cluster.
- **Version**: TDengine server version (master mnode).
- **Expire Time**: Expiration time for the enterprise edition.
- **Used Measuring Points**: Number of measuring points used in the enterprise edition.
- **Databases**: Number of databases.
- **Connections**: Current number of connections.
- **DNodes/MNodes/VGroups/VNodes**: Total number and alive count of each resource.
- **DNodes/MNodes/VGroups/VNodes Alive Percent**: Ratio of alive to total resources for each type; enables alert rules to trigger when the resource survival rate (average healthy resource ratio over one minute) is below 100%.
- **Measuring Points Used**: Usage of measuring points with alert rules enabled (no data for community edition, default is healthy).
- **Expire Time** - Expiration time for the enterprise edition.
- **Used Measuring Points** - Number of measuring points used in the enterprise edition.
- **Databases** - Number of databases.
- **Connections** - Current number of connections.
- **DNodes/MNodes/VGroups/VNodes**: Total and alive count of each resource.
- **DNodes/MNodes/VGroups/VNodes Alive Percent**: The ratio of alive/total for each resource, enable alert rules, and trigger when the resource survival rate (average healthy resource ratio within 1 minute) is less than 100%.
- **Measuring Points Used**: Number of measuring points used with alert rules enabled (no data for community edition, healthy by default).
### DNodes Overview
@ -77,23 +81,23 @@ This section includes basic information about the cluster's dnodes.
<Image img={imgStep02} alt=""/>
</figure>
**Metrics Details:**
Metric details:
- **DNodes Status**: A simple table view of `show dnodes`.
- **DNodes Number**: Changes in the number of DNodes.
### MNodes Overview
### MNode Overview
This section includes basic information about the cluster's mnodes.
This section includes basic information about the cluster's mnode.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
**Metrics Details:**
Metric details:
1. **MNodes Status**: A simple table view of `show mnodes`.
2. **MNodes Number**: Changes in the number of MNodes, similar to `DNodes Number`.
2. **MNodes Number**: Similar to `DNodes Number`, changes in the number of MNodes.
### Request Statistics
@ -103,12 +107,12 @@ This section includes statistical metrics for SQL execution in the cluster.
<Image img={imgStep04} alt=""/>
</figure>
**Metrics Details:**
Metric details:
1. **Select Request**: Count of select requests.
2. **Delete Request**: Count of delete requests.
3. **Insert Request**: Count of insert requests.
4. **Inserted Rows**: Actual number of inserted rows.
1. **Select Request**: Number of select requests.
2. **Delete Request**: Number of delete requests.
3. **Insert Request**: Number of insert requests.
4. **Inserted Rows**: Actual number of rows inserted.
5. **Slow Sql**: Number of slow queries, which can be filtered by duration at the top.
### Table Statistics
@ -119,80 +123,80 @@ This section includes statistical metrics for tables in the cluster.
<Image img={imgStep05} alt=""/>
</figure>
**Metrics Details:**
Metric details:
1. **STables**: Number of supertables.
2. **Total Tables**: Total number of all tables.
3. **Tables**: Time series graph showing the number of all basic tables over time.
4. **Tables Number For Each VGroups**: Number of tables contained in each VGroup.
2. **Total Tables**: Total number of tables.
3. **Tables**: Time variation graph of all basic tables.
4. **Tables Number Foreach VGroups**: Number of tables contained in each VGroup.
### DNode Resource Usage
This section presents the resource usage status of all data nodes in the cluster, with each data node displayed as a row.
This section includes a display of resource usage for all data nodes in the cluster, with each data node shown as a Row.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
**Metrics Details (from top to bottom, left to right):**
Metric details (from top to bottom, left to right):
1. **Uptime**: Time elapsed since the creation of the dnode.
2. **Has MNodes?**: Whether the current dnode is an mnode.
3. **CPU Cores**: Number of CPU cores.
4. **VNodes Number**: Current number of VNodes for the dnode.
4. **VNodes Number**: Number of VNodes on the current dnode.
5. **VNodes Masters**: Number of vnodes in the master role.
6. **Current CPU Usage of taosd**: CPU usage of the taosd process.
6. **Current CPU Usage of taosd**: CPU usage rate of the taosd process.
7. **Current Memory Usage of taosd**: Memory usage of the taosd process.
8. **Max Disk Used**: Maximum disk usage of all data directories for taosd.
9. **CPU Usage**: CPU usage metrics for both process and system.
10. **RAM Usage**: RAM usage time series view.
11. **Disk Used**: Disk usage for each level of multi-level storage (default is level0).
8. **Max Disk Used**: Maximum disk usage rate for all data directories of taosd.
9. **CPU Usage**: CPU usage rate of the process and system.
10. **RAM Usage**: Time-Series view of RAM usage metrics.
11. **Disk Used**: Disk used at each level under multi-level storage (default is level0).
12. **Disk IO**: Disk IO rate.
13. **Net IO**: Total network IO rate excluding local network.
13. **Net IO**: Network IO, total network IO rate excluding local network.
### taosAdapter Monitoring
This section includes detailed statistics of taosAdapter REST and WebSocket requests.
This section includes detailed statistics for taosAdapter rest and websocket requests.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
**Metrics Details:**
Metric details:
1. **Total**: Total number of requests.
2. **Successful**: Total number of successful requests.
3. **Failed**: Total number of failed requests.
4. **Queries**: Total number of queries.
5. **Writes**: Total number of writes.
6. **Other**: Total number of other requests.
1. **Total**: Total number of requests
2. **Successful**: Total number of successful requests
3. **Failed**: Total number of failed requests
4. **Queries**: Total number of queries
5. **Writes**: Total number of writes
6. **Other**: Total number of other requests
There are also detailed line charts for the above categories.
There are also line charts for the above categories.
## Upgrade
You can upgrade using one of the following three methods:
The following three methods can be used for upgrading:
- Through the graphical interface: If there is a new version available, click update on the “TDengine Datasource” plugin page.
- Follow the manual installation steps to install the new Grafana plugin and dashboard.
- Run the `TDinsight.sh` script again to upgrade to the latest Grafana plugin and TDinsight dashboard.
- Use the graphical interface, if there is a new version, you can click update on the "TDengine Datasource" plugin page to upgrade.
- Follow the manual installation steps to install the new Grafana plugin and Dashboard yourself.
- Upgrade to the latest Grafana plugin and TDinsight Dashboard by rerunning the `TDinsight.sh` script.
## Uninstall
## Uninstallation
For different installation methods, during uninstallation:
For different installation methods, when uninstalling:
- Through the graphical interface, click “Uninstall” on the “TDengine Datasource” plugin page.
- For TDinsight installed via `TDinsight.sh` script, you can use the command `TDinsight.sh -R` to clean up relevant resources.
- For manually installed TDinsight, to fully uninstall, you need to clean the following content:
- Use the graphical interface, click "Uninstall" on the "TDengine Datasource" plugin page.
- For TDinsight installed via the `TDinsight.sh` script, you can use the command line `TDinsight.sh -R` to clean up related resources.
- For manually installed TDinsight, to completely uninstall, you need to clean up the following:
1. TDinsight Dashboard in Grafana.
2. Data Source in Grafana.
3. Delete the `tdengine-datasource` plugin from the plugin installation directory.
## Appendix
### TDinsight.sh Detailed Explanation
### Detailed Description of TDinsight.sh
Here is a detailed explanation of the usage of TDinsight.sh:
Below is a detailed explanation of the usage of TDinsight.sh:
```text
Usage:
@ -222,38 +226,38 @@ Install and configure TDinsight dashboard in Grafana on Ubuntu 18.04/20.04 syste
-e, --tdinsight-editable If the provisioning dashboard could be editable. [default: false]
```
Most command line options can achieve the same effect through environment variables.
Most command line options can also be achieved through environment variables.
| Short Option | Long Option | Environment Variable | Description |
|--------------|-------------------------------|--------------------------------|---------------------------------------------------------------|
| -v | --plugin-version | TDENGINE_PLUGIN_VERSION | TDengine datasource plugin version, default uses latest. |
| -P | --grafana-provisioning-dir | GF_PROVISIONING_DIR | Grafana configuration directory, default is `/etc/grafana/provisioning/` |
| -G | --grafana-plugins-dir | GF_PLUGINS_DIR | Grafana plugins directory, default is `/var/lib/grafana/plugins`. |
| -O | --grafana-org-id | GF_ORG_ID | Grafana organization ID, default is 1. |
| -n | --tdengine-ds-name | TDENGINE_DS_NAME | TDengine datasource name, default is TDengine. |
| -a | --tdengine-api | TDENGINE_API | TDengine REST API endpoint, default is `http://127.0.0.1:6041`. |
| -u | --tdengine-user | TDENGINE_USER | TDengine username, [default: root] |
| -p | --tdengine-password | TDENGINE_PASSWORD | TDengine password, [default: taosdata] |
| -i | --tdinsight-uid | TDINSIGHT_DASHBOARD_UID | TDinsight dashboard `uid`, [default: tdinsight] |
| -t | --tdinsight-title | TDINSIGHT_DASHBOARD_TITLE | TDinsight dashboard title, [default: TDinsight] |
| -e | --tdinsight-editable | TDINSIGHT_DASHBOARD_EDITABLE | Whether the provisioning dashboard can be editable, [default: false] |
| Short Option | Long Option | Environment Variable | Description |
| ------------ | ------------------------------- | ------------------------------ | -------------------------------------------------------- |
| -v | --plugin-version | TDENGINE_PLUGIN_VERSION | TDengine datasource plugin version, default is latest. |
| -P | --grafana-provisioning-dir | GF_PROVISIONING_DIR | Grafana provisioning directory, default is `/etc/grafana/provisioning/` |
| -G | --grafana-plugins-dir | GF_PLUGINS_DIR | Grafana plugins directory, default is `/var/lib/grafana/plugins`. |
| -O | --grafana-org-id | GF_ORG_ID | Grafana organization ID, default is 1. |
| -n | --tdengine-ds-name | TDENGINE_DS_NAME | TDengine datasource name, default is TDengine. |
| -a | --tdengine-api | TDENGINE_API | TDengine REST API endpoint. Default is `http://127.0.0.1:6041`. |
| -u | --tdengine-user | TDENGINE_USER | TDengine user name. [default: root] |
| -p | --tdengine-password | TDENGINE_PASSWORD | TDengine password. [default: taosdata] |
| -i | --tdinsight-uid | TDINSIGHT_DASHBOARD_UID | TDinsight dashboard `uid`. [default: tdinsight] |
| -t | --tdinsight-title | TDINSIGHT_DASHBOARD_TITLE | TDinsight dashboard title. [default: TDinsight] |
| -e | --tdinsight-editable | TDINSIGHT_DASHBOARD_EDITABLE | If the provisioning dashboard could be editable. [default: false] |
:::note
The new version of the plugin uses the Grafana unified alerting feature, and the `-E` option is no longer supported.
The new version of the plugin uses the Grafana unified alerting feature, the `-E` option is no longer supported.
:::
Assuming you start the TDengine database on the host `tdengine`, with the HTTP API port at `6041`, user as `root1`, and password as `pass5ord`. Execute the script:
Assuming you start the TDengine database on the host `tdengine` with HTTP API port `6041`, user `root1`, and password `pass5ord`. Execute the script:
```bash
./TDinsight.sh -a http://tdengine:6041 -u root1 -p pass5ord
```
If you need to monitor multiple TDengine clusters, you will need to set up separate TDinsight dashboards for each. Non-default TDinsight configurations will require some changes: the `-n`, `-i`, and `-t` options must be changed to non-default names. If using the built-in SMS alert feature, `-N` and `-L` should also be modified.
If you want to monitor multiple TDengine clusters, you need to set up multiple TDinsight dashboards. Setting up a non-default TDinsight requires some changes: the `-n` `-i` `-t` options need to be changed to non-default names, and if using the built-in SMS alert feature, `-N` and `-L` should also be changed.
```bash
sudo ./TDengine.sh -n TDengine-Env1 -a http://another:6041 -u root -p taosdata -i tdinsight-env1 -t 'TDinsight Env1'
```
Please note that the data source configuration, notification channels, and dashboards in the frontend cannot be changed. You should either run this script again to update the configuration or manually change the configuration files in the `/etc/grafana/provisioning` directory (this is the default directory for Grafana; change using the `-P` option if needed).
Please note that configuring the data source, notification channel, and dashboard in the frontend is not changeable. You should update the configuration again through this script or manually change the configuration files in the `/etc/grafana/provisioning` directory (this is the default directory for Grafana, change as needed using the `-P` option).
Specifically, when using Grafana Cloud or other organizations, `-O` can be used to set the organization ID. The `-G` option can specify the Grafana plugin installation directory. The `-e` parameter sets the dashboard as editable.
Especially, when using Grafana Cloud or other organizations, `-O` can be used to set the organization ID. `-G` can specify the Grafana plugin installation directory. `-e` parameter sets the dashboard to be editable.

View File

@ -1,10 +1,9 @@
---
title: Components
description: TDengine Product Components Reference Manual
slug: /tdengine-reference/components
---
This section provides a detailed description of the main product components in TDengine, including their functionalities, command-line parameters, configuration options, and more.
This section provides detailed explanations of the functions, command-line parameters, configuration parameters, etc., of the main product components in TDengine.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -4,9 +4,9 @@ sidebar_label: TDengine CLI
slug: /tdengine-reference/tools/tdengine-cli
---
The TDengine Command-Line Interface (hereinafter referred to as TDengine CLI) is the simplest and most commonly used way for users to operate and interact with TDengine instances. Before use, you need to install the TDengine Server or TDengine Client package.
The TDengine command line program (hereinafter referred to as TDengine CLI) is the simplest and most commonly used tool for users to operate and interact with TDengine instances. It requires the installation of either the TDengine Server package or the TDengine Client package before use.
## Starting
## Startup
To enter the TDengine CLI, simply execute `taos` in the terminal.
@ -14,19 +14,24 @@ To enter the TDengine CLI, simply execute `taos` in the terminal.
taos
```
If the connection to the service is successful, a welcome message and version information will be printed. If it fails, an error message will be displayed.
If the connection to the service is successful, a welcome message and version information will be printed. If it fails, an error message will be printed.
The prompt for TDengine CLI is as follows:
The TDengine CLI prompt is as follows:
```shell
taos>
```
Once inside the TDengine CLI, you can execute various SQL statements, including inserts, queries, and various management commands.
After entering the TDengine CLI, you can execute various SQL statements, including insertions, queries, and various management commands.
To exit the TDengine CLI, execute `q`, `quit`, or `exit` and press enter.
## Executing SQL Scripts
```shell
taos> quit
```
You can run multiple SQL commands from a script file in the TDengine CLI using the `source` command.
## Execute SQL Scripts
In the TDengine CLI, you can run multiple SQL commands from a script file using the `source` command.
```sql
taos> source <filename>;
@ -34,45 +39,45 @@ taos> source <filename>;
## Online Modification of Display Character Width
You can adjust the character display width in the TDengine CLI using the following command:
You can adjust the display character width in the TDengine CLI using the following command:
```sql
taos> SET MAX_BINARY_DISPLAY_WIDTH <nn>;
```
If the displayed content ends with ..., it indicates that the content has been truncated. You can modify the display character width with this command to show the full content.
If the displayed content ends with ..., it indicates that the content has been truncated. You can modify the display character width with this command to display the full content.
## Command-Line Parameters
## Command Line Parameters
You can configure command-line parameters to change the behavior of the TDengine CLI. Here are some commonly used command-line parameters:
You can change the behavior of the TDengine CLI by configuring command line parameters. Below are some commonly used command line parameters:
- -h HOST: The FQDN of the TDengine server to connect to. Defaults to connecting to the local service.
- -P PORT: Specify the port number used by the server.
- -u USER: The username to use when connecting.
- -p PASSWORD: The password to use when connecting to the server.
- -?, --help: Print all command-line parameters.
- -h HOST: The FQDN of the server where the TDengine service is located, default is to connect to the local service
- -P PORT: Specifies the port number used by the server
- -u USER: Username to use when connecting
- -p PASSWORD: Password to use when connecting to the server
- -?, --help: Prints out all command line parameters
Other parameters include:
There are many other parameters:
- -a AUTHSTR: Authorization information for connecting to the server.
- -A: Calculate authorization information using username and password.
- -B: Set the BI tool display mode; all output will follow the BI tool's format.
- -c CONFIGDIR: Specify the configuration file directory, defaulting to `/etc/taos` in Linux, where the default configuration file is `taos.cfg`.
- -C: Print the configuration parameters in the `taos.cfg` specified by -c.
- -d DATABASE: Specify the database to use when connecting to the server.
- -E dsn: Use WebSocket DSN to connect to cloud services or servers providing WebSocket connections.
- -f FILE: Execute SQL script files in non-interactive mode. Each SQL statement must occupy one line.
- -k: Test the server's running status; 0: unavailable, 1: network ok, 2: service ok, 3: service degraded, 4: exiting.
- -l PKTLEN: Packet size used for network testing.
- -n NETROLE: The scope of network connection testing, defaulting to `client`, with optional values `client` and `server`.
- -N PKTNUM: The number of packets used in network testing.
- -r: Output time as an unsigned 64-bit integer type (i.e., uint64_t in C).
- -R: Connect to the server using RESTful mode.
- -s COMMAND: Execute SQL commands in non-interactive mode.
- -t: Test the server's startup status, with the same statuses as -k.
- -w DISPLAYWIDTH: Client column display width.
- -z TIMEZONE: Specify the timezone, defaulting to the local timezone.
- -V: Print the current version number.
- -a AUTHSTR: Authorization information for connecting to the server
- -A: Calculate authorization information using username and password
- -B: Set BI tool display mode, after setting, all outputs follow the format of BI tools
- -c CONFIGDIR: Specify the configuration file directory, default in Linux environment is `/etc/taos`, the default name of the configuration file in this directory is `taos.cfg`
- -C: Print the configuration parameters of `taos.cfg` in the directory specified by -c
- -d DATABASE: Specifies the database to use when connecting to the server
- -E dsn: Connect to cloud services or servers providing WebSocket connections using WebSocket DSN
- -f FILE: Execute SQL script file in non-interactive mode. Each SQL statement in the file must occupy one line
- -k: Test the running status of the server, 0: unavailable, 1: network ok, 2: service ok, 3: service degraded, 4: exiting
- -l PKTLEN: Test packet size used during network testing
- -n NETROLE: Test range during network connection testing, default is `client`, options are `client`, `server`
- -N PKTNUM: Number of test packets used during network testing
- -r: Convert time columns to unsigned 64-bit integer type output (i.e., uint64_t in C language)
- -R: Connect to the server using RESTful mode
- -s COMMAND: SQL command executed in non-interactive mode
- -t: Test the startup status of the server, status same as -k
- -w DISPLAYWIDTH: Client column display width
- -z TIMEZONE: Specifies the timezone, default is the local timezone
- -V: Print the current version number
Example:
@ -82,24 +87,31 @@ taos -h h1.taos.com -s "use db; show tables;"
## Configuration File
You can also control the behavior of the TDengine CLI through parameters set in the configuration file. For available configuration parameters, refer to [Client Configuration](../../components/taosc).
You can also control the behavior of the TDengine CLI through parameters set in the configuration file. For available configuration parameters, please refer to [Client Configuration](../../components/taosc)
## Error Codes Reference
After version 3.3.4.8 of TDengine, the TDengine CLI returned error codes in the error message. Users can search for the specific cause and solution on the error code page of the TDengine official website, see [Error Codes Table](https://docs.taosdata.com/reference/error-code)
## Error Code Table
Starting from TDengine version 3.3.4.8, TDengine CLI returns specific error codes in error messages. Users can visit the TDengine official website's error code page to find specific reasons and solutions, see: [Error Code Reference](../../error-codes/)
## TDengine CLI TAB Key Completion
- Pressing the TAB key when no command is present will list all commands supported by TDengine CLI
- Pressing the TAB key when preceded by a space will display the first of all possible command words at this position, pressing TAB again will switch to the next one
- If a string precedes the TAB key, it will search for all command words that match the prefix of this string and display the first one, pressing TAB again will switch to the next one
- Entering a backslash `\` + TAB key, will automatically complete to the column display mode command word `\G;`
## TDengine CLI Tips
- Use the up and down arrow keys to view previously entered commands.
- Use the `alter user` command in the TDengine CLI to change user passwords, with the default password being `taosdata`.
- Press Ctrl+C to abort an ongoing query.
- Execute `RESET QUERY CACHE` to clear the local table schema cache.
- Batch execute SQL statements. You can store a series of TDengine CLI commands (ending with a semicolon, with each SQL statement on a new line) in a file, and execute the command `source <file-name>` in the TDengine CLI to automatically run all SQL statements in that file.
- Type `q`, `quit`, or `exit` and press Enter to exit the TDengine CLI.
- You can use the up and down arrow keys to view previously entered commands
- In TDengine CLI, use the `alter user` command to change user passwords, the default password is `taosdata`
- Ctrl+C to stop a query that is in progress
- Execute `RESET QUERY CACHE` to clear the cache of the local table Schema
- Batch execute SQL statements. You can store a series of TDengine CLI commands (ending with a semicolon `;`, each SQL statement on a new line) in a file, and execute the command `source <file-name>` in TDengine CLI to automatically execute all SQL statements in that file
## Exporting Query Results to Files with TDengine CLI
## TDengine CLI Export Query Results to a File
- You can use the symbol “>>” to export query results to a specific file. The syntax is: SQL query statement >> output file name; if you don't specify a path, the output will be directed to the current directory. For example, `select * from d0 >> /root/d0.csv;` will output the query results to `/root/d0.csv`.
- You can use the symbol “>>” to export query results to a file, the syntax is: sql query statement >> 'output file name'; If no path is written for the output file, it will be output to the current directory. For example, select * from d0 >> '/root/d0.csv'; will output the query results to /root/d0.csv.
## Importing Data from Files into Tables with TDengine CLI
## TDengine CLI Import Data from a File into a Table
- You can use `insert into table_name file 'input file name'` to import the previously exported data file back into the specified table. For example, `insert into d0 file '/root/d0.csv';` indicates that all previously exported data will be imported into the `d0` table.
- You can use insert into table_name file 'input file name', to import the data file exported in the previous step back into the specified table. For example, insert into d0 file '/root/d0.csv'; means to import all the data exported above back into the d0 table.

View File

@ -4,53 +4,54 @@ sidebar_label: taosdump
slug: /tdengine-reference/tools/taosdump
---
`taosdump` is a tool that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster.
taosdump is a tool application that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster.
`taosdump` can back up data at the logical data unit level, such as databases, supertables, or basic tables, and can also back up data records within specified time periods from databases, supertables, and basic tables. You can specify the directory path for data backups; if no location is specified, `taosdump` will default to backing up data to the current directory.
taosdump can back up data using databases, supertables, or basic tables as logical data units, and can also back up data records within a specified time period from databases, supertables, and basic tables. You can specify the directory path for data backup; if not specified, taosdump defaults to backing up data to the current directory.
If the specified location already contains data files, `taosdump` will prompt the user and exit immediately to prevent data overwriting. This means that the same path can only be used for one backup. Please proceed with caution if you see relevant prompts.
If the specified location already has data files, taosdump will prompt the user and exit immediately to avoid data being overwritten. This means the same path can only be used for one backup.
If you see related prompts, please operate carefully.
`taosdump` is a logical backup tool and should not be used to back up any raw data, environmental settings, hardware information, server configurations, or the topology of the cluster. `taosdump` uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
taosdump is a logical backup tool, it should not be used to back up any raw data, environment settings, hardware information, server configuration, or cluster topology. taosdump uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
## Installation
There are two installation methods for `taosdump`:
There are two ways to install taosdump:
- Install TDengine. taosdump is included in the installation package.
- Compile `taos-tools` separately and install it. For details, please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository.
- Install the official taosTools package, please find taosTools on the [release history page](../../../release-history/taostools/) and download it for installation.
- Compile taos-tools separately and install, please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository for details.
## Common Use Cases
### Backing Up Data with taosdump
### taosdump Backup Data
1. Back up all databases: Use the `-A` or `--all-databases` parameter.
2. Back up multiple specified databases: Use the `-D db1,db2,...` parameter.
3. Back up certain supertables or basic tables in a specified database: Use the `dbname stbname1 stbname2 tbname1 tbname2 ...` parameter. Note that the first parameter in this input sequence must be the database name, and it only supports one database. The second and subsequent parameters are the names of the supertables or basic tables in that database, separated by spaces.
4. Back up the system `log` database: The TDengine cluster usually contains a system database named `log`, which holds data for the self-operation of TDengine. By default, `taosdump` will not back up the `log` database. If there is a specific requirement to back up the `log` database, use the `-a` or `--allow-sys` command-line parameter.
5. "Loose" mode backups: Starting from version 1.4.1, `taosdump` provides the `-n` and `-L` parameters for backing up data without using escape characters and "loose" mode, which can reduce backup time and storage space for backup data when table names, column names, and tag names do not use escape characters. If you are unsure whether the conditions for using `-n` and `-L` are met, please use the default parameters for "strict" mode backups. For details about escape characters, refer to the [official documentation](../../sql-manual/escape-characters/).
1. Backup all databases: specify the `-A` or `--all-databases` parameter;
2. Backup multiple specified databases: use the `-D db1,db2,...` parameter;
3. Backup certain supertables or basic tables in a specified database: use the `dbname stbname1 stbname2 tbname1 tbname2 ...` parameter, note that this input sequence starts with the database name, supports only one database, and the second and subsequent parameters are the names of the supertables or basic tables in that database, separated by spaces;
4. Backup the system log database: TDengine clusters usually include a system database named `log`, which contains data for TDengine's own operation, taosdump does not back up the log database by default. If there is a specific need to back up the log database, you can use the `-a` or `--allow-sys` command line parameter.
5. "Tolerant" mode backup: Versions after taosdump 1.4.1 provide the `-n` and `-L` parameters, used for backing up data without using escape characters and in "tolerant" mode, which can reduce backup data time and space occupied when table names, column names, and label names do not use escape characters. If unsure whether to use `-n` and `-L`, use the default parameters for "strict" mode backup. For an explanation of escape characters, please refer to the [official documentation](../../sql-manual/escape-characters/).
:::tip
- After version 1.4.1, `taosdump` provides the `-I` parameter for parsing the AVRO file schema and data. If you specify the `-s` parameter, it will only parse the schema.
- After version 1.4.2, backups use the batch size specified by the `-B` parameter, with a default value of 16384. If you encounter "Error actual dump .. batch .." due to network speed or disk performance issues, you can try adjusting it to a smaller value using the `-B` parameter.
- The export process of `taosdump` does not support interrupted recovery. Therefore, if the process unexpectedly terminates, the correct action is to delete all related files that have been exported or generated.
- The import process of `taosdump` supports interrupted recovery; however, when the process is restarted, you may receive some "table already exists" prompts, which can be ignored.
- Versions after taosdump 1.4.1 provide the `-I` parameter, used for parsing avro file schema and data, specifying the `-s` parameter will only parse the schema.
- Backups after taosdump 1.4.2 use the `-B` parameter to specify the number of batches, the default value is 16384. If "Error actual dump .. batch .." occurs due to insufficient network speed or disk performance in some environments, you can try adjusting the `-B` parameter to a smaller value.
- taosdump's export does not support interruption recovery, so the correct way to handle an unexpected termination of the process is to delete all related files that have been exported or generated.
- taosdump's import supports interruption recovery, but when the process restarts, you may receive some "table already exists" prompts, which can be ignored.
:::
### Restoring Data with taosdump
### taosdump Restore Data
To restore data files from a specified path, use the `-i` parameter followed by the path to the data file. As previously mentioned, do not use the same directory to back up different data sets, nor should you back up the same data set multiple times in the same path, as this could lead to overwriting or multiple backups of the data.
Restore data files from a specified path: use the `-i` parameter along with the data file path. As mentioned earlier, the same directory should not be used to back up different data sets, nor should the same path be used to back up the same data set multiple times, otherwise, the backup data will cause overwriting or multiple backups.
:::tip
`taosdump` uses the TDengine statement binding API for writing restored data. To improve restoration performance, a batch size of 16384 is currently used for each write operation. If the backup data contains many columns, you might encounter the "WAL size exceeds limit" error. In this case, try using the `-B` parameter to adjust it to a smaller value.
taosdump internally uses the TDengine stmt binding API to write restored data, currently using 16384 as a batch for writing. If there are many columns in the backup data, it may cause a "WAL size exceeds limit" error, in which case you can try adjusting the `-B` parameter to a smaller value.
:::
## Detailed Command-Line Parameters List
## Detailed Command Line Parameters List
Below is the detailed list of command-line parameters for `taosdump`:
Below is the detailed command line parameters list for taosdump:
```text
Usage: taosdump [OPTION...] dbname [tbname ...]
@ -73,7 +74,7 @@ Usage: taosdump [OPTION...] dbname [tbname ...]
-a, --allow-sys Allow to dump system database
-A, --all-databases Dump all databases.
-D, --databases=DATABASES Dump inputted databases. Use comma to separate
databases' names.
databases' name.
-e, --escape-character Use escaped character for database name
-N, --without-property Dump database without its properties.
-s, --schemaonly Only dump tables' schema.
@ -96,16 +97,16 @@ Usage: taosdump [OPTION...] dbname [tbname ...]
restore, please adjust the value to a smaller one
and try. The workable value is related to the
length of the row and type of table schema.
-I, --inspect Inspect avro file content and print on screen
-I, --inspect inspect avro file content and print on screen
-L, --loose-mode Using loose mode if the table name and column name
use letter and number only. Default is NOT.
-n, --no-escape No escape char '`'. Default is using it.
-Q, --dot-replace Replace dot character with underline character in
the table name.(Version 2.5.3)
-T, --thread-num=THREAD_NUM Number of threads for dump in file. Default is
-T, --thread-num=THREAD_NUM Number of thread for dump in file. Default is
8.
-C, --cloud=CLOUD_DSN Specify a DSN to access TDengine cloud service
-R, --restful Use RESTful interface to connect to TDengine
-C, --cloud=CLOUD_DSN specify a DSN to access TDengine cloud service
-R, --restful Use RESTful interface to connect TDengine
-t, --timeout=SECONDS The timeout seconds for websocket to interact.
-g, --debug Print debug info.
-?, --help Give this help list

View File

@ -4,56 +4,59 @@ sidebar_label: taosBenchmark
slug: /tdengine-reference/tools/taosbenchmark
---
`taosBenchmark` (formerly known as `taosdemo`) is a tool used for testing the performance of TDengine products. `taosBenchmark` can test the performance of TDengine's insert, query, and subscription functions. It can simulate a large amount of data generated by numerous devices and allows flexible control over the number and types of databases, supertables, tag columns, data columns, subtables, the amount of data per subtable, the time interval for inserting data, the number of worker threads in `taosBenchmark`, and whether to insert out-of-order data, among other parameters. To accommodate past user habits, the installation package provides `taosdemo` as a symlink to `taosBenchmark`.
taosBenchmark (formerly known as taosdemo) is a tool for testing the performance of the TDengine product. taosBenchmark can test the performance of TDengine's insert, query, and subscription functions. It can simulate massive data generated by a large number of devices and flexibly control the number of databases, supertables, types and number of tag columns, types and number of data columns, number of subtables, data volume per subtable, data insertion interval, number of working threads in taosBenchmark, whether and how to insert out-of-order data, etc. To accommodate the usage habits of past users, the installation package provides taosdemo as a soft link to taosBenchmark.
## Installation
`taosBenchmark` can be installed in two ways:
There are two ways to install taosBenchmark:
- It is automatically installed when you install the official TDengine installation package. For details, please refer to [TDengine Installation](../../../get-started/).
- You can compile `taos-tools` separately and install it. For more information, please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository.
- taosBenchmark is automatically installed with the official TDengine installation package, for details please refer to [TDengine Installation](../../../get-started/).
## Running
- Compile and install taos-tools separately, for details please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository.
### Configuration and Running Modes
## Operation
`taosBenchmark` needs to be executed from the terminal of the operating system. This tool supports two configuration methods: [command-line parameters](#command-line-parameter-details) and [JSON configuration files](#configuration-file-parameter-details). These two methods are mutually exclusive; when using a configuration file, only the command-line parameter `-f <json file>` can be specified. When running `taosBenchmark` using command-line parameters to control its behavior, the `-f` parameter must not be used, and other parameters should be used for configuration. Additionally, `taosBenchmark` offers a special running mode that requires no parameters.
### Configuration and Operation Methods
`taosBenchmark` supports comprehensive performance testing for TDengine, which can be categorized into three main functions: writing, querying, and subscribing. These functions are mutually exclusive, meaning that `taosBenchmark` can only test one at a time. Notably, when using command-line configuration methods, the type of function to be tested cannot be configured, and command-line configurations can only test write performance. To test TDengine's query and subscription performance, you must use the configuration file method and specify the function type in the configuration file with the parameter `filetype`.
taosBenchmark needs to be executed in the operating system's terminal, and this tool supports two configuration methods: Command Line Arguments and JSON Configuration File. These two methods are mutually exclusive; when using a configuration file, only one command line argument `-f <json file>` can be used to specify the configuration file. When using command line arguments to run taosBenchmark and control its behavior, the `-f` parameter cannot be used; instead, other parameters must be used for configuration. In addition, taosBenchmark also offers a special mode of operation, which is running without any parameters.
**Before running `taosBenchmark`, ensure that the TDengine cluster is running correctly.**
taosBenchmark supports comprehensive performance testing for TDengine, and the TDengine features it supports are divided into three categories: writing, querying, and subscribing. These three functions are mutually exclusive, and each run of taosBenchmark can only select one of them. It is important to note that the type of function to be tested is not configurable when using the command line configuration method; the command line configuration method can only test writing performance. To test TDengine's query and subscription performance, you must use the configuration file method and specify the type of function to be tested through the `filetype` parameter in the configuration file.
### Running Without Command-Line Parameters
**Ensure that the TDengine cluster is running correctly before running taosBenchmark.**
You can quickly experience the write performance test of `taosBenchmark` against TDengine using default configurations by executing the following command:
### Running Without Command Line Arguments
Execute the following command to quickly experience taosBenchmark performing a write performance test on TDengine based on the default configuration.
```bash
taosBenchmark
```
In the no-parameter run, `taosBenchmark` will by default connect to the TDengine cluster specified in `/etc/taos`, create a database named `test` in TDengine, create a supertable named `meters` under the `test` database, and write 10,000 records into 10,000 tables. Note that if the `test` database already exists, this command will first delete it and then create a new `test` database.
When running without parameters, taosBenchmark by default connects to the TDengine cluster specified under `/etc/taos`, and creates a database named `test` in TDengine, under which a supertable named `meters` is created, and 10,000 tables are created under the supertable, each table having 10,000 records inserted. Note that if a `test` database already exists, this command will delete the existing database and create a new `test` database.
### Running with Command-Line Configuration Parameters
### Running Using Command Line Configuration Parameters
When running `taosBenchmark` using command-line parameters to control its behavior, the `-f <json file>` parameter cannot be used. All configuration parameters must be specified via the command line. Below is an example of using command-line arguments to test `taosBenchmark` write performance:
When running taosBenchmark using command line parameters and controlling its behavior, the `-f <json file>` parameter cannot be used. All configuration parameters must be specified through the command line. Below is an example of using command line mode to test the write performance of taosBenchmark.
```bash
taosBenchmark -I stmt -n 200 -t 100
```
The above command will create a database named `test`, establish a supertable named `meters`, and insert 200 records into each of the 100 subtables using parameter binding.
The above command `taosBenchmark` will create a database named `test`, establish a supertable `meters` within it, create 100 subtables in the supertable, and insert 200 records into each subtable using parameter binding.
### Running with a Configuration File
### Running Using a Configuration File
The installation package for `taosBenchmark` includes example configuration files located in `<install_directory>/examples/taosbenchmark-json`. You can run `taosBenchmark` and control its behavior via the configuration file with the following command:
The taosBenchmark installation package includes examples of configuration files, located in `<install_directory>/examples/taosbenchmark-json`
Use the following command line to run taosBenchmark and control its behavior through a configuration file.
```bash
taosBenchmark -f <json file>
```
**Here are several examples of configuration files:**
**Below are a few examples of configuration files:**
#### Insert Scenario JSON Configuration File Example
#### JSON Configuration File Example for Insertion Scenario
<details>
<summary>insert.json</summary>
@ -64,7 +67,7 @@ taosBenchmark -f <json file>
</details>
#### Query Scenario JSON Configuration File Example
#### Example JSON Configuration File for Query Scenario
<details>
<summary>query.json</summary>
@ -75,7 +78,7 @@ taosBenchmark -f <json file>
</details>
#### Subscription Scenario JSON Configuration File Example
#### Example JSON Configuration File for Subscription Scenario
<details>
<summary>tmq.json</summary>
@ -86,221 +89,396 @@ taosBenchmark -f <json file>
</details>
## Command-Line Parameter Details
## Detailed Explanation of Command Line Parameters
- **-f/--file \<json file>** :
The JSON configuration file to use, which specifies all parameters. This parameter cannot be used simultaneously with other command-line parameters. No default value.
The JSON configuration file to use, specifying all parameters. This parameter cannot be used simultaneously with other command line parameters. There is no default value.
- **-c/--config-dir \<dir>** :
The directory where the TDengine cluster configuration file is located, with a default path of `/etc/taos`.
The directory where the TDengine cluster configuration files are located, default path is /etc/taos.
- **-h/--host \<host>** :
Specifies the FQDN of the TDengine server to connect to, with a default value of `localhost`.
Specifies the FQDN of the TDengine server to connect to, default value is localhost.
- **-P/--port \<port>** :
The port number of the TDengine server to connect to, with a default value of `6030`.
The port number of the TDengine server to connect to, default value is 6030.
- **-I/--interface \<insertMode>** :
The insert mode. Optional values are `taosc`, `rest`, `stmt`, `sml`, and `sml-rest`, corresponding to regular insert, RESTful interface insert, parameter binding interface insert, schemaless interface insert, and RESTful schemaless interface insert (provided by taosAdapter). The default value is `taosc`.
Insert mode, options include taosc, rest, stmt, sml, sml-rest, corresponding to normal writing, restful interface writing, parameter binding interface writing, schemaless interface writing, restful schemaless interface writing (provided by taosAdapter). Default value is taosc.
- **-u/--user \<user>** :
The username used to connect to the TDengine server, with a default of `root`.
Username for connecting to the TDengine server, default is root.
- **-U/--supplement-insert** :
Write data without first creating the database and tables; this is off by default.
Insert data without pre-creating database and tables, default is off.
- **-p/--password \<passwd>** :
The password used to connect to the TDengine server, with a default value of `taosdata`.
Password for connecting to the TDengine server, default value is taosdata.
- **-o/--output \<file>** :
The output file path for results, with a default value of `./output.txt`.
Path of the output file for results, default value is ./output.txt.
- **-T/--thread \<threadNum>** :
The number of threads for inserting data, with a default of `8`.
Number of threads for inserting data, default is 8.
- **-B/--interlace-rows \<rowNum>** :
Enables interleaved insert mode and specifies the number of rows to insert into each subtable at a time. In interleaved insert mode, rows will be inserted into each subtable sequentially as specified until all data for all subtables has been inserted. The default value is `0`, meaning that data insertion will proceed to the next subtable only after completing data insertion into one subtable.
Enables interlaced insertion mode and specifies the number of rows to insert into each subtable at a time. Interlaced insertion mode means inserting the specified number of rows into each subtable in sequence and repeating this process until all subtable data is inserted. Default value is 0, meaning data is inserted into one subtable completely before moving to the next.
- **-i/--insert-interval \<timeInterval>** :
Specifies the insertion interval for the interleaved insert mode in milliseconds, with a default value of `0`. This only takes effect if `-B/--interlace-rows` is greater than `0`, meaning that the data insertion thread will wait for the specified time interval after inserting records into each subtable before proceeding to the next round of inserts.
Specifies the insertion interval for interlaced insertion mode, in ms, default value is 0. Only effective when `-B/--interlace-rows` is greater than 0. It means that the data insertion thread will wait for the time interval specified by this value after inserting interlaced records for each subtable before proceeding to the next round of writing.
- **-r/--rec-per-req \<rowNum>** :
The number of rows to request writing to TDengine in each request, with a default value of `30000`.
Number of data rows requested per TDengine write, default value is 30000.
- **-t/--tables \<tableNum>** :
Specifies the number of subtables, with a default of `10000`.
Specifies the number of subtables, default is 10000.
- **-S/--timestampstep \<stepLength>** :
The timestamp step length for inserting data into each subtable, in milliseconds, with a default value of `1`.
Timestamp step length for inserting data into each subtable, in ms, default value is 1.
- **-n/--records \<recordNum>** :
The number of records to insert into each subtable, with a default value of `10000`.
Number of records to insert per subtable, default value is 10000.
- **-d/--database \<dbName>** :
The name of the database to be used, with a default value of `test`.
Name of the database to use, default value is test.
- **-b/--data-type \<colType>** :
The data types of the columns in the supertable. If not specified, the default is three columns with types FLOAT, INT, FLOAT.
Data column types for the supertable. If not used, the default is three data columns, types are FLOAT, INT, FLOAT.
- **-l/--columns \<colNum>** :
The total number of data columns in the supertable. If both this parameter and `-b/--data-type` are set, the resulting number of columns will be the greater of the two. If this parameter specifies a number greater than that specified by `-b/--data-type`, the unspecified column types default to INT. For example: `-l 5 -b float,double`, will yield columns `FLOAT, DOUBLE, INT, INT, INT`. If `columns` specifies a number less than or equal to that specified by `-b/--data-type`, the result will be the columns and types specified by `-b/--data-type`. For example: `-l 3 -b float,double,float,bigint` will yield columns `FLOAT, DOUBLE, FLOAT, BIGINT`.
- **-L/--partial-col-num \<colNum>** :
Specifies which columns will have data written to them, while the other columns will have NULL values. By default, all columns will have data written.
- **-A/--tag-type \<tagType>** :
The data types of the tag columns in the supertable. Both nchar and binary types can specify lengths. For example:
Total number of data columns for the supertable. If both this parameter and `-b/--data-type` are set, the final number of columns is the larger of the two. If the number specified by this parameter is greater than the number of columns specified by `-b/--data-type`, the unspecified column types default to INT, for example: `-l 5 -b float,double`, then the final columns are `FLOAT,DOUBLE,INT,INT,INT`. If the number of columns specified is less than or equal to the number specified by `-b/--data-type`, the result is the columns and types specified by `-b/--data-type`, for example: `-l 3 -b float,double,float,bigint`, then the final columns are `FLOAT,DOUBLE,FLOAT,BIGINT`.
- **-L/--partial-col-num \<colNum>**:
Specifies that only some columns are written with data, while other columns are NULL. By default, data is written to all columns.
- **-A/--tag-type \<tagType>**:
Tag column types for the supertable. `nchar` and `binary` types can set length simultaneously, for example:
```shell
taosBenchmark -A INT,DOUBLE,NCHAR,BINARY(16)
```
If the tag types are not specified, the default is two tags with types INT and BINARY(16). Note: In some shells like bash, parentheses need to be escaped.
If no tag types are set, the default is two tags, with types INT and BINARY(16).
Note: In some shells like bash, "()" needs to be escaped, so the command should be:
```bash
```shell
taosBenchmark -A INT,DOUBLE,NCHAR,BINARY\(16\)
```
- **-w/--binwidth \<length>** :
The default length for nchar and binary types, with a default value of `64`.
- **-m/--table-prefix \<tablePrefix>** :
The prefix for subtable names, with a default value of `"d"`.
- **-E/--escape-character** :
A switch to specify whether to use escape characters in the names of supertables and subtables. The default is not to use them.
- **-C/--chinese** :
A switch to specify whether nchar and binary types should use Unicode Chinese characters. The default is not to use.
- **-N/--normal-table** :
A switch to specify that only basic tables should be created, without supertables. The default is false. This can only be used when the insert mode is taosc, stmt, or rest.
- **-M/--random** :
A switch to generate random values for the data being inserted. The default is false. If this parameter is configured, random values will be generated for the data to be inserted. For numeric types of tag/data columns, the values will be random within the range of that type. For NCHAR and BINARY types, the values will be random strings within the specified length range.
- **-x/--aggr-func** :
A switch indicating that an aggregate function should be queried after insertion. The default is false.
- **-y/--answer-yes** :
A switch requiring user confirmation to continue after a prompt. The default is false.
- **-O/--disorder \<Percentage>** :
Specifies the percentage probability of out-of-order data, ranging from [0,50]. The default is `0`, meaning no out-of-order data.
- **-R/--disorder-range \<timeRange>** :
Specifies the timestamp rollback range for out-of-order data. The generated out-of-order timestamps will be the expected timestamps minus a random value within this range. This is only effective if the out-of-order data percentage specified by `-O/--disorder` is greater than 0.
- **-F/--prepare_rand \<Num>** :
The number of unique values in the generated random data. If set to `1`, all data will be the same. The default is `10000`.
- **-a/--replica \<replicaNum>** :
Specifies the number of replicas when creating the database, with a default of `1`.
- **-k/--keep-trying \<NUMBER>** : The number of times to retry after a failure, with no retries by default. This requires version v3.0.9 or above.
- **-z/--trying-interval \<NUMBER>** : The interval time for retries after a failure, in milliseconds, and only effective when `-k` specifies retries. This requires version v3.0.9 or above.
- **-v/--vgroups \<NUMBER>** :
Specifies the number of vgroups when creating the database, effective only for TDengine v3.0+.
- **-V/--version** :
Displays version information and exits. This cannot be mixed with other parameters.
- **-?/--help** :
Displays help information and exits. This cannot be mixed with other parameters.
- **-w/--binwidth \<length>**:
The default length for `nchar` and `binary` types, default value is 64.
## Configuration File Parameter Details
- **-m/--table-prefix \<tablePrefix>**:
The prefix for the subtable names, default value is "d".
- **-E/--escape-character**:
A toggle parameter, specifies whether to use escape characters in supertable and subtable names. The default is not to use.
- **-C/--chinese**:
A toggle parameter, specifies whether `nchar` and `binary` use Unicode Chinese characters. The default is not to use.
- **-N/--normal-table**:
A toggle parameter, specifies to only create basic tables, not supertables. Default is false. Only available when the insertion mode is taosc, stmt, or rest.
- **-M/--random**:
A toggle parameter, data to be inserted are generated random values. Default is false. If this parameter is configured, the data to be inserted will be randomly generated. For numeric type tag/columns, the values will be random within the type's range. For NCHAR and BINARY type tag/columns, the values will be random strings within the specified length range.
- **-x/--aggr-func**:
A toggle parameter, indicates to query aggregate functions after insertion. Default is false.
- **-y/--answer-yes**:
A toggle parameter, requires user confirmation after prompt to continue. Default is false.
- **-O/--disorder \<Percentage>**:
Specifies the percentage probability of disorder data, with a range of [0,50]. Default is 0, meaning no disorder data.
- **-R/--disorder-range \<timeRange>**:
Specifies the timestamp fallback range for disorder data. The generated disorder timestamps are the timestamps that should be used in non-disorder situations minus a random value within this range. Only effective when the disorder data percentage specified by `-O/--disorder` is greater than 0.
- **-F/--prepare_rand \<Num>**:
The number of unique values in the generated random data. If it is 1, it means all data are the same. Default value is 10000.
- **-a/--replica \<replicaNum>**:
Specifies the number of replicas when creating the database, default value is 1.
- **-k/--keep-trying \<NUMBER>**: Number of retries after failure, default is no retry. Requires version v3.0.9 or above.
- **-z/--trying-interval \<NUMBER>**: Retry interval in milliseconds, effective only when retries are specified with -k. Requires version v3.0.9 or above.
- **-v/--vgroups \<NUMBER>**:
Specifies the number of vgroups when creating the database, only valid for TDengine v3.0+.
- **-V/--version**:
Displays version information and exits. Cannot be used with other parameters.
- **-?/--help**:
Displays help information and exits. Cannot be used with other parameters.
## Configuration File Parameters Detailed Explanation
### General Configuration Parameters
The parameters listed in this section apply to all function modes.
The parameters listed in this section apply to all functional modes.
- **filetype** : The function to be tested, optional values are `insert`, `query`, and `subscribe`, corresponding to insert, query, and subscription functions, respectively. Only one of these can be specified in each configuration file.
- **cfgdir** : The directory where the TDengine client configuration file is located, with a default path of `/etc/taos`.
- **host** : Specifies the FQDN of the TDengine server to connect to, with a default value of `localhost`.
- **port** : The port number of the TDengine server to connect to, with a default value of `6030`.
- **user** : The username used to connect to the TDengine server, with a default of `root`.
- **password** : The password used to connect to the TDengine server, with a default value of `taosdata`.
- **filetype**: The function to test, possible values are `insert`, `query`, and `subscribe`. Corresponding to insert, query, and subscribe functions. Only one can be specified in each configuration file.
- **cfgdir**: Directory where the TDengine client configuration file is located, default path is /etc/taos.
### Insert Scenario Configuration Parameters
- **host**: Specifies the FQDN of the TDengine server to connect to, default value is localhost.
In the insert scenario, `filetype` must be set to `insert`. This parameter and other general parameters are detailed in [General Configuration Parameters](#general-configuration-parameters).
- **port**: The port number of the TDengine server to connect to, default value is 6030.
- **keep_trying** : The number of retries after a failure, with no retries by default. This requires version v3.0.9 or above.
- **trying_interval** : The interval time for retries after a failure, in milliseconds, only effective when `keep_trying` specifies retries. This requires version v3.0.9 or above.
- **childtable_from** and **childtable_to** : Specify the range of subtables to write to, with a closed interval of `[childtable_from, childtable_to)`.
- **continue_if_fail** : Allows the user to define behavior after a failure.
- `"continue_if_fail": "no"`: `taosBenchmark` automatically exits on failure (default behavior).
- `"continue_if_fail": "yes"`: `taosBenchmark` warns the user and continues writing on failure.
- `"continue_if_fail": "smart"`: If the subtable does not exist, `taosBenchmark` will create it and continue writing.
- **user**: Username for connecting to the TDengine server, default is root.
- **password** : Password for connecting to the TDengine server, default value is taosdata.
### Configuration Parameters for Insertion Scenarios
In insertion scenarios, `filetype` must be set to `insert`. For this parameter and other common parameters, see Common Configuration Parameters.
- **keep_trying**: Number of retries after failure, default is no retry. Requires version v3.0.9 or above.
- **trying_interval**: Interval between retries in milliseconds, effective only when retries are specified in keep_trying. Requires version v3.0.9 or above.
- **childtable_from and childtable_to**: Specifies the range of child tables to write to, the interval is [childtable_from, childtable_to).
- **continue_if_fail**: Allows users to define behavior after failure
“continue_if_fail”: “no”, taosBenchmark exits automatically upon failure, default behavior
“continue_if_fail”: “yes”, taosBenchmark warns the user and continues writing
“continue_if_fail”: “smart”, if the child table does not exist upon failure, taosBenchmark will create the child table and continue writing
#### Database Related Configuration Parameters
Configuration parameters related to creating databases are set in the `dbinfo` section of the JSON configuration file. Some specific parameters are as follows. Other parameters correspond to the parameters specified in the TDengine `create database` command. For details, refer to [../../taos-sql/database].
Parameters related to database creation are configured in the `dbinfo` section of the json configuration file, specific parameters are as follows. Other parameters correspond to those specified in TDengine's `create database`, see [../../taos-sql/database]
- **name** : The name of the database.
- **drop** : Whether to delete the database before inserting, with optional values of "yes" or "no". Defaults to delete.
- **name**: Database name.
#### Stream Processing Configuration Parameters
- **drop**: Whether to delete the database before insertion, options are "yes" or "no", "no" means do not create. Default is to delete.
Configuration parameters related to creating stream processing are set in the `stream` section of the JSON configuration file, with the following specific parameters.
#### Stream Computing Related Configuration Parameters
- **stream_name** : The name of the stream processing, which is required.
- **stream_stb** : The name of the corresponding supertable for the stream processing, which is required.
- **stream_sql** : The SQL statement for the stream processing, which is required.
- **trigger_mode** : The trigger mode for the stream processing, with optional values.
- **watermark** : The watermark for the stream processing, with optional values.
- **drop** : Whether to create the stream processing, with optional values of "yes" or "no". If set to "no", it will not be created.
Parameters related to stream computing are configured in the `stream` section of the json configuration file, specific parameters are as follows.
- **stream_name**: Name of the stream computing, mandatory.
- **stream_stb**: Name of the supertable corresponding to the stream computing, mandatory.
- **stream_sql**: SQL statement for the stream computing, mandatory.
- **trigger_mode**: Trigger mode for the stream computing, optional.
- **watermark**: Watermark for the stream computing, optional.
- **drop**: Whether to create stream computing, options are "yes" or "no", "no" means do not create.
#### Supertable Related Configuration Parameters
Configuration parameters related to creating supertables are set in the `super_tables` section of the JSON configuration file, with specific parameters as follows.
Parameters related to supertable creation are configured in the `super_tables` section of the json configuration file, specific parameters are as follows.
- **name**: Supertable name, must be configured, no default value.
- **child_table_exists**: Whether the child table already exists, default is "no", options are "yes" or "no".
- **child_table_count**: Number of child tables, default is 10.
- **child_table_prefix**: Prefix for child table names, mandatory, no default value.
- **escape_character**: Whether the supertable and child table names contain escape characters, default is "no", options are "yes" or "no".
- **auto_create_table**: Effective only when insert_mode is taosc, rest, stmt and child_table_exists is "no", "yes" means taosBenchmark will automatically create non-existent tables during data insertion; "no" means all tables are created in advance before insertion.
- **batch_create_tbl_num**: Number of tables created per batch during child table creation, default is 10. Note: The actual number of batches may not match this value, if the executed SQL statement exceeds the maximum supported length, it will be automatically truncated and executed, continuing the creation.
- **data_source**: Source of the data, default is randomly generated by taosBenchmark, can be configured as "rand" and "sample". For "sample", use the file specified by the sample_file parameter.
- **insert_mode**: Insertion mode, options include taosc, rest, stmt, sml, sml-rest, corresponding to normal writing, restful interface writing, parameter binding interface writing, schemaless interface writing, restful schemaless interface writing (provided by taosAdapter). Default is taosc.
- **non_stop_mode**: Specifies whether to continue writing, if "yes" then insert_rows is ineffective, writing stops only when Ctrl + C stops the program. Default is "no", i.e., stop after writing a specified number of records. Note: Even in continuous writing mode, insert_rows must still be configured as a non-zero positive integer.
- **line_protocol** : Use line protocol to insert data, effective only when insert_mode is sml or sml-rest, options include line, telnet, json.
- **tcp_transfer** : Communication protocol in telnet mode, effective only when insert_mode is sml-rest and line_protocol is telnet. If not configured, the default is the http protocol.
- **insert_rows** : The number of records inserted per subtable, default is 0.
- **childtable_offset** : Effective only when child_table_exists is yes, specifies the offset when getting the subtable list from the supertable, i.e., starting from which subtable.
- **childtable_limit** : Effective only when child_table_exists is yes, specifies the limit when getting the subtable list from the supertable.
- **interlace_rows** : Enables interlaced insertion mode and specifies the number of rows to insert into each subtable at a time. Interlaced insertion mode means inserting the number of rows specified by this parameter into each subtable in turn and repeating this process until all subtable data is inserted. The default value is 0, i.e., data is inserted into one subtable before moving to the next subtable.
- **insert_interval** : Specifies the insertion interval for interlaced insertion mode, in ms, default value is 0. Only effective when `-B/--interlace-rows` is greater than 0. It means that the data insertion thread will wait for the time interval specified by this value after inserting interlaced records for each subtable before proceeding to the next round of writing.
- **partial_col_num** : If this value is a positive number n, then only the first n columns are written to, effective only when insert_mode is taosc and rest, if n is 0 then all columns are written to.
- **disorder_ratio** : Specifies the percentage probability of out-of-order data, its value range is [0,50]. Default is 0, i.e., no out-of-order data.
- **disorder_range** : Specifies the timestamp rollback range for out-of-order data. The generated out-of-order timestamp is the timestamp that should be used under non-out-of-order conditions minus a random value within this range. Only effective when `-O/--disorder` specifies a disorder data percentage greater than 0.
- **timestamp_step** : The timestamp step for inserting data into each subtable, unit consistent with the database's `precision`, default value is 1.
- **start_timestamp** : The starting timestamp for each subtable, default value is now.
- **sample_format** : The type of sample data file, currently only supports "csv".
- **sample_file** : Specifies a csv format file as the data source, effective only when data_source is sample. If the number of data rows in the csv file is less than or equal to prepared_rand, then the csv file data will be read in a loop until it matches prepared_rand; otherwise, only prepared_rand number of rows will be read. Thus, the final number of data rows generated is the smaller of the two.
- **use_sample_ts** : Effective only when data_source is sample, indicates whether the csv file specified by sample_file contains the first column timestamp, default is no. If set to yes, then use the first column of the csv file as the timestamp, since the same subtable timestamp cannot be repeated, the amount of data generated depends on the same number of data rows in the csv file, at this time insert_rows is ineffective.
- **tags_file** : Effective only when insert_mode is taosc, rest. The final value of the tag is related to childtable_count, if the tag data rows in the csv file are less than the given number of subtables, then the csv file data will be read in a loop until the childtable_count specified subtable number is generated; otherwise, only childtable_count rows of tag data will be read. Thus, the final number of subtables generated is the smaller of the two.
- **primary_key** : Specifies whether the supertable has a composite primary key, values are 1 and 0, composite primary key columns can only be the second column of the supertable, after specifying the generation of composite primary keys, ensure that the second column meets the data type of composite primary keys, otherwise an error will occur
- **repeat_ts_min** : Numeric type, when composite primary key is enabled, specifies the minimum number of records with the same timestamp to be generated, the number of records with the same timestamp is a random value within the range [repeat_ts_min, repeat_ts_max], when the minimum value equals the maximum value, it is a fixed number
- **repeat_ts_max** : Numeric type, when composite primary key is enabled, specifies the maximum number of records with the same timestamp to be generated
- **sqls** : Array of strings type, specifies the array of sql to be executed after the supertable is successfully created, the table name specified in sql must be prefixed with the database name, otherwise an unspecified database error will occur
#### tsma Configuration Parameters
Specify the configuration parameters for tsma in `super_tables` under `tsmas`, with the following specific parameters:
- **name**: Specifies the name of the tsma, mandatory.
- **function**: Specifies the function of the tsma, mandatory.
- **interval**: Specifies the time interval for the tsma, mandatory.
- **sliding**: Specifies the window time shift for the tsma, mandatory.
- **custom**: Specifies custom configuration appended at the end of the tsma creation statement, optional.
- **start_when_inserted**: Specifies when to create the tsma after how many rows are inserted, optional, default is 0.
#### Tag and Data Column Configuration Parameters
Specify the configuration parameters for tag and data columns in `super_tables` under `columns` and `tag`.
- **type**: Specifies the column type, refer to the data types supported by TDengine.
Note: The JSON data type is special, it can only be used for tags, and when using JSON type as a tag, there must be only this one tag. In this case, count and len represent the number of key-value pairs in the JSON tag and the length of each KV pair's value respectively, with value defaulting to string.
- **len**: Specifies the length of the data type, effective for NCHAR, BINARY, and JSON data types. If configured for other data types, if it is 0, it means the column is always written with a null value; if not 0, it is ignored.
- **count**: Specifies the number of times this type of column appears consecutively, for example, "count": 4096 can generate 4096 columns of the specified type.
- **name**: The name of the column, if used with count, for example "name": "current", "count":3, then the names of the 3 columns are current, current_2, current_3 respectively.
- **min**: The minimum value for the data type of the column/tag. Generated values will be greater than or equal to the minimum value.
- **max**: The maximum value for the data type of the column/tag. Generated values will be less than the minimum value.
- **scalingFactor**: Floating-point precision enhancement factor, only effective when the data type is float/double, valid values range from 1 to 1000000 positive integers. Used to enhance the precision of generated floating points, especially when min or max values are small. This attribute enhances the precision after the decimal point by powers of 10: a scalingFactor of 10 means enhancing the precision by 1 decimal place, 100 means 2 places, and so on.
- **fun**: This column data is filled with functions, currently only supports sin and cos functions, input parameters are converted from timestamps to angle values, conversion formula: angle x = input time column ts value % 360. Also supports coefficient adjustment, random fluctuation factor adjustment, displayed in a fixed format expression, such as fun="10*sin(x)+100*random(5)", x represents the angle, ranging from 0 ~ 360 degrees, the increment step is consistent with the time column step. 10 represents the multiplication coefficient, 100 represents the addition or subtraction coefficient, 5 represents the fluctuation amplitude within a 5% random range. Currently supports int, bigint, float, double four data types. Note: The expression is in a fixed pattern and cannot be reversed.
- **values**: The value domain for nchar/binary column/tag, will randomly select from the values.
- **sma**: Adds this column to SMA, value is "yes" or "no", default is "no".
- **encode**: String type, specifies the first level encoding algorithm for this column in two-level compression, see creating supertables for details.
- **compress**: String type, specifies the second level encryption algorithm for this column in two-level compression, see creating supertables for details.
- **level**: String type, specifies the compression rate of the second level encryption algorithm for this column in two-level compression, see creating supertables for details.
- **gen**: String type, specifies the method of generating data for this column, if not specified it is random, if specified as "order", it will increase sequentially by natural numbers.
- **fillNull**: String type, specifies whether this column randomly inserts NULL values, can be specified as "true" or "false", only effective when generate_row_rule is 2.
#### Insertion Behavior Configuration Parameters
- **thread_count**: The number of threads for inserting data, default is 8.
- **thread_bind_vgroup**: Whether the vgroup is bound with the writing thread during writing, binding can improve writing speed, values are "yes" or "no", default is "no". Set to "no" to maintain the same behavior as before. When set to "yes", if the thread_count is equal to the number of vgroups in the database, thread_count is automatically adjusted to the number of vgroups; if thread_count is less than the number of vgroups, the number of writing threads is not adjusted, one thread writes one vgroup data after another, while maintaining the rule that only one thread can write into a vgroup at a time.
- **create_table_thread_count** : The number of threads for creating tables, default is 8.
- **connection_pool_size** : The number of pre-established connections with the TDengine server. If not configured, it defaults to the specified number of threads.
- **result_file** : The path to the result output file, default is ./output.txt.
- **confirm_parameter_prompt** : A toggle parameter that requires user confirmation after a prompt to continue. The default value is false.
- **interlace_rows** : Enables interleaved insertion mode and specifies the number of rows to insert into each subtable at a time. Interleaved insertion mode refers to inserting the specified number of rows into each subtable in sequence and repeating this process until all subtable data has been inserted. The default value is 0, meaning data is inserted into one subtable completely before moving to the next.
This parameter can also be configured in `super_tables`; if configured, the settings in `super_tables` take higher priority and override the global settings.
- **name**: The name of the supertable, which is required and has no default value.
- **child_table_exists** : Whether the subtables already exist, default is "no", with optional values of "yes" or "no".
- **child_table_count** : The number of subtables, default is `10`.
- **child_table_prefix** : The prefix for subtable names, which is a required configuration item and has no default value.
- **escape_character** : Whether to include escape characters in supertable and subtable names, default is "no", with optional values of "yes" or "no".
- **auto_create_table** : Only effective when the insert mode is `taosc`, `rest`, `stmt`, and `child_table_exists` is "no". If this parameter is set to "yes", `taosBenchmark` will automatically create the non-existent tables when inserting data; if set to "no", it indicates that all tables should be created in advance before inserting data.
- **batch_create_tbl_num** : The number of tables to create in each batch, default is `10`. Note: The actual number of batches may not match this value; when the executed SQL statement exceeds the maximum allowed length, it will be automatically truncated and executed to continue creating.
- **data_source** : The source of the data, default is random data generated by `taosBenchmark`. Can be configured as "rand" or "sample". When set to "sample", the data will be sourced from the specified file using the `sample_file` parameter.
- **insert_mode** : The insert mode, with optional values of `taosc`, `rest`, `stmt`, `sml`, and `sml-rest`, corresponding to regular insert, RESTful interface insert, parameter binding interface insert, schemaless interface insert, and RESTful schemaless interface insert (provided by taosAdapter). The default value is `taosc`.
- **non_stop_mode** : Indicates whether to continue writing. If set to "yes", the `insert_rows` parameter will be ineffective until the program is stopped with Ctrl + C. The default value is "no", meaning that it will stop after writing the specified number of records. Note: Even in continuous write mode, `insert_rows` must be configured as a positive integer.
- **line_protocol** : Inserts data using line protocol, effective only when the insert mode is `sml` or `sml-rest`, with optional values of `line`, `telnet`, `json`.
- **tcp_transfer** : The communication protocol in telnet mode, effective only when the insert mode is `sml-rest` and `line_protocol` is `telnet`. If not configured, the default is HTTP protocol.
- **insert_rows** : The number of records to insert into each subtable, default is `0`.
- **childtable_offset** : Effective only when `child_table_exists` is "yes", specifies the offset when obtaining the list of subtables from the supertable, i.e., starting from which subtable.
- **childtable_limit** : Effective only when `child_table_exists` is "yes", specifies the limit when obtaining the list of subtables from the supertable.
- **interlace_rows** : Enables interleaved insert mode and specifies the number of rows to insert into each subtable at a time. The default value is `0`, meaning that data will be inserted into each subtable only after completing the data insertion into one subtable.
- **insert_interval** :
Specifies the insertion interval for the interleaved insert mode in milliseconds, with a default value of `0`. This only takes effect if `-B/--interlace-rows` is greater than `0`.
Specifies the insertion interval for interleaved insertion mode in milliseconds, default is 0. This only takes effect when `-B/--interlace-rows` is greater than 0. It means that the data insertion thread will wait for the specified interval after inserting interleaved records for each subtable before proceeding to the next round of writing.
This parameter can also be configured in `super_tables`; if configured, the settings in `super_tables` take higher priority and override the global settings.
### Query Scenario Configuration Parameters
- **num_of_records_per_req** :
The number of data rows requested per write to TDengine, default is 30000. If set too high, the TDengine client driver will return corresponding error messages, and this parameter needs to be reduced to meet the writing requirements.
In the query scenario, `filetype` must be set to `query`. The parameter `query_times` specifies the number of times to execute the query.
- **prepare_rand** : The number of unique values in the generated random data. If it is 1, it means all data are the same. The default value is 10000.
The query scenario can control the execution of slow queries through the `kill_slow_query_threshold` and `kill_slow_query_interval` parameters. The threshold controls if queries exceeding the specified execution time (`exec_usec`) will be killed by `taosBenchmark`, measured in seconds; the interval controls the sleep duration to avoid consuming CPU resources with continuous slow queries, measured in seconds.
- **pre_load_tb_meta** : Whether to pre-load the meta data of subtables, values are “yes” or "no". When there are a large number of subtables, turning on this option can improve the writing speed.
Other general parameters can be found in [General Configuration Parameters](#general-configuration-parameters).
### Configuration Parameters for Query Scenarios
In query scenarios, `filetype` must be set to `query`.
`query_times` specifies the number of times to run the query, numeric type.
Query scenarios can control the execution of slow query statements by setting `kill_slow_query_threshold` and `kill_slow_query_interval` parameters, where threshold controls that queries exceeding the specified exec_usec time will be killed by taosBenchmark, in seconds; interval controls the sleep time to avoid continuous slow query CPU consumption, in seconds.
For other common parameters, see Common Configuration Parameters.
#### Configuration Parameters for Executing Specified Query Statements
The configuration parameters for querying specified tables (including supertables, subtables, or basic tables) are set in the `specified_table_query` section.
Configuration parameters for querying specified tables (can specify supertables, subtables, or regular tables) are set in `specified_table_query`.
- **query_interval** : The interval for querying, measured in seconds, default value is `0`.
- **threads** : The number of threads executing the query SQL, default is `1`.
- **sqls**
- **query_interval** : Query interval, in seconds, default is 0.
- **threads** : Number of threads executing the SQL query, default is 1.
- **sqls**:
- **sql**: The SQL command to execute, required.
- **result**: The file to save the query results; if not specified, it will not be saved.
- **result**: File to save the query results, if not specified, results are not saved.
#### Configuration Parameters for Querying Supertables
The configuration parameters for querying supertables are set in the `super_table_query` section.
Configuration parameters for querying supertables are set in `super_table_query`.
- **stblname** : The name of the supertable to query, required.
- **query_interval** : The interval for querying, measured in seconds, default value is `0`.
- **threads** : The number of threads executing the query SQL, default is `1`.
- **sqls**
- **sql** : The SQL command to execute, required. For querying supertables, the SQL command should retain "xxxx", which will be automatically replaced by the names of all the subtables in the supertable.
### Subscription Scenario Configuration Parameters
In the subscription scenario, `filetype` must be set to `subscribe`. This parameter and other general parameters are detailed in [General Configuration Parameters](#general-configuration-parameters).
- **query_interval** : Query interval, in seconds, default is 0.
- **threads** : Number of threads executing the SQL query, default is 1.
- **sqls** :
- **sql** : The SQL command to execute, required; for supertable queries, keep "xxxx" in the SQL command, the program will automatically replace it with all subtable names of the supertable.
- **result** : File to save the query results, if not specified, results are not saved.
### Configuration Parameters for Subscription Scenarios
In subscription scenarios, `filetype` must be set to `subscribe`, this parameter and other common parameters see Common Configuration Parameters.
#### Configuration Parameters for Executing Specified Subscription Statements
The configuration parameters for subscribing to specified tables (including supertables, subtables, or basic tables) are set in the `specified_table_query` section.
Configuration parameters for subscribing to specified tables (can specify supertables, subtables, or regular tables) are set in `specified_table_query`.
- **threads/concurrent**: The number of threads executing the SQL, default is `1`.
- **sqls**:
- **sql**: The SQL command to execute, required.
- **threads/concurrent** : Number of threads executing the SQL, default is 1.
#### Configuration File Data Type Reference Table
- **sqls** :
- **sql** : The SQL command to execute, required.
| # | **Engine** | **taosBenchmark** |
| ---- | :---------------: | :---------------: |
| 1 | TIMESTAMP | timestamp |
| 2 | INT | int |
| 3 | INT UNSIGNED | uint |
| 4 | BIGINT | bigint |
| 5 | BIGINT UNSIGNED | ubigint |
| 6 | FLOAT | float |
| 7 | DOUBLE | double |
| 8 | BINARY | binary |
| 9 | SMALLINT | smallint |
| 10 | SMALLINT UNSIGNED | usmallint |
| 11 | TINYINT | tinyint |
| 12 | TINYINT UNSIGNED | utinyint |
| 13 | BOOL | bool |
| 14 | NCHAR | nchar |
| 15 | VARCHAR | varchar |
| 16 | VARBINARY | varbinary |
| 17 | GEOMETRY | geometry |
| 18 | JSON | json |
#### Data Type Writing Comparison Table in Configuration File
Note: Data types in the `taosBenchmark` configuration file must be in lowercase to be recognized.
| # | **Engine** | **taosBenchmark**
| --- | :----------------: | :---------------:
| 1 | TIMESTAMP | timestamp
| 2 | INT | int
| 3 | INT UNSIGNED | uint
| 4 | BIGINT | bigint
| 5 | BIGINT UNSIGNED | ubigint
| 6 | FLOAT | float
| 7 | DOUBLE | double
| 8 | BINARY | binary
| 9 | SMALLINT | smallint
| 10 | SMALLINT UNSIGNED | usmallint
| 11 | TINYINT | tinyint
| 12 | TINYINT UNSIGNED | utinyint
| 13 | BOOL | bool
| 14 | NCHAR | nchar
| 15 | VARCHAR | varchar
| 16 | VARBINARY | varbinary
| 17 | GEOMETRY | geometry
| 18 | JSON | json
Note: Data types in the taosBenchmark configuration file must be in lowercase to be recognized.

View File

@ -4,7 +4,7 @@ description: TDengine Tools
slug: /tdengine-reference/tools
---
This section provides detailed descriptions of the main tools in TDengine, including their functions and usage.
This section provides detailed explanations of the functions and usage of the main tools in TDengine.
```mdx-code-block
import DocCardList from '@theme/DocCardList';

View File

@ -1,20 +1,19 @@
---
title: Data Types
description: 'TDengine supported data types: Timestamp, Float, JSON type, etc.'
slug: /tdengine-reference/sql-manual/data-types
---
## Timestamp
In TDengine, the most important aspect is the timestamp. You need to specify the timestamp when creating and inserting records, as well as when querying historical records. The rules for timestamps are as follows:
Using TDengine, the most important aspect is the timestamp. When creating and inserting records, or querying historical records, specifying the timestamp is necessary. The rules for timestamps are as follows:
- The timestamp format is `YYYY-MM-DD HH:mm:ss.MS`, with a default resolution of milliseconds. For example: `2017-08-12 18:25:58.128`.
- The internal function NOW represents the current time of the client.
- When inserting records, if the timestamp is NOW, the current time of the client submitting the record is used.
- Epoch Time: The timestamp can also be a long integer representing the number of milliseconds since UTC time 1970-01-01 00:00:00. Accordingly, if the time precision of the database is set to "microseconds", the long integer timestamp corresponds to the number of microseconds since UTC time 1970-01-01 00:00:00; the logic is the same for nanoseconds.
- You can add or subtract time, for example, NOW-2h indicates that the query time is moved back 2 hours (the last 2 hours). The time unit after the number can be b (nanoseconds), u (microseconds), a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks). For example, `SELECT * FROM t1 WHERE ts > NOW-2w AND ts <= NOW-1w` indicates querying data from exactly one week two weeks ago. When specifying the time window (Interval) for down sampling, time units can also be n (natural month) and y (natural year).
- The time format is `YYYY-MM-DD HH:mm:ss.MS`, with the default time resolution being milliseconds. For example: `2017-08-12 18:25:58.128`
- The internal function NOW represents the current time of the client
- When inserting records, if the timestamp is NOW, the current time of the client submitting the record is used
- Epoch Time: The timestamp can also be a long integer, representing the number of milliseconds since UTC time 1970-01-01 00:00:00. Accordingly, if the time precision of the Database is set to "microseconds", the meaning of the long integer format timestamp corresponds to the number of microseconds since UTC time 1970-01-01 00:00:00; the logic for nanoseconds precision is similar.
- Time can be added or subtracted, such as NOW-2h, which indicates pushing the query time forward by 2 hours (the last 2 hours). The time unit after the number can be b (nanoseconds), u (microseconds), a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks). For example `SELECT * FROM t1 WHERE ts > NOW-2w AND ts <= NOW-1w`, represents querying data for a whole week two weeks ago. When specifying the time window (Interval) for Down Sampling operations, the time unit can also use n (natural month) and y (natural year).
The default timestamp precision in TDengine is milliseconds, but it also supports microseconds and nanoseconds by passing the `PRECISION` parameter when `CREATE DATABASE` is executed.
TDengine's default timestamp precision is milliseconds, but microseconds and nanoseconds are also supported by passing the `PRECISION` parameter during `CREATE DATABASE`.
```sql
CREATE DATABASE db_name PRECISION 'ns';
@ -22,64 +21,64 @@ CREATE DATABASE db_name PRECISION 'ns';
## Data Types
In TDengine, the following data types can be used in the data model of a table.
In TDengine, the following data types can be used in the data model of basic tables.
| # | **Type** | **Bytes** | **Description** |
| --- | :---------------: | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | TIMESTAMP | 8 | Timestamp. The default precision is milliseconds, and it can support microseconds and nanoseconds, detailed explanation can be found in the previous section. |
| 2 | INT | 4 | Integer, range [-2^31, 2^31-1] |
| 3 | INT UNSIGNED | 4 | Unsigned integer, range [0, 2^32-1] |
| 4 | BIGINT | 8 | Long integer, range [-2^63, 2^63-1] |
| 5 | BIGINT UNSIGNED | 8 | Long integer, range [0, 2^64-1] |
| 6 | FLOAT | 4 | Floating-point number, significant digits 6-7, range [-3.4E38, 3.4E38] |
| 7 | DOUBLE | 8 | Double precision floating-point number, significant digits 15-16, range [-1.7E308, 1.7E308] |
| 8 | BINARY | Custom | Records single-byte strings, recommended for handling ASCII visible characters only; multi-byte characters such as Chinese must use NCHAR. |
| 9 | SMALLINT | 2 | Short integer, range [-32768, 32767] |
| 10 | SMALLINT UNSIGNED | 2 | Unsigned short integer, range [0, 65535] |
| 11 | TINYINT | 1 | Single-byte integer, range [-128, 127] |
| 12 | TINYINT UNSIGNED | 1 | Unsigned single-byte integer, range [0, 255] |
| 13 | BOOL | 1 | Boolean type, {true, false} |
| 14 | NCHAR | Custom | Records strings containing multi-byte characters, such as Chinese characters. Each NCHAR character occupies 4 bytes of storage space. Strings should be enclosed in single quotes, and single quotes within strings should be escaped with `\'`. The size must be specified when using NCHAR; for example, NCHAR(10) indicates that this column can store up to 10 NCHAR characters. An error will occur if the user string length exceeds the declared length. |
| 15 | JSON | | JSON data type, only tags can be in JSON format |
| 16 | VARCHAR | Custom | Alias for BINARY |
| 17 | GEOMETRY | Custom | Geometry type |
| 18 | VARBINARY | Custom | Variable-length binary data |
| # | **Type** | **Bytes** | **Description** |
| ---- | :---------------: | --------- | ------------------------------------------------------------ |
| 1 | TIMESTAMP | 8 | Timestamp. Default precision is milliseconds, supports microseconds and nanoseconds, see the previous section for details. |
| 2 | INT | 4 | Integer, range [-2^31, 2^31-1] |
| 3 | INT UNSIGNED | 4 | Unsigned integer, [0, 2^32-1] |
| 4 | BIGINT | 8 | Long integer, range [-2^63, 2^63-1] |
| 5 | BIGINT UNSIGNED | 8 | Long integer, range [0, 2^64-1] |
| 6 | FLOAT | 4 | Float, precision 6-7 digits, range [-3.4E38, 3.4E38] |
| 7 | DOUBLE | 8 | Double precision float, precision 15-16 digits, range [-1.7E308, 1.7E308] |
| 8 | BINARY | Custom | Records single-byte strings, recommended for handling ASCII visible characters only, multi-byte characters such as Chinese should use NCHAR |
| 9 | SMALLINT | 2 | Short integer, range [-32768, 32767] |
| 10 | SMALLINT UNSIGNED | 2 | Unsigned short integer, range [0, 65535] |
| 11 | TINYINT | 1 | Single-byte integer, range [-128, 127] |
| 12 | TINYINT UNSIGNED | 1 | Unsigned single-byte integer, range [0, 255] |
| 13 | BOOL | 1 | Boolean, {true, false} |
| 14 | NCHAR | Custom | Records strings including multi-byte characters, such as Chinese characters. Each NCHAR character occupies 4 bytes of storage space. Strings are enclosed in single quotes, and single quotes within the string are escaped with `\'`. NCHAR usage must specify the string size, a column of type NCHAR(10) indicates that this column can store up to 10 NCHAR characters. If the user's string length exceeds the declared length, an error will be reported. |
| 15 | JSON | | JSON data type, only Tags can be in JSON format |
| 16 | VARCHAR | Custom | Alias for BINARY type |
| 17 | GEOMETRY | Custom | Geometry type, supported starting from version 3.1.0.0 |
| 18 | VARBINARY | Custom | Variable-length binary data, supported starting from version 3.1.1.0 |
:::note
- The length of each row in a table cannot exceed 48KB (in version 3.0.5.0 and later, this limit is 64KB). (Note: Each BINARY/NCHAR/GEOMETRY/VARBINARY type column will also occupy an additional 2 bytes of storage space).
- Although the BINARY type supports byte-based binary characters at the storage level, the way different programming languages handle binary data may not guarantee consistency. Therefore, it is recommended to only store ASCII visible characters in the BINARY type and avoid storing invisible characters. Multi-byte data, such as Chinese characters, should be stored using the NCHAR type. If you forcefully use the BINARY type to store Chinese characters, it may sometimes read and write correctly, but it lacks character set information, making it prone to data garbling or even corruption.
- Theoretically, the BINARY type can be up to 16,374 bytes long (65,517 bytes for data columns and 16,382 bytes for tag columns starting from version 3.0.5.0). The BINARY type only supports string input, and the strings must be enclosed in single quotes. You must specify the size, for example, BINARY(20) defines the maximum length of a single-byte character string as 20 characters, occupying a total of 20 bytes of space. If the user string exceeds 20 bytes, an error will occur. For single quotes within the string, use the escape character backslash plus single quote, i.e., `\'`.
- The GEOMETRY type data column has a maximum length of 65,517 bytes, and the maximum length for tag columns is 16,382 bytes. It supports 2D POINT, LINESTRING, and POLYGON subtype data. The length calculation is as follows:
- The maximum length of each row in a table cannot exceed 48KB (64KB starting from version 3.0.5.0) (Note: Each BINARY/NCHAR/GEOMETRY/VARBINARY type column will also occupy an additional 2 bytes of storage space).
- Although the BINARY type supports byte-type binary characters at the storage level, different programming languages do not guarantee consistent handling of binary data. Therefore, it is recommended to store only ASCII visible characters in the BINARY type and avoid storing invisible characters. Multibyte data, such as Chinese characters, should be saved using the NCHAR type. If Chinese characters are forcibly saved using the BINARY type, although they can sometimes be read and written normally, they do not carry character set information, which can easily lead to data corruption or even data damage.
- Theoretically, the BINARY type can be up to 16,374 bytes long (from version 3.0.5.0, data columns are 65,517 bytes, label columns are 16,382 bytes). BINARY only supports string input, which must be enclosed in single quotes. When used, the size must be specified, such as BINARY(20) defines a string up to 20 single-byte characters long, with each character occupying 1 byte of storage space, totaling a fixed 20 bytes of space. If the user's string exceeds 20 bytes, an error will be reported. For single quotes within the string, they can be represented by the escape character backslash followed by a single quote, i.e., `\'`.
- GEOMETRY type data columns have a maximum length of 65,517 bytes, and label columns have a maximum length of 16,382 bytes. Supports 2D subtypes of POINT, LINESTRING, and POLYGON data. The length calculation method is shown in the following table:
| # | **Syntax** | **Min Length** | **Max Length** | **Each Coordinate Length Growth** |
|---|--------------------------------------|----------|------------|--------------|
| 1 | POINT(1.0 1.0) | 21 | 21 | None |
| 2 | LINESTRING(1.0 1.0, 2.0 2.0) | 9+2*16 | 9+4094*16 | +16 |
| 3 | POLYGON((1.0 1.0, 2.0 2.0, 1.0 1.0)) | 13+3*16 | 13+4094*16 | +16 |
| # | **Syntax** | **Minimum Length** | **Maximum Length** | **Increment per Coordinate Set** |
| ---- | ------------------------------------ | ------------------ | ------------------ | -------------------------------- |
| 1 | POINT(1.0 1.0) | 21 | 21 | None |
| 2 | LINESTRING(1.0 1.0, 2.0 2.0) | 9+2*16 | 9+4094*16 | +16 |
| 3 | POLYGON((1.0 1.0, 2.0 2.0, 1.0 1.0)) | 13+3*16 | 13+4094*16 | +16 |
- In SQL statements, the numerical type will be determined based on the presence of a decimal point or scientific notation. Therefore, be cautious of type overflow when using values. For example, 9999999999999999999 will be considered an overflow exceeding the upper limit of long integers, while 9999999999999999999.0 will be considered a valid floating-point number.
- VARBINARY is a data type for storing binary data, with a maximum length of 65,517 bytes and a maximum length for tag columns of 16,382 bytes. It can be written using SQL or schemaless methods (it needs to be converted to a string starting with \x for writing), or it can be written using the stmt method (binary can be used directly). Displayed in hexadecimal format starting with \x.
- In SQL statements, the type of numerical values will be determined based on the presence of a decimal point or the use of scientific notation, so care must be taken to avoid type overflow. For example, 9999999999999999999 will be considered to exceed the upper boundary of long integers and overflow, while 9999999999999999999.0 will be considered a valid floating point number.
- VARBINARY is a data type for storing binary data, with a maximum length of 65,517 bytes for data columns and 16,382 bytes for label columns. Binary data can be written via SQL or schemaless methods (needs to be converted to a string starting with \x), or through stmt methods (can use binary directly). Displayed as hexadecimal starting with \x.
:::
## Constants
TDengine supports multiple types of constants, detailed in the table below:
TDengine supports multiple types of constants, details as shown in the table below:
| # | **Syntax** | **Type** | **Description** |
| --- | :-----------------------------------------------: | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | [\{+ \| -}]123 | BIGINT | The literal type of integer values is BIGINT. If the user input exceeds the range of BIGINT, TDengine truncates the value according to BIGINT. |
| 2 | 123.45 | DOUBLE | The literal type of floating-point values is DOUBLE. TDengine determines whether the numerical type is integer or floating-point based on the presence of a decimal point or scientific notation. |
| 3 | 1.2E3 | DOUBLE | The literal type of scientific notation is DOUBLE. |
| 4 | 'abc' | BINARY | Content enclosed in single quotes is a string literal of type BINARY, and the size of BINARY is the actual number of characters. For single quotes within the string, use the escape character backslash plus single quote, i.e., `\'`. |
| 5 | "abc" | BINARY | Content enclosed in double quotes is a string literal of type BINARY, and the size of BINARY is the actual number of characters. For double quotes within the string, use the escape character backslash plus double quote, i.e., `\"`. |
| 6 | TIMESTAMP \{'literal' \| "literal"} | TIMESTAMP | The TIMESTAMP keyword indicates that the following string literal needs to be interpreted as TIMESTAMP type. The string must meet the format YYYY-MM-DD HH:mm:ss.MS, with the time resolution as the current database's time resolution. |
| 7 | \{TRUE \| FALSE} | BOOL | Boolean type literal. |
| 8 | \{'' \| "" \| '\t' \| "\t" \| ' ' \| " " \| NULL } | -- | Null value literal. Can be used for any type. |
| # | **Syntax** | **Type** | **Description** |
| ---- | :------------------------------------------------: | --------- | ------------------------------------------------------------ |
| 1 | [\{+ \| -}]123 | BIGINT | The literal type of integer values is always BIGINT. If the user input exceeds the range of BIGINT, TDengine truncates the value as BIGINT. |
| 2 | 123.45 | DOUBLE | The literal type of floating-point values is always DOUBLE. TDengine determines whether the value is an integer or floating point based on the presence of a decimal point or the use of scientific notation. |
| 3 | 1.2E3 | DOUBLE | The literal type for scientific notation is DOUBLE. |
| 4 | 'abc' | BINARY | Content enclosed in single quotes is a string literal, whose type is BINARY. The size of BINARY is the actual number of characters. For single quotes within the string, they can be represented by the escape character backslash followed by a single quote, i.e., `\'`. |
| 5 | "abc" | BINARY | Content enclosed in double quotes is a string literal, whose type is BINARY. The size of BINARY is the actual number of characters. For double quotes within the string, they can be represented by the escape character backslash followed by a single quote, i.e., `\"`. |
| 6 | TIMESTAMP \{'literal' \| "literal"} | TIMESTAMP | The TIMESTAMP keyword indicates that the following string literal should be interpreted as a TIMESTAMP type. The string must meet the YYYY-MM-DD HH:mm:ss.MS format, with the time resolution being that of the current database. |
| 7 | \{TRUE \| FALSE} | BOOL | Boolean type literal. |
| 8 | \{'' \| "" \| '\t' \| "\t" \| ' ' \| " " \| NULL } | -- | Null value literal. Can be used for any type. |
:::note
- TDengine determines whether the numerical type is integer or floating-point based on the presence of a decimal point or scientific notation. Therefore, be cautious of type overflow when using values. For example, 9999999999999999999 will be considered an overflow exceeding the upper limit of long integers, while 9999999999999999999.0 will be considered a valid floating-point number.
- TDengine determines whether a numeric type is an integer or a floating-point based on the presence of a decimal point or the use of scientific notation. Therefore, be aware of potential type overflow when using it. For example, 9999999999999999999 is considered to exceed the upper boundary of a long integer and will overflow, while 9999999999999999999.0 is considered a valid floating-point number.
:::

View File

@ -1,6 +1,5 @@
---
title: Manage Databases
description: "Create, delete databases, view and modify database parameters"
title: Databases
slug: /tdengine-reference/sql-manual/manage-databases
---
@ -8,17 +7,17 @@ slug: /tdengine-reference/sql-manual/manage-databases
```sql
CREATE DATABASE [IF NOT EXISTS] db_name [database_options]
database_options:
database_option ...
database_option: {
VGROUPS value
| PRECISION {'ms' | 'us' | 'ns'}
| REPLICA value
| BUFFER value
| PAGES value
| PAGESIZE value
| PAGESIZE value
| CACHEMODEL {'none' | 'last_row' | 'last_value' | 'both'}
| CACHESIZE value
| COMP {0 | 1 | 2}
@ -26,10 +25,12 @@ database_option: {
| MAXROWS value
| MINROWS value
| KEEP value
| KEEP_TIME_OFFSET value
| STT_TRIGGER value
| SINGLE_STABLE {0 | 1}
| TABLE_PREFIX value
| TABLE_SUFFIX value
| DNODES value
| TSDB_PAGESIZE value
| WAL_LEVEL {1 | 2}
| WAL_FSYNC_PERIOD value
@ -40,50 +41,52 @@ database_option: {
### Parameter Description
- VGROUPS: The initial number of vgroups in the database.
- PRECISION: The timestamp precision of the database. 'ms' means milliseconds, 'us' means microseconds, 'ns' means nanoseconds, with a default of milliseconds.
- REPLICA: Indicates the number of replicas for the database, with values of 1, 2, or 3, defaulting to 1; 2 is only available in the enterprise version 3.3.0.0 and later. In a cluster, the number of replicas must be less than or equal to the number of DNODEs, and there are the following restrictions:
- Operations such as SPLITE VGROUP or REDISTRIBUTE VGROUP are currently not supported for dual-replica databases.
- A single-replica database can be changed to a dual-replica database, but not the other way around, and a three-replica database cannot be changed to a dual-replica.
- BUFFER: The size of the memory pool for a VNODE write, measured in MB, defaulting to 256, with a minimum of 3 and a maximum of 16384.
- PAGES: The number of cache pages for the metadata storage engine in a VNODE, defaulting to 256, with a minimum of 64. The metadata storage of a VNODE occupies PAGESIZE * PAGES, which is 1MB of memory by default.
- PAGESIZE: The page size of the metadata storage engine in a VNODE, measured in KB, defaulting to 4 KB. The range is from 1 to 16384, i.e., from 1 KB to 16 MB.
- CACHEMODEL: Indicates whether to cache the recent data of the subtables in memory. The default is none.
- VGROUPS: The number of initial vgroups in the database.
- PRECISION: The timestamp precision of the database. ms for milliseconds, us for microseconds, ns for nanoseconds, default is ms.
- REPLICA: Indicates the number of database replicas, which can be 1, 2, or 3, default is 1; 2 is only available in the enterprise version 3.3.0.0 and later. In a cluster, the number of replicas must be less than or equal to the number of DNODEs. The following restrictions apply:
- Operations such as SPLITE VGROUP or REDISTRIBUTE VGROUP are not supported for databases with double replicas.
- A single-replica database can be changed to a double-replica database, but changing from double replicas to other numbers of replicas, or from three replicas to double replicas is not supported.
- BUFFER: The size of the memory pool for writing into a VNODE, in MB, default is 256, minimum is 3, maximum is 16384.
- PAGES: The number of cache pages in a VNODE's metadata storage engine, default is 256, minimum 64. A VNODE's metadata storage occupies PAGESIZE * PAGES, which by default is 1MB of memory.
- PAGESIZE: The page size of a VNODE's metadata storage engine, in KB, default is 4 KB. Range is 1 to 16384, i.e., 1 KB to 16 MB.
- CACHEMODEL: Indicates whether to cache the latest data of subtables in memory. Default is none.
- none: Indicates no caching.
- last_row: Indicates caching of the most recent row of the subtable. This significantly improves the performance of the LAST_ROW function.
- last_value: Indicates caching of the most recent non-NULL value for each column in the subtable. This significantly improves the performance of the LAST function under normal conditions (without special influences such as WHERE, ORDER BY, GROUP BY, INTERVAL).
- both: Indicates that both the caching of the most recent row and column functions are enabled.
Note: Switching CacheModel values back and forth may lead to inaccurate results for last/last_row queries, so please proceed with caution. It is recommended to keep it enabled.
- CACHESIZE: Indicates the memory size used for caching the recent data of the subtables in each vnode. The default is 1, with a range of [1, 65536], measured in MB.
- COMP: Indicates the database file compression flag. The default value is 2, with a range of [0, 2].
- last_row: Indicates caching the latest row of data of subtables. This will significantly improve the performance of the LAST_ROW function.
- last_value: Indicates caching the latest non-NULL value of each column of subtables. This will significantly improve the performance of the LAST function without special effects (WHERE, ORDER BY, GROUP BY, INTERVAL).
- both: Indicates enabling caching of both the latest row and column.
Note: Switching CacheModel values back and forth may cause inaccurate results for last/last_row queries, please operate with caution. It is recommended to keep it turned on.
- CACHESIZE: The size of memory used for caching the latest data of subtables in each vnode. Default is 1, range is [1, 65536], in MB.
- COMP: Indicates the compression flag for database files, default value is 2, range is [0, 2].
- 0: Indicates no compression.
- 1: Indicates one-stage compression.
- 1: Indicates first-stage compression.
- 2: Indicates two-stage compression.
- DURATION: The time span for storing data in the data file. You can use units, such as DURATION 100h, DURATION 10d, etc., supporting m (minutes), h (hours), and d (days). If no time unit is specified, the default unit is days; for example, DURATION 50 indicates 50 days.
- MAXROWS: The maximum number of records in a file block, defaulting to 4096 records.
- MINROWS: The minimum number of records in a file block, defaulting to 100 records.
- KEEP: Indicates the number of days to retain the data file. The default value is 3650, with a range of [1, 365000], and it must be at least three times the DURATION parameter value. The database will automatically delete data that exceeds the KEEP time. KEEP can also use units, such as KEEP 100h, KEEP 10d, etc., supporting m (minutes), h (hours), and d (days). It can also be specified without a unit, such as KEEP 50, where the default unit is days. The enterprise version supports [tiered storage](../../../operations-and-maintenance/advanced-storage-options/) functionality, allowing multiple retention times (separated by commas, with a maximum of 3, satisfying keep 0 \<= keep 1 \<= keep 2, e.g., KEEP 100h,100d,3650d); the community version does not support multi-level storage (even if multiple retention times are configured, they will not take effect, and KEEP will take the maximum retention time).
- STT_TRIGGER: Indicates the number of on-disk files that trigger file merging. The default is 1, with a range of 1 to 16. For high-frequency scenarios with few tables, it is recommended to use the default configuration or a smaller value; for low-frequency scenarios with many tables, it is recommended to configure a larger value.
- SINGLE_STABLE: Indicates whether only one supertable can be created in this database, suitable for cases where there are very many columns in the supertable.
- DURATION: The time span for storing data in data files. Can use unit-specified formats, such as DURATION 100h, DURATION 10d, etc., supports m (minutes), h (hours), and d (days) three units. If no time unit is added, the default unit is days, e.g., DURATION 50 means 50 days.
- MAXROWS: The maximum number of records in a file block, default is 4096.
- MINROWS: The minimum number of records in a file block, default is 100.
- KEEP: Indicates the number of days data files are kept, default value is 3650, range [1, 365000], and must be greater than or equal to 3 times the DURATION parameter value. The database will automatically delete data that has been saved for longer than the KEEP value to free up storage space. KEEP can use unit-specified formats, such as KEEP 100h, KEEP 10d, etc., supports m (minutes), h (hours), and d (days) three units. It can also be written without a unit, like KEEP 50, where the default unit is days. The enterprise version supports multi-tier storage feature, thus, multiple retention times can be set (multiple separated by commas, up to 3, satisfying keep 0 \<= keep 1 \<= keep 2, such as KEEP 100h,100d,3650d); the community version does not support multi-tier storage feature (even if multiple retention times are configured, it will not take effect, KEEP will take the longest retention time).
- KEEP_TIME_OFFSET: Effective from version 3.2.0.0. The delay execution time for deleting or migrating data that has been saved for longer than the KEEP value, default value is 0 (hours). After the data file's save time exceeds KEEP, the deletion or migration operation will not be executed immediately, but will wait an additional interval specified by this parameter, to avoid peak business periods.
- STT_TRIGGER: Indicates the number of file merges triggered by disk files. The open-source version is fixed at 1, the enterprise version can be set from 1 to 16. For scenarios with few tables and high-frequency writing, this parameter is recommended to use the default configuration; for scenarios with many tables and low-frequency writing, this parameter is recommended to be set to a larger value.
- SINGLE_STABLE: Indicates whether only one supertable can be created in this database, used in cases where the supertable has a very large number of columns.
- 0: Indicates that multiple supertables can be created.
- 1: Indicates that only one supertable can be created.
- TABLE_PREFIX: When positive, it ignores the specified prefix length in the table name when deciding which vgroup to allocate a table to; when negative, it only uses the specified prefix length in the table name. For example, assuming the table name is "v30001", when TSDB_PREFIX = 2, it uses "0001" to determine which vgroup to allocate; when TSDB_PREFIX = -2, it uses "v3" to determine which vgroup to allocate.
- TABLE_SUFFIX: When positive, it ignores the specified suffix length in the table name when deciding which vgroup to allocate a table to; when negative, it only uses the specified suffix length in the table name. For example, assuming the table name is "v30001", when TSDB_SUFFIX = 2, it uses "v300" to determine which vgroup to allocate; when TSDB_SUFFIX = -2, it uses "01" to determine which vgroup to allocate.
- TSDB_PAGESIZE: The page size of the time series data storage engine in a VNODE, measured in KB, defaulting to 4 KB. The range is from 1 to 16384, i.e., from 1 KB to 16 MB.
- WAL_LEVEL: WAL level, defaulting to 1.
- 1: Write WAL but do not perform fsync.
- TABLE_PREFIX: When it is a positive value, it ignores the specified length prefix of the table name when deciding which vgroup to allocate a table to; when it is a negative value, it only uses the specified length prefix of the table name when deciding which vgroup to allocate a table to; for example, assuming the table name is "v30001", when TSDB_PREFIX = 2, use "0001" to decide which vgroup to allocate to, when TSDB_PREFIX = -2, use "v3" to decide which vgroup to allocate to.
- TABLE_SUFFIX: When it is a positive value, it ignores the specified length suffix of the table name when deciding which vgroup to allocate a table to; when it is a negative value, it only uses the specified length suffix of the table name when deciding which vgroup to allocate a table to; for example, assuming the table name is "v30001", when TSDB_SUFFIX = 2, use "v300" to decide which vgroup to allocate to, when TSDB_SUFFIX = -2, use "01" to decide which vgroup to allocate to.
- TSDB_PAGESIZE: The page size of a VNODE's time-series data storage engine, in KB, default is 4 KB. Range is 1 to 16384, i.e., 1 KB to 16 MB.
- DNODES: Specifies the list of DNODEs where the VNODE is located, such as '1,2,3', separated by commas and without spaces between characters, only supported in the enterprise version.
- WAL_LEVEL: WAL level, default is 1.
- 1: Write WAL, but do not perform fsync.
- 2: Write WAL and perform fsync.
- WAL_FSYNC_PERIOD: When the WAL_LEVEL parameter is set to 2, it is used to set the write-back period. The default is 3000 milliseconds, with a minimum of 0 (indicating immediate write-back on each write) and a maximum of 180000 (i.e., three minutes).
- WAL_RETENTION_PERIOD: To facilitate data subscription consumption, the maximum retention duration policy for WAL log files is to be kept. The WAL log cleanup is not affected by the subscription client consumption status. The unit is seconds. The default is 3600, meaning that the most recent 3600 seconds of data in the WAL will be retained. Please modify this parameter to an appropriate value according to the data subscription needs.
- WAL_RETENTION_SIZE: To facilitate data subscription consumption, the maximum cumulative size policy for WAL log files is to be kept. The unit is KB. The default is 0, indicating that the cumulative size has no upper limit.
- WAL_FSYNC_PERIOD: When the WAL_LEVEL parameter is set to 2, it is used to set the disk writing period. Default is 3000, in milliseconds. Minimum is 0, meaning immediate disk writing upon each write; maximum is 180000, i.e., three minutes.
- WAL_RETENTION_PERIOD: For data subscription consumption, the maximum duration strategy for additional retention of WAL log files. WAL log cleaning is not affected by the consumption status of subscription clients. In seconds. Default is 3600, meaning WAL retains the most recent 3600 seconds of data, please modify this parameter to an appropriate value according to the needs of data subscription.
- WAL_RETENTION_SIZE: For data subscription consumption, the maximum cumulative size strategy for additional retention of WAL log files. In KB. Default is 0, meaning there is no upper limit on cumulative size.
### Example of Creating a Database
### Database Creation Example
```sql
create database if not exists db vgroups 10 buffer 10
```
The above example creates a database named db with 10 vgroups, with each vnode allocated 10MB of write cache.
The above example creates a database named db with 10 vgroups, where each vnode is allocated 10MB of write buffer.
### Using the Database
@ -91,7 +94,7 @@ The above example creates a database named db with 10 vgroups, with each vnode a
USE db_name;
```
Use/switch the database (invalid in REST connection mode).
Use/switch database (not valid in REST connection mode).
## Delete Database
@ -99,7 +102,7 @@ Use/switch the database (invalid in REST connection mode).
DROP DATABASE [IF EXISTS] db_name
```
Delete the database. All data tables contained in the specified database will be deleted, and all vgroups of that database will also be completely destroyed. Please use with caution!
Deletes the database. All tables contained in the Database will be deleted, and all vgroups of that database will also be destroyed, so use with caution!
## Modify Database Parameters
@ -121,49 +124,50 @@ alter_database_option: {
| KEEP value
| WAL_RETENTION_PERIOD value
| WAL_RETENTION_SIZE value
| MINROWS value
}
```
### Modify CACHESIZE
Modifying database parameters is straightforward; however, determining whether to modify and how to modify can be challenging. This section describes how to assess whether the database's cachesize is sufficient.
The command to modify database parameters is simple, but the difficulty lies in determining whether a modification is needed and how to modify it. This section describes how to judge whether the cachesize is sufficient.
1. How to check cachesize?
1. How to view cachesize?
You can check the specific values of cachesize by executing `select * from information_schema.ins_databases;`.
You can view the specific values of these cachesize through select * from information_schema.ins_databases;.
2. How to check cacheload?
2. How to view cacheload?
You can view cacheload by executing `show <db_name>.vgroups;`.
You can view cacheload through show \<db_name>.vgroups;
3. Assessing whether cachesize is sufficient
3. Determine if cachesize is sufficient
If cacheload is very close to cachesize, then cachesize may be too small. If cacheload is significantly less than cachesize, then cachesize is sufficient. You can use this principle to determine whether a modification is necessary. The specific modified value can be determined based on the available system memory.
If cacheload is very close to cachesize, then cachesize may be too small. If cacheload is significantly less than cachesize, then cachesize is sufficient. You can decide whether to modify cachesize based on this principle. The specific modification value can be determined based on the available system memory, whether to double it or increase it several times.
4. STT_TRIGGER
4. stt_trigger
Please stop database writes before modifying the stt_trigger parameter.
Please stop database writing before modifying the stt_trigger parameter.
:::note
Other parameters are not supported for modification in version 3.0.0.0.
Other parameters are not supported for modification in version 3.0.0.0
:::
## View Databases
## View Database
### View All Databases in the System
### View all databases in the system
```sql
SHOW DATABASES;
```
### Show the Creation Statement of a Database
### Display a database's creation statement
```sql
SHOW CREATE DATABASE db_name \G;
```
Commonly used for database migration. For an existing database, it returns its creation statement; executing this statement in another cluster will create a Database with identical settings.
Commonly used for database migration. For an existing database, it returns its creation statement; executing this statement in another cluster will result in a Database with the exact same settings.
### View Database Parameters
@ -171,7 +175,7 @@ Commonly used for database migration. For an existing database, it returns its c
SELECT * FROM INFORMATION_SCHEMA.INS_DATABASES WHERE NAME='db_name' \G;
```
This will list the configuration parameters of the specified database, with each row showing a single parameter.
Lists the configuration parameters of the specified database, displaying one parameter per line.
## Delete Expired Data
@ -179,33 +183,33 @@ This will list the configuration parameters of the specified database, with each
TRIM DATABASE db_name;
```
Delete expired data and reorganize data based on the multi-level storage configuration.
Deletes expired data and reorganizes data according to the multi-level storage configuration.
## Flush In-Memory Data
## Flush Memory Data to Disk
```sql
FLUSH DATABASE db_name;
```
Flush data in memory. Executing this command before shutting down a node can avoid data playback after a restart, speeding up the startup process.
Flushes data in memory to disk. Executing this command before shutting down a node can avoid data replay after restart, speeding up the startup process.
## Adjust the Distribution of VNODEs in a VGROUP
## Adjust the Distribution of VNODEs in VGROUP
```sql
REDISTRIBUTE VGROUP vgroup_no DNODE dnode_id1 [DNODE dnode_id2] [DNODE dnode_id3]
```
Adjust the distribution of vnodes in the vgroup according to the given dnode list. Since the maximum number of replicas is 3, a maximum of 3 dnodes can be input.
Adjusts the distribution of vnodes in a vgroup according to the given list of dnodes. Since the maximum number of replicas is 3, a maximum of 3 dnodes can be entered.
## Automatically Adjust the Distribution of VNODEs in a VGROUP
## Automatically Adjust the Distribution of VNODEs in VGROUP
```sql
BALANCE VGROUP
```
Automatically adjusts the distribution of vnodes in all vgroups in the cluster, performing data load balancing at the vnode level.
Automatically adjusts the distribution of vnodes in all vgroups of the cluster, equivalent to performing data load balancing at the vnode level for the cluster.
## View Database Work Status
## Check Database Working Status
```sql
SHOW db_name.ALIVE;
@ -226,4 +230,4 @@ SHOW db_name.disk_info;
```
View the compression ratio and disk usage of the database db_name
This command is essentially equivalent to `select sum(data1 + data2 + data3)/sum(raw_data), sum(data1 + data2 + data3) from information_schema.ins_disk_usage where db_name="dbname"`
This command is essentially equivalent to `select sum(data1 + data2 + data3)/sum(raw_data), sum(data1 + data2 + data3) from information_schema.ins_disk_usage where db_name="dbname"`

View File

@ -1,12 +1,11 @@
---
title: Manage Tables
description: Various management operations on tables
title: Tables
slug: /tdengine-reference/sql-manual/manage-tables
---
## Create Table
The `CREATE TABLE` statement is used to create basic tables and subtables based on supertables as templates.
The `CREATE TABLE` statement is used to create basic tables and subtables using a supertable as a template.
```sql
CREATE TABLE [IF NOT EXISTS] [db_name.]tb_name (create_definition [, create_definition] ...) [table_options]
@ -36,23 +35,24 @@ table_option: {
| SMA(col_name [, col_name] ...)
| TTL value
}
```
**Usage Instructions**
**Usage Notes**
1. The naming rules for table (column) names refer to [Naming Rules](../names/).
2. The maximum length of a table name is 192.
3. The first field of a table must be TIMESTAMP, which is automatically set as the primary key by the system.
4. In addition to the timestamp primary key column, a second column can also be designated as an additional primary key using the PRIMARY KEY keyword. The second column designated as the primary key must be of integer or string type (varchar).
5. The length of each row in a table cannot exceed 48KB (64KB starting from version 3.0.5.0); (Note: Each BINARY/NCHAR/GEOMETRY type column will also occupy an additional 2 bytes of storage space).
6. For the data types BINARY/NCHAR/GEOMETRY, the maximum byte size must be specified, such as BINARY(20), indicating 20 bytes.
7. For the usage of `ENCODE` and `COMPRESS`, please refer to [Column Compression](../manage-data-compression/).
1. For table (column) naming conventions, see [Naming Rules](../names/).
2. The maximum length for table names is 192 characters.
3. The first field of the table must be TIMESTAMP, and the system automatically sets it as the primary key.
4. In addition to the timestamp primary key column, a second column can be designated as an additional primary key column using the PRIMARY KEY keyword. The second column designated as a primary key must be of integer or string type (VARCHAR).
5. The maximum row length of a table cannot exceed 48KB (from version 3.0.5.0 onwards, 64KB); (Note: Each VARCHAR/NCHAR/GEOMETRY type column will also occupy an additional 2 bytes of storage space).
6. When using data types VARCHAR/NCHAR/GEOMETRY, specify the maximum number of bytes, e.g., VARCHAR(20) indicates 20 bytes.
7. For the use of `ENCODE` and `COMPRESS`, please refer to [Column Compression](../manage-data-compression/)
**Parameter Description**
1. COMMENT: Table comment. Applicable to supertables, subtables, and basic tables. The maximum length is 1024 bytes.
2. SMA: Small Materialized Aggregates, provides a custom pre-computation feature based on data blocks. The pre-computation types include MAX, MIN, and SUM. Applicable to supertables/basic tables.
3. TTL: Time to Live, is a parameter used to specify the lifecycle of the table. If this parameter is specified when creating the table, TDengine will automatically delete the table after its existence exceeds the specified TTL. This TTL time is approximate, and the system does not guarantee deletion at the specified time, but it ensures that such a mechanism exists and that it will eventually be deleted. TTL is measured in days, with a range of [0, 2147483647], defaulting to 0, which means no limit; the expiration time is the table creation time plus the TTL time. TTL is not related to the KEEP parameter of the database; if KEEP is smaller than TTL, the data may be deleted before the table is deleted.
1. COMMENT: Table comment. Can be used for supertables, subtables, and basic tables. The maximum length is 1024 bytes.
2. SMA: Small Materialized Aggregates, provides custom pre-computation based on data blocks. Pre-computation types include MAX, MIN, and SUM. Available for supertables/basic tables.
3. TTL: Time to Live, a parameter used by users to specify the lifespan of a table. If this parameter is specified when creating a table, TDengine automatically deletes the table after its existence exceeds the specified TTL time. This TTL time is approximate, the system does not guarantee deletion at the exact time but ensures that such a mechanism exists and will eventually delete it. TTL is measured in days, with a range of [0, 2147483647], defaulting to 0, meaning no limit, with the expiration time being the table creation time plus TTL time. TTL is not associated with the database KEEP parameter; if KEEP is smaller than TTL, data may be deleted before the table is removed.
## Create Subtable
@ -68,17 +68,17 @@ CREATE TABLE [IF NOT EXISTS] tb_name USING stb_name TAGS (tag_value1, ...);
CREATE TABLE [IF NOT EXISTS] tb_name USING stb_name (tag_name1, ...) TAGS (tag_value1, ...);
```
This creates a data table using the specified supertable as a template and can also specify values for some tag columns (unspecified tag columns will be set to NULL).
Using the specified supertable as a template, you can also create tables by specifying some of the TAGS column values (TAGS columns that are not specified will be set to null values).
### Batch Create Subtables
### Batch creation of subtables
```sql
CREATE TABLE [IF NOT EXISTS] tb_name1 USING stb_name TAGS (tag_value1, ...) [IF NOT EXISTS] tb_name2 USING stb_name TAGS (tag_value2, ...) ...;
```
The batch table creation method requires that the data tables be based on a supertable as a template. Within the limits of SQL statement length, it is recommended to control the number of tables in a single statement to be between 1000 and 3000 to achieve optimal table creation speed.
The batch table creation method requires that the tables must use a supertable as a template. Under the premise of not exceeding the SQL statement length limit, it is recommended to control the number of tables created in a single statement between 1000 and 3000 to achieve an ideal table creation speed.
### Use CSV to Batch Create Subtables
### Using CSV to batch create subtables
```sql
CREATE TABLE [IF NOT EXISTS] USING [db_name.]stb_name (field1_name [, field2_name] ....) FILE csv_file_path;
@ -86,11 +86,11 @@ CREATE TABLE [IF NOT EXISTS] USING [db_name.]stb_name (field1_name [, field2_nam
**Parameter Description**
1. The FILE syntax indicates that the data comes from a CSV file (comma-separated, with each value enclosed in single quotes). The CSV file does not require a header. The CSV file should contain only the table name and tag values. For inserting data, please refer to the data writing chapter.
2. The specified stb_name must already exist for creating the subtable.
3. The order of the field_name list must match the order of the columns in the CSV file. The list cannot contain duplicates and must include `tbname`, which can include zero or more of the already defined tag columns in the supertable. Tag values not included in the list will be set to NULL.
1. FILE syntax indicates that the data comes from a CSV file (separated by English commas, each value enclosed in English single quotes), and the CSV file does not need a header. The CSV file should only contain the table name and tag values. If you need to insert data, please refer to the 'Data Writing' section.
2. Create subtables for the specified stb_name, which must already exist.
3. The order of the field_name list must be consistent with the order of the columns in the CSV file. The list must not contain duplicates and must include `tbname`, and it may contain zero or more tag columns already defined in the supertable. Tag values not included in the list will be set to NULL.
## Modify Basic Table
## Modify basic tables
```sql
ALTER TABLE [db_name.]tb_name alter_table_clause
@ -110,47 +110,48 @@ alter_table_option: {
TTL value
| COMMENT 'string_value'
}
```
**Usage Instructions**
The following modifications can be performed on basic tables:
The following modifications can be made to basic tables:
1. ADD COLUMN: Add a column.
2. DROP COLUMN: Delete a column.
3. MODIFY COLUMN: Modify the column definition. If the data column type is variable-length, this command can be used to modify its width, but it can only be increased, not decreased.
3. MODIFY COLUMN: Modify the column definition. If the data column type is variable length, this command can be used to increase its width, but not decrease it.
4. RENAME COLUMN: Change the column name.
5. The primary key column of a basic table cannot be modified, nor can primary key columns be added or deleted using ADD/DROP COLUMN.
5. The primary key columns of basic tables cannot be modified, nor can they be added or removed through ADD/DROP COLUMN.
**Parameter Description**
1. COMMENT: Table comment. Applicable to supertables, subtables, and basic tables. The maximum length is 1024 bytes.
2. TTL: Time to Live, is a parameter used to specify the lifecycle of the table. If this parameter is specified when creating the table, TDengine will automatically delete the table after its existence exceeds the specified TTL. This TTL time is approximate, and the system does not guarantee deletion at the specified time, but it ensures that such a mechanism exists and that it will eventually be deleted. TTL is measured in days, with a range of [0, 2147483647], defaulting to 0, which means no limit; the expiration time is the table creation time plus the TTL time. TTL is not related to the KEEP parameter of the database; if KEEP is smaller than TTL, the data may be deleted before the table is deleted.
1. COMMENT: Table comment. Can be used for supertables, subtables, and basic tables. The maximum length is 1024 bytes.
2. TTL: Time to Live, a parameter used by users to specify the lifespan of a table. If this parameter is specified when creating a table, TDengine automatically deletes the table after its existence exceeds the specified TTL time. This TTL time is approximate, and the system does not guarantee that it will definitely delete the table at that time, but only ensures that there is such a mechanism and it will eventually be deleted. The TTL unit is days, with a range of [0, 2147483647], defaulting to 0, meaning no limit, and the expiration time is the table creation time plus the TTL time. TTL is not related to the database KEEP parameter. If KEEP is smaller than TTL, data may already be deleted before the table is deleted.
### Add Column
### Add column
```sql
ALTER TABLE tb_name ADD COLUMN field_name data_type;
```
### Drop Column
### Delete column
```sql
ALTER TABLE tb_name DROP COLUMN field_name;
```
### Modify Column Width
### Modify column width
```sql
ALTER TABLE tb_name MODIFY COLUMN field_name data_type(length);
```
### Rename Column
### Change column name
```sql
ALTER TABLE tb_name RENAME COLUMN old_col_name new_col_name
```
### Modify Table Lifecycle
### Modify table lifespan
```sql
ALTER TABLE tb_name TTL value
@ -169,7 +170,7 @@ ALTER TABLE [db_name.]tb_name alter_table_clause
alter_table_clause: {
alter_table_options
| SET TAG tag_name = new_tag_value,tag_name2=new_tag2_value...
| SET tag tag_name = new_tag_value,tag_name2=new_tag2_value...
}
alter_table_options:
@ -181,22 +182,22 @@ alter_table_option: {
}
```
**Usage Instructions**
**Usage Notes**
1. For modifications to subtable columns and tags, except for changing tag values, all modifications must be made through the supertable.
1. Modifications to columns and tags of subtables, except for changing tag values, must be done through the supertable.
**Parameter Description**
1. COMMENT: Table comment. Applicable to supertables, subtables, and basic tables. The maximum length is 1024 bytes.
2. TTL: Time to Live, is a parameter used to specify the lifecycle of the table. If this parameter is specified when creating the table, TDengine will automatically delete the table after its existence exceeds the specified TTL. This TTL time is approximate, and the system does not guarantee deletion at the specified time, but it ensures that such a mechanism exists and that it will eventually be deleted. TTL is measured in days, with a range of [0, 2147483647], defaulting to 0, which means no limit; the expiration time is the table creation time plus the TTL time. TTL is not related to the KEEP parameter of the database; if KEEP is smaller than TTL, the data may be deleted before the table is deleted.
1. COMMENT: Table comment. Can be used for supertables, subtables, and regular tables. The maximum length is 1024 bytes.
2. TTL: Time to Live, a parameter used by users to specify the lifespan of a table. If this parameter is specified when creating a table, TDengine automatically deletes the table after its existence exceeds the time specified by TTL. This TTL time is approximate; the system does not guarantee that it will delete the table exactly at that time, but it ensures that there is such a mechanism and it will eventually delete the table. TTL is measured in days, with a range of [0, 2147483647], default is 0, meaning no limit, and the expiration time is the table creation time plus TTL time. TTL is not related to the database KEEP parameter; if KEEP is smaller than TTL, data might be deleted before the table is.
### Modify Subtable Tag Value
```sql
ALTER TABLE tb_name SET TAG tag_name=new_tag_value;
ALTER TABLE tb_name SET tag tag_name=new_tag_value;
```
### Modify Table Lifecycle
### Modify Table Lifespan
```sql
ALTER TABLE tb_name TTL value
@ -208,24 +209,24 @@ ALTER TABLE tb_name TTL value
ALTER TABLE tb_name COMMENT 'string_value'
```
## Drop Table
## Delete Table
You can drop one or more basic tables or subtables in a single SQL statement.
You can delete one or more regular tables or subtables in a single SQL statement.
```sql
DROP TABLE [IF EXISTS] [db_name.]tb_name [, [IF EXISTS] [db_name.]tb_name] ...
```
**Note**: Dropping a table does not immediately free the disk space occupied by that table; instead, it marks the data of that table as deleted, and these data will not appear in queries. However, the release of disk space will be delayed until the system automatically or manually performs data compaction.
**Note**: Deleting a table does not immediately free up the disk space occupied by the table. Instead, the table's data is marked as deleted. This data will not appear in queries, but freeing up disk space is delayed until the system automatically or the user manually reorganizes the data.
## View Table Information
### Show All Tables
The following SQL statement can list all table names in the current database.
The following SQL statement can list all the table names in the current database.
```sql
SHOW TABLES [LIKE tb_name_wildchar];
SHOW TABLES [LIKE tb_name_wildcard];
```
### Show Table Creation Statement
@ -234,7 +235,7 @@ SHOW TABLES [LIKE tb_name_wildchar];
SHOW CREATE TABLE tb_name;
```
Commonly used for database migration. For an existing table, it returns its creation statement; executing this statement in another cluster will create a table with the same structure.
Commonly used for database migration. For an existing table, it returns its creation statement; executing this statement in another cluster will produce a table with the exact same structure.
### Get Table Structure Information

View File

@ -1,6 +1,5 @@
---
title: Manage Supertables
description: Various management operations on supertables
title: Supertables
slug: /tdengine-reference/sql-manual/manage-supertables
---
@ -24,43 +23,43 @@ table_option: {
}
```
**Usage Instructions**
**Instructions**
1. The maximum number of columns in a supertable is 4096, including tag columns; the minimum number is 3, which includes one timestamp primary key, one tag column, and one data column.
2. In addition to the timestamp primary key column, a second column can also be designated as an additional primary key using the PRIMARY KEY keyword. The second column designated as the primary key must be of integer or string type (varchar).
3. The `TAGS` syntax specifies the tag columns for the supertable, which must adhere to the following conventions:
- The TIMESTAMP data type in a tag columnrequires a provided value when writing data and does not support arithmetic operations, such as NOW + 10s.
- Tag column names cannot be the same as other column names.
- Tag column names cannot be reserved keywords.
- Tag allows a maximum of 128 tags, at least 1, with a total length not exceeding 16 KB.
4. For the usage of `ENCODE` and `COMPRESS`, please refer to [Column Compression](../manage-data-compression/).
5. For parameter descriptions in table_options, please refer to [Table Creation SQL Description](../manage-tables/).
1. The maximum number of columns in a supertable is 4096, including tag columns, with a minimum of 3 columns: a timestamp primary key, one tag column, and one data column.
2. Besides the timestamp primary key column, a second column can be designated as an additional primary key using the PRIMARY KEY keyword. This second primary key column must be of integer or string type (varchar).
3. TAGS syntax specifies the label columns of the supertable, which must adhere to the following conventions:
- The TIMESTAMP column in TAGS requires a given value when writing data and does not support arithmetic operations, such as expressions like NOW + 10s.
- TAGS column names cannot be the same as other column names.
- TAGS column names cannot be reserved keywords.
- TAGS can have up to 128 columns, at least 1, with a total length not exceeding 16 KB.
4. For the use of `ENCODE` and `COMPRESS`, please refer to [Column Compression](../manage-data-compression/)
5. For explanations of parameters in table_option, please refer to [Table SQL Description](../manage-tables/)
## View Supertable
## View Supertables
### Show All Supertables in the Current Database
### Show all supertable information in the current database
```sql
SHOW STABLES [LIKE tb_name_wildcard];
```
View all supertables within the database.
View all supertables in the database.
### Show Creation Statement of a Supertable
### Show the creation statement of a supertable
```sql
SHOW CREATE STABLE stb_name;
```
Commonly used for database migration. For an existing supertable, it returns its creation statement; executing this statement in another cluster will create a supertable with the same structure.
Commonly used for database migration. For an existing supertable, it returns its creation statement; executing this statement in another cluster will create a supertable with the exact same structure.
### Get Structure Information of a Supertable
### Get the structure information of a supertable
```sql
DESCRIBE [db_name.]stb_name;
```
### Get Tag Information of All Subtables in a Supertable
### Get the tag information of all subtables in a supertable
```sql
SHOW TABLE TAGS FROM table_name [FROM db_name];
@ -77,9 +76,9 @@ taos> SHOW TABLE TAGS FROM st1;
Query OK, 3 rows in database (0.004455s)
```
The first column of the returned result set is the subtable name, and the subsequent columns are the tag columns.
The first column of the result set is the subtable name, followed by columns for tags.
If you already know the tag column names, you can use the following statement to obtain the values of the specified tag columns.
If you already know the name of the tag column, you can use the following statement to get the value of a specific tag column.
```text
taos> SELECT DISTINCT TBNAME, id FROM st1;
@ -91,9 +90,9 @@ taos> SELECT DISTINCT TBNAME, id FROM st1;
Query OK, 3 rows in database (0.002891s)
```
Note that the DISTINCT in the SELECT statement and TBNAME are both essential; TDengine will optimize the statement based on these to ensure that tag values are returned correctly and quickly, regardless of whether there is no data or the data volume is very large.
It should be noted that both DISTINCT and TBNAME in the SELECT statement are essential. TDengine optimizes the statement based on them, allowing it to return tag values correctly and quickly, whether there is no data or an abundance of data.
### Get Tag Information of a Specific Subtable
### Retrieve tag information for a specific subtable
```text
taos> SHOW TAGS FROM st1s1;
@ -104,7 +103,7 @@ taos> SHOW TAGS FROM st1s1;
Query OK, 2 rows in database (0.003684s)
```
Similarly, you can also use a SELECT statement to query the values of specified tag columns.
Similarly, you can also use the SELECT statement to query the values of specified tag columns.
```text
taos> SELECT DISTINCT TBNAME, id, loc FROM st1s1;
@ -114,19 +113,15 @@ taos> SELECT DISTINCT TBNAME, id, loc FROM st1s1;
Query OK, 1 rows in database (0.001884s)
```
## Drop Supertable
## Delete Supertable
```sql
DROP STABLE [IF EXISTS] [db_name.]stb_name
```
Dropping an STABLE will automatically delete all subtables created through the STABLE and all data within those subtables.
Deleting an STable automatically removes the subtables created through the STable and all data within those subtables.
:::note
Dropping a supertable does not immediately release the disk space occupied by that table; instead, it marks the data of that table as deleted, and these data will not appear in queries. However, the release of disk space will be delayed until the system automatically or manually performs data compaction.
:::
**Note**: Deleting a supertable does not immediately free up the disk space it occupies. Instead, the data of the table is marked as deleted. These data will not appear in queries, but the release of disk space will be delayed until the system automatically or the user manually reorganizes the data.
## Modify Supertable
@ -138,10 +133,10 @@ alter_table_clause: {
| ADD COLUMN col_name column_type
| DROP COLUMN col_name
| MODIFY COLUMN col_name column_type
| ADD TAG tag_name tag_type
| DROP TAG tag_name
| MODIFY TAG tag_name tag_type
| RENAME TAG old_tag_name new_tag_name
| ADD tag tag_name tag_type
| DROP tag tag_name
| MODIFY tag tag_name tag_type
| RENAME tag old_tag_name new_tag_name
}
alter_table_options:
@ -150,20 +145,21 @@ alter_table_options:
alter_table_option: {
COMMENT 'string_value'
}
```
**Usage Instructions**
Modifying the structure of a supertable will take effect on all its subtables. It is not possible to modify the table structure for a specific subtable. Changes to tag structures must be issued on the supertable, and TDengine will automatically apply these changes to all subtables of that supertable.
Modifying the structure of a supertable affects all its subtables. It is not possible to modify the table structure for a specific subtable. Modifications to the tag structure need to be issued to the supertable, and TDengine will automatically apply them to all subtables of this supertable.
- ADD COLUMN: Add a column.
- DROP COLUMN: Remove a column.
- MODIFY COLUMN: Change the width of a column; the data column type must be nchar or binary, and this instruction can be used to increase its width only, not decrease it.
- ADD TAG: Add a tag to the supertable.
- DROP TAG: Remove a tag from the supertable. When a tag is removed from the supertable, all subtables of that supertable will automatically delete that tag as well.
- MODIFY TAG: Change the width of a tag in the supertable. The tag type can only be nchar or binary, and this instruction can be used to increase its width only, not decrease it.
- RENAME TAG: Change the name of a tag in the supertable. When a tag name is changed in the supertable, all subtables of that supertable will automatically update that tag name as well.
- Similar to basic tables, the primary key column of a supertable cannot be modified, nor can primary key columns be added or deleted using ADD/DROP COLUMN.
- DROP COLUMN: Delete a column.
- MODIFY COLUMN: Modify the width of a column. The data column types must be nchar and binary, and this command can be used to increase their width, but not decrease.
- ADD tag: Add a tag to the supertable.
- DROP tag: Remove a tag from the supertable. After a tag is removed from a supertable, it is automatically deleted from all its subtables.
- MODIFY tag: Modify the width of a tag in the supertable. The tag types can only be nchar and binary, and this command can be used to increase their width, but not decrease.
- RENAME tag: Change the name of a tag in the supertable. After a tag name is changed in a supertable, all its subtables automatically update to the new tag name.
- Like basic tables, the primary key columns of a supertable cannot be modified, nor can primary key columns be added or removed through ADD/DROP COLUMN.
### Add Column
@ -171,7 +167,7 @@ Modifying the structure of a supertable will take effect on all its subtables. I
ALTER STABLE stb_name ADD COLUMN col_name column_type;
```
### Drop Column
### Delete Column
```sql
ALTER STABLE stb_name DROP COLUMN col_name;
@ -183,48 +179,47 @@ ALTER STABLE stb_name DROP COLUMN col_name;
ALTER STABLE stb_name MODIFY COLUMN col_name data_type(length);
```
If the data column type is variable-length (BINARY or NCHAR), this instruction can be used to modify its width (only to increase, not decrease).
If the data column type is variable length (BINARY or NCHAR), this command can be used to modify its width (can only increase, not decrease).
### Add Tag
```sql
ALTER STABLE stb_name ADD TAG tag_name tag_type;
ALTER STABLE stb_name ADD tag tag_name tag_type;
```
Add a new tag to the supertable and specify the type of the new tag. The total number of tags cannot exceed 128, with a total length not exceeding 16KB.
Add a new tag to an STable and specify the type of the new tag. The total number of tags cannot exceed 128, and the total length cannot exceed 16KB.
### Drop Tag
### Delete Tag
```sql
ALTER STABLE stb_name DROP TAG tag_name;
ALTER STABLE stb_name DROP tag tag_name;
```
Remove a tag from the supertable. When a tag is removed from the supertable, all subtables of that supertable will automatically delete that tag.
Delete a tag from a supertable; after a tag is deleted from a supertable, all subtables under that supertable will automatically delete that tag.
### Rename Tag
```sql
ALTER STABLE stb_name RENAME TAG old_tag_name new_tag_name;
ALTER STABLE stb_name RENAME tag old_tag_name new_tag_name;
```
Change the name of a tag in the supertable. When a tag name is changed in the supertable, all subtables of that supertable will automatically update that tag name as well.
Change the name of a tag in a supertable; after a tag name is changed in a supertable, all subtables under that supertable will automatically update that tag name.
### Modify Tag Width
### Modify Tag Column Width
```sql
ALTER STABLE stb_name MODIFY TAG tag_name data_type(length);
ALTER STABLE stb_name MODIFY tag tag_name data_type(length);
```
If the tag type is variable-length (BINARY or NCHAR), this instruction can be used to modify its width (only to increase, not decrease). (Added in version 2.1.3.0)
If the tag type is variable length (BINARY or NCHAR), this command can be used to modify its width (can only increase, not decrease). (Added in version 2.1.3.0)
### Supertable Query
You can use the SELECT statement to perform projection and aggregation queries on supertables. In the WHERE clause, you can filter and select tags and columns.
Using the SELECT statement, you can perform projection and aggregation queries on a supertable. In the WHERE clause, you can filter and select based on tags and columns.
If no ORDER BY clause is added in the supertable query, the returned order is to first return all data of one subtable, and then return all data of the next subtable, so the returned data is unordered. If an ORDER BY clause is added, the data will be returned strictly in the order specified by the ORDER BY clause.
If the supertable query statement does not include ORDER BY, the return order is to return all data from one subtable first, then all data from the next subtable, so the returned data is unordered. If an ORDER BY clause is added, the data will be returned strictly in the order specified by the ORDER BY clause.
:::note
Except for updating the tag values, which operate on subtables, all other tag operations (adding tags, dropping tags, etc.) can only act on the STABLE and cannot operate on individual subtables. After adding a tag to the STABLE, all tables established based on that STABLE will automatically have the new tag, and all newly added tags will have their default values set to NULL.
Except for updating the value of tags, which is done on subtables, all other tag operations (adding tags, deleting tags, etc.) can only be applied to STables and not to individual subtables. After adding tags to an STable, all tables built on that STable will automatically have a new tag added, and the default value for all new tags is NULL.
:::

View File

@ -1,12 +1,11 @@
---
title: Insert Data
description: Detailed syntax for writing data
title: Data Ingestion
slug: /tdengine-reference/sql-manual/insert-data
---
## Insertion Syntax
## Writing Syntax
Data insertion supports two syntax types: normal syntax and supertable syntax. In normal syntax, the table name immediately following `INSERT INTO` is either a subtable name or a basic table name. In supertable syntax, the table name immediately following `INSERT INTO` is the supertable name.
There are two syntaxes supported for writing records: normal syntax and supertable syntax. Under normal syntax, the table name immediately following INSERT INTO is either a subtable name or a regular table name. Under supertable syntax, the table name immediately following INSERT INTO is a supertable name.
### Normal Syntax
@ -29,65 +28,64 @@ INSERT INTO tb_name [(field1_name, ...)] subquery
```sql
INSERT INTO
stb1_name [(field1_name, ...)]
stb1_name [(field1_name, ...)]
VALUES (field1_value, ...) [(field1_value2, ...) ...] | FILE csv_file_path
[stb2_name [(field1_name, ...)]
[stb2_name [(field1_name, ...)]
VALUES (field1_value, ...) [(field1_value2, ...) ...] | FILE csv_file_path
...];
```
**About Timestamps**
1. TDengine requires that the inserted data must have timestamps. Please note the following points regarding the timestamps used for data insertion:
1. TDengine requires that inserted data must have timestamps. Pay attention to the following points regarding the timestamps:
2. Different formats of timestamps will have different precision impacts. The string format of timestamps is not affected by the time precision settings of the DATABASE; however, the long integer format of timestamps is influenced by the time precision settings of the DATABASE. For example, the UNIX seconds for the timestamp "2021-07-13 16:16:48" is 1626164208. Therefore, under millisecond precision, it should be written as 1626164208000, under microsecond precision it should be written as 1626164208000000, and under nanosecond precision, it should be written as 1626164208000000000.
2. Different timestamp formats can affect precision differently. String format timestamps are not affected by the precision setting of the DATABASE they belong to; however, long integer format timestamps are affected by the DATABASE's precision setting. For example, the UNIX seconds for the timestamp "2021-07-13 16:16:48" is 1626164208. Therefore, it needs to be written as 1626164208000 in millisecond precision, 1626164208000000 in microsecond precision, and 1626164208000000000 in nanosecond precision.
3. When inserting multiple rows of data, do not set the values of the timestamp column to NOW. Otherwise, multiple records in the statement will have the same timestamp, which may result in overlaps that prevent all data rows from being saved correctly. This is because the NOW function will be parsed as the client execution time of the SQL statement, and multiple occurrences of NOW in the same statement will be replaced with the exact same timestamp value.
The earliest allowed timestamp for inserted records is relative to the current server time minus the configured KEEP value (the number of days data is retained, which can be specified when creating the database; the default value is 3650 days). The latest allowed timestamp for inserted records depends on the DATABASE's PRECISION value (timestamp precision, which can be specified when creating the database; ms indicates milliseconds, us indicates microseconds, ns indicates nanoseconds, with the default being milliseconds): if it is milliseconds or microseconds, the value is UTC 00:00:00.000 on January 1, 2970, plus 1000 years; if it is nanoseconds, the value is UTC 00:00:00.000000000 on January 1, 2262, plus 292 years.
3. When inserting multiple rows of data at once, do not set the value of the first column's timestamp to NOW for all rows. This will cause multiple records in the statement to use the same timestamp, potentially leading to data overwriting and not all rows being correctly saved. This happens because the NOW function is resolved to the client execution time of the SQL statement, and multiple NOW markers in the same statement will be replaced with the exact same timestamp value.
The oldest record timestamp allowed for insertion is relative to the current server time, minus the configured KEEP value (the number of days data is retained, which can be specified when creating the database, default is 3650 days). The newest record timestamp allowed for insertion depends on the database's PRECISION value (timestamp precision, which can be specified when creating the database, ms for milliseconds, us for microseconds, ns for nanoseconds, default is milliseconds): if it is milliseconds or microseconds, the value is January 1, 1970, 00:00:00.000 UTC plus 1000 years, i.e., January 1, 2970, 00:00:00.000 UTC; if it is nanoseconds, the value is January 1, 1970, 00:00:00.000000000 UTC plus 292 years, i.e., January 1, 2262, 00:00:00.000000000 UTC.
**Syntax Explanation**
**Syntax Notes**
1. You can specify the columns for which values are being inserted, and the database will automatically fill the unspecified columns with NULL.
1. You can specify the columns for which values are to be inserted; for columns not specified, the database will automatically fill them with NULL.
2. The VALUES syntax represents one or more rows of data to be inserted.
2. The VALUES syntax indicates the row or rows of data to be inserted.
3. The FILE syntax indicates that data comes from a CSV file (comma-separated, with values enclosed in single quotes), and the CSV file does not require a header.
3. The FILE syntax indicates that the data comes from a CSV file (comma-separated, with each value enclosed in single quotes), which does not require a header. For creating subtables only, refer to the 'Table' section.
4. Both `INSERT ... VALUES` statements and `INSERT ... FILE` statements can insert data into multiple tables within a single INSERT statement.
4. Both `INSERT ... VALUES` and `INSERT ... FILE` statements can insert data into multiple tables in a single INSERT statement.
5. The INSERT statement is completely parsed before execution, so for the following statement, there will be no data errors, but the table will be created successfully:
5. INSERT statements are fully parsed before execution, preventing situations where data errors occur but table creation succeeds.
```sql
INSERT INTO d1001 USING meters TAGS('Beijing.Chaoyang', 2) VALUES('a');
```
6. In cases where data is being inserted into multiple subtables, there can still be instances where some data writes fail while others succeed. This is because multiple subtables may be distributed across different VNODEs. The client will fully parse the INSERT statement and send the data to the involved VNODEs, each of which will independently perform the write operation. If a particular VNODE fails to write due to issues (such as network problems or disk failures), it will not affect the writes to other VNODE nodes.
```sql
INSERT INTO d1001 USING meters TAGS('Beijing.Chaoyang', 2) VALUES('a');
```
6. When inserting data into multiple subtables, there may still be cases where some data fails to write while other data writes successfully. This is because multiple subtables may be distributed across different VNODEs. After the client fully parses the INSERT statement, it sends the data to each involved VNODE, where each VNODE independently performs the write operation. If a VNODE fails to write due to some reason (such as network issues or disk failure), it will not affect the write operations of other VNODE nodes.
7. The primary key column value must be specified and cannot be NULL.
**Normal Syntax Explanation**
**Standard Syntax Explanation**
1. The USING clause is the auto-create table syntax. If the user is unsure whether a table exists when writing data, they can use the auto-create table syntax to create the non-existent table; if the table already exists, a new table will not be created. When auto-creating a table, it is required to use a supertable as a template and specify the tag values for the data table. It is also possible to specify only a portion of the tag columns, with unspecified tag columns set to NULL.
1. The USING clause is for automatic table creation syntax. If a user is unsure whether a table exists when writing data, they can use the automatic table creation syntax to create a non-existent table during data writing; if the table already exists, a new table will not be created. Automatic table creation requires using a supertable as a template and specifying the TAGS values for the data table. It is possible to specify only some TAGS column values, with unspecified TAGS columns set to NULL.
2. You can use the `INSERT ... subquery` statement to insert data from TDengine into the specified table. The subquery can be any query statement. This syntax can only be used for subtables and basic tables and does not support auto-creating tables.
2. You can use the `INSERT ... subquery` statement to insert data from TDengine into a specified table. The subquery can be any query statement. This syntax can only be used for subtables and regular tables, and does not support automatic table creation.
**Supertable Syntax Explanation**
1. The tbname column must be specified in the field_name list; otherwise, an error will occur. The tbname column is the subtable name, and its type is string. Characters do not need to be escaped and cannot include a dot '.'.
1. The tbname column must be specified in the field_name list, otherwise, it will result in an error. The tbname column is the subtable name, which is a string type. Characters do not need to be escaped and cannot include the dot '.'.
2. The field_name list supports tag columns. When the subtable already exists, specifying tag values will not trigger a change in tag values; when the subtable does not exist, it will create the subtable using the specified tag values. If no tag columns are specified, all tag values will be set to NULL.
2. The field_name list supports tag columns. When a subtable already exists, specifying tag values will not trigger a modification of the tag values; when a subtable does not exist, the specified tag values will be used to establish the subtable. If no tag columns are specified, all tag column values are set to NULL.
3. Parameter binding for insertion is not supported.
3. Parameter binding for writing is not supported.
## Insert a Record
## Inserting a Record
To write data to the database, specify the name of the already created subtable and provide one or more rows of data through the VALUES keyword. For example, executing the following statement can write one record:
Specify the table name of an already created data subtable, and provide one or more rows of data using the VALUES keyword to write these data into the database. For example, execute the following statement to write a single record:
```sql
INSERT INTO d1001 VALUES (NOW, 10.2, 219, 0.32);
```
## Insert Multiple Records
## Inserting Multiple Records
Alternatively, you can write two records with the following statement:
@ -95,38 +93,38 @@ Alternatively, you can write two records with the following statement:
INSERT INTO d1001 VALUES ('2021-07-13 14:06:32.272', 10.2, 219, 0.32) (1626164208000, 10.15, 217, 0.33);
```
## Specify Columns for Insertion
## Specifying Columns for Insertion
When inserting records into the data subtable, you can map the data to specified columns, regardless of whether you are inserting a single row or multiple rows. For the columns not mentioned in the SQL statement, the database will automatically fill them with NULL. The primary key (timestamp) cannot be NULL. For example:
When inserting records into a data subtable, whether inserting one row or multiple rows, you can map the data to specific columns. For columns not mentioned in the SQL statement, the database will automatically fill them with NULL. The primary key (timestamp) cannot be NULL. For example:
```sql
INSERT INTO d1001 (ts, current, phase) VALUES ('2021-07-13 14:06:33.196', 10.27, 0.31);
```
## Insert Records into Multiple Tables
## Inserting Records into Multiple Tables
You can insert one or multiple records into multiple tables in a single statement, and you can also specify columns during the insertion process. For example:
You can insert one or more records into multiple tables in a single statement, and also specify columns during the insertion process. For example:
```sql
INSERT INTO d1001 VALUES ('2021-07-13 14:06:34.630', 10.2, 219, 0.32) ('2021-07-13 14:06:35.779', 10.15, 217, 0.33)
d1002 (ts, current, phase) VALUES ('2021-07-13 14:06:34.255', 10.27, 0.31);
```
## Automatically Create Tables When Inserting Records
## Automatic Table Creation During Record Insertion
If the user is unsure whether a certain table exists while writing data, they can use the auto-create table syntax to create non-existent tables; if the table already exists, a new table will not be created. When auto-creating a table, it is required to use a supertable as a template and specify the tag values for the data table. For example:
If a user is unsure whether a table exists when writing data, they can use the automatic table creation syntax to create a non-existent table during data writing; if the table already exists, a new table will not be created. Automatic table creation requires using a supertable as a template and specifying the TAGS values for the data table. For example:
```sql
INSERT INTO d21001 USING meters TAGS ('California.SanFrancisco', 2) VALUES ('2021-07-13 14:06:32.272', 10.2, 219, 0.32);
```
You can also specify only some tag column values during auto-creation, and unspecified tag columns will be set to NULL. For example:
You can also specify only some TAGS column values during automatic table creation, with unspecified TAGS columns set to NULL. For example:
```sql
INSERT INTO d21001 USING meters (groupId) TAGS (2) VALUES ('2021-07-13 14:06:33.196', 10.15, 217, 0.33);
```
The auto-create table syntax also supports inserting records into multiple tables in a single statement. For example:
The automatic table creation syntax also supports inserting records into multiple tables in a single statement. For example:
```sql
INSERT INTO d21001 USING meters TAGS ('California.SanFrancisco', 2) VALUES ('2021-07-13 14:06:34.630', 10.2, 219, 0.32) ('2021-07-13 14:06:35.779', 10.15, 217, 0.33)
@ -134,41 +132,50 @@ INSERT INTO d21001 USING meters TAGS ('California.SanFrancisco', 2) VALUES ('202
d21003 USING meters (groupId) TAGS (2) (ts, current, phase) VALUES ('2021-07-13 14:06:34.255', 10.27, 0.31);
```
## Insert Records from a File
## Inserting Data Records from a File
In addition to using the VALUES keyword to insert one or multiple rows of data, you can also place the data to be written in a CSV file (comma-separated, with timestamp and string-type values enclosed in single quotes) for the SQL statement to read. The CSV file does not require a header. For example, if the content of the `/tmp/csvfile.csv` file is:
In addition to using the VALUES keyword to insert one or more rows of data, you can also place the data to be written in a CSV file (separated by commas, with timestamps and string type values enclosed in single quotes) for SQL commands to read. The CSV file does not need a header. For example, if the content of the /tmp/csvfile.csv file is:
```csv
'2021-07-13 14:07:34.630', 10.2, 219, 0.32
'2021-07-13 14:07:35.779', 10.15, 217, 0.33
```
Then you can write the data from this file into the subtable with the following statement:
Then the following command can be used to write the data in this file to the subtable:
```sql
INSERT INTO d1001 FILE '/tmp/csvfile.csv';
```
## Insert Records from a File and Automatically Create Tables
## Inserting Data Records from a File and Automatically Creating Tables
```sql
INSERT INTO d21001 USING meters TAGS ('California.SanFrancisco', 2) FILE '/tmp/csvfile.csv';
```
You can also insert records into multiple tables using the auto-create table method in a single statement. For example:
You can also insert records into multiple tables in one statement with automatic table creation. For example:
```sql
INSERT INTO d21001 USING meters TAGS ('California.SanFrancisco', 2) FILE '/tmp/csvfile_21001.csv'
d21002 USING meters (groupId) TAGS (2) FILE '/tmp/csvfile_21002.csv';
```
## Supertable Syntax
## Inserting Data into a Supertable and Automatically Creating Subtables
When automatically creating tables, the table names are specified through the tbname column.
Automatically create tables, with table names specified by the tbname column
```sql
INSERT INTO meters(tbname, location, groupId, ts, current, voltage, phase)
values('d31001', 'California.SanFrancisco', 2, '2021-07-13 14:06:34.630', 10.2, 219, 0.32)
INSERT INTO meters(tbname, location, groupId, ts, current, voltage, phase)
VALUES ('d31001', 'California.SanFrancisco', 2, '2021-07-13 14:06:34.630', 10.2, 219, 0.32)
('d31001', 'California.SanFrancisco', 2, '2021-07-13 14:06:35.779', 10.15, 217, 0.33)
('d31002', NULL, 2, '2021-07-13 14:06:34.255', 10.15, 217, 0.33)
('d31002', NULL, 2, '2021-07-13 14:06:34.255', 10.15, 217, 0.33)
```
## Inserting Data into a Supertable from a CSV File and Automatically Creating Subtables
Create subtables for the supertable based on the contents of the CSV file, and populate the respective columns and tags
```sql
INSERT INTO meters(tbname, location, groupId, ts, current, voltage, phase)
FILE '/tmp/csvfile_21002.csv'
```

View File

@ -1,6 +1,5 @@
---
title: Query Data
description: Detailed syntax for querying data
title: Data Querying
slug: /tdengine-reference/sql-manual/query-data
---
@ -16,7 +15,7 @@ SELECT [hints] [DISTINCT] [TAGS] select_list
[interp_clause]
[window_clause]
[group_by_clause]
[order_by_clause]
[order_by_clasue]
[SLIMIT limit_val [SOFFSET offset_val]]
[LIMIT limit_val [OFFSET offset_val]]
[>> export_file]
@ -74,11 +73,10 @@ partition_by_expr:
group_by_clause:
GROUP BY group_by_expr [, group_by_expr] ... HAVING condition
group_by_expr:
{expr | position | c_alias}
order_by_clause:
order_by_clasue:
ORDER BY order_expr [, order_expr] ...
order_expr:
@ -87,69 +85,70 @@ order_expr:
## Hints
Hints are a means for users to control query optimization for individual statements. Hints that do not apply to the current query will be automatically ignored. The specifics are as follows:
Hints are a means for users to control the optimization of individual statement queries. When a Hint is not applicable to the current query statement, it will be automatically ignored. The specific instructions are as follows:
- Hints syntax starts with `/*+` and ends with `*/`, with optional spaces around.
- Hints syntax starts with `/*+` and ends with `*/`, spaces may exist before and after.
- Hints syntax can only follow the SELECT keyword.
- Each hint can contain multiple hints, separated by spaces. When multiple hints conflict or are the same, the first one takes precedence.
- If an error occurs in any hint, all valid hints before the error remain effective, while the current and subsequent hints are ignored.
- hint_param_list consists of parameters for each hint, varying by hint type.
- Each Hints can contain multiple Hints, separated by spaces. If multiple Hints conflict or are the same, the first one prevails.
- If an error occurs in one of the Hints, the valid Hints before the error remain effective, and the current and subsequent Hints are ignored.
- hint_param_list is the parameter list for each Hint, which varies depending on the Hint.
The currently supported list of hints is as follows:
The currently supported Hints list is as follows:
| **Hint** | **Parameters** | **Description** | **Applicable Scope** |
| **Hint** | **Parameter** | **Description** | **Applicable Scope** |
| :-----------: | -------------- | -------------------------- | -----------------------------|
| BATCH_SCAN | None | Use batch reading from the table | Supertable JOIN statements |
| NO_BATCH_SCAN | None | Use sequential reading from the table | Supertable JOIN statements |
| SORT_FOR_GROUP| None | Use sorting for grouping, conflicts with PARTITION_FIRST | When partition by list has ordinary columns |
| PARTITION_FIRST| None | Use partition calculations for grouping before aggregation, conflicts with SORT_FOR_GROUP | When partition by list has ordinary columns |
| PARA_TABLES_SORT| None | When sorting supertable data by timestamp, do not use temporary disk space, only memory. This can consume large amounts of memory and may cause OOM when the number of subtables is high or row length is large. | When sorting supertable data by timestamp |
| SMALLDATA_TS_SORT| None | When sorting supertable data by timestamp, if the length of the queried columns is greater than or equal to 256 but the number of rows is not large, this hint can improve performance | When sorting supertable data by timestamp |
| SKIP_TSMA | None | Used to disable TSMA query optimization | Queries with Agg functions |
| BATCH_SCAN | None | Use batch table reading | Supertable JOIN statements |
| NO_BATCH_SCAN | None | Use sequential table reading | Supertable JOIN statements |
| SORT_FOR_GROUP| None | Use sort method for grouping, conflicts with PARTITION_FIRST | When partition by list includes regular columns |
| PARTITION_FIRST| None | Use PARTITION to calculate groups before aggregation, conflicts with SORT_FOR_GROUP | When partition by list includes regular columns |
| PARA_TABLES_SORT| None | When sorting supertable data by timestamp, use memory instead of temporary disk space. When there are many subtables and rows are large, it will use a lot of memory and may cause OOM | When sorting supertable data by timestamp |
| SMALLDATA_TS_SORT| None | When sorting supertable data by timestamp, if the query column length is greater than or equal to 256 but the number of rows is not large, using this hint can improve performance | When sorting supertable data by timestamp |
| SKIP_TSMA | None | Explicitly disable TSMA query optimization | Queries with Agg functions |
Examples:
```sql
SELECT /*+ BATCH_SCAN() */ a.ts FROM stable1 a, stable2 b WHERE a.tag0 = b.tag0 AND a.ts = b.ts;
SELECT /*+ BATCH_SCAN() */ a.ts FROM stable1 a, stable2 b where a.tag0 = b.tag0 and a.ts = b.ts;
SELECT /*+ SORT_FOR_GROUP() */ count(*), c1 FROM stable1 PARTITION BY c1;
SELECT /*+ PARTITION_FIRST() */ count(*), c1 FROM stable1 PARTITION BY c1;
SELECT /*+ PARA_TABLES_SORT() */ * FROM stable1 ORDER BY ts;
SELECT /*+ SMALLDATA_TS_SORT() */ * FROM stable1 ORDER BY ts;
SELECT /*+ PARA_TABLES_SORT() */ * from stable1 order by ts;
SELECT /*+ SMALLDATA_TS_SORT() */ * from stable1 order by ts;
```
## Lists
## List
Query statements can specify some or all columns as return results. Both data columns and tag columns can appear in the list.
Query statements can specify some or all columns as the return results. Both data columns and tag columns can appear in the list.
### Wildcards
### Wildcard
The wildcard `*` can be used to refer to all columns. For basic tables and subtables, the result only includes normal columns. For supertables, it also includes tag columns.
The wildcard * can be used to refer to all columns. For basic tables and subtables, only regular columns are included in the results. For supertables, tag columns are also included.
```sql
SELECT * FROM d1001;
```
The wildcard supports prefixes on the table name; the following two SQL statements return all columns:
The wildcard supports table name prefixes, the following two SQL statements both return all columns:
```sql
SELECT * FROM d1001;
SELECT d1001.* FROM d1001;
```
In JOIN queries, using `*` with or without a table prefix returns different results. The `*` returns all columns from all tables (excluding tags), while the prefixed wildcard returns only the column data from that table.
In JOIN queries, there is a difference between a prefixed *and an unprefixed*; * returns all column data from all tables (excluding tags), while a prefixed wildcard returns only the column data from that table.
```sql
SELECT * FROM d1001, d1003 WHERE d1001.ts = d1003.ts;
SELECT d1001.* FROM d1001, d1003 WHERE d1001.ts = d1003.ts;
SELECT * FROM d1001, d1003 WHERE d1001.ts=d1003.ts;
SELECT d1001.* FROM d1001,d1003 WHERE d1001.ts = d1003.ts;
```
In the above query statements, the first one returns all columns from both d1001 and d1003, while the second one only returns all columns from d1001.
In the above query statements, the former returns all columns from both d1001 and d1003, while the latter only returns all columns from d1001.
When using SQL functions for queries, some SQL functions support wildcard operations. The difference is that the `count(*)` function returns only one column, while `first`, `last`, and `last_row` functions return all columns.
In the process of using SQL functions for queries, some SQL functions support wildcard operations. The difference is:
The `count(*)` function only returns one column. The `first`, `last`, `last_row` functions return all columns.
### Tag Columns
In queries for supertables and subtables, you can specify _tag columns_, and the values of tag columns will be returned along with the data in normal columns.
In queries involving supertables and subtables, *tag columns* can be specified, and the values of the tag columns are returned along with the data of the regular columns.
```sql
SELECT location, groupid, current FROM d1001 LIMIT 2;
@ -157,19 +156,19 @@ SELECT location, groupid, current FROM d1001 LIMIT 2;
### Aliases
The naming rules for aliases are the same as for columns, and UTF-8 encoded Chinese aliases are supported.
The naming rules for aliases are the same as for columns, supporting direct specification of Chinese aliases in UTF-8 encoding format.
### Result Deduplication
### Deduplication of Results
The `DISTINCT` keyword can deduplicate one or more columns in the result set, removing duplicates from both tag columns and data columns.
The `DISTINCT` keyword can be used to deduplicate one or more columns in the result set, and the columns can be either tag columns or data columns.
Deduplicating tag columns:
Deduplication of tag columns:
```sql
SELECT DISTINCT tag_name [, tag_name ...] FROM stb_name;
```
Deduplicating data columns:
Deduplication of data columns:
```sql
SELECT DISTINCT col_name [, col_name ...] FROM tb_name;
@ -177,16 +176,16 @@ SELECT DISTINCT col_name [, col_name ...] FROM tb_name;
:::info
1. The configuration parameter `maxNumOfDistinctRes` in the cfg file limits the number of data rows that DISTINCT can output. Its minimum value is 100,000, the maximum is 100,000,000, and the default is 10,000,000. If the actual calculation result exceeds this limit, only a portion within this range will be output.
2. Due to the inherent precision mechanisms of floating-point numbers, using DISTINCT on FLOAT and DOUBLE columns does not guarantee completely unique output values in certain cases.
1. The configuration parameter maxNumOfDistinctRes in the cfg file limits the number of data rows that DISTINCT can output. The minimum value is 100000, the maximum value is 100000000, and the default value is 10000000. If the actual calculation result exceeds this limit, only a portion within this range will be output.
2. Due to the inherent precision mechanism of floating-point numbers, using DISTINCT on FLOAT and DOUBLE columns may not guarantee the complete uniqueness of the output values under specific conditions.
:::
### Tag Queries
### Tag Query
When the queried columns consist only of tag columns, the `TAGS` keyword can be used to specify the return of all subtables' tag columns. Each subtable returns only one row of tag columns.
When only tag columns are queried, the `TAGS` keyword can specify the return of tag columns for all subtables. Each subtable returns one row of tag columns.
Return all subtables' tag columns:
Return the tag columns of all subtables:
```sql
SELECT TAGS tag_name [, tag_name ...] FROM stb_name
@ -194,20 +193,20 @@ SELECT TAGS tag_name [, tag_name ...] FROM stb_name
### Result Set Column Names
In the `SELECT` clause, if the column names for the return result set are not specified, the result set's column names will default to the expression names in the `SELECT` clause. Additionally, users can use `AS` to rename the column names in the return result set. For example:
In the `SELECT` clause, if the column names of the result set are not specified, the default column names of the result set use the expression names in the `SELECT` clause. Additionally, users can use `AS` to rename the columns in the result set. For example:
```sql
taos> SELECT ts, ts AS primary_key_ts FROM d1001;
```
However, the functions `first(*)`, `last(*)`, and `last_row(*)` do not support renaming single columns.
However, renaming individual columns is not supported for `first(*)`, `last(*)`, `last_row(*)`.
### Pseudo Columns
**Pseudo Columns**: Pseudo columns behave similarly to regular data columns but are not actually stored in the table. You can query pseudo columns, but you cannot insert, update, or delete them. Pseudo columns are akin to functions without parameters. Below are the available pseudo columns:
**Pseudo Columns**: The behavior of pseudocolumns is similar to regular data columns, but they are not actually stored in the table. Pseudocolumns can be queried, but cannot be inserted, updated, or deleted. Pseudo columns are somewhat like functions without parameters. Below are the available pseudo columns:
**TBNAME**
`TBNAME` can be regarded as a special tag in the supertable, representing the name of the subtable.
`TBNAME` can be considered a special tag in a supertable, representing the table name of a subtable.
Retrieve all subtable names and related tag information from a supertable:
@ -215,129 +214,129 @@ Retrieve all subtable names and related tag information from a supertable:
SELECT TAGS TBNAME, location FROM meters;
```
It is recommended to use the INS_TAGS system table under INFORMATION_SCHEMA to query the tag information of subtables under supertables, for example, to retrieve all subtable names and tag values from the supertable `meters`:
It is recommended that users query the subtable tag information of supertables using the INS_TAGS system table under INFORMATION_SCHEMA, for example, to get all subtable names and tag values of the supertable meters:
```mysql
SELECT table_name, tag_name, tag_type, tag_value FROM information_schema.ins_tags WHERE stable_name='meters';
```
Count the number of subtables under the supertable:
Count the number of subtables under a supertable:
```mysql
SELECT COUNT(*) FROM (SELECT DISTINCT TBNAME FROM meters);
```
The above two queries only support adding filters for tags (TAGS) in the WHERE condition clause.
Both queries only support adding filtering conditions for tags (TAGS) in the WHERE clause.
**\_QSTART/\_QEND**
\_qstart and \_qend represent the query time range specified by the timestamp condition in the WHERE clause. If there is no valid timestamp condition in the WHERE clause, the time range is [-2^63, 2^63-1].
\_qstart and \_qend represent the query time range input by the user, i.e., the time range limited by the primary key timestamp condition in the WHERE clause. If there is no valid primary key timestamp condition in the WHERE clause, the time range is [-2^63, 2^63-1].
\_qstart and \_qend cannot be used in the WHERE clause.
**\_WSTART/\_WEND/\_WDURATION**
\_wstart, \_wend, and \_wduration pseudo columns
\_wstart indicates the starting timestamp of the window, \_wend indicates the ending timestamp of the window, and \_wduration indicates the duration of the window.
\_wstart pseudocolumn, \_wend pseudo column, and \_wduration pseudo column
\_wstart represents the window start timestamp, \_wend represents the window end timestamp, \_wduration represents the window duration.
These three pseudo columns can only be used in time-windowed split queries and must appear after the window-splitting clause.
These three pseudocolumns can only be used in window slicing queries within time windows, and must appear after the window slicing clause.
**\_c0/\_ROWTS**
In TDengine, every table must have a timestamp type as its primary key, and the pseudo columns \_rowts and \_c0 both represent the value of this column. Compared to the actual primary key timestamp column, using pseudo columns is more flexible and has clearer semantics. For example, they can be used with functions like max and min.
In TDengine, the first column of all tables must be of timestamp type and serve as the primary key. The pseudocolumns `_rowts` and `_c0` both represent the value of this column. Compared to the actual primary key timestamp column, using pseudo-columns is more flexible and semantically standard. For example, they can be used with functions like max and min.
```sql
SELECT _rowts, MAX(current) FROM meters;
select _rowts, max(current) from meters;
```
**\_IROWTS**
The \_irowts pseudo column can only be used with the interp function to return the timestamps corresponding to the results of the interp function.
The `_irowts` pseudocolumn can only be used with the interp function to return the timestamp column corresponding to the interpolation result of the interp function.
```sql
SELECT _irowts, interp(current) FROM meters RANGE('2020-01-01 10:00:00', '2020-01-01 10:30:00') EVERY(1s) FILL(linear);
select _irowts, interp(current) from meters range('2020-01-01 10:00:00', '2020-01-01 10:30:00') every(1s) fill(linear);
```
** \_IROWTS\_ORIGIN**
Pseudo column `_irowts_origin` is used to get the original timestamp of the row used for filling. It can only be used with the INTERP query. `_irowts_origin` is not supported in stream. Only FILL PREV/NEXT/NEAR is supported. If there is not data in range, return NULL.
**\_IROWTS\_ORIGIN**
The `_irowts_origin` pseudocolumn can only be used with the interp function, is not supported in stream computing, and is only applicable for FILL types PREV/NEXT/NEAR. It returns the timestamp column of the original data used by the interp function. If there are no values within the range, it returns NULL.
```sql
SELECT _irowts_origin, interp(current) FROM meters RANGE('2020-01-01 10:00:00', '2020-01-01 10:30:00') EVERY(1s) FILL(PREV);
select _iorwts_origin, interp(current) from meters range('2020-01-01 10:00:00', '2020-01-01 10:30:00') every(1s) fill(NEXT);
```
## Query Objects
After the FROM keyword, there can be several table (supertable) lists, or the results of subqueries.
If you do not specify the user's current database, you can specify the database to which the table belongs by using the database name before the table name. For example: `power.d1001` can be used to access tables across databases.
The FROM keyword can be followed by a list of tables (supertables) or the result of a subquery.
If the user's current database is not specified, the database name can be used before the table name to specify the database to which the table belongs. For example, using `power.d1001` to cross-database use tables.
TDengine supports INNER JOIN based on the timestamp primary key, with the following rules:
1. It supports both table lists after the FROM keyword and explicit JOIN clauses.
2. For basic tables and subtables, the ON condition must only include equal conditions for the timestamp primary key.
3. For supertables, the ON condition must include equal conditions for the timestamp primary key as well as tag columns that correspond one-to-one. OR conditions are not supported.
4. The tables involved in the JOIN computation must be of the same type; they can only be supertables, subtables, or basic tables.
5. Both sides of the JOIN support subqueries.
6. JOIN cannot be mixed with the FILL clause.
1. Supports both FROM table list and explicit JOIN clause syntax.
2. For basic tables and subtables, the ON condition must have and only have an equality condition on the timestamp primary key.
3. For supertables, in addition to the equality condition on the timestamp primary key, the ON condition also requires an equality condition on the label columns that can be corresponded one-to-one, and does not support OR conditions.
4. Tables involved in JOIN calculations must be of the same type, i.e., all must be supertables, subtables, or basic tables.
5. Both sides of JOIN support subqueries.
6. Does not support mixing with the FILL clause.
## GROUP BY
If both GROUP BY and SELECT clauses are specified in the statement, then the SELECT list can only include the following expressions:
If a GROUP BY clause is specified in the statement, the SELECT list can only contain the following expressions:
1. Constants
2. Aggregate functions
3. Expressions that match the GROUP BY expression.
4. Expressions that include the above expressions.
3. Expressions identical to those after GROUP BY.
4. Expressions containing the above expressions
The GROUP BY clause groups the data based on the values of the expressions following GROUP BY, returning one row of summary information for each group.
The GROUP BY clause groups each row of data according to the value of the expression after GROUP BY and returns a summary row for each group.
The GROUP BY clause can group by any columns in a table or view by specifying the column names, and these columns do not need to appear in the SELECT list.
The GROUP BY clause can group by any column in the table or view by specifying the column name, which does not need to appear in the SELECT list.
The GROUP BY clause can use positional syntax, where the position is a positive integer starting from 1, representing the expression's position in the SELECT list to group by.
The GROUP BY clause can use positional syntax, where the position is a positive integer starting from 1, indicating the grouping by the nth expression in the SELECT list.
The GROUP BY clause can use result set column names to indicate the specified expressions in the SELECT list for grouping.
The GROUP BY clause can use the result set column name, indicating grouping by the specified expression in the SELECT list.
When using positional syntax and result set column names in the GROUP BY clause, the corresponding expressions in the SELECT list cannot be aggregate functions.
When using positional syntax and result set column names for grouping in the GROUP BY clause, the corresponding expressions in the SELECT list cannot be aggregate functions.
This clause groups rows, but does not guarantee the order of the result set. To sort the groups, use the ORDER BY clause.
This clause groups rows but does not guarantee the order of the result set. To sort the groups, use the ORDER BY clause.
## PARTITION BY
The PARTITION BY clause is a special syntax introduced in TDengine version 3.0, used to partition data based on part_list, allowing various calculations within each partition.
The PARTITION BY clause is a distinctive syntax introduced in TDengine 3.0, used to partition data based on part_list, allowing various calculations within each partition slice.
PARTITION BY is similar in meaning to GROUP BY; both group data based on specified lists and then calculate. The difference is that PARTITION BY has no restrictions on the SELECT list in GROUP BY, allowing any operations (constants, aggregates, scalars, expressions, etc.) within the groups. Therefore, PARTITION BY is fully compatible with GROUP BY, but note that without aggregation queries, their results may differ.
PARTITION BY is similar in basic meaning to GROUP BY, both involving grouping data by a specified list and then performing calculations. The difference is that PARTITION BY does not have the various restrictions of the GROUP BY clause's SELECT list, allowing any operation within the group (constants, aggregates, scalars, expressions, etc.), thus PARTITION BY is fully compatible with GROUP BY, and all places using the GROUP BY clause can be replaced with PARTITION BY. Note that without aggregate queries, the results of the two may differ.
Because PARTITION BY does not require returning one row of aggregated data, it supports various window operations after grouping slices, and all window operations that require grouping can only use the PARTITION BY clause.
Since PARTITION BY does not require returning a single row of aggregated data, it also supports various window operations after group slicing, and all window operations requiring grouping can only use the PARTITION BY clause.
See also [TDengine Distinctive Queries](../time-series-extensions/).
See [TDengine Distinctive Queries](../time-series-extensions/)
## ORDER BY
The ORDER BY clause sorts the result set. If no ORDER BY is specified, the order of the result set returned in multiple queries of the same statement is not guaranteed to be consistent.
The ORDER BY clause sorts the result set. If ORDER BY is not specified, the order of the result set returned by the same query multiple times cannot be guaranteed.
After ORDER BY, positional syntax can be used, where the position is a positive integer starting from 1, indicating which expression in the SELECT list to use for sorting.
ORDER BY can use positional syntax, where the position is indicated by a positive integer starting from 1, representing the expression in the SELECT list used for sorting.
ASC indicates ascending order, and DESC indicates descending order.
The NULLS syntax specifies the position of NULL values in sorting. NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.
The NULLS syntax is used to specify the position of NULL values in the output of the sorting. NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.
## LIMIT
LIMIT controls the number of output rows, and OFFSET specifies which row to start outputting from. LIMIT/OFFSET is applied after the ORDER BY result set. LIMIT 5 OFFSET 2 can be abbreviated as LIMIT 2, 5, which outputs data from the 3rd to the 7th row.
LIMIT controls the number of output rows, and OFFSET specifies starting from which row to begin output. The execution order of LIMIT/OFFSET is after ORDER BY. LIMIT 5 OFFSET 2 can be abbreviated as LIMIT 2, 5, both outputting data from row 3 to row 7.
When there are PARTITION BY/GROUP BY clauses, LIMIT controls the output within each partition slice rather than the total output of the result set.
When there is a PARTITION BY/GROUP BY clause, LIMIT controls the output within each partition slice, not the total result set output.
## SLIMIT
SLIMIT is used with PARTITION BY/GROUP BY clauses to control the number of output slices. SLIMIT 5 SOFFSET 2 can be abbreviated as SLIMIT 2, 5, which outputs the 3rd to the 7th slices.
SLIMIT is used with the PARTITION BY/GROUP BY clause to control the number of output slices. SLIMIT 5 SOFFSET 2 can be abbreviated as SLIMIT 2, 5, both indicating output from the 3rd to the 7th slice.
Note that if there is an ORDER BY clause, only one slice is output.
## Special Features
Some special query features can be executed without a FROM clause.
Some special query functions can be executed without using the FROM clause.
### Get Current Database
The following command retrieves the current database, `DATABASE()`. If a default database was not specified when logging in, and the `USE` command has not been used to switch databases, it returns NULL.
The following command retrieves the current database with database(). If no default database was specified at login, and the `USE` command was not used to switch databases, it returns NULL.
```sql
SELECT DATABASE();
@ -352,7 +351,7 @@ SELECT SERVER_VERSION();
### Get Server Status
This server status detection statement returns a number (e.g., 1) if the server is normal. If the server is abnormal, it returns an error code. This SQL syntax is compatible with connection pools for TDengine status checks and third-party tools for database server status checks, avoiding issues caused by using incorrect heartbeat detection SQL statements that lead to connection loss in connection pools.
Server status check statement. If the server is normal, it returns a number (e.g., 1). If the server is abnormal, it returns an error code. This SQL syntax is compatible with connection pools checking the status of TDengine and third-party tools checking the status of database servers. It can also prevent connection pool disconnections caused by incorrect heartbeat check SQL statements.
```sql
SELECT SERVER_STATUS();
@ -390,15 +389,15 @@ SELECT CURRENT_USER();
WHERE (column|tbname) match/MATCH/nmatch/NMATCH _regex_
```
### Regular Expression Specifications
### Regular Expression Standards
Ensure that the regular expressions used comply with POSIX standards; refer to [Regular Expressions](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html) for specifics.
Ensure that the regular expressions used comply with the POSIX standards, specific standards can be found at [Regular Expressions](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html)
### Usage Limitations
### Usage Restrictions
Regular expression filtering can only target table names (i.e., tbname filtering) and binary/nchar type values.
Regular expression filtering can only be applied to table names (i.e., tbname filtering), binary/nchar type values.
The length of the matched regular expression string cannot exceed 128 bytes. You can set and adjust the maximum allowable length of the regex matching string using the parameter `_maxRegexStringLen`, which is a client configuration parameter requiring a restart to take effect.
The length of the regular match string cannot exceed 128 bytes. You can set and adjust the maximum allowed regular match string length through the parameter *maxRegexStringLen*, which is a client configuration parameter and requires a restart to take effect.
## CASE Expression
@ -411,35 +410,35 @@ CASE WHEN condition THEN result [WHEN condition THEN result ...] [ELSE result] E
### Description
TDengine allows users to use IF ... THEN ... ELSE logic within SQL statements through the CASE expression.
TDengine allows users to use IF ... THEN ... ELSE logic in SQL statements through CASE expressions.
The first CASE syntax returns the first result whose value equals compare_value. If no compare_value matches, it returns the result after ELSE. If there is no ELSE part, it returns NULL.
The first CASE syntax returns the result where the first value equals compare_value, if no compare_value matches, it returns the result after ELSE, if there is no ELSE part, it returns NULL.
The second syntax returns the first true result for a condition. If no condition matches, it returns the result after ELSE. If there is no ELSE part, it returns NULL.
The second syntax returns the result where the first condition is true. If no condition matches, it returns the result after ELSE, if there is no ELSE part, it returns NULL.
The return type of the CASE expression is the type of the first result after WHEN THEN, while the results of the remaining WHEN THEN parts and the ELSE part must be convertible to this type; otherwise, TDengine will raise an error.
The return type of the CASE expression is the result type of the first WHEN THEN part, and the result types of the other WHEN THEN parts and the ELSE part must be convertible to it, otherwise TDengine will report an error.
### Example
A device has three status codes, displaying its status as follows:
A device has three status codes, displaying its status, the statement is as follows:
```sql
SELECT CASE dev_status WHEN 1 THEN 'Running' WHEN 2 THEN 'Warning' WHEN 3 THEN 'Downtime' ELSE 'Unknown' END FROM dev_table;
```
Calculating the average voltage of smart meters, when the voltage is less than 200 or greater than 250, it is considered an erroneous statistic, correcting its value to 220, as follows:
Calculate the average voltage of smart meters, and if the voltage is less than 200 or greater than 250, it is considered a statistical error, and the value is corrected to 220, the statement is as follows:
```sql
SELECT AVG(CASE WHEN voltage < 200 OR voltage > 250 THEN 220 ELSE voltage END) FROM meters;
SELECT AVG(CASE WHEN voltage < 200 or voltage > 250 THEN 220 ELSE voltage END) FROM meters;
```
## JOIN Clause
Prior to version 3.3.0.0, TDengine only supported INNER JOIN; starting from version 3.3.0.0, TDengine supports a wider range of JOIN types, including traditional database types such as LEFT JOIN, RIGHT JOIN, FULL JOIN, SEMI JOIN, ANTI-SEMI JOIN, as well as time-series features like ASOF JOIN and WINDOW JOIN. JOIN operations can occur between subtables, basic tables, supertables, and subqueries.
Before version 3.3.0.0, TDengine only supported inner joins. From version 3.3.0.0, TDengine supports a wider range of JOIN types, including traditional database joins like LEFT JOIN, RIGHT JOIN, FULL JOIN, SEMI JOIN, ANTI-SEMI JOIN, as well as time-series specific joins like ASOF JOIN, WINDOW JOIN. JOIN operations are supported between subtables, regular tables, supertables, and subqueries.
### Example
JOIN operation between basic tables:
JOIN operation between regular tables:
```sql
SELECT *
@ -452,10 +451,10 @@ LEFT JOIN operation between supertables:
```sql
SELECT *
FROM temp_stable t1 LEFT JOIN temp_stable t2
ON t1.ts = t2.ts AND t1.deviceid = t2.deviceid AND t1.status = 0;
ON t1.ts = t2.ts AND t1.deviceid = t2.deviceid AND t1.status=0;
```
LEFT ASOF JOIN operation between subtables and supertables:
LEFT ASOF JOIN operation between a subtable and a supertable:
```sql
SELECT *
@ -463,13 +462,13 @@ FROM temp_ctable t1 LEFT ASOF JOIN temp_stable t2
ON t1.ts = t2.ts AND t1.deviceid = t2.deviceid;
```
For more information on JOIN operations, see [TDengine Join Queries](../join-queries/).
For more information on JOIN operations, see the page [TDengine Join Queries](../join-queries/)
## Nested Queries
"Nesting Queries," also known as "Subqueries," means that in a single SQL statement, the results of the "inner query" can be used as computation objects in the "outer query."
"Nested queries," also known as "subqueries," mean that in a single SQL statement, the result of the "inner query" can be used as the computation object for the "outer query."
Starting from version 2.2.0.0, TDengine's query engine supports using non-correlated subqueries in the FROM clause (where "non-correlated" means the subquery does not use parameters from the parent query). This means you can replace the tb_name_list in the ordinary SELECT statement with an independent SELECT statement (enclosed in parentheses), resulting in the complete nested query SQL statement:
Starting from version 2.2.0.0, TDengine's query engine began to support non-correlated subqueries in the FROM clause (meaning the subquery does not use parameters from the parent query). That is, in the tb_name_list position of a regular SELECT statement, an independent SELECT statement is used instead (enclosed in English parentheses), thus a complete nested query SQL statement looks like:
```sql
SELECT ... FROM (SELECT ... FROM ...) ...;
@ -477,16 +476,18 @@ SELECT ... FROM (SELECT ... FROM ...) ...;
:::info
- The result of the inner query will be used as a "virtual table" for the outer query. It is recommended to alias this virtual table for easy reference in the outer query.
- The outer query can refer to the inner query's columns or pseudo columns directly by name or as `column_name`.
- Both inner and outer queries support basic table and supertable JOINs. The results of the inner query can also participate in JOIN operations with data subtables.
- The functional capabilities of the inner query are consistent with those of non-nested query statements.
- The ORDER BY clause in the inner query generally has no significance; it is recommended to avoid such usage to prevent unnecessary resource consumption.
- Compared to non-nested query statements, there are some functional limitations for the outer query:
- Calculation function parts:
- If the inner query's result data does not provide timestamps, calculation functions that implicitly depend on timestamps will not work correctly in the outer query. For example: INTERP, DERIVATIVE, IRATE, LAST_ROW, FIRST, LAST, TWA, STATEDURATION, TAIL, UNIQUE.
- If the result data of the inner query is not ordered by timestamps, calculation functions that depend on ordered data will not work correctly in the outer query. For example: LEASTSQUARES, ELAPSED, INTERP, DERIVATIVE, IRATE, TWA, DIFF, STATECOUNT, STATEDURATION, CSUM, MAVG, TAIL, UNIQUE.
- Functions requiring two scans of the calculation process will not work properly in the outer query. For example: PERCENTILE.
- The result of the inner query will serve as a "virtual table" for the outer query, and it is recommended to alias this virtual table for easy reference in the outer query.
- The outer query supports direct referencing of columns or pseudocolumns from the inner query by column name or `column name`.
- Both inner and outer queries support regular table-to-table/supertable joins. The result of the inner query can also participate in JOIN operations with data subtables.
- The functional features supported by the inner query are consistent with those of non-nested queries.
- The ORDER BY clause in the inner query generally has no meaning and is recommended to be avoided to prevent unnecessary resource consumption.
- Compared to non-nested queries, the outer query has the following limitations in supported functional features:
- Part of calculation functions:
- If the result data of the inner query does not provide timestamps, then functions implicitly dependent on timestamps will not work properly in the outer query. Examples include: INTERP, DERIVATIVE, IRATE, LAST_ROW, FIRST, LAST, TWA, STATEDURATION, TAIL, UNIQUE.
- If the result data of the inner query is not ordered by timestamp, then functions dependent on data being ordered by time will not work properly in the outer query. Examples include: LEASTSQUARES, ELAPSED, INTERP, DERIVATIVE, IRATE, TWA, DIFF, STATECOUNT, STATEDURATION, CSUM, MAVG, TAIL, UNIQUE.
- Functions that require two passes of scanning will not work properly in the outer query. Examples of such functions include: PERCENTILE.
:::
:::
@ -498,35 +499,35 @@ UNION ALL SELECT ...
[UNION ALL SELECT ...]
```
TDengine supports the UNION ALL operator, meaning that if multiple SELECT statements return result sets with identical structures (column names, column types, number of columns, order), they can be combined into one result set using UNION ALL. Currently, only UNION ALL mode is supported, meaning no deduplication occurs during the result set merging. Up to 100 UNION ALL statements are supported within a single SQL statement.
TDengine supports the UNION ALL operator. This means that if multiple SELECT clauses return result sets with the exact same structure (column names, column types, number of columns, order), these result sets can be combined together using UNION ALL. Currently, only the UNION ALL mode is supported, which means that duplicates are not removed during the merging process. In the same SQL statement, a maximum of 100 UNION ALLs are supported.
## SQL Example
## SQL Examples
For the example below, the table `tb1` is created with the following statement:
For the following example, the table tb1 is created with the statement:
```sql
CREATE TABLE tb1 (ts TIMESTAMP, col1 INT, col2 FLOAT, col3 BINARY(50));
```
Query all records from `tb1` from the past hour:
Query all records from tb1 for the past hour:
```sql
SELECT * FROM tb1 WHERE ts >= NOW - 1h;
```
Query records from `tb1` from `2018-06-01 08:00:00.000` to `2018-06-02 08:00:00.000`, and where `col3` ends with 'nny', sorting results in descending order of timestamp:
Query the table tb1 for the time range from 2018-06-01 08:00:00.000 to 2018-06-02 08:00:00.000, and records where the string of col3 ends with 'nny', results ordered by timestamp in descending order:
```sql
SELECT * FROM tb1 WHERE ts > '2018-06-01 08:00:00.000' AND ts <= '2018-06-02 08:00:00.000' AND col3 LIKE '%nny' ORDER BY ts DESC;
```
Query the sum of `col1` and `col2`, naming it `complex`, for timestamps greater than `2018-06-01 08:00:00.000` and where `col2` is greater than `1.2`, outputting only 10 records starting from the 5th record:
Query the sum of col1 and col2, named as complex, where the time is greater than 2018-06-01 08:00:00.000, col2 is greater than 1.2, and only output the first 10 records starting from the 5th:
```sql
SELECT (col1 + col2) AS 'complex' FROM tb1 WHERE ts > '2018-06-01 08:00:00.000' AND col2 > 1.2 LIMIT 10 OFFSET 5;
```
Query records from the past 10 minutes, where `col2` is greater than `3.14`, and output results to the file `/home/testoutput.csv`:
Query records from the past 10 minutes where col2 is greater than 3.14, and output the results to the file `/home/testoutput.csv`:
```sql
SELECT COUNT(*) FROM tb1 WHERE ts >= NOW - 10m AND col2 > 3.14 >> /home/testoutput.csv;

View File

@ -1,38 +1,37 @@
---
title: Manage Tag Indices
description: Using tag indexes to improve query performance
title: Tag Indices
slug: /tdengine-reference/sql-manual/manage-tag-indices
---
This section explains the indexing mechanism in TDengine. Prior to version 3.0.3.0 (exclusive), an index was created by default on the first tag column, but adding indexes dynamically to other columns was not supported. Starting from version 3.0.3.0, indexes can be dynamically added to other tag columns. The automatically created index on the first tag column is enabled by default in queries, and users cannot intervene in it. Proper use of indexes can effectively enhance query performance.
This section explains the indexing mechanism of TDengine. Prior to version 3.0.3.0 (exclusive), an index is created by default on the first column tag, but it does not support dynamically adding indexes to other columns. Starting from version 3.0.3.0, indexes can be dynamically added to other tag columns. The index automatically created on the first tag column is enabled by default in queries, and users cannot intervene in any way. Proper use of indexes can effectively improve query performance.
## Syntax
The syntax for creating an index is as follows:
The syntax for creating an index is as follows
```sql
CREATE INDEX index_name ON tbl_name (tagColName)
```
Where `index_name` is the name of the index, `tbl_name` is the name of the supertable, and `tagColName` is the name of the tag column on which the index will be created. There are no restrictions on the type of `tagColName`; any type of tag column can have an index created on it.
Where `index_name` is the name of the index, `tbl_name` is the name of the supertable, and `tagColName` is the name of the tag column on which the index is to be created. The type of `tagColName` is unrestricted, meaning an index can be created on any type of tag column.
The syntax for dropping an index is as follows:
The syntax for deleting an index is as follows
```sql
DROP INDEX index_name
```
Where `index_name` is the name of an already established index; if the index does not exist, the command will fail but will not have any other effect on the system.
Where `index_name` is the name of an existing index, if the index does not exist then the command fails but does not affect the system in any other way.
To view the indexes that already exist in the system:
Viewing existing indexes in the system
```sql
SELECT * FROM information_schema.INS_INDEXES
SELECT * FROM information_schema.INS_INDEXES
```
You can also add filter conditions to narrow the query scope for the above query.
You can also add filtering conditions to the above query statement to narrow down the search scope.
Alternatively, you can use the SHOW command to view indexes on a specified table:
Or use the SHOW command to view indexes on a specified table
```sql
SHOW INDEXES FROM tbl_name [FROM db_name];
@ -41,18 +40,18 @@ SHOW INDEXES FROM [db_name.]tbl_name;
## Usage Instructions
1. Proper use of indexes can improve data filtering efficiency. Currently supported filtering operators include `=`, `>`, `>=`, `<`, `<=`. If these operators are used in the query filter conditions, indexes can significantly enhance query efficiency. However, if other operators are used, the indexes will have no effect, and the query efficiency will remain unchanged. More operators will be gradually added in the future.
1. Proper use of indexes can improve the efficiency of data filtering. Currently supported filtering operators include `=`, `>`, `>=`, `<`, `<=`. If these operators are used in the query filtering conditions, the index can significantly improve query efficiency. However, if other operators are used in the query filtering conditions, the index does not work, and there is no change in query efficiency. More operators will be added gradually.
2. An index can only be created for one tag column; if an attempt is made to create a duplicate index, an error will occur.
2. Only one index can be created for a tag column, and an error will be reported if an index is created repeatedly.
3. Indexes can only be created one at a time for a single tag column; multiple tag columns cannot have indexes created simultaneously.
3. Only one index can be created for a tag column at a time; it is not possible to create indexes for multiple tags simultaneously.
4. The names of all types of indexes in the system must be unique.
4. Regardless of the type of index, its name must be unique throughout the system.
5. There is no limit on the number of indexes; however, each additional index increases the metadata in the system, and too many indexes can reduce the efficiency of metadata access, thereby degrading overall system performance. Therefore, it is advisable to avoid adding unnecessary indexes.
5. There is no limit to the number of indexes, but each additional index will increase the metadata in the system. Too many indexes can reduce the efficiency of metadata access and thus degrade the overall system performance. Therefore, please avoid adding unnecessary indexes.
6. Indexes cannot be created for basic tables and subtables.
6. Indexes cannot be created for ordinary and subtables.
7. If a tag column has a low number of unique values, it is not recommended to create an index on it, as the benefits will be minimal.
7. It is not recommended to create an index for a tag column with few unique values, as the effect in such cases is minimal.
8. A new supertable will generate a random index name for the first tag column, with the naming rule being: the name of tag0 + 23 bytes. This can be checked in the system table, and it can also be dropped as needed, functioning the same as the indexes of other tag columns.
8. For a newly created supertable, an indexNewName is randomly generated for the first column tag, following the rule: the name of tag0 + 23 bytes. This can be viewed in the system table and can be dropped as needed, behaving the same as indexes on other column tags.

View File

@ -1,12 +1,11 @@
---
description: "Delete data records from specified tables or supertables."
title: Delete Data
title: Data Deletion
slug: /tdengine-reference/sql-manual/delete-data
---
Deleting data is a feature provided by TDengine that allows users to remove data records from specified tables or supertables based on a specified time range, making it convenient to clean up abnormal data generated due to equipment failures and other reasons.
Deleting data is a feature provided by TDengine that allows users to delete data records from specified tables or supertables within a specified time period, facilitating the cleanup of abnormal data caused by device failures and other reasons.
**Note:** Deleting data does not immediately free up the disk space occupied by the table; instead, the data in the table is marked as deleted and will not appear in queries. However, the release of disk space will be delayed until the system automatically or the user manually reorganizes the data.
**Note**: Deleting data does not immediately free up the disk space occupied by the table. Instead, the data of the table is marked as deleted. These data will not appear in queries, but the release of disk space will be delayed until the system automatically or the user manually reorganizes the data.
**Syntax:**
@ -14,30 +13,30 @@ Deleting data is a feature provided by TDengine that allows users to remove data
DELETE FROM [ db_name. ] tb_name [WHERE condition];
```
**Function:** Delete data records from specified tables or supertables.
**Functionality**: Delete data records from specified tables or supertables
**Parameters:**
- `db_name`: An optional parameter specifying the name of the database where the table to be deleted resides. If not specified, it will operate in the current database.
- `tb_name`: A mandatory parameter specifying the name of the table from which data will be deleted. This can be a basic table, subtable, or supertable.
- `condition`: An optional parameter specifying the filter condition for deleting data. If no filter condition is specified, all data in the table will be deleted; use with caution. Notably, the where condition here only supports filtering on the first column (the timestamp column).
- `db_name`: Optional parameter, specifies the database name where the table to be deleted is located. If not specified, it defaults to the current database.
- `tb_name`: Required parameter, specifies the name of the table from which data is to be deleted. It can be a basic table, a subtable, or a supertable.
- `condition`: Optional parameter, specifies the filtering condition for deleting data. If no filtering condition is specified, all data in the table will be deleted. Use with caution. Note that the where condition only supports filtering on the first column, which is the time column.
**Special Note:**
Once data is deleted, it cannot be recovered; please use this command with caution. To ensure that the data to be deleted is indeed the correct data, it is recommended to first use the `SELECT` statement along with the `WHERE` clause to view the data that will be deleted. Confirm that the selection is accurate before executing the `DELETE` command.
Once data is deleted, it cannot be recovered. Use with caution. To ensure that the data you are deleting is indeed what you intend to delete, it is recommended to first use the `select` statement with the `where` condition to view the data to be deleted. Confirm it is correct before executing the `delete` command.
**Example:**
`meters` is a supertable, and `groupid` is an `int` type tag column. Now, to delete all data in the `meters` table with a timestamp less than `2021-10-01 10:40:00.100`, the SQL is as follows:
`meters` is a supertable, and `groupid` is an int type tag column. Now, to delete all data from the `meters` table where the time is less than 2021-10-01 10:40:00.100, the SQL is as follows:
```sql
DELETE FROM meters WHERE ts < '2021-10-01 10:40:00.100';
delete from meters where ts < '2021-10-01 10:40:00.100' ;
```
After execution, the result will display:
After execution, the result is displayed as:
```text
Deleted 102000 row(s) from 1020 table(s) (0.421950s)
```
This indicates that a total of 102,000 rows of data were deleted from 1,020 subtables.
This indicates that a total of 102000 rows of data were deleted from 1020 subtables.

Some files were not shown because too many files have changed in this diff Show More