Merge branch 'main' into fix/liaohj

This commit is contained in:
Haojun Liao 2024-11-27 15:44:53 +08:00
commit 5e49268cd8
817 changed files with 33022 additions and 29276 deletions

11
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,11 @@
# Description
Please briefly describe the code changes in this pull request.
# Checklist
Please check the items in the checklist if applicable.
- [ ] Is the user manual updated?
- [ ] Are the test cases passed and automated?
- [ ] Is there no significant decrease in test coverage?

View File

@ -5,7 +5,7 @@ node {
}
file_zh_changed = ''
file_en_changed = ''
file_no_doc_changed = ''
file_no_doc_changed = '1'
def abortPreviousBuilds() {
def currentJobName = env.JOB_NAME
def currentBuildNumber = env.BUILD_NUMBER.toInteger()
@ -69,7 +69,7 @@ def check_docs(){
file_no_doc_changed = sh (
script: '''
cd ${WKC}
git --no-pager diff --name-only FETCH_HEAD `git merge-base FETCH_HEAD ${CHANGE_TARGET}`|grep -v "^docs/en/"|grep -v "^docs/zh/"|grep -v "*.md" || :
git --no-pager diff --name-only FETCH_HEAD `git merge-base FETCH_HEAD ${CHANGE_TARGET}`|grep -v "^docs/en/"|grep -v "^docs/zh/"|grep -v ".md$" || :
''',
returnStdout: true
).trim()
@ -355,7 +355,7 @@ def pre_test_build_win() {
bat '''
cd %WIN_COMMUNITY_ROOT%/tests/ci
pip3 install taospy==2.7.16
pip3 install taos-ws-py==0.3.3
pip3 install taos-ws-py==0.3.5
xcopy /e/y/i/f %WIN_INTERNAL_ROOT%\\debug\\build\\lib\\taos.dll C:\\Windows\\System32
'''
return 1
@ -450,8 +450,8 @@ pipeline {
stage('run test') {
when {
allOf {
not { expression { file_no_doc_changed == '' }}
expression {
file_no_doc_changed != '' && env.CHANGE_TARGET != 'docs-cloud'
}
}
parallel {
@ -655,4 +655,4 @@ pipeline {
)
}
}
}
}

View File

@ -23,6 +23,16 @@
English | [简体中文](README-CN.md) | [TDengine Cloud](https://cloud.tdengine.com) | [Learn more about TSDB](https://tdengine.com/tsdb/)
# Table of Contents
1. [What is TDengine?](#what-is-tdengine)
1. [Documentation](#documentation)
1. [Building](#building)
1. [Installing](#installing)
1. [Try TDengine](#try-tdengine)
1. [Developing with TDengine](#developing-with-tdengine)
1. [Contribute to TDengine](#contribute-to-tdengine)
1. [Join the TDengine Community](#join-the-tdengine-community)
# What is TDengine
TDengine is an open source, high-performance, cloud native [time-series database](https://tdengine.com/tsdb/) optimized for Internet of Things (IoT), Connected Cars, and Industrial IoT. It enables efficient, real-time data ingestion, processing, and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors. TDengine differentiates itself from other time-series databases with the following advantages:

View File

@ -1,32 +1,19 @@
---
title: TDengine Documentation
sidebar_label: Documentation Home
description: This website contains the user manuals for TDengine, an open-source, cloud-native time-series database optimized for IoT, Connected Cars, and Industrial IoT.
slug: /
---
TDengine is an [open-source](https://tdengine.com/tdengine/open-source-time-series-database/), [cloud-native](https://tdengine.com/tdengine/cloud-native-time-series-database/) [time-series database](https://tdengine.com/tsdb/) optimized for the Internet of Things (IoT), Connected Cars, and Industrial IoT. It enables efficient, real-time data ingestion, processing, and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors. This document is the TDengine user manual. It introduces the basic, as well as novel concepts, in TDengine, and also talks in detail about installation, features, SQL, APIs, operation, maintenance, kernel design, and other topics. It's written mainly for architects, developers, and system administrators.
TDengine™ is a time-series database purpose-built for Industry 4.0 and Industrial IoT. It enables real-time ingestion, storage, analysis, and distribution of petabytes of data per day, generated by billions of sensors and data collectors. TDengine's mission is to make time-series data accessible, valuable, and affordable for everyone — from independent developers and startups to industry stalwarts and multinationals.
To get an overview of TDengine, such as a feature list, benchmarks, and competitive advantages, please browse through the [Introduction](./intro) section.
This website contains the user documentation for TDengine:
TDengine greatly improves the efficiency of data ingestion, querying, and storage by exploiting the characteristics of time series data, introducing the novel concepts of "one table for one data collection point" and "super table", and designing an innovative storage engine. To understand the new concepts in TDengine and make full use of the features and capabilities of TDengine, please read [Concepts](./concept) thoroughly.
- If you are new to time-series data, you can get a quick understanding of the field from ["What Is a Time-Series Database?"](https://tdengine.com/what-is-a-time-series-database/) and [other articles](https://tdengine.com/time-series-database/) on our official website.
- If you would like to install TDengine and experience its features for yourself, see the [Get Started](get-started/) section for instructions.
- System architects are advised to review the [Basic Features](basic-features/) and [Advanced Features](advanced-features/) sections to decide whether TDengine's capabilities can meet their needs, as well as [Inside TDengine](inside-tdengine/) for a more in-depth look at TDengine's design.
- Software developers can consult the [Developer's Guide](developer-guide/) for information about creating applications that interoperate with TDengine and writing user-defined functions that run within TDengine.
- Database administrators will find valuable information in [Operations and Maintenance](operations-and-maintenance/) and [TDengine Reference](tdengine-reference/) to assist in managing, maintaining, and monitoring their TDengine deployments.
If you are a developer, please read the [Developer Guide](./develop) carefully. This section introduces the database connection, data modeling, data ingestion, query, continuous query, cache, data subscription, user-defined functions, and other functionality in detail. Sample code is provided for a variety of programming languages. In most cases, you can just copy and paste the sample code, and make a few changes to accommodate your application, and it will work.
We live in the era of big data, and scale-up is unable to meet the growing needs of the business. Any modern data system must have the ability to scale out, and clustering has become an indispensable feature of big data systems. Not only did the TDengine team develop the cluster feature, but also decided to open source this important feature. To learn how to deploy, manage and maintain a TDengine cluster please refer to [Cluster Deployment](./operation/deployment).
TDengine uses ubiquitous SQL as its query language, which greatly reduces learning costs and migration costs. In addition to the standard SQL, TDengine has extensions to better support time series data analysis. These extensions include functions such as roll-up, interpolation, and time-weighted average, among many others. The [SQL Reference](./reference/taos-sql) chapter describes the SQL syntax in detail and lists the various supported commands and functions.
If you are a system administrator who cares about installation, upgrade, fault tolerance, disaster recovery, data import, data export, system configuration, how to monitor whether TDengine is running healthily, and how to improve system performance, please refer to, and thoroughly read the [Administration](./operation) section.
If you want to know more about TDengine tools and the REST API, please see the [Reference](./reference) chapter.
For information about connecting to TDengine with different programming languages, see [Client Libraries](./reference/connectors).
If you are very interested in the internal design of TDengine, please read the chapter [Inside TDengine](./tdinternal), which introduces the cluster design, data partitioning, sharding, writing, and reading processes in detail. If you want to study TDengine code or even contribute code, please read this chapter carefully.
To get more general introduction about time series database, please read through [a series of articles](https://tdengine.com/tsdb/). To lean more competitive advantages about TDengine, please read through [a series of blogs](https://tdengine.com/tdengine/).
TDengine is an open-source database, and we would love for you to be a part of TDengine. If you find any errors in the documentation or see parts where more clarity or elaboration is needed, please click "Edit this page" at the bottom of each page to edit it directly.
TDengine, including this documentation, is an open-source project, and we welcome contributions from the community. If you find any errors or unclear descriptions, click **Edit this document** at the bottom of the page to submit your corrections. To view the source code, visit our [GitHub repository](https://github.com/taosdata/tdengine).
Together, we make a difference!

View File

@ -1,182 +0,0 @@
---
title: Concepts
description: This document describes the basic concepts of TDengine, including the supertable.
---
In order to explain the basic concepts and provide some sample code, the TDengine documentation smart meters as a typical time series use case. We assume the following: 1. Each smart meter collects three metrics i.e. current, voltage, and phase; 2. There are multiple smart meters; 3. Each meter has static attributes like location and group ID. Based on this, collected data will look similar to the following table:
<div className="center-table">
<table>
<thead>
<tr>
<th rowSpan="2">Device ID</th>
<th rowSpan="2">Timestamp</th>
<th colSpan="3">Collected Metrics</th>
<th colSpan="2">Tags</th>
</tr>
<tr>
<th>current</th>
<th>voltage</th>
<th>phase</th>
<th>location</th>
<th>groupid</th>
</tr>
</thead>
<tbody>
<tr>
<td>d1001</td>
<td>1538548685000</td>
<td>10.3</td>
<td>219</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548684000</td>
<td>10.2</td>
<td>220</td>
<td>0.23</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1003</td>
<td>1538548686500</td>
<td>11.5</td>
<td>221</td>
<td>0.35</td>
<td>California.LosAngeles</td>
<td>3</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548685500</td>
<td>13.4</td>
<td>223</td>
<td>0.29</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548695000</td>
<td>12.6</td>
<td>218</td>
<td>0.33</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
<tr>
<td>d1004</td>
<td>1538548696600</td>
<td>11.8</td>
<td>221</td>
<td>0.28</td>
<td>California.LosAngeles</td>
<td>2</td>
</tr>
<tr>
<td>d1002</td>
<td>1538548696650</td>
<td>10.3</td>
<td>218</td>
<td>0.25</td>
<td>California.SanFrancisco</td>
<td>3</td>
</tr>
<tr>
<td>d1001</td>
<td>1538548696800</td>
<td>12.3</td>
<td>221</td>
<td>0.31</td>
<td>California.SanFrancisco</td>
<td>2</td>
</tr>
</tbody>
</table>
<a href="#model_table1">Table 1: Smart meter example data</a>
</div>
Each row contains the device ID, timestamp, collected metrics (`current`, `voltage`, `phase` as above), and static tags (`location` and `groupid` in Table 1) associated with the devices. Each smart meter generates a row (measurement) in a pre-defined time interval or triggered by an external event. The device produces a sequence of measurements with associated timestamps.
## Metric
Metric refers to the physical quantity collected by sensors, equipment or other types of data collection devices, such as current, voltage, temperature, pressure, GPS position, etc., which change with time, and the data type can be integer, float, Boolean, or strings. As time goes by, the amount of collected metric data stored increases. In the smart meters example, current, voltage and phase are the metrics.
## Label/Tag
Label/Tag refers to the static properties of sensors, equipment or other types of data collection devices, which do not change with time, such as device model, color, fixed location of the device, etc. The data type can be any type. Although static, TDengine allows users to add, delete or update tag values at any time. Unlike the collected metric data, the amount of tag data stored does not change over time. In the meters example, `location` and `groupid` are the tags.
## Data Collection Point
Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same timestamp. For some complex equipment, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car. So in this example the car would have three data collection points. In the smart meters example, d1001, d1002, d1003, and d1004 are the data collection points.
## Table
Since time-series data is most likely to be structured data, TDengine adopts the traditional relational database model to process them with a short learning curve. You need to create a database, create tables, then insert data points and execute queries to explore the data.
To make full use of time-series data characteristics, TDengine adopts a strategy of "**One Table for One Data Collection Point**". TDengine requires the user to create a table for each data collection point (DCP) to store collected time-series data. For example, if there are over 10 million smart meters, it means 10 million tables should be created. For the table above, 4 tables should be created for devices d1001, d1002, d1003, and d1004 to store the data collected. This design has several benefits:
1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
3. The metric data from a DCP is continuously stored, block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, which allows for a higher compression rate.
If the metric data of multiple DCPs are traditionally written into a single table, due to uncontrollable network delays, the timing of the data from different DCPs arriving at the server cannot be guaranteed, write operations must be protected by locks, and metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest possible extent.**
TDengine suggests using DCP ID as the table name (like d1001 in the above table). Each DCP may collect one or multiple metrics (like the `current`, `voltage`, `phase` as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the timestamp as the index, and won't build the index on any metrics stored. Column wise storage is used.
Complex devices, such as connected cars, may have multiple DCPs. In this case, multiple tables are created for a single device, one table per DCP.
## Super Table (STable)
The design of one table for one data collection point will require a huge number of tables, which is difficult to manage. Furthermore, applications often need to take aggregation operations among DCPs, thus aggregation operations will become complicated. To support aggregation over multiple tables efficiently, the STable(Super Table) concept is introduced by TDengine.
STable is a template for a type of data collection point. A STable contains a set of data collection points (tables) that have the same schema or data structure, but with different static attributes (tags). To describe a STable, in addition to defining the table structure of the metrics, it is also necessary to define the schema of its tags. The data type of tags can be int, float, string, and there can be multiple tags, which can be added, deleted, or modified afterward. If the whole system has N different types of data collection points, N STables need to be established.
In the design of TDengine, **a table is used to represent a specific data collection point, and STable is used to represent a set of data collection points of the same type**. In the smart meters example, we can create a super table named `meters`.
## Subtable
When creating a table for a specific data collection point, the user can use a STable as a template and specify the tag values of this specific DCP to create it. ** The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is:
1. Subtable is a table, all SQL commands applied on a regular table can be applied on subtable.
2. Subtable is a table with extensions, it has static tags (labels), and these tags can be added, deleted, and updated after it is created. But a regular table does not have tags.
3. A subtable belongs to only one STable, but a STable may have many subtables. Regular tables do not belong to a STable.
4. A regular table can not be converted into a subtable, and vice versa.
The relationship between a STable and the subtables created based on this STable is as follows:
1. A STable contains multiple subtables with the same metric schema but with different tag values.
2. The schema of metrics or labels cannot be adjusted through subtables, and it can only be changed via STable. Changes to the schema of a STable takes effect immediately for all associated subtables.
3. STable defines only one template and does not store any data or label information by itself. Therefore, data cannot be written to a STable, only to subtables.
Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which reduces the number of data sets to be scanned which in turn greatly improves the performance of data aggregation across multiple DCPs. In essence, querying a supertable is a very efficient aggregate query on multiple DCPs of the same type.
In TDengine, it is recommended to use a subtable instead of a regular table for a DCP. In the smart meters example, we can create subtables like d1001, d1002, d1003, and d1004 under super table `meters`.
To better understand the data model using metrics, tags, super table and subtable, please refer to the diagram below which demonstrates the data model of the smart meters example.
<figure>
![Meters Data Model Diagram](./supertable.webp)
<center><figcaption>Figure 1. Meters Data Model Diagram</figcaption></center>
</figure>
## Database
A database is a collection of tables. TDengine allows a running instance to have multiple databases, and each database can be configured with different storage policies. The [characteristics of time-series data](https://tdengine.com/tsdb/characteristics-of-time-series-data/) from different data collection points may be different. Characteristics include collection frequency, retention policy and others which determine how you create and configure the database. For e.g. days to keep, number of replicas, data block size, whether data updates are allowed and other configurable parameters would be determined by the characteristics of your data and your business requirements. In order for TDengine to work with maximum efficiency in various scenarios, TDengine recommends that STables with different data characteristics be created in different databases.
In a database, there can be one or more STables, but a STable belongs to only one database. All tables owned by a STable are stored in only one database.
## FQDN & End Point
FQDN (Fully Qualified Domain Name) is the full domain name of a specific computer or host on the Internet. FQDN consists of two parts: hostname and domain name. For example, the FQDN of a mail server might be mail.tdengine.com. The hostname is mail, and the host is located in the domain name tdengine.com. DNS (Domain Name System) is responsible for translating FQDN into IP. For systems without DNS, it can be solved by configuring the hosts file.
Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.
TDengine does not recommend using an IP address to access the cluster. FQDN is recommended for cluster management.

View File

@ -1,132 +0,0 @@
---
title: Introduction
description: This document introduces the major features, competitive advantages, typical use cases, and benchmarks of TDengine.
toc_max_heading_level: 2
---
TDengine is a big data platform designed and optimized for IoT (Internet of Things) and Industrial Internet. It can safely and effetively converge, store, process and distribute high volume data (TB or even PB) generated everyday by a lot of devices and data acquisition units, monitor and alert business operation status in real time and provide real time business insight. The core component of TDengine is TDengine OSS, which is a high performance, open source, cloud native and simplified time series database.
This section introduces the major features, competitive advantages, typical use-cases and benchmarks to help you get a high level overview of TDengine.
## Major Features of TDengine OSS
The major features are listed below:
1. Insert data
- Supports [using SQL to insert](../develop/insert-data/sql-writing).
- Supports [schemaless writing](../reference/schemaless/) just like NoSQL databases. It also supports standard protocols like [InfluxDB Line](../develop/insert-data/influxdb-line), [OpenTSDB Telnet](../develop/insert-data/opentsdb-telnet), [OpenTSDB JSON ](../develop/insert-data/opentsdb-json) among others.
- Supports seamless integration with third-party tools like [Telegraf](../third-party/telegraf/), [Prometheus](../third-party/prometheus/), [collectd](../third-party/collectd/), [StatsD](../third-party/statsd/), [TCollector](../third-party/tcollector/), [EMQX](../third-party/emq-broker), [HiveMQ](../third-party/hive-mq-broker), and [Icinga2](../third-party/icinga2/), they can write data into TDengine with simple configuration and without a single line of code.
2. Query data
- Supports standard [SQL](../reference/taos-sql/), including nested query.
- Supports [time series specific functions](../reference/taos-sql/function/#time-series-extensions) and [time series specific queries](../reference/taos-sql/distinguished), like downsampling, interpolation, cumulated sum, time weighted average, state window, session window and many others.
- Supports [User Defined Functions (UDF)](../reference/taos-sql/udf).
3. [Caching](../develop/cache/): TDengine always saves the last data point in cache, so Redis is not needed for time-series data processing.
4. [Stream Processing](../develop/stream/): Not only is the continuous query is supported, but TDengine also supports event driven stream processing, so Flink or Spark is not needed for time-series data processing.
5. [Data Subscription](../develop/tmq/): Application can subscribe a table or a set of tables. API is the same as Kafka, but you can specify filter conditions.
6. Visualization
- Supports seamless integration with [Grafana](../third-party/grafana/).
- Supports seamless integration with [Google Data Studio](../third-party/google-data-studio/).
7. Cluster
- Supports [cluster](../operation/deployment/) with the capability of increasing processing power by adding more nodes.
- Supports [deployment on Kubernetes](../operation/deployment).
- Supports high availability via data replication.
8. Administration
- Provides [monitoring](../operation/monitor) on running instances of TDengine.
- Provides many ways to [import](../operation/import) and [export](../operation/export) data.
9. Tools
- Provides an interactive [Command Line Interface (CLI)](../reference/components/taos-shell) for management, maintenance and ad-hoc queries.
- Provides a tool [taosBenchmark](../reference/components/taosbenchmark/) for testing the performance of TDengine.
10. Programming
- Provides [client libraries](../reference/connectors/) for [C/C++](../reference/connectors/cpp), [Java](../reference/connectors/java), [Python](../reference/connectors/python), [Go](../reference/connectors/go), [Rust](../reference/connectors/rust), [Node.js](../reference/connectors/node) and other programming languages.
- Provides a [REST API](../reference/connectors/rest-api).
For more details on features, please read through the entire documentation.
## Competitive Advantages
By making full use of [characteristics of time series data](https://tdengine.com/characteristics-of-time-series-data/), TDengine differentiates itself from other time series databases with the following advantages.
- **[High-Performance](https://tdengine.com/high-performance/)**: TDengine is the only time-series database to solve the high cardinality issue to support billions of data collection points while outperforming other time-series databases for data ingestion, querying and data compression.
- **[Simplified Solution](https://tdengine.com/comprehensive-industrial-data-solution/)**: Through built-in caching, stream processing and data subscription features, TDengine provides a simplified solution for time-series data processing. It reduces system design complexity and operation costs significantly.
- **[Cloud Native](https://tdengine.com/cloud-native/)**: Through native distributed design, sharding and partitioning, separation of compute and storage, RAFT, support for Kubernetes deployment and full observability, TDengine is a cloud native Time-series Database and can be deployed on public, private or hybrid clouds.
- **[Ease of Use](https://tdengine.com/easy-to-use/)**: For administrators, TDengine significantly reduces the effort to deploy and maintain. For developers, it provides a simple interface, simplified solution and seamless integrations for third party tools. For data users, it gives easy data access.
- **[Easy Data Analytics](https://tdengine.com/simplifying-time-series-analysis-for-data-scientists/)**: Through super tables, storage and compute separation, data partitioning by time interval, pre-computation and other means, TDengine makes it easy to explore, format, and get access to data in a highly efficient way.
- **[Open Source](https://tdengine.com/open-source/)**: TDengine's core modules, including cluster feature, are all available under open source licenses. It has gathered over 22k stars on GitHub. There is an active developer community, and over 400k running instances worldwide.
With TDengine, the total cost of ownership of your time-series data platform can be greatly reduced.
1. With its superior performance, the computing and storage resources are reduced significantly.
2. With SQL support, it can be seamlessly integrated with many third party tools, and learning costs/migration costs are reduced significantly.
3. With its simplified solution and nearly zero management, the operation and maintenance costs are reduced significantly.
## Technical Ecosystem
This is how TDengine would be situated, in a typical time-series data processing platform:
<figure>
![TDengine Database Technical Ecosystem ](eco_system.webp)
<center><figcaption>Figure 1. TDengine Technical Ecosystem</figcaption></center>
</figure>
On the left-hand side, there are data collection agents like OPC-UA, MQTT, Telegraf and Kafka. On the right-hand side, visualization/BI tools, HMI, Python/R, and IoT Apps can be connected. TDengine itself provides an interactive command-line interface and a web interface for management and maintenance.
## Typical Use Cases
As a high-performance, scalable and SQL supported time-series database, TDengine's typical use case include but are not limited to IoT, Industrial Internet, Connected Vehicles, IT operation and maintenance, energy, financial markets and other fields. TDengine is a purpose-built database optimized for the characteristics of time series data. As such, it cannot be used to process data from web crawlers, social media, e-commerce, ERP, CRM and so on. More generally TDengine is not a suitable storage engine for non-time-series data. This section makes a more detailed analysis of the applicable scenarios.
### Characteristics and Requirements of Data Sources
| **Data Source Characteristics and Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------ | ------------------ | ----------------------- | ------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| A massive amount of total data | | | √ | TDengine provides excellent scale-out functions in terms of capacity, and has a storage structure with matching high compression ratio to achieve the best storage efficiency in the industry. |
| Data input velocity is extremely high | | | √ | TDengine's performance is much higher than that of other similar products. It can continuously process larger amounts of input data in the same hardware environment, and provides a performance evaluation tool that can easily run in the user environment. |
| A huge number of data sources | | | √ | TDengine is optimized specifically for a huge number of data sources. It is especially suitable for efficiently ingesting, writing and querying data from billions of data sources. |
### System Architecture Requirements
| **System Architecture Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ----------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| A simple and reliable system architecture | | | √ | TDengine's system architecture is very simple and reliable, with its own message queue, cache, stream computing, monitoring and other functions. There is no need to integrate any additional third-party products. |
| Fault-tolerance and high-reliability | | | √ | TDengine has cluster functions to automatically provide high-reliability and high-availability functions such as fault tolerance and disaster recovery. |
| Standardization support | | | √ | TDengine supports standard SQL and provides SQL extensions for time-series data analysis. |
### System Function Requirements
| **System Function Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| -------------------------------------------- | ------------------ | ----------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Complete data processing algorithms built-in | | √ | | While TDengine implements various general data processing algorithms, industry specific algorithms and special types of processing will need to be implemented at the application level. |
| A large number of crosstab queries | | √ | | This type of processing is better handled by general purpose relational database systems but TDengine can work in concert with relational database systems to provide more complete solutions. |
### System Performance Requirements
| **System Performance Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| ------------------------------------------------- | ------------------ | ----------------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| Very large total processing capacity | | | √ | TDengine's cluster functions can easily improve processing capacity via multi-server coordination. |
| Extremely high-speed data processing | | | √ | TDengine's storage and data processing are optimized for IoT, and can process data many times faster than similar products. |
| Extremely fast processing of high resolution data | | | √ | TDengine has achieved the same or better performance than other relational and NoSQL data processing systems. |
### System Maintenance Requirements
| **System Maintenance Requirements** | **Not Applicable** | **Might Be Applicable** | **Very Applicable** | **Description** |
| --------------------------------------- | ------------------ | ----------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Native high-reliability | | | √ | TDengine has a very robust, reliable and easily configurable system architecture to simplify routine operation. Human errors and accidents are eliminated to the greatest extent, with a streamlined experience for operators. |
| Minimize learning and maintenance costs | | | √ | In addition to being easily configurable, standard SQL support and the TDengine CLI for ad hoc queries makes maintenance simpler, allows reuse and reduces learning costs. |
| Abundant talent supply | √ | | | Given the above, and given the extensive training and professional services provided by TDengine, it is easy to migrate from existing solutions or create a new and lasting solution based on TDengine. |
## Comparison with other databases
- [TDengine vs. InfluxDB](https://tdengine.com/tsdb-comparison-influxdb-vs-tdengine/)
- [TDengine vs. TimescaleDB](https://tdengine.com/tsdb-comparison-timescaledb-vs-tdengine/)
## Products
For information about our paid offerings, see:
- [TDengine Enterprise](https://tdengine.com/enterprise/)
- [TDengine Cloud](https://cloud.tdengine.com)

View File

@ -1,137 +1,127 @@
---
title: Quick Install on Docker
sidebar_label: Docker
description: This document describes how to install TDengine in a Docker container and perform queries and inserts.
sidebar_label: Deploy in Docker
title: Get Started with TDengine Using Docker
description: Quickly experience TDengines efficient insertion and querying using Docker
slug: /get-started/deploy-in-docker
---
This document describes how to install TDengine in a Docker container and perform queries and inserts.
You can install TDengine in a Docker container and perform some basic tests to verify its performance.
- The easiest way to explore TDengine is through [TDengine Cloud](https://cloud.tdengine.com).
- To get started with TDengine in a non-containerized environment, see [Quick Install from Package](../../get-started/package).
- If you want to view the source code, build TDengine yourself, or contribute to the project, see the [TDengine GitHub repository](https://github.com/taosdata/TDengine).
To install TDengine on your local machine instead of in a container, see [Get Started with TDengine Using an Installation Package](../deploy-from-package/).
## Run TDengine
## Before You Begin
If Docker is already installed on your computer, pull the latest TDengine Docker container image:
- Install Docker. For more information, see the [Docker website](https://www.docker.com/).
- Ensure that the network ports required by TDengine are not currently in use. For more information, see [Network Port Requirements](../../operations-and-maintenance/system-requirements/#network-port-requirements).
```shell
docker pull tdengine/tdengine:latest
```
## Procedure
Or the container image of specific version:
1. Pull the latest TDengine image:
```shell
docker pull tdengine/tdengine:3.0.1.4
```
```bash
docker pull tdengine/tdengine:latest
```
And then run the following command:
:::note
You can also pull a specific version of the image. For example:
```shell
docker run -d -p 6030:6030 -p 6041:6041 -p 6043-6060:6043-6060 -p 6043-6060:6043-6060/udp tdengine/tdengine
```
```bash
docker pull tdengine/tdengine:3.3.0.0
```
:::
Note that TDengine Server 3.0 uses TCP port 6030. Port 6041 is used by taosAdapter for the REST API service. Ports 6043 through 6049 are used by taosAdapter for other connections. You can open these ports as needed.
2. Start a container with the following command:
If you need to persist data to a specific directory on your local machine, please run the following command:
```shell
docker run -d -v ~/data/taos/dnode/data:/var/lib/taos \
-v ~/data/taos/dnode/log:/var/log/taos \
-p 6030:6030 -p 6041:6041 -p 6043-6060:6043-6060 -p 6043-6060:6043-6060/udp tdengine/tdengine
```
:::note
```bash
docker run -d -p 6030:6030 -p 6041:6041 -p 6043-6060:6043-6060 -p 6043-6060:6043-6060/udp tdengine/tdengine
```
- /var/lib/taos: TDengine's default data file directory. The location can be changed via [configuration file]. Also you can modify ~/data/taos/dnode/data to your any local empty data directory
- /var/log/taos: TDengine's default log file directory. The location can be changed via [configure file]. you can modify ~/data/taos/dnode/log to your any local empty log directory
To persist data to your local machine, use the following command:
:::
```bash
docker run -d -v <local-data-directory>:/var/lib/taos -v <local-log-directory>:/var/log/taos -p 6030:6030 -p 6041:6041 -p 6043-6060:6043-6060 -p 6043-6060:6043-6060/udp tdengine/tdengine
```
3. Verify that the container is running properly:
```bash
docker ps
```
4. Enter the container and open a shell:
```bash
docker exec -it <container-name> bash
```
You can now work with TDengine inside your container. For example, you can run the `taos` command to open the TDengine command-line interface.
## What to Do Next
### Test Data Ingestion
Your TDengine installation includes taosBenchmark, a tool specifically designed to test TDengines performance. taosBenchmark can simulate data generated by many devices with a wide range of configuration options so that you can perform tests on sample data similar to your real-world use cases. For more information about taosBenchmark, see [taosBenchmark](../../tdengine-reference/tools/taosbenchmark/).
Perform the following steps to use taosBenchmark to test TDengine's ingestion performance in your container:
1. In a shell inside your container, run taosBenchmark with the default settings:
```bash
taosBenchmark -y
```
taosBenchmark automatically creates the `test` database and the `meters` supertable inside that database. This supertable contains 10,000 subtables, named `d0` to `d9999`, with each subtable containing 10,000 records. Each record includes the following four metrics:
- `ts` (timestamp), ranging from `2017-07-14 10:40:00 000` to `2017-07-14 10:40:09 999`
- `current`
- `voltage`
- `phase`
Each subtable also has the following two tags:
- `groupId`, ranging from `1` to `10`
- `location`, indicating a city and state such as `California.Campbell` or `California.Cupertino`
When the ingestion process is finished, taosBenchmark outputs the time taken to ingest the specified sample data. From this, you can estimate how TDengine would perform on your system in a production environment.
### Test Data Querying
After inserting data with taosBenchmark as described above, you can use the TDengine CLI to test TDengine's query performance in your container:
Run the following command to ensure that your container is running:
1. Start the TDengine CLI:
```shell
docker ps
```
```bash
taos
```
Enter the container and open the `bash` shell:
2. Query the total number of records in the `meters` supertable:
```shell
docker exec -it <container name> bash
```
```sql
SELECT COUNT(*) FROM test.meters;
```
You can now access TDengine or run other Linux commands.
3. Query the average, maximum, and minimum values of 100 million records:
Note: For information about installing docker, see the [official documentation](https://docs.docker.com/get-docker/).
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters;
```
## TDengine Command Line Interface
4. Query the total number of records where the value of the `location` tag is `California.SanFrancisco`:
On the container, run the following command to open the TDengine CLI:
```sql
SELECT COUNT(*) FROM test.meters WHERE location = "California.SanFrancisco";
```
```
$ taos
5. Query the average, maximum, and minimum values of all records where the value of the `groupId` tag is `10`:
taos>
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters WHERE groupId = 10;
```
```
6. Calculate the average, maximum, and minimum values for the `d1001` table every 10 seconds:
## TDegnine Graphic User Interface
From TDengine 3.3.0.0, there is a new component called `taos-explorer` added in the TDengine docker image. You can use it to manage the databases, super tables, child tables, and data in your TDengine system. There are also some features only available in TDengine Enterprise Edition, please contact TDengine sales team in case you need these features.
To use taos-explorer in the container, you need to access the host port mapped from container port 6060. Assuming the host name is abc.com, and the port used on host is 6060, you need to access `http://abc.com:6060`. taos-explorer uses port 6060 by default in the container. The default username and password to log in to the TDengine Database Management System is "root/taosdata".
## Test data insert performance
After your TDengine Server is running normally, you can run the taosBenchmark utility to test its performance:
Start TDengine service and execute `taosBenchmark` (formerly named `taosdemo`) in a terminal.
```bash
taosBenchmark
```
This command creates the `meters` supertable in the `test` database. In the `meters` supertable, it then creates 10,000 subtables named `d0` to `d9999`. Each table has 10,000 rows and each row has four columns: `ts`, `current`, `voltage`, and `phase`. The timestamps of the data in these columns range from 2017-07-14 10:40:00 000 to 2017-07-14 10:40:09 999. Each table is randomly assigned a `groupId` tag from 1 to 10 and a `location` tag of either `California.Campbell`, `California.Cupertino`, `California.LosAngeles`, `California.MountainView`, `California.PaloAlto`, `California.SanDiego`, `California.SanFrancisco`, `California.SanJose`, `California.SantaClara` or `California.Sunnyvale`.
The `taosBenchmark` command creates a deployment with 100 million data points that you can use for testing purposes. The time required to create the deployment depends on your hardware. On most modern servers, the deployment is created in ten to twenty seconds.
You can customize the test deployment that taosBenchmark creates by specifying command-line parameters. For information about command-line parameters, run the `taosBenchmark --help` command. For more information about taosBenchmark, see [taosBenchmark](../../reference/components/taosbenchmark).
## Test data query performance
After using `taosBenchmark` to create your test deployment, you can run queries in the TDengine CLI to test its performance:
From the TDengine CLI (taos) query the number of rows in the `meters` supertable:
```sql
SELECT COUNT(*) FROM test.meters;
```
Query the average, maximum, and minimum values of all 100 million rows of data:
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters;
```
Query the number of rows whose `location` tag is `California.SanFrancisco`:
```sql
SELECT COUNT(*) FROM test.meters WHERE location = "California.SanFrancisco";
```
Query the average, maximum, and minimum values of all rows whose `groupId` tag is `10`:
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters WHERE groupId = 10;
```
Query the average, maximum, and minimum values for table `d10` in 10 second intervals:
```sql
SELECT FIRST(ts), AVG(current), MAX(voltage), MIN(phase) FROM test.d10 INTERVAL(10s);
```
In the query above you are selecting the first timestamp (ts) in the interval, another way of selecting this would be `\_wstart` which will give the start of the time window. For more information about windowed queries, see [Time-Series Extensions](../../reference/taos-sql/distinguished/).
## Additional Information
For more information about deploying TDengine in a Docker environment, see [Deploying TDengine with Docker](../../operation/deployment/#docker).
```sql
SELECT _wstart, AVG(current), MAX(voltage), MIN(phase) FROM test.d1001 INTERVAL(10s);
```

View File

@ -1,326 +1,247 @@
---
title: Quick Install from Package
sidebar_label: Package
description: This document describes how to install TDengine on Linux, Windows, and macOS and perform queries and inserts.
sidebar_label: Deploy from Package
title: Get Started with TDengine Using an Installation Package
description: Quick experience with TDengine using the installation package
slug: /get-started/deploy-from-package
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import PkgListV3 from "/components/PkgListV3";
This document describes how to install TDengine on Linux/Windows/macOS and perform queries and inserts.
You can install TDengine on a local machine and perform some basic tests to verify its performance. The TDengine OSS server can be installed on Linux and macOS, and the TDengine OSS client can be installed on Linux, macOS, and Windows.
- The easiest way to explore TDengine is through [TDengine Cloud](https://cloud.tdengine.com).
- To get started with TDengine on Docker, see [Quick Install on Docker](../../get-started/docker).
- If you want to view the source code, build TDengine yourself, or contribute to the project, see the [TDengine GitHub repository](https://github.com/taosdata/TDengine).
To install TDengine in a Docker container instead of on your machine, see [Get Started with TDengine in Docker](../deploy-in-docker/).
The full package of TDengine includes the TDengine Server (`taosd`), TDengine Client (`taosc`), taosAdapter for connecting with third-party systems and providing a RESTful interface, a command-line interface (CLI, taos), and some tools. Note that taosAdapter supports Linux only. In addition to client libraries for multiple languages, TDengine also provides a [REST API](../../reference/connectors/rest-api) through [taosAdapter](../../reference/components/taosadapter).
## Before You Begin
The standard server installation package includes `taos`, `taosd`, `taosAdapter`, `taosBenchmark`, and sample code. You can also download the Lite package that includes only `taosd` and the C/C++ client library.
- Verify that your machine meets the minimum system requirements for TDengine. For more information, see [Supported Platforms](../../tdengine-reference/supported-platforms/) and [System Requirements](../../operations-and-maintenance/system-requirements/).
- **(Windows only)** Verify that the latest version of the Microsoft Visual C++ Redistributable is installed on your machine. To download the redistributable package, see [Microsoft Visual C++ Redistributable latest supported downloads](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170).
TDengine OSS is released as Deb and RPM packages. The Deb package can be installed on Debian, Ubuntu, and derivative systems. The RPM package can be installed on CentOS, RHEL, SUSE, and derivative systems. A .tar.gz package is also provided for enterprise customers, and you can install TDengine over `apt-get` as well. The .tar.tz package includes `taosdump` and the TDinsight installation script. If you want to use these utilities with the Deb or RPM package, download and install taosTools separately. TDengine can also be installed on x64 Windows and x64/m1 macOS.
## Procedure
## Operating environment requirements
In the Linux system, the minimum requirements for the operating environment are as follows:
The TDengine OSS installation package is provided for Linux users in .deb, .rpm, and .tar.gz format and can also be installed via APT from our repository. Installation packages are also provided for macOS (client and server) and Windows (client only).
linux core version - 3.10.0-1160.83.1.el7.x86_64;
1. Select the appropriate package for your machine and follow the steps to install TDengine.
glibc version - 2.17;
<Tabs>
<TabItem label=".deb" value="debinst">
If compiling and installing through clone source code, it is also necessary to meet the following requirements:
1. Download the .deb installation package:
<PkgListV3 type={6}/>
2. Run the following command to install TDengine:
cmake version - 3.26.4 or above;
```bash
sudo dpkg -i TDengine-server-<version>-Linux-x64.deb
```
gcc version - 9.3.1 or above;
Replace `<version>` with the version of the package that you downloaded.
## Installation
</TabItem>
**Note**
<TabItem label=".rpm" value="rpminst">
Since TDengine 3.0.6.0, we don't provide standalone taosTools pacakge for downloading. However, all the tools included in the taosTools pacakge can be found in TDengine-server pacakge.
1. Download the .rpm installation package:
<PkgListV3 type={5}/>
2. Run the following command to install TDengine:
<Tabs>
<TabItem label=".deb" value="debinst">
```bash
sudo rpm -ivh TDengine-server-<version>-Linux-x64.rpm
```
Replace `<version>` with the version of the package that you downloaded.
1. Download the Deb installation package.
<PkgListV3 type={6}/>
2. In the directory where the package is located, use `dpkg` to install the package:
</TabItem>
> Please replace `<version>` with the corresponding version of the package downloaded
<TabItem label=".tar.gz" value="tarinst">
```bash
sudo dpkg -i TDengine-server-<version>-Linux-x64.deb
```
1. Download the desired .tar.gz package from the following list:
<PkgListV3 type={0}/>
2. Run the following command to decompress the package:
</TabItem>
```bash
tar -zxvf TDengine-server-<version>-Linux-x64.tar.gz
```
Replace `<version>` with the version of the package that you downloaded.
3. In the directory where you decompressed the package, run the following command to install TDengine:
<TabItem label=".rpm" value="rpminst">
```bash
sudo ./install.sh
```
1. Download the .rpm installation package.
<PkgListV3 type={5}/>
2. In the directory where the package is located, use rpm to install the package:
:::note
The `install.sh` script requires you to enter configuration information in the terminal. For a non-interactive installation, run `./install.sh -e no`. You can run `./install.sh -h` for detailed information about all parameters.
:::
> Please replace `<version>` with the corresponding version of the package downloaded
</TabItem>
```bash
sudo rpm -ivh TDengine-server-<version>-Linux-x64.rpm
```
<TabItem label="APT" value="apt-get">
</TabItem>
1. Configure the package repository:
<TabItem label=".tar.gz" value="tarinst">
```bash
wget -qO - http://repos.taosdata.com/tdengine.key | sudo apt-key add -
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-stable stable main" | sudo tee /etc/apt/sources.list.d/tdengine-stable.list
```
1. Download the .tar.gz installation package.
<PkgListV3 type={0}/>
2. In the directory where the package is located, use `tar` to decompress the package:
2. Update the list of available packages and install TDengine.
> Please replace `<version>` with the corresponding version of the package downloaded
```bash
sudo apt-get update
apt-cache policy tdengine
sudo apt-get install tdengine
```
```bash
tar -zxvf TDengine-server-<version>-Linux-x64.tar.gz
```
</TabItem>
In the directory to which the package was decompressed, run `install.sh`:
<TabItem label="Windows" value="windows">
```bash
sudo ./install.sh
```
:::note
:::info
Users will be prompted to enter some configuration information when install.sh is executing. The interactive mode can be disabled by executing `./install.sh -e no`. `./install.sh -h` can show all parameters with detailed explanation.
:::
This procedure installs the TDengine OSS client on Windows. The TDengine OSS server does not support Windows.
</TabItem>
:::
<TabItem value="apt-get" label="apt-get">
You can use `apt-get` to install TDengine from the official package repository.
1. Download the Windows installation package:
<PkgListV3 type={3}/>
2. Run the installation package to install TDengine.
**Configure the package repository**
</TabItem>
```bash
wget -qO - http://repos.taosdata.com/tdengine.key | sudo apt-key add -
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-stable stable main" | sudo tee /etc/apt/sources.list.d/tdengine-stable.list
```
<TabItem label="macOS" value="macos">
You can install beta versions by configuring the following repository:
1. Download the desired installation package from the following list:
<PkgListV3 type={7}/>
2. Run the installation package to install TDengine.
```bash
wget -qO - http://repos.taosdata.com/tdengine.key | sudo apt-key add -
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-beta beta main" | sudo tee /etc/apt/sources.list.d/tdengine-beta.list
```
:::note
If the installation is blocked, right-click on the package and choose **Open**.
:::
**Install TDengine with `apt-get`**
</TabItem>
</Tabs>
```bash
sudo apt-get update
apt-cache policy tdengine
sudo apt-get install tdengine
```
2. When installing the first node and prompted with `Enter FQDN:`, you do not need to input anything. Only when installing the second or subsequent nodes do you need to input the FQDN of any available node in the existing cluster to join the new node to the cluster. Alternatively, you can configure it in the new node's configuration file before starting.
:::tip
This installation method is supported only for Debian and Ubuntu.
:::
</TabItem>
<TabItem label="Windows" value="windows">
3. Select your operating system and follow the steps to start TDengine services.
**Note**
- TDengine only supports Windows Server 2016/2019 and Windows 10/11 on the Windows platform.
- Since TDengine 3.1.0.0, we wonly provide client package for Windows. If you need to run TDenginer server on Windows, please contact TDengine sales team to upgrade to TDengine Enterprise.
- To run on Windows, the Microsoft Visual C++ Runtime library is required. If the Microsoft Visual C++ Runtime Library is missing on your platform, you can download and install it from [VC Runtime Library](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170).
<Tabs>
<TabItem label="Linux" value="linux">
Follow the steps below:
Run the following command to start all TDengine services:
1. Download the Windows installation package.
<PkgListV3 type={3}/>
2. Run the downloaded package to install TDengine.
Note: From version 3.0.1.7, only TDengine client pacakge can be downloaded for Windows platform. If you want to run TDengine servers on Windows, please contact our sales team to upgrade to TDengine Enterprise.
```bash
sudo start-all.sh
```
Alternatively, you can manage specific TDengine services through systemd:
```bash
sudo systemctl start taosd
sudo systemctl start taosadapter
sudo systemctl start taoskeeper
sudo systemctl start taos-explorer
```
:::note
</TabItem>
<TabItem label="macOS" value="macos">
If your machine does not support systemd, you can manually run the TDengine services located in the `/usr/local/taos/bin` directory.
1. Download the macOS installation package.
<PkgListV3 type={7}/>
2. Run the downloaded package to install TDengine. If the installation is blocked, you can right-click or ctrl-click on the installation package and select `Open`.
:::
</TabItem>
</Tabs>
</TabItem>
:::info
For information about TDengine other releases, check [Release History](../../releases/tdengine).
:::
<TabItem label="macOS" value="macos">
Run the following command to start all TDengine services:
:::note
On the first node in your TDengine cluster, leave the `Enter FQDN:` prompt blank and press **Enter**. On subsequent nodes, you can enter the endpoint of the first dnode in the cluster. You can also configure this setting after you have finished installing TDengine.
```bash
sudo start-all.sh
```
:::
Alternatively, you can manage specific TDengine services with the `launchctl` command:
## Quick Launch
```bash
sudo launchctl start com.tdengine.taosd
sudo launchctl start com.tdengine.taosadapter
sudo launchctl start com.tdengine.taoskeeper
sudo launchctl start com.tdengine.taos-explorer
```
<Tabs>
<TabItem label="Linux" value="linux">
</TabItem>
</Tabs>
You can now work with TDengine on your local machine. For example, you can run the `taos` command to open the TDengine command-line interface.
After the installation is complete, run the following command to start the TDengine service:
## What to Do Next
```bash
systemctl start taosd
systemctl start taosadapter
systemctl start taoskeeper
systemctl start taos-explorer
```
### Test Data Ingestion
Or you can run a scrip to start all the above services together
Your TDengine installation includes taosBenchmark, a tool specifically designed to test TDengines performance. taosBenchmark can simulate data generated by many devices with a wide range of configuration options so that you can perform tests on sample data similar to your real-world use cases. For more information about taosBenchmark, see [taosBenchmark](../../tdengine-reference/tools/taosbenchmark/).
```bash
start-all.sh
```
Perform the following steps to use taosBenchmark to test TDengine's ingestion performance on your machine:
systemctl can also be used to stop, restart a specific service or check its status, like below using `taosd` as example:
1. Run taosBenchmark with the default settings:
```bash
systemctl start taosd
systemctl stop taosd
systemctl restart taosd
systemctl status taosd
```
```bash
taosBenchmark -y
```
:::info
taosBenchmark automatically creates the `test` database and the `meters` supertable inside that database. This supertable contains 10,000 subtables, named `d0` to `d9999`, with each subtable containing 10,000 records. Each record includes the following four metrics:
- The `systemctl` command requires _root_ privileges. If you are not logged in as the _root_ user, use the `sudo` command.
- The `systemctl stop taosd` command does not instantly stop TDengine Server. The server is stopped only after all data in memory is flushed to disk. The time required depends on the cache size.
- If your system does not include `systemd`, you can run `/usr/local/taos/bin/taosd` to start TDengine manually.
- `ts` (timestamp), ranging from `2017-07-14 10:40:00 000" to "2017-07-14 10:40:09 999`
- `current`
- `voltage`
- `phase`
:::
Each subtable also has the following two tags:
</TabItem>
- `groupId`, ranging from `1` to `10`
- `location`, indicating a city and state such as `California.Campbell` or `California.Cupertino`
<TabItem label="Windows" value="windows">
When the ingestion process is finished, taosBenchmark outputs the time taken to ingest the specified sample data. From this, you can estimate how TDengine would perform on your system in a production environment.
After the installation is complete, please run `sc start taosd` or run `C:\TDengine\taosd.exe` with administrator privilege to start TDengine Server. Please run `sc start taosadapter` or run `C:\TDengine\taosadapter.exe` with administrator privilege to start taosAdapter to provide http/REST service.
### Test Data Querying
</TabItem>
After inserting data with taosBenchmark as described above, you can use the TDengine CLI to test TDengine's query performance on your machine:
<TabItem label="macOS" value="macos">
1. Start the TDengine CLI:
After the installation is complete, double-click the /applications/TDengine to start the program, or run `sudo launchctl start ` to start TDengine services.
```bash
taos
```
```bash
sudo launchctl start com.tdengine.taosd
sudo launchctl start com.tdengine.taosadapter
sudo launchctl start com.tdengine.taoskeeper
sudo launchctl start com.tdengine.taos-explorer
```
2. Query the total number of records in the `meters` supertable:
Or you can run a scrip to start all the above services together
```bash
start-all.sh
```
```sql
SELECT COUNT(*) FROM test.meters;
```
The following `launchctl` commands can help you manage TDengine service, using `taosd` service as an example below:
3. Query the average, maximum, and minimum values of 100 million records:
```bash
sudo launchctl start com.tdengine.taosd
sudo launchctl stop com.tdengine.taosd
sudo launchctl list | grep taosd
sudo launchctl print system/com.tdengine.taosd
```
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters;
```
:::info
- Please use `sudo` to run `launchctl` to manage _com.tdengine.taosd_ with administrator privileges.
- The administrator privilege is required for service management to enhance security.
- Troubleshooting:
- The first column returned by the command `launchctl list | grep taosd` is the PID of the program. If it's `-`, that means the TDengine service is not running.
- If the service is abnormal, please check the `launchd.log` file from the system log or the `taosdlog` from the `/var/log/taos directory` for more information.
4. Query the total number of records where the value of the `location` tag is `California.SanFrancisco`:
:::
```sql
SELECT COUNT(*) FROM test.meters WHERE location = "California.SanFrancisco";
```
5. Query the average, maximum, and minimum values of all records where the value of the `groupId` tag is `10`:
</TabItem>
</Tabs>
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters WHERE groupId = 10;
```
6. Calculate the average, maximum, and minimum values for the `d1001` table every 10 seconds:
## TDengine Command Line Interface
You can use the TDengine CLI to monitor your TDengine deployment and execute ad hoc queries. To open the CLI, you can execute `taos` (Linux/Mac) or `taos.exe` (Windows) in terminal. The prompt of TDengine CLI is like below:
```cmd
taos>
```
Using TDengine CLI, you can create and delete databases and tables and run all types of queries. Each SQL command must be end with a semicolon (;). For example:
```sql
CREATE DATABASE demo;
USE demo;
CREATE TABLE t (ts TIMESTAMP, speed INT);
INSERT INTO t VALUES ('2019-07-15 00:00:00', 10);
INSERT INTO t VALUES ('2019-07-15 01:00:00', 20);
SELECT * FROM t;
ts | speed |
========================================
2019-07-15 00:00:00.000 | 10 |
2019-07-15 01:00:00.000 | 20 |
Query OK, 2 row(s) in set (0.003128s)
```
You can also can monitor the deployment status, add and remove user accounts, and manage running instances. You can run the TDengine CLI on either machines. For more information, see [TDengine CLI](../../reference/components/taos-shell/).
## TDengine Graphic User Interface
From TDengine 3.3.0.0, there is a new componenet called `taos-explorer` added in the TDengine docker image. You can use it to manage the databases, super tables, child tables, and data in your TDengine system. There are also some features only available in TDengine Enterprise Edition, please contact TDengine sales team in case you need these features.
To use taos-explorer in the container, you need to access the host port mapped from container port 6060. Assuming the host name is abc.com, and the port used on host is 6060, you need to access `http://abc.com:6060`. taos-explorer uses port 6060 by default in the container. When you use it the first time, you need to register with your enterprise email, then can logon using your user name and password in the TDengine
## Test data insert performance
After your TDengine Server is running normally, you can run the taosBenchmark utility to test its performance:
Start TDengine service and execute `taosBenchmark` (formerly named `taosdemo`) in a terminal.
```bash
taosBenchmark
```
This command creates the `meters` supertable in the `test` database. In the `meters` supertable, it then creates 10,000 subtables named `d0` to `d9999`. Each table has 10,000 rows and each row has four columns: `ts`, `current`, `voltage`, and `phase`. The timestamps of the data in these columns range from 2017-07-14 10:40:00 000 to 2017-07-14 10:40:09 999. Each table is randomly assigned a `groupId` tag from 1 to 10 and a `location` tag of either `California.Campbell`, `California.Cupertino`, `California.LosAngeles`, `California.MountainView`, `California.PaloAlto`, `California.SanDiego`, `California.SanFrancisco`, `California.SanJose`, `California.SantaClara` or `California.Sunnyvale`.
The `taosBenchmark` command creates a deployment with 100 million data points that you can use for testing purposes. The time required to create the deployment depends on your hardware. On most modern servers, the deployment is created in ten to twenty seconds.
You can customize the test deployment that taosBenchmark creates by specifying command-line parameters. For information about command-line parameters, run the `taosBenchmark --help` command. For more information about taosBenchmark, see [taosBenchmark](../../reference/components/taosbenchmark).
## Test data query performance
After using `taosBenchmark` to create your test deployment, you can run queries in the TDengine CLI to test its performance:
From the TDengine CLI (taos) query the number of rows in the `meters` supertable:
```sql
SELECT COUNT(*) FROM test.meters;
```
Query the average, maximum, and minimum values of all 100 million rows of data:
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters;
```
Query the number of rows whose `location` tag is `California.SanFrancisco`:
```sql
SELECT COUNT(*) FROM test.meters WHERE location = "California.SanFrancisco";
```
Query the average, maximum, and minimum values of all rows whose `groupId` tag is `10`:
```sql
SELECT AVG(current), MAX(voltage), MIN(phase) FROM test.meters WHERE groupId = 10;
```
Query the average, maximum, and minimum values for table `d10` in 10 second intervals:
```sql
SELECT FIRST(ts), AVG(current), MAX(voltage), MIN(phase) FROM test.d10 INTERVAL(10s);
```
In the query above you are selecting the first timestamp (ts) in the interval, another way of selecting this would be `\_wstart` which will give the start of the time window. For more information about windowed queries, see [Time-Series Extensions](../../reference/taos-sql/distinguished/).
```sql
SELECT _wstart, AVG(current), MAX(voltage), MIN(phase) FROM test.d1001 INTERVAL(10s);
```

View File

@ -0,0 +1,42 @@
---
sidebar_label: Use TDengine Cloud
title: Get Started with TDengine Cloud
slug: /get-started/use-tdengine-cloud
---
TDengine Cloud is a fully managed cloud service for industrial big data. It delivers all features of TDengine Enterprise as a cloud-native solution in Amazon Web Services, Microsoft Azure, or Google Cloud Platform.
You can register for a TDengine Cloud account for free and automatically obtain a one-month free trial to test TDengine Cloud for yourself.
## Procedure
1. Register for a TDengine Cloud account.
1. In a web browser, open the [TDengine Cloud](https://cloud.tdengine.com) website.
2. In the **Sign up** section, enter your name and company email address.
3. Click **Get Confirmation Code**. A confirmation email is sent to your email address.
4. Copy the 6-digit confirmation code from the email and paste it into the **Confirmation Code** field.
5. Click **Sign in TDengine Cloud**.
6. On the page displayed, enter your name, company, country of residence, and phone number.
7. Specify a password and click **Continue**.
2. Determine whether you want to use any public databases and click **Next**.
The TDengine DB Mart includes several public databases that you can use for testing purposes. To enable access to a public database in your account, select the toggle. You can modify these settings after the account creation process is finished.
3. Create an organization.
1. Enter a name for your organization in TDengine Cloud. This name must be unique.
2. Specify whether to enable single sign-on (SSO).
- Select **Public** to use GitHub, Microsoft, or Google SSO.
- Select **Azure AD** to use Microsoft Entra ID. Enter the Azure domain, client ID, and client secret as prompted.
3. Click **Next**.
4. Create your first instance.
1. Select a cloud and region from the drop-down lists.
2. Enter a name for your instance.
3. Specify whether to enable high availability.
4. Specify whether to create a sample database.
5. Click **Select Plan** and select your desired price plan.
6. Click **Create**.
Your instance is created according to your specifications and you can begin to use TDengine Cloud. For more information, see the [TDengine Cloud documentation](/cloud).

View File

@ -1,26 +0,0 @@
You can use `apt-get` to install TDengine from the official package repository.
**Configure the package repository**
```
wget -qO - http://repos.taosdata.com/tdengine.key | sudo apt-key add -
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-stable stable main" | sudo tee /etc/apt/sources.list.d/tdengine-stable.list
```
You can install beta versions by configuring the following package repository:
```
echo "deb [arch=amd64] http://repos.taosdata.com/tdengine-beta beta main" | sudo tee /etc/apt/sources.list.d/tdengine-beta.list
```
**Install TDengine with `apt-get`**
```
sudo apt-get update
apt-cache policy tdengine
sudo apt-get install tdengine
```
:::tip
This installation method is supported only for Debian and Ubuntu.
::::

View File

@ -1,17 +0,0 @@
import PkgList from "/components/PkgList";
TDengine is easy to download and install.
The standard server installation package includes `taos`, `taosd`, `taosAdapter`, `taosBenchmark`, and sample code. You can also download a lite package that includes only `taosd` and the C/C++ client library.
You can download the TDengine installation package in .rpm, .deb, or .tar.gz format. The .tar.tz package includes `taosdump` and the TDinsight installation script. If you want to use these utilities with the .deb or .rpm package, download and install taosTools separately.
Between official releases, beta versions may be released that contain new features. Do not use beta versions for production or testing environments. Select the installation package appropriate for your system.
<PkgList type={0}/>
For information about installing TDengine, see [Install and Uninstall](../operation/pkg-install).
For information about TDengine releases, see [All Downloads](https://tdengine.com/all-downloads)
and [Release Notes](https://github.com/taosdata/TDengine/releases).

View File

@ -1,7 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-0.5 -1 32 32" width="50" height="50">
<g fill="#5865f2">
<path
d="M26.0015 6.9529C24.0021 6.03845 21.8787 5.37198 19.6623 5C19.3833 5.48048 19.0733 6.13144 18.8563 6.64292C16.4989 6.30193 14.1585 6.30193 11.8336 6.64292C11.6166 6.13144 11.2911 5.48048 11.0276 5C8.79575 5.37198 6.67235 6.03845 4.6869 6.9529C0.672601 12.8736 -0.41235 18.6548 0.130124 24.3585C2.79599 26.2959 5.36889 27.4739 7.89682 28.2489C8.51679 27.4119 9.07477 26.5129 9.55525 25.5675C8.64079 25.2265 7.77283 24.808 6.93587 24.312C7.15286 24.1571 7.36986 23.9866 7.57135 23.8161C12.6241 26.1255 18.0969 26.1255 23.0876 23.8161C23.3046 23.9866 23.5061 24.1571 23.7231 24.312C22.8861 24.808 22.0182 25.2265 21.1037 25.5675C21.5842 26.5129 22.1422 27.4119 22.7621 28.2489C25.2885 27.4739 27.8769 26.2959 30.5288 24.3585C31.1952 17.7559 29.4733 12.0212 26.0015 6.9529ZM10.2527 20.8402C8.73376 20.8402 7.49382 19.4608 7.49382 17.7714C7.49382 16.082 8.70276 14.7025 10.2527 14.7025C11.7871 14.7025 13.0425 16.082 13.0115 17.7714C13.0115 19.4608 11.7871 20.8402 10.2527 20.8402ZM20.4373 20.8402C18.9183 20.8402 17.6768 19.4608 17.6768 17.7714C17.6768 16.082 18.8873 14.7025 20.4373 14.7025C21.9717 14.7025 23.2271 16.082 23.1961 17.7714C23.1961 19.4608 21.9872 20.8402 20.4373 20.8402Z"
></path>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 1.3 KiB

View File

@ -1,6 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-1 -2 18 18" width="50" height="50">
<path
fill="#000"
d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"
></path>
</svg>

Before

Width:  |  Height:  |  Size: 705 B

View File

@ -1,43 +1,10 @@
---
title: Get Started
description: This document describes how to install TDengine on various platforms.
slug: /get-started
---
import GitHubSVG from './github.svg'
import DiscordSVG from './discord.svg'
import TwitterSVG from './twitter.svg'
import YouTubeSVG from './youtube.svg'
import LinkedInSVG from './linkedin.svg'
import StackOverflowSVG from './stackoverflow.svg'
This section describes how to set up a TDengine environment quickly using Docker or installation packages and experience its capabilities.
You can install and run TDengine on Linux/Windows/macOS machines as well as Docker containers. You can also deploy TDengine as a managed service with TDengine Cloud.
The full package of TDengine includes the TDengine Server (`taosd`), TDengine Client (`taosc`), taosAdapter for connecting with third-party systems and providing a RESTful interface, a command-line interface, and some tools. In addition to client libraries for multiple languages, TDengine also provides a [RESTful interface](../reference/connectors/rest-api) through [taosAdapter](../reference/components/taosadapter).
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```
## Join TDengine Community
<table width="100%">
<tr align="center" style={{border:0}}>
<td width="16%" style={{border:0}}><a href="https://github.com/taosdata/TDengine" target="_blank"><GitHubSVG /></a></td>
<td width="16%" style={{border:0}}><a href="https://discord.com/invite/VZdSuUg4pS" target="_blank"><DiscordSVG /></a></td>
<td width="16%" style={{border:0}}><a href="https://twitter.com/TDengineDB" target="_blank"><TwitterSVG /></a></td>
<td width="16%" style={{border:0}}><a href="https://www.youtube.com/@tdengine" target="_blank"><YouTubeSVG /></a></td>
<td width="16%" style={{border:0}}><a href="https://www.linkedin.com/company/tdengine" target="_blank"><LinkedInSVG /></a></td>
<td width="16%" style={{border:0}}><a href="https://stackoverflow.com/questions/tagged/tdengine" target="_blank"><StackOverflowSVG /></a></td>
</tr>
<tr align="center" style={{border:0,backgroundColor:'transparent'}}>
<td width="16%" style={{border:0,padding:0}}><a href="https://github.com/taosdata/TDengine" target="_blank">Star GitHub</a></td>
<td width="16%" style={{border:0,padding:0}}><a href="https://discord.com/invite/VZdSuUg4pS" target="_blank">Join Discord</a></td>
<td width="16%" style={{border:0,padding:0}}><a href="https://twitter.com/TDengineDB" target="_blank">Follow Twitter</a></td>
<td width="16%" style={{border:0,padding:0}}><a href="https://www.youtube.com/@tdengine" target="_blank">Subscribe YouTube</a></td>
<td width="16%" style={{border:0,padding:0}}><a href="https://www.linkedin.com/company/tdengine" target="_blank">Follow LinkedIn</a></td>
<td width="16%" style={{border:0,padding:0}}><a href="https://stackoverflow.com/questions/tagged/tdengine" target="_blank">Ask StackOverflow</a></td>
</tr>
</table>
- To deploy TDengine in a container, see [Get Started with TDengine Using Docker](deploy-in-docker/).
- To install TDengine on a local server, see [Get Started with TDengine Using an Installation Package](deploy-from-package/).
- To use TDengine as a fully managed cloud service instead of deploying on your own, see [Get Started with TDengine Cloud](use-tdengine-cloud/).

View File

@ -1,6 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 -2 24 24" width="50" height="50">
<path
fill="rgb(10, 102, 194)"
d="M20.5 2h-17A1.5 1.5 0 002 3.5v17A1.5 1.5 0 003.5 22h17a1.5 1.5 0 001.5-1.5v-17A1.5 1.5 0 0020.5 2zM8 19H5v-9h3zM6.5 8.25A1.75 1.75 0 118.3 6.5a1.78 1.78 0 01-1.8 1.75zM19 19h-3v-4.74c0-1.42-.6-1.93-1.38-1.93A1.74 1.74 0 0013 14.19a.66.66 0 000 .14V19h-3v-9h2.9v1.3a3.11 3.11 0 012.7-1.4c1.55 0 3.36.86 3.36 3.66z"
></path>
</svg>

Before

Width:  |  Height:  |  Size: 461 B

View File

@ -1,7 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-8 0 48 48" width="50" height="50">
<path d="M26 41v-9h4v13H0V32h4v9h22z" fill="#BCBBBB" />
<path
d="M23 34l.8-3-16.1-3.3L7 31l16 3zM9.2 23.2l15 7 1.4-3-15-7-1.4 3zm4.2-7.4L26 26.4l2.1-2.5-12.7-10.6-2.1 2.5zM21.5 8l-2.7 2 9.9 13.3 2.7-2L21.5 8zM7 38h16v-3H7v3z"
fill="#F48024"
/>
</svg>

Before

Width:  |  Height:  |  Size: 350 B

View File

@ -1,7 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 -2 24 24" width="50" height="50">
<g fill="rgb(29, 155, 240)">
<path
d="M23.643 4.937c-.835.37-1.732.62-2.675.733.962-.576 1.7-1.49 2.048-2.578-.9.534-1.897.922-2.958 1.13-.85-.904-2.06-1.47-3.4-1.47-2.572 0-4.658 2.086-4.658 4.66 0 .364.042.718.12 1.06-3.873-.195-7.304-2.05-9.602-4.868-.4.69-.63 1.49-.63 2.342 0 1.616.823 3.043 2.072 3.878-.764-.025-1.482-.234-2.11-.583v.06c0 2.257 1.605 4.14 3.737 4.568-.392.106-.803.162-1.227.162-.3 0-.593-.028-.877-.082.593 1.85 2.313 3.198 4.352 3.234-1.595 1.25-3.604 1.995-5.786 1.995-.376 0-.747-.022-1.112-.065 2.062 1.323 4.51 2.093 7.14 2.093 8.57 0 13.255-7.098 13.255-13.254 0-.2-.005-.402-.014-.602.91-.658 1.7-1.477 2.323-2.41z"
></path>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 772 B

View File

@ -1,11 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="-2 -8 32 32" width="50" height="50">
<g>
<g>
<path
d="M27.9727 3.12324C27.6435 1.89323 26.6768 0.926623 25.4468 0.597366C23.2197 2.24288e-07 14.285 0 14.285 0C14.285 0 5.35042 2.24288e-07 3.12323 0.597366C1.89323 0.926623 0.926623 1.89323 0.597366 3.12324C2.24288e-07 5.35042 0 10 0 10C0 10 2.24288e-07 14.6496 0.597366 16.8768C0.926623 18.1068 1.89323 19.0734 3.12323 19.4026C5.35042 20 14.285 20 14.285 20C14.285 20 23.2197 20 25.4468 19.4026C26.6768 19.0734 27.6435 18.1068 27.9727 16.8768C28.5701 14.6496 28.5701 10 28.5701 10C28.5701 10 28.5677 5.35042 27.9727 3.12324Z"
fill="#FF0000"
></path>
<path d="M11.4253 14.2854L18.8477 10.0004L11.4253 5.71533V14.2854Z" fill="white"></path>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 801 B

View File

@ -0,0 +1,185 @@
---
sidebar_label: Data Model
title: Understand the TDengine Data Model
slug: /basic-features/data-model
---
import Image from '@theme/IdealImage';
import dataModel from '../assets/data-model-01.png';
This document describes the data model and provides definitions of terms and concepts used in TDengine.
The TDengine data model is illustrated in the following figure.
<figure>
<Image img={dataModel} alt="Data Model Diagram"/>
<figcaption>Figure 1. The TDengine data model</figcaption>
</figure>
## Terminology
### Metric
A metric is a measurement obtained from a data collection point. With smart meters, for example, current, voltage, and phase are typical metrics.
### Tag
A tag is a static attribute associated with a data collection point and that does not typically change over time, such as device model, color, or location. With smart meters, for example, location and group ID are typical tags.
### Data Collection Point
A data collection point (DCP) is a hardware or software device responsible for collecting metrics at a predetermined time interval or upon a specific event trigger. A DCP can collect one or more metrics simultaneously, but all metrics from each DCP share the same timestamp.
Complex devices often have multiple DCPs, each with its own collection cycle, operating independently of each other. For example, in a car, one DCP may collect GPS data while a second monitors the engine status and a third monitors the interior environment.
### Table
A table in TDengine consists of rows of data and columns defining the type of data, like in the traditional relational database model. However, TDengine stores the data from each DCP in a separate table. This is known as the "one table per DCP" model and is a unique feature of TDengine.
Note that for complex devices like cars, which have multiple DCPs, this means that multiple tables are created for a single device.
Typically, the name of a DCP is stored as the name of the table, not as a separate tag. The `tbname` pseudocolumn is used for filtering by table name. Each metric collected by and tag associated with the DCP is represented as a column in the table, and each column has a defined data type. The first column in the table must be the timestamp, which is used to build an index.
TDengine includes two types of tables: subtables, which are created within a supertable, and basic tables, which are independent of supertables.
### Basic Table
A basic table cannot contain tags and does not belong to any supertable. The functionality of a basic table is similar to that of a table in a relational database management system.
### Supertable
A supertable is a data structure that groups together a specific type of DCP into a logical unified table. The tables created within a supertable are known as subtables. All subtables within a supertable have the same schema. The supertable is a unique concept in TDengine that simplifies table management and facilitates aggregation across DCPs.
Each supertable contains at least one timestamp column, at least one metric column, and at least one tag column. Tags in a supertable can be added, modified, or deleted at any time without creating new time series. Note that data is not stored within a supertable, but within the subtables in that supertable.
With smart meters, for example, one supertable would be created for all smart meters. Within that supertable, one subtable is created for each smart meter. For more information, see [TDengine Concepts: Supertable](https://tdengine.com/tdengine-concepts-supertable/).
### Subtable
A subtable is created within a supertable and inherits the schema of the supertable. The schema of a subtable cannot be modified; any modifications to the supertable schema affect all subtables that it contains.
### Database
A database defines storage policies for its supertables and tables. Each database can contain one or more supertables, but each supertable or table belongs to only one database.
A single TDengine deployment can contain multiple databases with different policies. You can create multiple databases to achieve finer-grained data management and optimization.
### Timestamp
Timestamps are a complex but essential part of time-series data management. TDengine stores timestamps in Unix time, which represents the number of milliseconds elapsed since the Unix epoch of January 1, 1970 at 00:00 UTC. However, when an application queries data in TDengine, the TDengine client automatically converts the timestamp to the local time zone of the application.
- When TDengine ingests a timestamp in RFC 3339 format, for example `2018-10-03T14:38:05.000+08:00`, the time zone specified in the timestamp is used to convert the timestamp to Unix time.
- When TDengine ingests a timestamp that does not contain time zone information, the local time zone of the application is used to convert the timestamp to Unix time.
## Sample Data
In this documentation, a smart meters scenario is used as sample data. The smart meters in this scenario collect three metrics, current, voltage, and phase; and two tags, location and group ID. The device ID of each smart meter is used as its table name.
An example of data collected by these smart meters is shown in the following table.
| Device ID | Timestamp | Current | Voltage | Phase | Location | Group ID |
| :-------: | :-----------: | :-----: | :-----: | :---: | :---------------------: | :------: |
| d1001 | 1538548685000 | 10.3 | 219 | 0.31 | California.SanFrancisco | 2 |
| d1002 | 1538548684000 | 10.2 | 220 | 0.23 | California.SanFrancisco | 3 |
| d1003 | 1538548686500 | 11.5 | 221 | 0.35 | California.LosAngeles | 3 |
| d1004 | 1538548685500 | 13.4 | 223 | 0.29 | California.LosAngeles | 2 |
| d1001 | 1538548695000 | 12.6 | 218 | 0.33 | California.SanFrancisco | 2 |
| d1004 | 1538548696600 | 11.8 | 221 | 0.28 | California.LosAngeles | 2 |
| d1002 | 1538548696650 | 10.3 | 218 | 0.25 | California.SanFrancisco | 3 |
| d1001 | 1538548696800 | 12.3 | 221 | 0.31 | California.SanFrancisco | 2 |
## Data Management
This section describes how to create databases, supertables, and tables to store your data in TDengine.
### Create a Database
You use the `CREATE DATABASE` statement to create a database:
```sql
CREATE DATABASE power PRECISION 'ms' KEEP 3650 DURATION 10 BUFFER 16;
```
The name of the database created is `power` and its parameters are explained as follows:
- `PRECISION 'ms'`: The time-series data in this database uses millisecond-precision timestamps.
- `KEEP 3650`: The data in this database is retained for 3650 days. Any data older than 3650 days is automatically deleted.
- `DURATION 10`: Each data file contains 10 days of data.
- `BUFFER 16`: A 16 MB memory buffer is used for data ingestion.
For a list of all database parameters, see [Manage Databases](../../tdengine-reference/sql-manual/manage-databases/).
You use the `USE` statement to set a current database:
```sql
USE power;
```
This SQL statement switches the current database to `power`, meaning that subsequent statements are performed within the `power` database.
### Create a Supertable
You use the `CREATE STABLE` statement to create a supertable.
```sql
CREATE STABLE meters (
ts TIMESTAMP,
current FLOAT,
voltage INT,
phase FLOAT
) TAGS (
location VARCHAR(64),
group_id INT
);
```
The name of the supertable created is `meters` and the parameters following the name define the columns in the supertable. Each column is defined as a name and a data type. The first group of columns are metrics and the second group, following the `TAGS` keyword, are tags.
:::note
- The first metric column must be of type `TIMESTAMP`.
- Metric columns and tag columns cannot have the same name.
:::
### Create a Subtable
You use the `CREATE TABLE` statement with the `USING` keyword to create a subtable:
```sql
CREATE TABLE d1001
USING meters (
location,
group_id
) TAGS (
"California.SanFrancisco",
2
);
```
The name of the subtable created is `d1001` and it is created within the `meters` supertable. The `location` and `group_id` tag columns are used in this subtable, and their values are set to `California.SanFrancisco` and `2`, respectively.
Note that when creating a subtable, you can specify values for all or a subset of tag columns in the target supertable. However, these tag columns must already exist within the supertable.
### Create a Basic Table
You use the `CREATE TABLE` statement to create a basic table.
```sql
CREATE TABLE d1003(
ts TIMESTAMP,
current FLOAT,
voltage INT,
phase FLOAT,
location VARCHAR(64),
group_id INT
);
```
The name of the basic table is `d1003` and it includes the columns `ts`, `current`, `voltage`, `phase`, `location`, and `group_id`. Note that this table is not associated with any supertable and its metric and tag columns are not separate.
## Multi-Column Model vs. Single-Column Model
Typically, each supertable in TDengine contains multiple columns, one for each metric and one for each tag. However, in certain scenarios, it can be preferable to create supertables that contain only one column.
For example, when the types of metrics collected by a DCP frequently change, the standard multi-column model would require frequent modifications to the schema of the supertable. In this situation, creating one supertable per metric may offer improved performance.

View File

@ -0,0 +1,127 @@
---
sidebar_label: Data Ingestion
title: Ingest, Update, and Delete Data
slug: /basic-features/data-ingestion
---
This document describes how to insert, update, and delete data using SQL. The databases and tables used as examples in this document are defined in [Sample Data](../data-model/#sample-data).
TDengine can also ingest data from various data collection tools. For more information, see [Integrate with Data Collection Tools](../../third-party-tools/data-collection/).
## Ingest Data
You use the `INSERT` statement to ingest data into TDengine. You can ingest one or more records into one or more tables.
### Insert a Record
The following SQL statement inserts one record into the `d1001` subtable:
```sql
INSERT INTO d1001 (ts, current, voltage, phase) VALUES ("2018-10-03 14:38:05", 10.3, 219, 0.31);
```
In this example, a smart meter with device ID `d1001` collected data on October 3, 2018 at 14:38:05. The data collected indicated a current of 10.3 A, voltage of 219 V, and phase of 0.31. The SQL statement provided inserts data into the `ts`, `current`, `voltage`, and `phase` columns of subtable `d1001` with the values `2018-10-03 14:38:05`, `10.3`, `219`, and `0.31`, respectively.
Note that when inserting data into every column of a subtable at once, you can omit the column list. The following SQL statement therefore achieves the same result:
```sql
INSERT INTO d1001 VALUES ("2018-10-03 14:38:05", 10.3, 219, 0.31);
```
Also note that timestamps can be inserted in Unix time if desired:
```sql
INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);
```
### Insert Multiple Records
The following SQL statement inserts multiple records into the `d1001` subtable:
```sql
INSERT INTO d1001 VALUES
("2018-10-03 14:38:05", 10.2, 220, 0.23),
("2018-10-03 14:38:15", 12.6, 218, 0.33),
("2018-10-03 14:38:25", 12.3, 221, 0.31);
```
This method can be useful in scenarios where a data collection point (DCP) collects data faster than it reports data. In this example, the smart meter with device ID `d1001` collects data every 10 seconds but reports data every 30 seconds, meaning that three records need to be inserted every 30 seconds.
### Insert into Multiple Tables
The following SQL statement inserts three records each into the `d1001`, `d1002`, and `d1003` subtables:
```sql
INSERT INTO d1001 VALUES
("2018-10-03 14:38:05", 10.2, 220, 0.23),
("2018-10-03 14:38:15", 12.6, 218, 0.33),
("2018-10-03 14:38:25", 12.3, 221, 0.31)
d1002 VALUES
("2018-10-03 14:38:04", 10.2, 220, 0.23),
("2018-10-03 14:38:14", 10.3, 218, 0.25),
("2018-10-03 14:38:24", 10.1, 220, 0.22)
d1003 VALUES
("2018-10-03 14:38:06", 11.5, 221, 0.35),
("2018-10-03 14:38:16", 10.4, 220, 0.36),
("2018-10-03 14:38:26", 10.3, 220, 0.33);
```
### Insert into Specific Columns
The following SQL statement inserts a record containing only the `ts`, `voltage`, and `phase` columns into the `d1004`subtable:
```sql
INSERT INTO d1004 (ts, voltage, phase) VALUES ("2018-10-04 14:38:06", 223, 0.29);
```
A `NULL` value is written to any columns not included in the `INSERT` statement. Note that the timestamp column cannot be omitted and cannot be null.
### Create Subtable on Insert
It is not necessary to create subtables in advance. You can use the `INSERT` statement with the `USING` keyword to create subtables automatically:
```sql
INSERT INTO d1002 USING meters TAGS ("California.SanFrancisco", 2) VALUES (now, 10.2, 219, 0.32);
```
If the subtable `d1002` already exists, the specified metrics are inserted into the subtable. If the subtable does not exist, it is created using the `meters` subtable with the specified tag values, and the specified metrics are then inserted into it. This can be useful when creating subtables programatically for new DCPs.
### Insert via Supertable
The following statement inserts a record into the `d1001` subtable via the `meters` supertable.
```sql
INSERT INTO meters (tbname, ts, current, voltage, phase, location, group_id) VALUES ("d1001", "2018-10-03 14:38:05", 10.2, 220, 0.23, "California.SanFrancisco", 2);
```
Note that the data is not stored in the supertable itself, but in the subtable specified as the value of the `tbname` column.
## Update Data
You can update existing metric data by writing a new record with the same timestamp as the record that you want to replace:
```sql
INSERT INTO d1001 (ts, current) VALUES ("2018-10-03 14:38:05", 22);
```
This SQL statement updates the value of the `current` column at the specified time to `22`.
## Delete Data
TDengine automatically deletes expired data based on the retention period configured for your database. However, if necessary, you can manually delete data from a table.
:::warning
Deleted data cannot be recovered. Exercise caution when deleting data.
Before deleting data, run a `SELECT` statement with the same `WHERE` condition to query the data that you want to delete. Confirm that you want to delete all data returned by the `SELECT` statement, and only then run the `DELETE` statement.
:::
The following SQL statement deletes all data from supertable `meters` whose timestamp is earlier than 2021-10-01 10:40:00.100.
```sql
DELETE FROM meters WHERE ts < '2021-10-01 10:40:00.100';
```
Note that when deleting data, you can filter only on the timestamp column. Other filtering conditions are not supported.

View File

@ -0,0 +1,637 @@
---
sidebar_label: Data Querying
title: Query Data
slug: /basic-features/data-querying
---
import Image from '@theme/IdealImage';
import windowModel from '../assets/data-querying-01.png';
import slidingWindow from '../assets/data-querying-02.png';
import sessionWindow from '../assets/data-querying-03.png';
import eventWindow from '../assets/data-querying-04.png';
This document describes how to query data that is stored in TDengine. You can use the following taosBenchmark command to generate the sample data used in this document.
```shell
taosBenchmark --start-timestamp=1600000000000 --tables=100 --records=10000000 --time-step=10000
```
This command creates the `test` database in TDengine and then creates the `meters` supertable containing a timestamp column; current, voltage, and phase metrics; and group ID and location tags. The supertable contains 100 subtables, each having 10 million data records. The timestamps of these records start from September 13, 2020, 12:26:40 UTC (1600000000000 Unix time) and increase by 10 seconds per record.
## Basic Queries
You use the `SELECT` statement to query data in TDengine. To specify filtering conditions, you use the `WHERE` clause.
```sql
SELECT * FROM meters
WHERE voltage > 10
ORDER BY ts DESC
LIMIT 5;
```
This SQL statement queries all records in the meters supertable whose voltage is greater than 10. It then orders them by timestamp in descending order (latest timestamp first) and limits the output to the first five records.
The results of the query are similar to the following:
```text
ts | current | voltage | phase | groupid | location |
=================================================================================================================================
2023-11-15 06:13:10.000 | 11.2467804 | 245 | 149.5000000 | 10 | California.MountainView |
2023-11-15 06:13:10.000 | 11.2467804 | 245 | 149.5000000 | 5 | California.Sunnyvale |
2023-11-15 06:13:10.000 | 11.2467804 | 245 | 149.5000000 | 4 | California.Cupertino |
2023-11-15 06:13:10.000 | 11.2467804 | 245 | 149.5000000 | 3 | California.Sunnyvale |
2023-11-15 06:13:10.000 | 11.2467804 | 245 | 149.5000000 | 8 | California.SanDiego |
```
## Aggregation Queries
You use the `GROUP BY` clause to perform aggregation queries. This clause groups data and returns a summary row for each group. You can group data by any column in the target table or view. It is not necessary that the columns in the `GROUP BY` clause be included in the `SELECT` list.
```sql
SELECT groupid, AVG(voltage)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2023-01-01T00:00:00+00:00"
GROUP BY groupid;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and January 1, 2023 at midnight UTC exclusive. It groups the results by the value of the `groupid` tag and calculates the average of the `voltage` metric for each group.
The results of the query are similar to the following:
```text
groupid | avg(voltage) |
==========================================
8 | 244.093189053779810 |
5 | 244.093189053779810 |
1 | 244.093189053779810 |
7 | 244.093189053779810 |
9 | 244.093189053779810 |
6 | 244.093189053779810 |
4 | 244.093189053779810 |
10 | 244.093189053779810 |
2 | 244.093189053779810 |
3 | 244.093189053779810 |
```
:::note
In a query with the `GROUP BY` clause, the `SELECT` list can include only the following expressions:
- Constants
- Aggregate functions
- The same expressions as those in the `GROUP BY` clause
- Expressions that include the preceding expressions
The `GROUP BY` clause does not order the result set in any specific way when aggregating data. To obtain an ordered result set, use the `ORDER BY` clause after the `GROUP BY` clause.
:::
For information about the aggregation functions that TDengine supports, see [Aggregation Functions](../../tdengine-reference/sql-manual/functions/#aggregate-functions).
## Partitioned Queries
You use the `PARTITION BY` clause to partition data on a certain dimension and then perform calculations within each partition. You can partition data by any scalar expression, including columns, constants, scalar functions, or combinations of these.
```sql
SELECT location, AVG(voltage)
FROM meters
PARTITION BY location;
```
This SQL statement queries the `meters` supertable for all records. It partitions the results by the value of the `location` tag and calculates the average of the `voltage` metric for each partition.
The results of the query are similar to the following:
```text
location | avg(voltage) |
=========================================================
California.SantaClara | 244.093199999999996 |
California.SanFrancisco | 244.093199999999996 |
California.SanJose | 244.093199999999996 |
California.LosAngles | 244.093199999999996 |
California.SanDiego | 244.093199999999996 |
California.Sunnyvale | 244.093199999999996 |
California.PaloAlto | 244.093199999999996 |
California.Cupertino | 244.093199999999996 |
California.MountainView | 244.093199999999996 |
California.Campbell | 244.093199999999996 |
```
:::note
The `PARTITION BY` clause can be used before a `GROUP BY` clause or `WINDOW` clause. In this case, the subsequent clauses take effect on each partition.
:::
## Windowed Queries
Windowed queries partition the dataset by a window and perform aggregations on the data within each window. The following windows are supported:
- Time window
- State window
- Session window
- Event window
- Count window
The logic for windowing is shown in the following figure.
<figure>
<Image img={windowModel} alt="Windowing description"/>
<figcaption>Figure 1. Windowing logic</figcaption>
</figure>
:::note
The following conditions apply to all windowed queries:
1. You cannot use a `GROUP BY` clause with a window clause.
2. If you use a `PARTITION BY` clause in a windowed query, the `PARTITION BY` clause must occur before the window clause.
3. The expressions in the `SELECT` list for a windowed query can include only the following:
- Constants
- Pseudocolumns (`_wstart`, `_wend`, and `_wduration`)
- Aggregate functions, including selection functions and time-series-specific functions that determine the number of rows output.
This means that windowed queries cannot include the timestamp column in the `SELECT` list. Instead, use the `_wstart` and `_wend` pseudocolumns to indicate the start and end time of the window, the `_wduration` pseudocolumn to indicate the duration of the window, and the `_qstart` and `_qend` pseudocolumns to indicate the start and end time of the query.
When using these pseudocolumns, note the following:
- The start and end time of the window are inclusive.
- The window duration is expressed in the time precision configured for the database.
:::
### Time Windows
You use the `INTERVAL` clause to create a time window. In this clause, you specify the size of the time window and an optional offset.
```sql
SELECT tbname, _wstart, _wend, AVG(voltage)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:05:00+00:00"
PARTITION BY tbname
INTERVAL(1m, 5s)
SLIMIT 2;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:05 UTC exclusive. It partitions the results by table name in 1 minute windows with a 5 second offset and calculates the average of the `voltage` metric for each window. The `SLIMIT` clause then limits the results to only two partitions.
The results of this query are similar to the following:
```text
tbname | _wstart | _wend | avg(voltage) |
=================================================================================================================
d26 | 2022-01-01 07:59:05.000 | 2022-01-01 08:00:05.000 | 244.000000000000000 |
d26 | 2022-01-01 08:00:05.000 | 2022-01-01 08:01:05.000 | 244.166666666666657 |
d26 | 2022-01-01 08:01:05.000 | 2022-01-01 08:02:05.000 | 241.333333333333343 |
d26 | 2022-01-01 08:02:05.000 | 2022-01-01 08:03:05.000 | 245.166666666666657 |
d26 | 2022-01-01 08:03:05.000 | 2022-01-01 08:04:05.000 | 237.500000000000000 |
d26 | 2022-01-01 08:04:05.000 | 2022-01-01 08:05:05.000 | 240.800000000000011 |
d2 | 2022-01-01 07:59:05.000 | 2022-01-01 08:00:05.000 | 244.000000000000000 |
d2 | 2022-01-01 08:00:05.000 | 2022-01-01 08:01:05.000 | 244.166666666666657 |
d2 | 2022-01-01 08:01:05.000 | 2022-01-01 08:02:05.000 | 241.333333333333343 |
d2 | 2022-01-01 08:02:05.000 | 2022-01-01 08:03:05.000 | 245.166666666666657 |
d2 | 2022-01-01 08:03:05.000 | 2022-01-01 08:04:05.000 | 237.500000000000000 |
d2 | 2022-01-01 08:04:05.000 | 2022-01-01 08:05:05.000 | 240.800000000000011 |
```
By default, the `INTERVAL` clause creates a tumbling window in which time intervals do not overlap. You can add the `SLIDING` clause after the `INTERVAL` clause to create a sliding window.
```sql
SELECT tbname, _wstart, AVG(voltage)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:05:00+00:00"
PARTITION BY tbname
INTERVAL(1m) SLIDING(30s)
SLIMIT 1;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:05 UTC exclusive. It partitions the results by table name in 1 minute windows with a sliding time of 30 seconds and calculates the average of the `voltage` metric for each window. The `SLIMIT` clause then limits the results to only one partition.
The results of this query are similar to the following:
```text
tbname | _wstart | avg(voltage) |
=======================================================================================
d0 | 2022-01-01 07:59:30.000 | 245.666666666666657 |
d0 | 2022-01-01 08:00:00.000 | 242.500000000000000 |
d0 | 2022-01-01 08:00:30.000 | 243.833333333333343 |
d0 | 2022-01-01 08:01:00.000 | 243.666666666666657 |
d0 | 2022-01-01 08:01:30.000 | 240.166666666666657 |
d0 | 2022-01-01 08:02:00.000 | 242.166666666666657 |
d0 | 2022-01-01 08:02:30.000 | 244.500000000000000 |
d0 | 2022-01-01 08:03:00.000 | 240.500000000000000 |
d0 | 2022-01-01 08:03:30.000 | 239.333333333333343 |
d0 | 2022-01-01 08:04:00.000 | 240.666666666666657 |
d0 | 2022-01-01 08:04:30.000 | 237.666666666666657 |
```
The relationship between the time window and sliding time in a windowed query is described in the following figure.
<figure>
<Image img={slidingWindow} alt="Sliding window logic"/>
<figcaption>Figure 2. Sliding window logic</figcaption>
</figure>
:::note
The following conditions apply to time windows:
- The offset, if used, cannot be larger than the time window.
- The sliding time, if used, cannot be larger than the time window. If the sliding time is equal to the time window, a tumbling window is created instead of a sliding window.
- The minimum size of the time window is 10 milliseconds.
- The following units of time are supported:
- `b` (nanoseconds)
- `u` (microseconds)
- `a` (milliseconds)
- `s` (seconds)
- `m` (minutes)
- `h` (hours)
- `d` (days)
- `w` (weeks)
- `n` (months)
- `y` (years).
:::
:::tip
For optimal performance, ensure that the local time zone is the same on your TDengine client and server when you use time windows. Time zone conversion may cause performance deterioration.
:::
You can use the `FILL` clause to specify how to handle missing data within a time window. The following fill modes are supported:
- `FILL(NONE)`: makes no changes. This mode is used by default.
- `FILL(NULL)`: replaces missing data with null values.
- `FILL(VALUE, <val>)`: replaces missing data with a specified value. Note that the actual value is determined by the data type of the column. For example, the `FILL(VALUE, 1.23)` clause on a column of type `INT` will replace missing data with the value `1`.
- `FILL(PREV)`: replaces missing data with the previous non-null value.
- `FILL(NEXT)`: replaces missing data with the next non-null value.
- `FILL(LINEAR)`: replaces missing data by interpolating between the nearest non-null values.
Note that if there is no data within the entire query time range, the specified fill mode does not take effect and no changes are made to the data. You can use the following fill modes to forcibly replace data:
- `FILL(NULL_F)`: forcibly replaces missing data with null values.
- `FILL(VALUE_F, <val>)`: forcibly replaces missing data with the specified value.
:::note
- In a stream with the `INTERVAL` clause, `FILL(NULL_F)` and `FILL(VALUE_F)` do not take effect. It is not possible to forcibly replace missing data.
- In an `INTERP` clause, `FILL(NULL)` and `FILL(VALUE)` always forcibly replace missing data. It is not necessary to use `FILL(NULL_F)` and `FILL(VALUE_F)`.
:::
An example of the `FILL` clause is shown as follows:
```sql
SELECT tbname, _wstart, _wend, AVG(voltage)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:05:00+00:00"
PARTITION BY tbname
INTERVAL(1m) FILL(prev)
SLIMIT 2;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:05 UTC exclusive. It partitions the results by table name in 1 minute windows, replaces any missing data with the previous non-null value, and calculates the average of the `voltage` metric for each window. The `SLIMIT` clause then limits the results to two partitions.
The results of this query are similar to the following:
```text
tbname | _wstart | _wend | avg(voltage) |
=================================================================================================================
d0 | 2022-01-01 08:00:00.000 | 2022-01-01 08:01:00.000 | 242.500000000000000 |
d0 | 2022-01-01 08:01:00.000 | 2022-01-01 08:02:00.000 | 243.666666666666657 |
d0 | 2022-01-01 08:02:00.000 | 2022-01-01 08:03:00.000 | 242.166666666666657 |
d0 | 2022-01-01 08:03:00.000 | 2022-01-01 08:04:00.000 | 240.500000000000000 |
d0 | 2022-01-01 08:04:00.000 | 2022-01-01 08:05:00.000 | 240.666666666666657 |
d13 | 2022-01-01 08:00:00.000 | 2022-01-01 08:01:00.000 | 242.500000000000000 |
d13 | 2022-01-01 08:01:00.000 | 2022-01-01 08:02:00.000 | 243.666666666666657 |
d13 | 2022-01-01 08:02:00.000 | 2022-01-01 08:03:00.000 | 242.166666666666657 |
d13 | 2022-01-01 08:03:00.000 | 2022-01-01 08:04:00.000 | 240.500000000000000 |
d13 | 2022-01-01 08:04:00.000 | 2022-01-01 08:05:00.000 | 240.666666666666657 |
```
:::tip
1. Ensure that you specify a time range when using a `FILL` clause. Otherwise, a large amount of data may be filled.
2. TDengine does not return more than 10 million rows with interpolated data in a single query.
:::
### State Windows
You use the `STATE_WINDOW` clause to create a state window, with the `CASE` expression defining a condition that triggers the start of a state and another condition that triggers its end. You can use integers or strings to represent this state. The state window opens when the start condition is met by a record and closes when the end condition is met by a subsequent record.
```sql
SELECT tbname, _wstart, _wend, _wduration, CASE WHEN voltage >= 205 AND voltage <= 235 THEN 1 ELSE 0 END status
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:05:00+00:00"
PARTITION BY tbname
STATE_WINDOW(
CASE WHEN voltage >= 205 AND voltage <= 235 THEN 1 ELSE 0 END
)
SLIMIT 10;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:05 UTC exclusive. It partitions the results by table name and sets the state value based on the voltage. Voltage between 205 and 235 inclusive returns a state value of 1 (normal), and voltage outside that range returns a state value of 0 (abnormal). The `SLIMIT` clause then limits the results to ten partitions.
The results of this query are similar to the following:
```text
tbname | _wstart | _wend | _wduration | status |
=====================================================================================================================================
d26 | 2022-01-01 08:00:00.000 | 2022-01-01 08:00:20.000 | 20000 | 0 |
d26 | 2022-01-01 08:00:30.000 | 2022-01-01 08:00:30.000 | 0 | 1 |
d26 | 2022-01-01 08:00:40.000 | 2022-01-01 08:01:40.000 | 60000 | 0 |
d26 | 2022-01-01 08:01:50.000 | 2022-01-01 08:01:50.000 | 0 | 1 |
d26 | 2022-01-01 08:02:00.000 | 2022-01-01 08:02:00.000 | 0 | 0 |
d26 | 2022-01-01 08:02:10.000 | 2022-01-01 08:02:10.000 | 0 | 1 |
d26 | 2022-01-01 08:02:20.000 | 2022-01-01 08:03:00.000 | 40000 | 0 |
d26 | 2022-01-01 08:03:10.000 | 2022-01-01 08:03:10.000 | 0 | 1 |
d26 | 2022-01-01 08:03:20.000 | 2022-01-01 08:03:20.000 | 0 | 0 |
d26 | 2022-01-01 08:03:30.000 | 2022-01-01 08:03:30.000 | 0 | 1 |
```
### Session Windows
You use the `SESSION` clause to create a session window. Sessions are based on the value of the primary timestamp column. The session is considered to be closed when a new record's timestamp exceeds a specified interval from the previous record.
```sql
SELECT tbname, _wstart, _wend, _wduration, COUNT(*)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:10:00+00:00"
PARTITION BY tbname
SESSION(ts, 10m)
SLIMIT 10;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:10 UTC exclusive. It partitions the results by table name in 10-minute sessions. The `SLIMIT` clause then limits the results to ten partitions.
The results of this query are similar to the following:
```text
tbname | _wstart | _wend | _wduration | count(*) |
=====================================================================================================================================
d76 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d47 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d37 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d87 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d64 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d35 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d83 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d51 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d63 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
d0 | 2021-12-31 16:00:00.000 | 2021-12-31 16:09:50.000 | 590000 | 60 |
Query OK, 10 row(s) in set (0.043489s)
```
In the following figure, a 12-second session window has been configured. The first three records are in the first session, and the second three records are in the second session.
<figure>
<Image img={sessionWindow} alt="Session window example"/>
<figcaption>Figure 3. Session window example</figcaption>
</figure>
The difference between the timestamps of the first and second records is 10 seconds, less than the 12-second window defined in this example. Therefore the session remains open. However, the difference between the timestamps of the third and fourth records is 40 seconds. Therefore the session is closed after the third record, and a new session is opened when the fourth record is ingested.
### Event Windows
You use the `EVENT_WINDOW` clause to create an event window, with the `START WITH` expression defining a condition that opens the window and the `END WITH` expression defining a condition that closes the window. Both conditions can be any expression supported by TDengine, and they can involve different columns.
```sql
SELECT tbname, _wstart, _wend, _wduration, COUNT(*)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00"
AND ts < "2022-01-01T00:10:00+00:00"
PARTITION BY tbname
EVENT_WINDOW START WITH voltage >= 10 END WITH voltage < 20
LIMIT 10;
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:10 UTC exclusive. It partitions the results by table name and creates events based on the voltage. An event is triggered when voltage is greater than or equal to 10 and ends when voltage is less than 20. The `SLIMIT` clause then limits the results to ten partitions.
The results of this query are similar to the following:
```text
tbname | _wstart | _wend | _wduration | count(*) |
=====================================================================================================================================
d0 | 2021-12-31 16:00:00.000 | 2021-12-31 16:00:00.000 | 0 | 1 |
d0 | 2021-12-31 16:00:30.000 | 2021-12-31 16:00:30.000 | 0 | 1 |
d0 | 2021-12-31 16:00:40.000 | 2021-12-31 16:00:40.000 | 0 | 1 |
d0 | 2021-12-31 16:01:20.000 | 2021-12-31 16:01:20.000 | 0 | 1 |
d0 | 2021-12-31 16:02:20.000 | 2021-12-31 16:02:20.000 | 0 | 1 |
d0 | 2021-12-31 16:02:30.000 | 2021-12-31 16:02:30.000 | 0 | 1 |
d0 | 2021-12-31 16:03:10.000 | 2021-12-31 16:03:10.000 | 0 | 1 |
d0 | 2021-12-31 16:03:30.000 | 2021-12-31 16:03:30.000 | 0 | 1 |
d0 | 2021-12-31 16:03:40.000 | 2021-12-31 16:03:40.000 | 0 | 1 |
d0 | 2021-12-31 16:03:50.000 | 2021-12-31 16:03:50.000 | 0 | 1 |
Query OK, 10 row(s) in set (0.034127s)
```
The following figure describes how records trigger event windows.
<figure>
<Image img={eventWindow} alt="Event window example"/>
<figcaption>Figure 4. Event window example</figcaption>
</figure>
:::note
- An event window can consist of a single record. If a record satisfies the start and end conditions for the window while no other window is open, the record forms a window.
- Results are generated only when the event window closes. If the conditions that close the window are never met, the window is not created.
- If an event window query is performed on a supertable, all data from the supertable is consolidated into a single timeline. Event windows are opened and closed based on whether the consolidated data meets the specified conditions for the window.
- If an event window query is performed on the results of a subquery, the subquery results must be output in a valid timestamp order and contain valid timestamp columns.
:::
### Count Windows
You use the `COUNT_WINDOW` clause to create a count window. This window is defined as a fixed number of records. Data records are sorted by timestamp and divided into windows. If the total number of rows is not divisible by the specified number of records, the last window will have fewer records.
```sql
SELECT _wstart, _wend, COUNT(*)
FROM meters
WHERE ts >= "2022-01-01T00:00:00+00:00" AND ts < "2022-01-01T00:30:00+00:00"
COUNT_WINDOW(10);
```
This SQL statement queries the `meters` supertable for records whose timestamp is between January 1, 2022 at midnight UTC inclusive and 00:30 UTC exclusive. The data is then grouped into windows of 10 records each.
The results of this query are similar to the following:
```text
_wstart | _wend | count(*) |
============================================================================
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
2022-01-01 08:00:00.000 | 2022-01-01 08:00:00.000 | 10 |
```
You can create a sliding count window by specifying two arguments in the `COUNT_WINDOW` clause. For example, `COUNT_WINDOW(10, 2)` creates count windows of 10 records with an overlap of 2 records.
## Nested Queries
A nested query is a query whose results depend on the results of another query, known as an inner query. The inner query does not depend on the parameters of the outer query.
```sql
SELECT MAX(voltage), *
FROM (
SELECT tbname, LAST_ROW(ts), voltage, current, phase, groupid, location
FROM meters
PARTITION BY tbname
)
GROUP BY groupid;
```
In this SQL statement, the inner query retrieves the last row of each subtable in the `meters` supertable and partitions the results by table name. The outer query then groups the results by the `groupid` tag and returns the maximum voltage for each group.
:::note
- The inner query results are treated as a virtual table for the outer query. You can assign an alias to this virtual table for easier referencing.
- The outer query can reference columns and pseudocolumns from the inner query using column names or expressions.
- Both the inner and outer queries support table joins. The results of the inner query can also participate in table joins.
- The inner query supports the same features as non-nested queries. However, the `ORDER BY` clause typically does not take effect in the inner query.
- The following conditions apply to outer queries:
- If the inner query does not return a timestamp, the outer query cannot include functions that implicitly rely on the timestamp, such as `INTERP`, `DERIVATIVE`, `IRATE`, `LAST_ROW`, `FIRST`, `LAST`, `TWA`, `STATEDURATION`, `TAIL`, and `UNIQUE`.
- If the inner query result is not ordered by timestamp, the outer query cannot include functions that require ordered data, such as `LEASTSQUARES`, `ELAPSED`, `INTERP`, `DERIVATIVE`, `IRATE`, `TWA`, `DIFF`, `STATECOUNT`, `STATEDURATION`, `CSUM`, `MAVG`, `TAIL`, and `UNIQUE`.
- The outer query cannot include functions that require two-pass scanning, such as `PERCENTILE`.
:::
## UNION Clause
You use the `UNION` clause to combine the results of multiple queries whose results have the same structure. The column names, types, counts, and order must be identical among queries combined with a `UNION` clause.
```sql
(SELECT tbname, * FROM d1 LIMIT 1)
UNION ALL
(SELECT tbname, * FROM d11 LIMIT 2)
UNION ALL
(SELECT tbname, * FROM d21 LIMIT 3);
```
In this SQL statement, one record is queried from subtable `d`, two from `d11`, and three from `d21`. These records are combined into a single result set.
The results of this query are similar to the following:
```text
tbname | ts | current | voltage | phase |
=======================================================================================================================
d21 | 2020-09-13 20:26:40.000 | 11.7680807 | 255 | 146.0000000 |
d21 | 2020-09-13 20:26:50.000 | 14.2392311 | 244 | 148.0000000 |
d21 | 2020-09-13 20:27:00.000 | 10.3999424 | 239 | 149.5000000 |
d11 | 2020-09-13 20:26:40.000 | 11.7680807 | 255 | 146.0000000 |
d11 | 2020-09-13 20:26:50.000 | 14.2392311 | 244 | 148.0000000 |
d1 | 2020-09-13 20:26:40.000 | 11.7680807 | 255 | 146.0000000 |
```
:::note
A SQL statement can contain a maximum of 100 `UNION` clauses.
:::
## Join Queries
### Join Concepts
1. Driving Table
In a join query, the role of the driving table depends on the type of join used: in Left Join queries, the left table is the driving table, while in Right Join queries, the right table is the driving table.
2. Join Condition
In TDengine, the join condition is the specified criteria used to perform the join. For all join queries (except As Of Join and Window Join), you must specify a join condition, typically following the `ON` clause. In As Of Join, conditions in the `WHERE` clause can also be considered join conditions, while Window Join uses the `window_offset` as the join condition.
For all joins except As Of Join, you must explicitly define a join condition. As Of Join defines an implicit condition by default, so if the default condition meets your needs, you can omit an explicit join condition.
In joins other than As Of Join and Window Join, you can include multiple join conditions beyond the primary one. These secondary conditions must have an `AND` relationship with the primary condition, but they dont need to have an `AND` relationship with each other. They can include primary key columns, tag columns, normal columns, constants, scalar functions, or any combination of logical expressions.
The following SQL queries for smart meters contain valid join conditions:
```sql
SELECT a.* FROM meters a LEFT JOIN meters b ON a.ts = b.ts AND a.ts > '2023-10-18 10:00:00.000';
SELECT a.* FROM meters a LEFT JOIN meters b ON a.ts = b.ts AND (a.ts > '2023-10-18 10:00:00.000' OR a.ts < '2023-10-17 10:00:00.000');
SELECT a.* FROM meters a LEFT JOIN meters b ON TIMETRUNCATE(a.ts, 1s) = TIMETRUNCATE(b.ts, 1s) AND (a.ts + 1s > '2023-10-18 10:00:00.000' OR a.groupId > 0);
SELECT a.* FROM meters a LEFT ASOF JOIN meters b ON TIMETRUNCATE(a.ts, 1s) < TIMETRUNCATE(b.ts, 1s) AND a.groupId = b.groupId;
```
3. Primary Join Condition
As a time-series database, TDengine join queries revolve around the primary key column. Therefore, for all join queries except As Of Join and Window Join, the join condition must include the primary key column as an equality condition. The first appearance of the primary key column in an equality join condition is considered the primary join condition. As Of Join allows non-equality conditions in the primary join condition, while Window Join specifies the primary condition via the `window_offset`.
Except for Window Join, TDengine supports the usage of the `TIMETRUNCATE` function on the primary join condition, such as `ON TIMETRUNCATE(a.ts, 1s) = TIMETRUNCATE(b.ts, 1s)`. Beyond that, other functions and scalar operations are not supported for the primary join condition.
4. Grouping Conditions
As Of Join and Window Join support grouping input data before applying the join. Each group is then joined separately, and the output does not include the group information. In As Of Join and Window Join, any equality conditions that appear after the `ON` clause (except for the As Of Join primary join condition) are considered grouping conditions.
5. Primary Key Timeline
In TDengine, every table must have a primary key timestamp column, which is the primary key timeline used for time-related calculations. The `subquery` or `join` result must also clearly identify which column is the primary key timeline for further calculations. In a `subquery`, the first ordered primary key column (or a related pseudo-column, such as `_wstart`, `_wend`) is considered the primary key timeline of the output table. For join results, the primary key timeline follows these rules:
- In Left/Right Join, the primary key column from the driving table (subquery) becomes the primary key timeline for subsequent queries. In Window Join, both left and right tables are ordered, so the primary key timeline can be selected from either table, with priority given to the table's own primary key column.
- In Inner Join, either tables primary key column can be selected as the primary key timeline. If there is a grouping condition (an equality condition on a tag column that is `AND`-related to the primary join condition), the primary key timeline cannot be generated.
- Full Join does not generate a valid primary key timeline, meaning time-dependent operations cannot be performed.
### Syntax Explanation
In the following sections, we will explain Left Join and Right Join series using a unified approach. Therefore, in the explanations for Outer, Semi, Anti-Semi, As Of, and Window join types, we use "Left/Right" to cover both Left Join and Right Join. In these descriptions:
- "Left/Right" table refers to the left table for Left Join, and the right table for Right Join.
- "Right/Left" table refers to the right table for Left Join, and the left table for Right Join.
### Join Functions
The table below lists the different types of joins supported in TDengine, along with their definitions.
| Join Type | Definition |
| :-----------------------: | :----------------------------------------------------------: |
| Inner Join | An inner join returns only the data where both the left and right tables match the join condition, essentially returning the intersection of the two tables that meet the condition. |
| Left/Right Outer Join | A left/right outer join includes both the data where the left/right tables match the join condition, as well as the data from the left/right table that does not match the join condition. |
| Left/Right Semi Join | A left/right semi-join typically expresses the meaning of `IN` or `EXISTS`. Data is returned from the left/right table only when at least one match is found in the right/left table based on the join condition. |
| Left/Right Anti-Semi Join | A left/right anti-semi join is the inverse of a semi-join. It typically expresses the meaning of `NOT IN` or `NOT EXISTS`. Data is returned from the left/right table only when no match is found in the right/left table. |
| Left/Right As Of Join | A left/right approximate join, where the join does not require an exact match. As Of Join matches based on the nearest timestamp within the specified criteria. |
| Left/Right Window Join | A left/right window join matches data based on a sliding window. Each row in the left/right table constructs a window based on its timestamp and a specified window size, and matches with the right/left table. |
| Full Outer Join | A full outer join includes data from both tables, where matches exist or do not exist, returning the union of both tables' datasets. |
### Constraints and Limitations
1. Input Timeline Limitation
In TDengine, all join operations require the input data to contain a valid primary key timeline. For table queries, this requirement is typically met. However, for subqueries, the output data must include a valid primary key timeline.
2. Join Condition Limitation
The join condition limitations include the following:
- Except for As Of Join and Window Join, all other join operations must have a primary key column as the primary join condition.
- The primary join condition must have an `AND` relationship with other join conditions.
- The primary key column in the primary join condition only supports the `timetruncate` function, but no other functions or scalar operations. However, as secondary join conditions, there are no such restrictions.
3. Grouping Condition Limitation
Grouping condition limitations include:
- Grouping conditions are restricted to tag columns or normal columns, excluding the primary key column.
- Only equality conditions are supported for grouping.
- Multiple grouping conditions are allowed, and they must have an `AND` relationship.
4. Query Result Order Limitation
The limitations on query result ordering include:
- For basic table queries, subqueries without grouping or ordering, results will be ordered by the primary key column from the driving table.
- For supertable queries, Full Join, or queries with grouping conditions but no ordering, the results will not have a fixed order. If ordering is required, an `ORDER BY` clause should be used. In cases where a function depends on an ordered timeline, missing a valid timeline will prevent the function from working properly.

14
docs/en/05-basic/index.md Normal file
View File

@ -0,0 +1,14 @@
---
title: Basic Features
description: 'TDengine Basic Features'
slug: /basic-features
---
This section describes the basic features of TDengine, including how to manage databases and tables, how to ingest data, and how to query data stored in TDengine.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```

View File

@ -0,0 +1,42 @@
---
title: Data Subscription
slug: /advanced-features/data-subscription
---
TDengine provides Kafka-like publish/subscribe data subscription as a built-in component. You create topics in TDengine using SQL statements, and your applications can subscribe to your topics as consumers.
TDengine's message queue provides an ACK (Acknowledgment) mechanism to ensure at least once consumption in complex environments such as crashes and restarts.
To achieve the above functionality, TDengine automatically creates indexes for the Write-Ahead Logging (WAL) files to support fast random access and provides a flexible and configurable file switching and retention mechanism. Users can specify the retention time and size of the WAL files according to their needs. Through these methods, the WAL is transformed into a persistent storage engine that preserves the order of event arrival. For queries created in the form of topics, TDengine will read data from the WAL. During consumption, TDengine reads data directly from the WAL based on the current consumption progress and uses a unified query engine to perform filtering, transformation, and other operations before pushing the data to consumers.
## Topics
A topic can be a query, a supertable, or a database. You can filter by tag, table name, column, or expression and perform scalar operations. Note that data aggregation and time windows are not supported. The data granularity is determined by the SQL statement that defines the topic, and data filtering and preprocessing are automatically handled by TDengine.
For more information about topics, see [Create a Topic](../../tdengine-reference/sql-manual/manage-topics-and-consumer-groups/#create-a-topic).
## Consumers and Consumer Groups
Consumers that subscribe to a topic receive the latest data in real time. A single consumer can subscribe to multiple topics. If the topic corresponds to a supertable or database, the data may be distributed across multiple different nodes or data shards.
You can also create consumer groups to enable multithreaded, distributed data consumption. The consumers in a consumer group share consumption progress, while consumers in different consumer groups do not share consumption progress even if they consume the same topic.
Consumers and consumer groups are created in your applications, not in TDengine. For more information, see [Manage Consumers](../../developer-guide/manage-consumers/). To delete consumer groups from TDengine or view existing consumer groups, see [Manage Consumer Groups](../../tdengine-reference/sql-manual/manage-topics-and-consumer-groups/#manage-consumer-groups).
## Replay
You can replay data streams in the order of their actual write times. To replay a data stream, specify a time range in the query statement to control the start and end times for the replay.
For example, assume the following three records have been written to the database. During replay, the first record is returned immediately, the second record is returned 5 seconds later, and the third record is returned 3 seconds after the second record.
```text
2023/09/22 00:00:00.000
2023/09/22 00:00:05.000
2023/09/22 00:00:08.000
```
The following conditions apply to replay:
- Replay supports query topics only. You cannot use replay with supertable or database topics.
- Replay does not support progress saving.
- Replay precision may be delayed by several dozen milliseconds due to data processing.

View File

@ -0,0 +1,123 @@
---
title: Caching
slug: /advanced-features/caching
---
TDengine includes caching as a built-in component. This includes write caching, read caching, metadata caching, and file system caching.
## Write Cache
TDengine uses a time-driven cache management strategy that prioritizes caching the most recently ingested data. When the size of the data stored in cache reaches a preset threshold, the earliest data in cache is written to disk.
You can optimize database performance for your use case by specifying the number of vgroups in the database and the size of the write cache allocated to each vnode.
For example, the following SQL statement creates a database with 10 vgroups, with each vnode having a 256 MB write cache.
```sql
CREATE DATABASE power VGROUPS 10 BUFFER 256 CACHEMODEL 'none' PAGES 128 PAGESIZE 16;
```
Generally, a larger cache size results in improved performance. However, there exists a certain point at which further increasing the cache size has no significant effect on performance.
## Read Cache
You can configure TDengine databases to cache the most recent data of each subtable, allowing for faster queries. To do so, you specify a cache model for your database by setting the the `CACHEMODEL` parameter to one of the following values:
- `none`: The read cache is disabled.
- `last_row`: The most recent row of data from each subtable is cached. The `LAST_ROW()` function will then retrieve this data from cache.
- `last_value`: The most recent non-null value for each column of each subtable is cached. The `LAST()` function will then retrieve this data from cache.
- `both`: The most recent row of each subtable and the most recent non-null value of each column of each subtable are cached. This simultaneously activates the behavior of both the `last_row` and `last_value` cache models.
You can also configure the memory size for each vnode by specifying a value for the `CACHESIZE` parameter. This parameter can be set from 1 MB to 65536 MB. The default value is 1 MB.
## Metadata Cache
Each vnode caches metadata that it has previously accessed. The size of this metadata cache is determined by the `PAGES` and `PAGESIZE` parameters of the database. For example, the following SQL statement creates a database whose vnodes have a metadata cache of 128 pages with each page being 16 KB:
```sql
CREATE DATABASE power PAGES 128 PAGESIZE 16;
```
## File System Cache
For reliability purposes, TDengine records changes in a write-ahead log (WAL) file before any data is written to the data storage layer. The `fsync` function is then called to write the data from the WAL to disk. You can control when the `fsync` function is called for a database by specifying the `WAL_LEVEL` and `WAL_FSYNC_PERIOD` parameters.
- `WAL_LEVEL`:
- Specify `1` to wait for the operating system to call `fsync`. In this configuration, TDengine does not call `fsync` itself.
- Specify `2` for TDengine to call `fsync` at a certain interval, specified by the WAL_FSYNC_PERIOD parameter.
The default value is `1`.
- `WAL_FSYNC_PERIOD`:
- Specify `0` to call `fsync` every time data is written to the WAL.
- Specify a value between `1` and `180000` milliseconds to call `fsync` each time this interval has elapsed.
Note that this parameter takes effect only when `WAL_LEVEL` is set to `2`.
The following SQL statement creates a database in which data in the WAL is written to disk every 3000 milliseconds:
```sql
CREATE DATABASE power WAL_LEVEL 2 WAL_FSYNC_PERIOD 3000;
```
The default configuration of `WAL_VALUE 1` delivers the highest performance. In use cases where data reliability is a higher priority than performance, you can set `WAL_LEVEL` to `2`.
## Example: Enhancing Query Performance with Read Caching
This example demonstrates the performance improvements delivered by read caching. The sample data from [Data Querying](../../basic-features/data-querying/) is used in this section. This data is generated by the following command:
```shell
taosBenchmark -d power -Q --start-timestamp=1600000000000 --tables=10000 --records=10000 --time-step=10000 -y
```
Note that read caching is disabled by default on the sample database generated by taosBenchmark.
1. To establish a performance baseline, run the following SQL statements:
```text
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.353815s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.344070s)
```
These return the most recent non-null value and the most recent row from any subtable in the `meters` supertable. It can be seen that these queries return in 353 and 344 milliseconds, respectively.
2. Enable read caching on the database:
```text
taos> ALTER DATABASE power CACHEMODEL 'both';
Query OK, 0 row(s) affected (0.046092s)
taos> SHOW CREATE DATABASE power\G;
*************************** 1.row ***************************
Database: power
Create Database: CREATE DATABASE `power` BUFFER 256 CACHESIZE 1 CACHEMODEL 'both' COMP 2 DURATION 14400m WAL_FSYNC_PERIOD 3000 MAXROWS 4096 MINROWS 100 STT_TRIGGER 2 KEEP 5256000m,5256000m,5256000m PAGES 256 PAGESIZE 4 PRECISION 'ms' REPLICA 1 WAL_LEVEL 1 VGROUPS 10 SINGLE_STABLE 0 TABLE_PREFIX 0 TABLE_SUFFIX 0 TSDB_PAGESIZE 4 WAL_RETENTION_PERIOD 3600 WAL_RETENTION_SIZE 0 KEEP_TIME_OFFSET 0
Query OK, 1 row(s) in set (0.000282s)
```
3. Run the two queries from Step 1 again:
```text
taos> SELECT LAST(ts, current) FROM meters;
last(ts) | last(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.044021s)
taos> SELECT LAST_ROW(ts, current) FROM meters;
last_row(ts) | last_row(current) |
=================================================
2020-09-15 00:13:10.000 | 1.1294620 |
Query OK, 1 row(s) in set (0.046682s)
```
It can be seen that these queries now return in 44 and 47 milliseconds, respectively. This indicates that read caching on this system produces an approximately 8-fold improvement in query performance.

View File

@ -0,0 +1,157 @@
---
title: Stream Processing
slug: /advanced-features/stream-processing
---
import Image from '@theme/IdealImage';
import watermarkImg from '../assets/stream-processing-01.png';
TDengine includes stream processing as a built-in component. You define real-time stream transformations by using SQL statements. Data written to the source table of the stream is then automatically processed in the specified manner and written to the target supertable based on the specified trigger mode. This provides a lightweight alternative to complex stream processing systems while delivering results in milliseconds even under high-throughput conditions.
Streams can include data filtering, scalar functions (including UDFs), and windowing. The source table of a stream can be a supertable, subtable, or basic table, but the target must be a supertable. You can use the `PARTITION BY` clause to partition data by table name or tag, and each partition is written to a different subtable in the target supertable.
Streams can aggregate data from supertables distributed across multiple nodes and can handle out-of-order data ingestion. You can specify a tolerance for out-of-order data by using a watermark and decide whether to discard or recompute such data with the `IGNORE EXPIRED` option.
## Managing Streams
For information about creating and managing streams, see [Manage Streams](../../tdengine-reference/sql-manual/manage-streams/).
## Partitioning in Streams
You can use the `PARTITION BY` clause with the `tbname` pseudocolumn, tag columns, regular columns, or expressions to perform partitioned computations in a stream. Each partition has its own independent timeline and time window, and data is aggregated separately and written to different subtables in the target supertable. In a stream without a `PARTITION BY` clause, all data is written to the same subtable.
A group ID is automatically generated for each partition. By default, the subtables created by a stream are named with this group ID. You can use the `SUBTABLE` clause to generate custom names for the subtable for each partition. For example:
```sql
CREATE STREAM avg_vol_s INTO avg_vol SUBTABLE(CONCAT('new-', tname)) AS SELECT _wstart, count(*), avg(voltage) FROM meters PARTITION BY tbname tname INTERVAL(1m);
```
This statement creates subtables using the naming convention `new-<subtable-name>_<supertable-name>_<group-id>`.
:::info[Version Info]
Prior to TDengine 3.2.3.0, the supertable name and group ID were not appended to the name defined in the `SUBTABLE` clause. Therefore the naming convention in this example would be `new-<subtable-name>` in earlier versions.
:::
:::note
- `tname` is an alias of `tbname` for use in expressions within the `SUBTABLE` clause.
- Subtable names that exceed the table name limit of 192 bytes are truncated.
- If the generated subtable name is not unique within the database, it will fail to be created and data will not be written to it.
:::
## Handling Historical Data
By default, a stream processes only data ingested after the stream is created. If you want a stream to process pre-existing data, you can specify the `FILL_HISTORY 1` parameter. This parameter enables streams to process data ingested at any time before, during, or after the creation of the stream.
For example, the following SQL statement creates a stream that counts the number of records generated by all smart meters every 10 seconds, including all historical data:
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters INTERVAL(10s);
```
You can also specify a time range. For example, the following SQL statement processes records after January 30, 2020:
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters WHERE ts > '2020-01-30' INTERVAL(10s);
```
The following statement processes records between January 30, 2020 and January 1, 2023. Note that you can specify an end time in the future.
```sql
CREATE STREAM IF NOT EXISTS count_history_s FILL_HISTORY 1 INTO count_history AS SELECT COUNT(*) FROM power.meters WHERE ts > '2020-01-30' AND ts < '2023-01-01' INTERVAL(10s);
```
:::note
A stream can process a maximum of 20 million records. Exceeding this limit will cause an error.
:::
## Trigger Modes
You use the `TRIGGER` directive to specify when stream processing occurs for windowed computations:
1. `AT_ONCE`: Triggered immediately upon ingestion.
2. `WINDOW_CLOSE`: Triggered when the window closes, with optional watermark.
3. `MAX_DELAY time`: Triggered when the specified time elapses or the window closes, whichever is earlier.
The default value is `WINDOW_CLOSE`.
Note that non-windowed computations are processed in real time.
## Watermark
The time at which a window closes is determined by the event time, which is the primary key (timestamp) of the record ingested. This prevents problems caused by discrepancies between client and server times and addresses challenges such as out-of-order data ingestion.
You can specify a watermark to define the upper threshold of out-of-order data in your stream. The default value is 0, indicating that out-of-order data is not processed.
When data is ingested, the window closure time <math><mi>T</mi></math> is calculated as <math><mrow><mi>T</mi><mo>=</mo><mi>latest event time</mi><mo></mo><mi>watermark</mi></mrow></math>. All windows whose end time is earlier than <math><mi>T</mi></math> are then closed. This process is described in the following figure.
<figure>
<Image img={watermarkImg} alt="Window closure in stream processing"/>
<figcaption>Figure 1. Window closure diagram</figcaption>
</figure>
In the diagram, the vertical axis represents time, while the dots on the horizontal axis represent the received data points.
1. At time <math><msub><mi>T</mi><mn>1</mn></msub></math>, the 7th data point arrives. The calculated time falls within the second window, so the second window does not close.
2. At time <math><msub><mi>T</mi><mn>2</mn></msub></math>, the 6th and 8th data points are delayed. Since the latest event has not changed, <math><mi>T</mi></math> also remains unchanged, and the out-of-order data in the second window is processed.
3. At time <math><msub><mi>T</mi><mn>3</mn></msub></math>, the 10th data point arrives, and <math><mi>T</mi></math> moves past the closure time of the second window, which is then closed, allowing the out-of-order data to be correctly processed.
:::note
For streams whose trigger mode is `WINDOW_CLOSE` or `MAX_DELAY`, window closure triggers computation. However, streams in `AT_ONCE` mode compute results immediately upon data ingestion regardless of window closure.
:::
## Handling Expired Data
Data that is ingested into a closed window is considered to be expired. You can specify the `IGNORE EXPIRED` parameter to determine how to handle expired data:
1. `IGNORE EXPIRED 0`: Recalculate the latest results taking expired data into account.
2. `IGNORE EXPIRED 1`: Ignore expired data.
The default value is `IGNORE EXPIRED 1`.
:::note
Ensure that an appropriate watermark has been set regardless of how you choose to handle expired data.
:::
## Handling Updated Data
You can specify the `IGNORE UPDATE` parameter to determine how to handle data that is updated after ingestion:
1. `IGNORE UPDATE 0`: Check for updates and recompute results accordingly.
2. `IGNORE UPDATE 1`: Do not check for updates.
The default value is `IGNORE UPDATE 0`.
## Writing to an Existing Supertable
Generally, the results of stream processing are stored in new supertables. If it is necessary to write results to an existing supertable, ensure that the columns in the supertable correspond exactly to the results of the subquery in your stream.
When writing to an existing supertable, note the following:
1. If the data types of the columns in the subquery results do not match those of the target supertable, the system will automatically convert them to the types specified in the supertable. If the length of the resultant data exceeds 4096 bytes, an error will occur.
2. If the number and position of the columns in the subquery results do not match those of the target supertable, you must explicitly specify the relationships between columns.
3. Multiple streams cannot write to the same target supertable.
## Customizing Tag Values for Target Tables
You can specify custom tag values for the subtable corresponding to each partition. The syntax is described as follows:
```sql
CREATE STREAM output_tag TRIGGER AT_ONCE INTO output_tag_s TAGS(alias_tag varchar(100)) AS SELECT _wstart, COUNT(*) FROM power.meters PARTITION BY CONCAT("tag-", tbname) AS alias_tag INTERVAL(10s);
```
In the `PARTITION BY` clause, an alias `alias_tag` is defined for `CONCAT("tag-", tbname)`, corresponding to the custom tag name of the supertable `output_tag_s`. In this example, the tag of the newly created subtables for the stream will have the prefix `tag-` concatenated with the original table name as the tag value.
When defining custom tag values, note the following:
1. If the data types of the defined tags do not match those of the target supertable, the system will automatically convert them to the types specified in the supertable. If the length of the resultant data exceeds 4096 bytes, an error will occur.
2. If the number and position of the defined tags do not match those of the target supertable, you must explicitly specify the relationships between the defined tags and the tag columns in the target supertable.

View File

@ -0,0 +1,56 @@
---
title: EdgeCloud Orchestration
slug: /advanced-features/edge-cloud-orchestration
---
import Image from '@theme/IdealImage';
import edgeCloud from '../assets/edge-cloud-orchestration-01.png';
## Overview of Edge-Cloud Orchestration
In the context of the Industrial Internet, edge devices are primarily used to process local data, and decision-makers cannot form a global understanding of the entire system based solely on the information collected from edge devices. In practice, edge devices need to report data to a cloud computing platform (either public or private), where data aggregation and information fusion occur, allowing decision-makers to gain a comprehensive insight into the data. This architecture of edge-cloud orchestration has gradually become an essential pillar supporting the development of the Industrial Internet.
Edge devices mainly monitor and alert on specific data points from the production line, such as real-time production data from a workshop, and then synchronize this production data to a cloud-based big data platform. The requirement for real-time processing is high on the edge, but the volume of data may not be large; typically, a production workshop may have a few thousand to tens of thousands of monitoring points. In contrast, the central side often has sufficient computing resources to aggregate edge data for analysis.
To achieve this operation, the database or data storage layer must ensure that data can be reported hierarchically and selectively. In some scenarios, the overall data volume is very large, necessitating selective reporting. For example, raw records collected once every second on the edge may be downsampled to once every minute when reported to the central side. This downsampling significantly reduces the data volume while still retaining key information for long-term analysis and forecasting.
In the traditional industrial data collection process, data is collected from Programmable Logic Controllers (PLC) and then enters a historian (an industrial real-time database), which supports business applications. Such systems typically adopt a master-slave architecture that is difficult to scale horizontally and heavily relies on the Windows ecosystem, resulting in a relatively closed environment.
## TDengine's Solution
TDengine Enterprise is committed to providing powerful edge-cloud orchestration capabilities, featuring the following significant characteristics:
- **Efficient Data Synchronization**: Supports synchronization efficiency of millions of data points per second, ensuring rapid and stable data transmission between the edge and the cloud.
- **Multi-Data Source Integration**: Compatible with various external data sources, such as AVEVA PI System, OPC-UA, OPC-DA, and MQTT, achieving broad data access and integration.
- **Flexible Configuration of Synchronization Rules**: Provides configurable synchronization rules, allowing users to customize data synchronization strategies and methods based on actual needs.
- **Resume Transmission and Re-Subscription**: Supports resume transmission and re-subscription functionalities, ensuring continuity and integrity of data synchronization during network instability or interruptions.
- **Historical Data Migration**: Supports the migration of historical data, enabling users to seamlessly transfer historical data to a new system during upgrades or system changes.
TDengine's data subscription feature offers significant flexibility for subscribers, allowing users to configure subscription objects as needed. Users can subscribe to a database, a supertable, or even a query statement with filter conditions. This allows users to achieve selective data synchronization, transferring only the relevant data (including offline and out-of-order data) from one cluster to another to meet various complex data demands.
The following diagram illustrates the implementation of the edge-cloud orchestration architecture in TDengine Enterprise using a specific example of a production workshop. In the workshop, real-time data generated by equipment is stored in TDengine deployed on the edge. The TDengine deployed at the branch factory subscribes to data from the workshop's TDengine. To better meet business needs, data analysts can set subscription rules, such as downsampling data or only synchronizing data that exceeds a specified threshold. Similarly, TDengine deployed at the group level subscribes to data from various branch factories, achieving data aggregation at the group level for further analysis and processing.
<figure>
<Image img={edgeCloud} alt="Edge-cloud orchestration diagram"/>
<figcaption>Edge-cloud orchestration diagram</figcaption>
</figure>
This implementation approach has several advantages:
- Requires no coding; only simple configurations are needed on the edge and cloud sides.
- Significantly increases the automation level of cross-region data synchronization, reducing error rates.
- Data does not need to be cached, minimizing batch transmissions and avoiding bandwidth congestion during peak flow.
- Data is synchronized through a subscription method, which is configurable, simple, flexible, and real-time.
- Both edge and cloud use TDengine, ensuring a unified data model that reduces the difficulty of data governance.
A common pain point faced by manufacturing enterprises is data synchronization. Many companies currently use offline methods to synchronize data, but TDengine Enterprise enables real-time data synchronization with configurable rules. This approach can prevent resource waste and bandwidth congestion risks caused by periodically transmitting large volumes of data.
## Advantages of Edge-Cloud Orchestration
The IT and OT (Operational Technology) construction status of traditional industries varies greatly. Compared to the Internet sector, most enterprises are significantly lagging in their investments in digitization. Many enterprises are still using outdated systems to process data, which often operate independently, leading to so-called data silos.
In this context, injecting new vitality into traditional industries with AI requires first integrating the dispersed systems and their collected data, breaking the limitations of data silos. However, this process is challenging, as it involves multiple systems and various Industrial Internet protocols, making data aggregation far more than a simple merging task. It requires cleaning, processing, and handling data from different sources to integrate it into a unified platform.
When all data is aggregated into a single system, the efficiency of accessing and processing data will be significantly improved. Enterprises will be able to respond more quickly to real-time data and resolve issues more effectively. Employees both inside and outside the enterprise can also collaborate efficiently, enhancing overall operational efficiency.
Moreover, once data is aggregated, advanced third-party AI analysis tools can be utilized for better anomaly monitoring, real-time alerts, and more accurate predictions regarding capacity, costs, and equipment maintenance. This will enable decision-makers to better grasp the overall macro situation, providing strong support for enterprise development and facilitating the digital transformation and intelligent upgrade of traditional industries.

View File

@ -0,0 +1,54 @@
---
title: TDengine 2.x
slug: /advanced-features/data-connectors/tdengine-2
---
import Image from '@theme/IdealImage';
import imgStep1 from '../../assets/tdengine-2-01.png';
import imgStep2 from '../../assets/tdengine-2-02.png';
import imgStep3 from '../../assets/tdengine-2-03.png';
import imgStep4 from '../../assets/tdengine-2-04.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from the TDengine 2.x to the current cluster.
## Feature Overview
`taosX` uses SQL queries to retrieve data from the source cluster and writes the query results to the target database. Specifically, `taosX` treats a subtable's data for a specific time period as the basic unit of the query, and the data to be migrated is written to the target database in batches.
`taosX` supports three migration modes:
1. **history** mode: Migrates data within a specified time range. If no time range is specified, it migrates all data up to the task creation time. Once the migration is complete, the task stops.
2. **realtime** mode: Synchronizes data from the task creation time onward. The task will continue running unless manually stopped.
3. **both** mode: Executes history mode first, then switches to realtime mode.
In each migration mode, you can specify whether to migrate the table structure. If "always" is selected, the table structure will be synced to the target database before migrating the data. If there are many subtables, this process may take a while. If you are sure that the target database already has the same table schema as the source database, it is recommended to select "none" to save time.
During task execution, progress is saved to disk, so if a task is paused and restarted, or automatically recovers from an error, it will not restart from the beginning.
For more detailed information, we recommend reading the description of each form field on the task creation page.
## Steps
First, click the "Data Ingestion" menu on the left, then click the "Add Data Source" button on the right.
<figure>
<Image img={imgStep1} alt="Add data source"/>
<figcaption>Figure 1. Add a data source</figcaption>
</figure>
Next, enter the task name, such as "migrate-test", and select the type "TDengine2". At this point, the form will switch to the dedicated TDengine2 migration form, which contains many options, each with a detailed description, as shown in the images below.
<figure>
<Image img={imgStep2} alt="Add data source"/>
<figcaption>Figure 2. Add a data source</figcaption>
</figure>
<figure>
<Image img={imgStep3} alt="Add data source"/>
<figcaption>Figure 3. Add a data source</figcaption>
</figure>
<figure>
<Image img={imgStep4} alt="Add data source"/>
<figcaption>Figure 4. Add a data source</figcaption>
</figure>
After clicking the "Submit" button to submit the task, return to the "Data Source" task list page, where you can monitor the task's execution status.

View File

@ -0,0 +1,110 @@
---
title: TDengine 3.x
slug: /advanced-features/data-connectors/tdengine-3
---
import Image from '@theme/IdealImage';
import imgStep1 from '../../assets/tdengine-3-01.png';
import imgStep2 from '../../assets/tdengine-3-02.png';
import imgStep3 from '../../assets/tdengine-3-03.png';
import imgStep4 from '../../assets/tdengine-3-04.png';
import imgStep5 from '../../assets/tdengine-3-05.png';
import imgStep6 from '../../assets/tdengine-3-06.png';
import imgStep7 from '../../assets/tdengine-3-07.png';
import imgStep8 from '../../assets/tdengine-3-08.png';
import imgStep9 from '../../assets/tdengine-3-09.png';
This document explains how to use the Explorer interface to subscribe to data from another cluster into the current one.
## Preparation
Create the necessary subscription topic on the source cluster. You can subscribe to the entire database, a supertable, or a subtable. In this example, we will demonstrate subscribing to a database named `test`.
### Step 1: Access the Data Subscription page
Open the Explorer interface for the source cluster, click on the "Data Subscription" menu on the left, and then click on "Add New Topic."
<figure>
<Image img={imgStep1} alt=""/>
</figure>
### Step 2: Add a New Topic
Enter the topic name and select the database you want to subscribe to.
<figure>
<Image img={imgStep2} alt=""/>
</figure>
### Step 3: Copy the Topic's DSN
Click the "Create" button, go back to the topic list, and copy the topic's **DSN** for later use.
<figure>
<Image img={imgStep3} alt=""/>
</figure>
## Create a Subscription Task
### Step 1: Go to the "Add Data Source" page
1. Click the "Data Ingestion" menu on the left.
2. Click "Add Data Source."
<figure>
<Image img={imgStep4} alt=""/>
</figure>
### Step 2: Enter Data Source Information
1. Enter the task name.
2. Select the task type "TDengine3."
3. Choose the target database.
4. Paste the DSN copied from the preparation step into the **Topic DSN** field. For example: `tmq+ws://root:taosdata@localhost:6041/topic`
5. After completing the above steps, click the "Connectivity Check" button to test connectivity with the source.
<figure>
<Image img={imgStep5} alt=""/>
</figure>
### Step 3: Configure Subscription Settings and Submit the Task
1. Choose the subscription starting point. You can configure it to start from the earliest or latest data, with the default being the earliest.
2. Set the timeout. Supported units include ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), M (months), y (years).
3. Set the subscription group ID. The subscription group ID is an arbitrary string used to identify a subscription group, with a maximum length of 192 characters. Subscribers within the same group share consumption progress. If not specified, a randomly generated group ID will be used.
4. Set the client ID. The client ID is an arbitrary string used to identify the client, with a maximum length of 192 characters.
5. Synchronize data that has already been written to disk. If enabled, it will synchronize data that has already been flushed to the TSDB storage file (i.e., not in the WAL). If disabled, it will only synchronize data that has not yet been flushed (i.e., still in the WAL).
6. Synchronize table deletion operations. If enabled, it will synchronize table deletion operations to the target database.
7. Synchronize data deletion operations. If enabled, it will synchronize data deletion operations to the target database.
8. Compression. Enable WebSocket compression to reduce network bandwidth usage.
9. Click the "Submit" button to submit the task.
<figure>
<Image img={imgStep6} alt=""/>
</figure>
## Monitoring Task Progress
After submitting the task, return to the data source page to view the task status. The task will first be added to the execution queue and will start running shortly after.
<figure>
<Image img={imgStep7} alt=""/>
</figure>
Click the "View" button to monitor dynamic statistical information about the task.
<figure>
<Image img={imgStep8} alt=""/>
</figure>
You can also click the collapse button on the left to expand the task's activity information. If the task encounters any issues, detailed explanations will be provided here.
<figure>
<Image img={imgStep9} alt=""/>
</figure>
## Advanced Usage
1. The FROM DSN supports multiple Topics, separated by commas. For example: `tmq+ws://root:taosdata@localhost:6041/topic1,topic2,topic3`
2. In the FROM DSN, you can also use database names, supertable names, or subtable names in place of the Topic names. For example: `tmq+ws://root:taosdata@localhost:6041/db1,db2,db3`. In this case, it is not necessary to create Topics in advance; `taosX` will automatically recognize the use of database names and create the database subscription Topics in the source cluster.
3. The FROM DSN supports the `group.id` parameter to explicitly specify the group ID for the subscription. If not specified, a randomly generated group ID will be used.

View File

@ -0,0 +1,201 @@
---
title: AVEVA PI System
sidebar_label: PI System
slug: /advanced-features/data-connectors/pi-system
---
import Image from '@theme/IdealImage';
import imgStep1 from '../../assets/pi-system-01.png';
import imgStep2 from '../../assets/pi-system-02.png';
import imgStep3 from '../../assets/pi-system-03.png';
import imgStep4 from '../../assets/pi-system-04.png';
This section explains how to create a task through the Explorer interface to migrate data from PI System to TDengine.
## Overview
PI System is a suite of software products for data collection, retrieval, analysis, transmission, and visualization. It can serve as the infrastructure for enterprise-level systems that manage real-time data and events. The `taosX` PI System connector plugin can extract both real-time and historical data from PI System.
From a data timeliness perspective, PI System data source tasks are divided into two types: **real-time tasks** and **backfill tasks**. In the task type dropdown list, these two types are labeled: **PI** and **PI backfill**.
From a data model perspective, PI System data source tasks are divided into **single-column model** tasks and **multi-column model** tasks:
1. **Single-column model** tasks map a PI Point to a TDengine table.
2. **Multi-column model** tasks map a PI AF element to a TDengine table.
For the type of connected data source, PI System data source tasks are divided into **Archive Server** data sources and **AF Server** data sources. For **Archive Server** data sources, only the **single-column model** can be used. For **AF Server** data sources, both **single-column model** and **multi-column model** can be selected.
Users configure data mapping rules from PI System to TDengine via a CSV file, referred to as the **model configuration file**:
1. For tasks using the AF Server's single-column model, `taosX` automatically identifies which attributes of an element reference PI Point data, mapping a PI Point attribute to a table.
2. For tasks using the AF Server's multi-column model, one element corresponds to one table. By default, `taosX` maps PI Point attributes to TDengine metric columns and other attributes to TDengine tag columns.
## Creating a Task
### Add Data Source
In the "Data Ingestion" page, click the **+Add Data Source** button to go to the add data source page.
<figure>
<Image img={imgStep1} alt=""/>
</figure>
### Basic Configuration
Enter a task name in the **Name** field, such as: "test."
Select **PI** or **PI backfill** from the **Type** dropdown list.
If the `taosX` service runs on the same server as the PI system or can connect directly to it (requires PI AF SDK), an agent is not necessary. Otherwise, configure an agent: select a specified agent from the dropdown, or click the **+Create New Agent** button on the right to create a new agent, following the prompts to configure it. `taosX` or its agent must be deployed on a host that can directly connect to the PI System.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right to create a new one.
<figure>
<Image img={imgStep2} alt=""/>
</figure>
### Connection Configuration
The PI System connector supports two connection methods:
1. **PI Data Archive Only**: Does not use the AF model. In this mode, fill in the **PI Service Name** (server address, typically the hostname).
<figure>
<Image img={imgStep3} alt=""/>
</figure>
2. **PI Data Archive and Asset Framework (AF) Server**: Uses the AF SDK. In this mode, in addition to configuring the service name, you also need to configure the PI System (AF Server) name (hostname) and the AF Database name.
<figure>
<Image img={imgStep4} alt=""/>
</figure>
Click the **Connectivity Check** button to check if the data source is available.
### Data Model Configuration
This section has two tabs corresponding to the single-column model configuration and the multi-column model configuration. If this is your first time configuring it, whether you choose the single-column model or the multi-column model, be sure to click the "Download Default Configuration" button. This will generate the default **model configuration file**, which will also be downloaded to your local machine, where you can view or edit it. After editing, you can re-upload it to override the default configuration.
If you want to synchronize all points or all elements of a template, the default configuration is sufficient. If you want to filter specific naming patterns of points or element templates, you need to fill in the filter conditions before clicking "Download Default Configuration."
#### Multi-Column Model Configuration File
Below is an example of a multi-column model configuration file. This configuration file includes configurations for two supertables: one for the `metertemplate` table, whose data comes from the `MeterTemplate` element, and another for the `farm` table, whose data comes from the `Farm` element.
```csv
SuperTable,metertemplate
SubTable,${element_name}_${element_id}
Template,MeterTemplate
Filter,
ts,KEY,TIMESTAMP,$ts
voltage,COLUMN,DOUBLE,$voltage
voltage_status,COLUMN,INT,$voltage_status
current,COLUMN,DOUBLE,$current
current_status,COLUMN,INT,$current_status
element_id,TAG,VARCHAR(100),$element_id
element_name,TAG,VARCHAR(100),$element_name
path,TAG,VARCHAR(100),$path
categories,TAG,VARCHAR(100),$categories
SuperTable,farm
SubTable,${element_name}_${element_id}
Template,Farm
Filter,
ts,KEY,TIMESTAMP,$ts
wind_speed,COLUMN,FLOAT,$wind_speed
wind_speed_status,COLUMN,INT,$wind_speed_status
power_production,COLUMN,FLOAT,$power_production
power_production_status,COLUMN,INT,$power_production_status
lost_power,COLUMN,FLOAT,$lost_power
lost_power_status,COLUMN,INT,$lost_power_status
farm_lifetime_production__weekly_,COLUMN,FLOAT,$farm_lifetime_production__weekly_
farm_lifetime_production__weekly__status,COLUMN,INT,$farm_lifetime_production__weekly__status
farm_lifetime_production__hourly_,COLUMN,FLOAT,$farm_lifetime_production__hourly_
farm_lifetime_production__hourly__status,COLUMN,INT,$farm_lifetime_production__hourly__status
element_id,TAG,VARCHAR(100),$element_id
element_name,TAG,VARCHAR(100),$element_name
path,TAG,VARCHAR(100),$path
categories,TAG,VARCHAR(100),$categories
```
A multi-column model configuration file consists of one or more supertable definitions. Each supertable configuration includes:
1. The mapping between supertables and templates.
2. The mapping between attributes and TDengine metric columns.
3. The mapping between attributes and TDengine tag columns.
4. Source data filtering conditions.
5. For each column, whether it is a metric column or a tag column, you can configure a mapping rule. For details, see [Zero-Code Third-Party Data Integration](../), "Data Extraction, Filtering, and Transformation."
#### Single-Column Model Configuration File
Below is an example of a single-column model configuration file.
```csv
SuperTable,volt_float32
SubTable,${point_name}
Filter,
ts,KEY,TIMESTAMP,$ts
value,COLUMN,FLOAT,$value
status,COLUMN,INT,$status
path,TAG,VARCHAR(200),$path
point_name,TAG,VARCHAR(100),$point_name
ptclassname,TAG,VARCHAR(100),$ptclassname
sourcetag,TAG,VARCHAR(100),$sourcetag
tag,TAG,VARCHAR(100),$tag
descriptor,TAG,VARCHAR(100),$descriptor
exdesc,TAG,VARCHAR(100),$exdesc
engunits,TAG,VARCHAR(100),$engunits
pointsource,TAG,VARCHAR(100),$pointsource
step,TAG,VARCHAR(100),$step
future,TAG,VARCHAR(100),$future
element_paths,TAG,VARCHAR(512),`$element_paths.replace("\\", ".")`
SuperTable,milliampere_float32
SubTable,${point_name}
Filter,
ts,KEY,TIMESTAMP,$ts
value,COLUMN,FLOAT,$value
status,COLUMN,INT,$status
path,TAG,VARCHAR(200),$path
point_name,TAG,VARCHAR(100),$point_name
ptclassname,TAG,VARCHAR(100),$ptclassname
sourcetag,TAG,VARCHAR(100),$sourcetag
tag,TAG,VARCHAR(100),$tag
descriptor,TAG,VARCHAR(100),$descriptor
exdesc,TAG,VARCHAR(100),$exdesc
engunits,TAG,VARCHAR(100),$engunits
pointsource,TAG,VARCHAR(100),$pointsource
step,TAG,VARCHAR(100),$step
future,TAG,VARCHAR(100),$future
element_paths,TAG,VARCHAR(512),`$element_paths.replace("\\", ".")`
Meter_1000004_Voltage,POINT,volt_float32
Meter_1000004_Current,POINT,milliampere_float32
Meter_1000001_Voltage,POINT,volt_float32
Meter_1000001_Current,POINT,milliampere_float32
Meter_1000474_Voltage,POINT,volt_float32
Meter_1000474_Current,POINT,milliampere_float32
```
A single-column model configuration file is divided into two parts. The first part is similar to the multi-column model configuration file and consists of several supertable definitions. The second part is the point list, which configures the mapping between points and supertables. The default configuration maps points with the same UOM and data type to the same supertable.
### Backfill Configuration
1. For PI tasks, a "restart compensation time" can be configured. If the task is interrupted unexpectedly, this parameter is useful upon restart, as it allows `taosX` to automatically backfill a period of data.
2. For PI backfill tasks, you must configure the start and end times for the backfill.
### Advanced Options
The advanced options differ for different task types. The common advanced options are:
1. Connector log level.
2. Batch size for querying and sending data.
3. Maximum delay for a single read.
For **multi-column real-time tasks**, there are the following toggle options:
1. Sync new elements. If enabled, the PI connector will monitor new elements in the template. Without restarting the task, it can automatically synchronize the data for new elements.
2. Sync static attribute changes. If enabled, the PI connector will sync changes to all static attributes (non-PI Point attributes). This means that if a static attribute of an element is modified in the PI AF Server, the corresponding tag value in the TDengine table will also be modified.
3. Sync delete element operations. If enabled, the PI connector will listen for element deletion events in the configured template and sync the deletion of the corresponding subtable in TDengine.
4. Sync delete historical data operations. If enabled, for the time-series data of an element, if data from a certain time is deleted in PI, the corresponding column data in TDengine for that time will be set to null.
5. Sync historical data modifications. If enabled, for the time-series data of an element, if historical data is modified in PI, the corresponding data in TDengine will also be updated.

View File

@ -0,0 +1,241 @@
---
title: OPC UA
slug: /advanced-features/data-connectors/opc-ua
---
import Image from '@theme/IdealImage';
import imgStep1 from '../../assets/opc-ua-01.png';
import imgStep2 from '../../assets/opc-ua-02.png';
import imgStep3 from '../../assets/opc-ua-03.png';
import imgStep4 from '../../assets/opc-ua-04.png';
import imgStep5 from '../../assets/opc-ua-05.png';
import imgStep6 from '../../assets/opc-ua-06.png';
import imgStep7 from '../../assets/opc-ua-07.png';
import imgStep8 from '../../assets/opc-ua-08.png';
import imgStep9 from '../../assets/opc-ua-09.png';
This section explains how to create a data migration task through the Explorer interface, syncing data from an OPC-UA server to the current TDengine cluster.
## Overview
OPC is one of the interoperability standards for securely and reliably exchanging data in industrial automation and other industries.
OPC-UA is the next-generation standard of the classic OPC specification. It is a platform-independent, service-oriented architecture specification that integrates all the features of the existing OPC Classic specification and provides a path to migrate to a more secure and scalable solution.
TDengine can efficiently read data from the OPC-UA server and write it to TDengine to achieve real-time data ingestion.
## Creating a Task
### 1. Add Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
<figure>
<Image img={imgStep1} alt=""/>
</figure>
### 2. Configure Basic Information
Enter a task name in the **Name** field, such as for a task monitoring environmental temperature and humidity, name it **environment-monitoring**.
Select **OPC-UA** from the **Type** dropdown list.
The agent is optional. If needed, you can select a designated agent from the dropdown or click the **+Create New Agent** button on the right.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep2} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** section, fill in the **OPC-UA Server Address**, such as: `127.0.0.1:5000`, and configure the data transmission security mode. There are three security modes to choose from:
1. None: Data is transmitted in plaintext.
2. Sign: Digital signatures are used to verify the communication data, ensuring data integrity.
3. SignAndEncrypt: Digital signatures are used to verify the communication data, and encryption algorithms are applied to encrypt the data, ensuring data integrity, authenticity, and confidentiality.
If you select Sign or SignAndEncrypt as the security mode, you must select a valid security policy. The security policy defines how encryption and verification mechanisms in the security mode are implemented, including the encryption algorithm used, key length, digital certificates, etc. The available security policies are:
1. None: Only selectable when the security mode is None.
2. Basic128Rsa15: Uses RSA algorithm and a 128-bit key length to sign or encrypt communication data.
3. Basic256: Uses AES algorithm and a 256-bit key length to sign or encrypt communication data.
4. Basic256Sha256: Uses AES algorithm and a 256-bit key length, and encrypts the digital signature using the SHA-256 algorithm.
5. Aes128Sha256RsaOaep: Uses AES-128 algorithm for encrypting and decrypting communication data, encrypts the digital signature using SHA-256 algorithm, and uses RSA algorithm and OAEP mode for encrypting and decrypting symmetric communication keys.
6. Aes256Sha256RsaPss: Uses AES-256 algorithm for encrypting and decrypting communication data, encrypts the digital signature using SHA-256 algorithm, and uses RSA algorithm and PSS mode for encrypting and decrypting symmetric communication keys.
<figure>
<Image img={imgStep3} alt=""/>
</figure>
### 4. Choose Authentication Method
As shown in the image below, switch the tab to choose different authentication methods. The available authentication methods are:
1. Anonymous
2. Username
3. Certificate access: This can be the same as the security communication certificate or a different one.
<figure>
<Image img={imgStep4} alt=""/>
</figure>
After configuring the connection properties and authentication method, click the **Connectivity Check** button to check if the data source is available. If you are using a security communication certificate or authentication certificate, the certificate must be trusted by the OPC UA server; otherwise, it will fail the check.
### 5. Configure Data Points Set
You can choose to use a CSV file template or **Select All Data Points** for the **Data Points Set**.
#### 5.1. Upload CSV Configuration File
You can download an empty CSV template and configure the data point information based on the template, then upload the CSV configuration file to configure data points. Alternatively, download the data points based on configured filtering conditions in the format specified by the CSV template.
The CSV file must follow these rules:
1. File Encoding
The uploaded CSV file must be encoded in one of the following formats:
(1) UTF-8 with BOM
(2) UTF-8 (i.e., UTF-8 without BOM)
2. Header Configuration Rules
The header is the first row of the CSV file. The rules are as follows:
(1) The following columns can be configured in the CSV header:
| No. | Column Name | Description | Required | Default Behavior |
| ---- | ----------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
| 1 | point_id | The id of the data point on the OPC UA server | Yes | None |
| 2 | stable | The supertable in TDengine corresponding to the data point | Yes | None |
| 3 | tbname | The subtable in TDengine corresponding to the data point | Yes | None |
| 4 | enable | Whether to collect data for this point | No | Uses a default value of `1` as the enable value |
| 5 | value_col | The column name in TDengine where the collected value of the data point is stored | No | Uses a default value of `val` as the value_col value |
| 6 | value_transform | The transformation function executed on the collected value in taosX | No | No transformation will be applied |
| 7 | type | The data type of the collected value | No | The original type of the collected value will be used as the data type in TDengine |
| 8 | quality_col | The column name in TDengine where the quality of the collected value is stored | No | No quality column will be added in TDengine |
| 9 | ts_col | The timestamp column in TDengine where the original timestamp of the data point is stored | No | If both ts_col and received_ts_col are non-empty, the former is used as the timestamp column. If one of the two is empty, the non-empty column is used. If both are empty, the original timestamp of the data point is used as the timestamp. |
| 10 | received_ts_col | The timestamp column in TDengine where the received timestamp of the data point is stored | No | Same as above |
| 11 | ts_transform | The transformation function applied to the data point's timestamp in taosX | No | No transformation will be applied to the original timestamp of the data point |
| 12 | received_ts_transform | The transformation function applied to the received timestamp of the data point in taosX | No | Same as above |
| 13 | tag::VARCHAR(200)::name | The Tag column in TDengine corresponding to the data point. `tag` is a reserved keyword that indicates the column is a tag column. `VARCHAR(200)` indicates the tag's type. `name` is the actual name of the tag. | No | If 1 or more tag columns are configured, the specified tag columns are used. If no tag columns are configured and the supertable exists in TDengine, the supertable's tags are used. If not, default tags are added: `point_id` and `point_name`. |
(2) The CSV header must not have duplicate columns.
(3) Columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple tags in TDengine. However, tag names must not be duplicated.
(4) The order of columns in the CSV header does not affect the CSV file validation rules.
(5) Columns not listed in the table can also be configured, such as serial numbers. These columns will be automatically ignored.
3. Row Configuration Rules
Each row in the CSV file configures an OPC data point. The row rules are as follows:
(1) The columns in the row must correspond to the columns in the header:
| No. | Header Column | Value Type | Value Range | Required | Default Value |
| ---- | ----------------------- | ---------- | ------------------------------------------------------------ | -------- | ---------------------------------------- |
| 1 | point_id | String | A string like `ns=3;i=1005`, which must conform to the OPC UA ID specification, i.e., must contain ns and id parts | Yes | |
| 2 | enable | int | 0: Do not collect data for this point. The subtable corresponding to the data point will be deleted from TDengine before the OPC DataIn task starts. 1: Collect data for this point. The subtable will not be deleted. | No | 1 |
| 3 | stable | String | Any string that conforms to TDengine supertable naming conventions. Special characters such as `.` will be replaced with underscores. If `{type}` is present: - If `type` is non-empty, it will be replaced by the value of `type`. - If `type` is empty, the original type of the collected value will be used. | Yes | |
| 4 | tbname | String | Any string that conforms to TDengine subtable naming conventions. Special characters such as `.` will be replaced with underscores. For OPC UA: - If `{ns}` is present, it will be replaced with the ns part of the point_id. - If `{id}` is present, it will be replaced with the id part of the point_id. | Yes | |
| 5 | value_col | String | A column name that conforms to TDengine naming conventions | No | val |
| 6 | value_transform | String | A calculation expression that conforms to the Rhai engine, such as `(val + 10) / 1000 * 2.0` or `log(val) + 10` | No | None |
| 7 | type | String | Supported types include: `b/bool/i8/tinyint/i16/small/inti32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/float/f64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json` | No | The original type of the collected value |
| 8 | quality_col | String | A column name that conforms to TDengine naming conventions | No | None |
| 9 | ts_col | String | A column name that conforms to TDengine naming conventions | No | ts |
| 10 | received_ts_col | String | A column name that conforms to TDengine naming conventions | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, e.g., `ts / 1000 * 1000` to set the last 3 digits of a timestamp in ms to 0; `ts + 8 * 3600 * 1000` to add 8 hours to a timestamp in ms precision; `ts - 8 * 3600 * 1000` to subtract 8 hours from a timestamp in ms precision. | No | None |
| 12 | received_ts_transform | String | None | No | None |
| 13 | tag::VARCHAR(200)::name | String | The value in the tag. When the tag type is VARCHAR, it can be in Chinese | No | NULL |
(2) point_id must be unique across the entire DataIn task. In an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you must create multiple OPC DataIn tasks.
(3) If point_id is different but tbname is the same, value_col must be different. This configuration allows data from multiple data points of different data types to be written to different columns in the same subtable. This approach corresponds to the use case of "wide tables for OPC data ingestion into TDengine."
4. Other Rules
(1) If the number of columns in the header and the row is inconsistent, validation fails, and the user is prompted with the row number that does not meet the requirements.
(2) The header must be in the first row and cannot be empty.
(3) At least one data point is required.
#### 5.2. Select Data Points
You can filter data points by configuring **Root Node ID**, **Namespace**, **Regular Expression Match**, and other conditions.
Specify the supertable and subtable where the data will be written by configuring **Supertable Name** and **Table Name**.
Configure the **Primary Key Column**: select `origin_ts` to use the original timestamp of the OPC data point as the primary key in TDengine, or select `received_ts` to use the received timestamp as the primary key in TDengine. You can also configure the **Primary Key Alias** to specify the name of the timestamp column in TDengine.
<figure>
<Image img={imgStep5} alt=""/>
</figure>
### 6. Collection Configuration
In the collection configuration, configure the collection mode, collection interval, collection timeout, and other options for the current task.
<figure>
<Image img={imgStep6} alt=""/>
</figure>
As shown in the image above:
- **Collection Mode**: You can use the `subscribe` or `observe` mode.
- `subscribe`: Subscription mode. Data is reported when there is a change and written to TDengine.
- `observe`: The latest value of the data point is polled at the `Collection Interval` and written to TDengine.
- **Collection Interval**: The default is 10 seconds. The collection interval is the duration between the end of the previous data collection and the start of the next data collection, during which the latest value of the data point is polled and written to TDengine. This is only configurable when the **Collection Mode** is set to `observe`.
- **Collection Timeout**: If no data is returned from the OPC server within the specified time when reading data points, the read operation will fail. The default is 10 seconds. This is only configurable when the **Collection Mode** is set to `observe`.
When **Data Points Set** is configured using the **Select Data Points** method, you can configure **Data Point Update Mode** and **Data Point Update Interval** in the collection configuration to enable dynamic data point updates. **Dynamic Data Point Update** means that during the task's execution, if the OPC Server adds or deletes data points, data points that meet the criteria will be automatically added to the current task without restarting the OPC task.
- Data Point Update Mode: You can choose `None`, `Append`, or `Update`.
- None: Dynamic data point updates are not enabled.
- Append: Dynamic data point updates are enabled, but only new data points are added.
- Update: Dynamic data point updates are enabled, and data points can be added or removed.
- Data Point Update Interval: This is effective when the **Data Point Update Mode** is set to `Append` or `Update`. The unit is seconds, the default is 600, the minimum value is 60, and the maximum value is 2147483647.
### 7. Advanced Options
<figure>
<Image img={imgStep7} alt=""/>
</figure>
As shown in the image above, advanced options can be configured to further optimize performance, logging, and more.
The **Log Level** is set to `info` by default. The available options are `error`, `warn`, `info`, `debug`, and `trace`.
The **Max Write Concurrency** option sets the maximum concurrency limit for writing to taosX. The default value is 0, which means auto, and concurrency is automatically configured.
The **Batch Size** option sets the batch size for each write operation, i.e., the maximum number of messages sent at once.
The **Batch Delay** option sets the maximum delay for a single send operation (in seconds). When the timeout expires, if there is data, even if the **Batch Size** is not met, the data will be sent immediately.
In the **Save Raw Data** option, choose whether to save the raw data. The default is No.
When saving raw data, the following two parameters become effective.
The **Max Retention Days** option sets the maximum retention days for raw data.
The **Raw Data Storage Directory** option sets the path for storing raw data. If an agent is used, the storage path refers to the path on the agent's server; otherwise, it is the path on the taosX server. The path can use the `$DATA_DIR` placeholder and `:id` as part of the path.
- On Linux platforms, `$DATA_DIR` is `/var/lib/taos/taosx`. By default, the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platforms, `$DATA_DIR` is `C:\TDengine\data\taosx`. By default, the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
### 8. Task Completion
Click the **Submit** button to complete the OPC UA to TDengine data synchronization task. Return to the **Data Sources List** page to view the task's execution status.
## Adding Data Points
While the task is running, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
In the pop-up form, fill in the data point information.
Click the **Confirm** button to complete the addition of data points.

View File

@ -0,0 +1,213 @@
---
title: OPC DA
slug: /advanced-features/data-connectors/opc-da
---
import Image from '@theme/IdealImage';
import imgStep1 from '../../assets/opc-da-01.png';
import imgStep2 from '../../assets/opc-da-02.png';
import imgStep3 from '../../assets/opc-da-03.png';
import imgStep4 from '../../assets/opc-da-04.png';
import imgStep5 from '../../assets/opc-da-05.png';
import imgStep6 from '../../assets/opc-da-06.png';
import imgStep7 from '../../assets/opc-da-07.png';
import imgStep8 from '../../assets/opc-da-08.png';
This section explains how to create a data migration task through the Explorer interface, syncing data from an OPC-DA server to the current TDengine cluster.
## Overview
OPC is one of the interoperability standards for securely and reliably exchanging data in industrial automation and other industries.
OPC DA (Data Access) is a classic COM-based specification that is only applicable to Windows. Although OPC DA is not the most modern or efficient data communication standard, it is widely used because some older devices only support OPC DA.
TDengine can efficiently read data from the OPC-DA server and write it to TDengine to achieve real-time data ingestion.
## Creating a Task
### 1. Add Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
<figure>
<Image img={imgStep1} alt=""/>
</figure>
### 2. Configure Basic Information
Enter a task name in the **Name** field, such as for a task monitoring environmental temperature and humidity, name it **environment-monitoring**.
Select **OPC-DA** from the **Type** dropdown list.
If the taosX service is running on the same server as the OPC-DA server, the agent is not required. Otherwise, you need to configure an agent: select a designated agent from the dropdown or click the **+Create New Agent** button to create a new agent, and follow the prompts to configure the agent.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep2} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** section, fill in the **OPC-DA Server Address**, such as `127.0.0.1/Matrikon.OPC.Simulation.1`, and configure the authentication method.
Click the **Connectivity Check** button to check if the data source is available.
<figure>
<Image img={imgStep3} alt=""/>
</figure>
### 4. Configure Data Points Set
You can choose to use a CSV file template or **Select All Data Points** for the **Data Points Set**.
#### 4.1. Upload CSV Configuration File
You can download an empty CSV template and configure the data point information based on the template, then upload the CSV configuration file to configure data points. Alternatively, download the data points based on the configured filtering conditions in the format specified by the CSV template.
The CSV file must follow these rules:
1. File Encoding
The uploaded CSV file must be encoded in one of the following formats:
(1) UTF-8 with BOM
(2) UTF-8 (i.e., UTF-8 without BOM)
2. Header Configuration Rules
The header is the first row of the CSV file. The rules are as follows:
(1) The following columns can be configured in the CSV header:
| No. | Column Name | Description | Required | Default Behavior |
| ---- | ----------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
| 1 | tag_name | The id of the data point on the OPC DA server | Yes | None |
| 2 | stable | The supertable in TDengine corresponding to the data point | Yes | None |
| 3 | tbname | The subtable in TDengine corresponding to the data point | Yes | None |
| 4 | enable | Whether to collect data for this point | No | Uses a default value of `1` as the enable value |
| 5 | value_col | The column name in TDengine where the collected value of the data point is stored | No | Uses a default value of `val` as the value_col value |
| 6 | value_transform | The transformation function executed on the collected value in taosX | No | No transformation will be applied |
| 7 | type | The data type of the collected value | No | The original type of the collected value will be used as the data type in TDengine |
| 8 | quality_col | The column name in TDengine where the quality of the collected value is stored | No | No quality column will be added in TDengine |
| 9 | ts_col | The timestamp column in TDengine where the original timestamp of the data point is stored | No | If both ts_col and received_ts_col are non-empty, the former is used as the timestamp column. If one of the two is empty, the non-empty column is used. If both are empty, the original timestamp of the data point is used as the timestamp. |
| 10 | received_ts_col | The timestamp column in TDengine where the received timestamp of the data point is stored | No | Same as above |
| 11 | ts_transform | The transformation function applied to the data point's timestamp in taosX | No | No transformation will be applied to the original timestamp of the data point |
| 12 | received_ts_transform | The transformation function applied to the received timestamp of the data point in taosX | No | Same as above |
| 13 | tag::VARCHAR(200)::name | The Tag column in TDengine corresponding to the data point. `tag` is a reserved keyword that indicates the column is a tag column. `VARCHAR(200)` indicates the tag's type. `name` is the actual name of the tag. | No | If 1 or more tag columns are configured, the specified tag columns are used. If no tag columns are configured and the supertable exists in TDengine, the supertable's tags are used. If not, default tags are added: `point_id` and `point_name`. |
(2) The CSV header must not have duplicate columns.
(3) Columns like `tag::VARCHAR(200)::name` can be configured multiple times, corresponding to multiple tags in TDengine. However, tag names must not be duplicated.
(4) The order of columns in the CSV header does not affect the CSV file validation rules.
(5) Columns not listed in the table can also be configured, such as serial numbers. These columns will be automatically ignored.
3. Row Configuration Rules
Each row in the CSV file configures an OPC data point. The row rules are as follows:
(1) The columns in the row must correspond to the columns in the header:
| No. | Header Column | Value Type | Value Range | Required | Default Value |
| ---- | ----------------------- | ---------- | ------------------------------------------------------------ | -------- | ---------------------------------------- |
| 1 | tag_name | String | A string like `root.parent.temperature`, which must conform to the OPC DA ID specification | Yes | |
| 2 | enable | int | 0: Do not collect data for this point. The subtable corresponding to the data point will be deleted from TDengine before the OPC DataIn task starts. 1: Collect data for this point. The subtable will not be deleted. | No | 1 |
| 3 | stable | String | Any string that conforms to TDengine supertable naming conventions. Special characters such as `.` will be replaced with underscores. If `{type}` is present: - If `type` is non-empty, it will be replaced by the value of `type`. - If `type` is empty, the original type of the collected value will be used. | Yes | |
| 4 | tbname | String | Any string that conforms to TDengine subtable naming conventions. Special characters such as `.` will be replaced with underscores. If `{tag_name}` is present, it will be replaced with the tag_name. | Yes | |
| 5 | value_col | String | A column name that conforms to TDengine naming conventions | No | val |
| 6 | value_transform | String | A calculation expression that conforms to the Rhai engine, such as `(val + 10) / 1000 * 2.0` or `log(val) + 10` | No | None |
| 7 | type | String | Supported types include: `b/bool/i8/tinyint/i16/small/inti32/int/i64/bigint/u8/tinyint unsigned/u16/smallint unsigned/u32/int unsigned/u64/bigint unsigned/f32/float/f64/double/timestamp/timestamp(ms)/timestamp(us)/timestamp(ns)/json` | No | The original type of the collected value |
| 8 | quality_col | String | A column name that conforms to TDengine naming conventions | No | None |
| 9 | ts_col | String | A column name that conforms to TDengine naming conventions | No | ts |
| 10 | received_ts_col | String | A column name that conforms to TDengine naming conventions | No | rts |
| 11 | ts_transform | String | Supports +, -, *, /, % operators, e.g., `ts / 1000 * 1000` to set the last 3 digits of a timestamp in ms to 0; `ts + 8 * 3600 * 1000` to add 8 hours to a timestamp in ms precision; `ts - 8 * 3600 * 1000` to subtract 8 hours from a timestamp in ms precision. | No | None |
| 12 | received_ts_transform | String | None | No | None |
| 13 | tag::VARCHAR(200)::name | String | The value in the tag. When the tag type is VARCHAR, it can be in Chinese | No | NULL |
(2) tag_name must be unique across the entire DataIn task. In an OPC DataIn task, a data point can only be written to one subtable in TDengine. If you need to write a data point to multiple subtables, you must create multiple OPC DataIn tasks.
(3) If tag_name is different but tbname is the same, value_col must be different. This configuration allows data from multiple data points of different data types to be written to different columns in the same subtable. This approach corresponds to the use case of "wide tables for OPC data ingestion into TDengine."
4. Other Rules
(1) If the number of columns in the header and the row is inconsistent, validation fails, and the user is prompted with the row number that does not meet the requirements.
(2) The header must be in the first row and cannot be empty.
(3) At least one data point is required.
#### 4.2. Select Data Points
You can filter data points by configuring **Root Node ID** and **Regular Expression Match** as filtering conditions.
Specify the supertable and subtable where the data will be written by configuring **Supertable Name** and **Table Name**.
Configure the **Primary Key Column**: select origin_ts to use the original timestamp of the OPC data point as the primary key in TDengine, or select received_ts to use the received timestamp as the primary key in TDengine. You can also configure the **Primary Key Alias** to specify the name of the timestamp column in TDengine.
<figure>
<Image img={imgStep4} alt=""/>
</figure>
### 5. Collection Configuration
In the collection configuration, configure the collection interval, connection timeout, and collection timeout options for the current task.
<figure>
<Image img={imgStep5} alt=""/>
</figure>
As shown in the image above:
- **Connection Timeout**: Configure the timeout for connecting to the OPC server. The default is 10 seconds.
- **Collection Timeout**: If no data is returned from the OPC server within the specified time when reading data points, the read operation will fail. The default is 10 seconds.
- **Collection Interval**: The default is 10 seconds. The collection interval is the duration between the end of the previous data collection and the start of the next data collection, during which the latest value of the data point is polled and written to TDengine.
When **Data Points Set** is configured using the **Select Data Points** method, you can configure **Data Point Update Mode** and **Data Point Update Interval** in the collection configuration to enable dynamic data point updates. **Dynamic Data Point Update** means that during the task's execution, if the OPC Server adds or deletes data points, data points that meet the criteria will be automatically added to the current task without restarting the OPC task.
- Data Point Update Mode: You can choose `None`, `Append`, or `Update`.
- None: Dynamic data point updates are not enabled.
- Append: Dynamic data point updates are enabled, but only new data points are added.
- Update: Dynamic data point updates are enabled, and data points can be added or removed.
- Data Point Update Interval: This is effective when the **Data Point Update Mode** is set to `Append` or `Update`. The unit is seconds, the default is 600, the minimum value is 60, and the maximum value is 2147483647.
### 6. Advanced Options
<figure>
<Image img={imgStep6} alt=""/>
</figure>
As shown in the image above, advanced options can be configured to further optimize performance, logging, and more.
The **Log Level** is set to `info` by default. The available options are `error`, `warn`, `info`, `debug`, and `trace`.
The **Max Write Concurrency** option sets the maximum concurrency limit for writing to taosX. The default value is 0, which means auto, and concurrency is automatically configured.
The **Batch Size** option sets the batch size for each write operation, i.e., the maximum number of messages sent at once.
The **Batch Delay** option sets the maximum delay for a single send operation (in seconds). When the timeout expires, if there is data, even if the **Batch Size** is not met, the data will be sent immediately.
In the **Save Raw Data** option, choose whether to save the raw data. The default is No.
When saving raw data, the following two parameters become effective.
The **Max Retention Days** option sets the maximum retention days for raw data.
The **Raw Data Storage Directory** option sets the path for storing raw data. If an agent is used, the storage path refers to the path on the agent's server; otherwise, it is the path on the taosX server. The path can use the `$DATA_DIR` placeholder and `:id` as part of the path.
- On Linux platforms, `$DATA_DIR` is `/var/lib/taos/taosx`. By default, the storage path is `/var/lib/taos/taosx/tasks/<task_id>/rawdata`.
- On Windows platforms, `$DATA_DIR` is `C:\TDengine\data\taosx`. By default, the storage path is `C:\TDengine\data\taosx\tasks\<task_id>\rawdata`.
### 7. Task Completion
Click the **Submit** button to complete the OPC DA to TDengine data synchronization task. Return to the **Data Sources List** page to view the task's execution status.
## Adding Data Points
While the task is running, click **Edit**, then click the **Add Data Points** button to append data points to the CSV file.
In the pop-up form, fill in the data point information.
Click the **Confirm** button to complete the addition of data points.

View File

@ -0,0 +1,203 @@
---
title: MQTT
slug: /advanced-features/data-connectors/mqtt
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/mqtt-01.png';
import imgStep02 from '../../assets/mqtt-02.png';
import imgStep03 from '../../assets/mqtt-03.png';
import imgStep04 from '../../assets/mqtt-04.png';
import imgStep05 from '../../assets/mqtt-05.png';
import imgStep06 from '../../assets/mqtt-06.png';
import imgStep07 from '../../assets/mqtt-07.png';
import imgStep08 from '../../assets/mqtt-08.png';
import imgStep09 from '../../assets/mqtt-09.png';
import imgStep10 from '../../assets/mqtt-10.png';
import imgStep11 from '../../assets/mqtt-11.png';
import imgStep12 from '../../assets/mqtt-12.png';
import imgStep13 from '../../assets/mqtt-13.png';
import imgStep14 from '../../assets/mqtt-14.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from MQTT to the current TDengine cluster.
## Overview
MQTT stands for Message Queuing Telemetry Transport. It is a lightweight messaging protocol, easy to implement and use.
TDengine can use the MQTT connector to subscribe to data from an MQTT broker and write it to TDengine to enable real-time data ingestion.
## Creating a Task
### 1. Add Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
Enter the task name in the **Name** field, such as "test_mqtt".
Select **MQTT** from the **Type** dropdown list.
The agent is optional. If needed, you can select an agent from the dropdown list, or click the **+Create New Agent** button.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection and Authentication Information
In the **MQTT Address** field, enter the address of the MQTT broker, for example: `192.168.1.42`.
In the **MQTT Port** field, enter the port of the MQTT broker, for example: `1883`.
In the **User** field, enter the username for the MQTT broker.
In the **Password** field, enter the password for the MQTT broker.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure SSL Certificates
If the MQTT broker uses SSL certificates, upload the certificate file in the **SSL Certificate** field.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### 5. Configure Collection Information
In the **Collection Configuration** section, enter the relevant parameters for the collection task.
Select the MQTT protocol version from the **MQTT Protocol** dropdown list. There are three options: `3.1`, `3.1.1`, and `5.0`. The default is 3.1.
In the **Client ID** field, enter the client identifier. This will generate a client ID with the `taosx` prefix (for example, if you enter "foo", the generated client ID will be `taosxfoo`). If the switch at the end is enabled, the task ID will be appended after `taosx` before the entered identifier (the generated client ID will be like `taosx100foo`). All client IDs connected to the same MQTT address must be unique.
In the **Keep Alive** field, enter the keep-alive interval. If the broker does not receive any messages from the client within this interval, it will assume the client has disconnected and close the connection. The keep-alive interval is the negotiated time between the client and the broker to detect if the client is active. If no messages are sent within this interval, the broker will disconnect the client.
In the **Clean Session** field, choose whether to clean the session. The default value is true.
In the **Subscription Topics and QoS Configuration** field, enter the Topic names to consume, using the following format: `topic1::0,topic2::1`.
Click the **Connectivity Check** button to test if the data source is available.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
### 6. Configure MQTT Payload Parsing
In the **MQTT Payload Parsing** section, enter the configuration parameters related to parsing the Payload.
taosX can use a JSON extractor to parse the data and allows users to specify the data model in the database, including specifying table names, supertable names, setting regular columns, and tag columns.
#### 6.1 Parsing
There are three ways to obtain sample data:
Click the **Retrieve from Server** button to get sample data from MQTT.
Click the **File Upload** button to upload a CSV file and get sample data.
Enter sample data from the MQTT message body in the **Message Body** field.
json data supports `JSONObject` or `JSONArray`. The following data can be parsed using the JSON parser:
```json
{"id": 1, "message": "hello-world"}
{"id": 2, "message": "hello-world"}
```
or
```json
[{"id": 1, "message": "hello-world"},{"id": 2, "message": "hello-world"}]
```
The parsing result is shown below:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
Click the **Magnifier Icon** to preview the parsing result.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
#### 6.2 Field Splitting
In the **Extract or Split from Column** section, enter the fields to extract or split from the message body. For example, to split the `message` field into `message_0` and `message_1`, select the `split` extractor, enter `-` as the separator, and `2` as the number.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
Click **Delete** to remove the current extraction rule.
Click **Add more** to add more extraction rules.
Click the **Magnifier Icon** to preview the extraction/split results.
<figure>
<Image img={imgStep09} alt=""/>
</figure>
#### 6.3 Data Filtering
In the **Filter** section, enter filtering conditions. For example, entering `id != 1` will filter out data where the `id` is equal to `1`, and only data with `id` not equal to 1 will be written to TDengine.
<figure>
<Image img={imgStep10} alt=""/>
</figure>
Click **Delete** to remove the current filter rule.
Click the **Magnifier Icon** to preview the filtering results.
<figure>
<Image img={imgStep11} alt=""/>
</figure>
#### 6.4 Table Mapping
In the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button to create a new one.
In the **Mapping** section, enter the mapping rule for the target tables name. For example, enter `t_{id}`. Fill in the mapping rules according to your needs, and mapping supports setting default values.
<figure>
<Image img={imgStep12} alt=""/>
</figure>
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep13} alt=""/>
</figure>
### 7. Advanced Options
In the **Log Level** dropdown list, select the log level. There are five options: `TRACE`, `DEBUG`, `INFO`, `WARN`, and `ERROR`. The default is `INFO`.
When saving raw data, the following two parameters are enabled:
**Max Retention Days:** Set the maximum number of days to retain raw data.
**Raw Data Storage Directory:** Set the path for storing raw data. If an agent is used, this path refers to the server where the agent is located; otherwise, it refers to the server where taosX is running. You can use placeholders like `DATA_DIR` and `:id` as part of the path.
<figure>
<Image img={imgStep14} alt=""/>
</figure>
### 8. Completion
Click the **Submit** button to complete the creation of the MQTT to TDengine data synchronization task. Go back to the **Data Source List** page to monitor the task's execution status.

View File

@ -0,0 +1,266 @@
---
title: Apache Kafka
sidebar_label: Kafka
slug: /advanced-features/data-connectors/kafka
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/kafka-01.png';
import imgStep02 from '../../assets/kafka-02.png';
import imgStep03 from '../../assets/kafka-03.png';
import imgStep04 from '../../assets/kafka-04.png';
import imgStep05 from '../../assets/kafka-05.png';
import imgStep06 from '../../assets/kafka-06.png';
import imgStep07 from '../../assets/kafka-07.png';
import imgStep08 from '../../assets/kafka-08.png';
import imgStep09 from '../../assets/kafka-09.png';
import imgStep10 from '../../assets/kafka-10.png';
import imgStep11 from '../../assets/kafka-11.png';
import imgStep12 from '../../assets/kafka-12.png';
import imgStep13 from '../../assets/kafka-13.png';
import imgStep14 from '../../assets/kafka-14.png';
import imgStep15 from '../../assets/kafka-15.png';
import imgStep16 from '../../assets/kafka-16.png';
import imgStep17 from '../../assets/kafka-17.png';
import imgStep18 from '../../assets/kafka-18.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from Kafka to the current TDengine cluster.
## Overview
Apache Kafka is an open-source distributed streaming platform for stream processing, real-time data pipelines, and large-scale data integration.
TDengine can efficiently read data from Kafka and write it into TDengine, enabling historical data migration or real-time data ingestion.
## Creating a Task
### 1. Add Data Source
On the Data Ingestion page, click the **+Add Data Source** button to go to the Add Data Source page.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
Enter the task name in the **Name** field, such as "test_kafka".
Select **Kafka** from the **Type** dropdown list.
The agent is optional. If needed, you can select an agent from the dropdown list, or click the **+Create New Agent** button.
Select a target database from the **Target Database** dropdown list, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
Enter **bootstrap-server**, for example: `192.168.1.92`.
Enter **Port**, for example: `9092`.
For multiple broker addresses, add more pairs of bootstrap-server and ports by clicking the **Add Broker** button at the bottom right of the connection configuration section.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure SASL Authentication Mechanism
If the server has SASL authentication enabled, configure SASL and select the appropriate authentication mechanism. Currently, PLAIN/SCRAM-SHA-256/GSSAPI are supported.
#### 4.1. PLAIN Authentication
Select the `PLAIN` authentication mechanism and enter the username and password:
<figure>
<Image img={imgStep04} alt=""/>
</figure>
#### 4.2. SCRAM (SCRAM-SHA-256) Authentication
Select the `SCRAM-SHA-256` authentication mechanism and enter the username and password:
<figure>
<Image img={imgStep05} alt=""/>
</figure>
#### 4.3. GSSAPI Authentication
Select `GSSAPI`, which uses the [RDkafka client](https://github.com/confluentinc/librdkafka) to call GSSAPI for Kerberos authentication:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
You will need to provide:
- Kerberos service name, typically `kafka`.
- Kerberos principal (authentication username), such as `kafkaclient`.
- Kerberos initialization command (optional).
- Kerberos keytab, a file that you must upload.
These details must be provided by the Kafka administrator.
You must also set up [Kerberos](https://web.mit.edu/kerberos/) authentication on your server. Install it using the following commands:
- On Ubuntu: `apt install krb5-user`
- On CentOS: `yum install krb5-workstation`
After configuring, you can use the [kcat](https://github.com/edenhill/kcat) tool to validate the Kafka topic consumption:
```bash
kcat <topic> \
-b <kafka-server:port> \
-G kcat \
-X security.protocol=SASL_PLAINTEXT \
-X sasl.mechanism=GSSAPI \
-X sasl.kerberos.keytab=</path/to/kafkaclient.keytab> \
-X sasl.kerberos.principal=<kafkaclient> \
-X sasl.kerberos.service.name=kafka
```
If you get the error: "Server xxxx not found in kerberos database," ensure that the domain name for the Kafka node is configured properly and set `rdns = true` in the Kerberos client configuration file (`/etc/krb5.conf`).
### 5. Configure SSL Certificate
If SSL encryption authentication is enabled on the server, enable SSL here and configure the relevant details.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 6. Configure Collection Information
In the **Collection Configuration** section, fill in the relevant parameters for the collection task.
Enter the **Timeout**. If Kafka does not provide any data within the timeout period, the data collection task will exit. The default is 0 ms, meaning it will wait indefinitely until data is available or an error occurs.
Enter the **Topic** name to consume. Multiple topics can be configured, separated by commas (e.g., `tp1,tp2`).
In the **Client ID** field, enter the client identifier. This will generate a client ID with the `taosx` prefix (e.g., if you enter "foo", the client ID will be `taosxfoo`). If you enable the switch at the end, the task ID will be appended to `taosx` before the entered identifier (e.g., `taosx100foo`). You should note that when using multiple taosX instances to subscribe to the same Topic for load balancing, you must provide a consistent Client ID to achieve the balancing effect.
In the **Consumer Group ID** field, enter the consumer group identifier. This will generate a consumer group ID with the `taosx` prefix (e.g., if you enter "foo", the consumer group ID will be `taosxfoo`). If you enable the switch at the end, the task ID will be appended to `taosx` before the entered identifier (e.g., `taosx100foo`).
From the **Offset** dropdown list, choose the offset to start consuming data. There are three options: `Earliest`, `Latest`, `ByTime(ms)`. The default is `Earliest`.
- Earliest: Requests the earliest offset.
- Latest: Requests the latest offset.
In the **Max Duration for Fetching Data**, set the maximum time to wait for data when the data is insufficient (in milliseconds). The default is 100ms.
Click the **Check Connectivity** button to check if the data source is available.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 7. Configure Payload Parsing
In the **Payload Parsing** section, fill in the relevant parameters for payload parsing.
#### 7.1 Parsing
There are three ways to obtain sample data:
Click the **Fetch from Server** button to retrieve sample data from Kafka.
Click the **Upload File** button to upload a CSV file to obtain sample data.
In the **Message Body** field, enter a sample of the Kafka message body.
JSON data supports both `JSONObject` and `JSONArray`. Use the JSON parser to parse the following data:
```json
{"id": 1, "message": "hello-world"}
{"id": 2, "message": "hello-world"}
```
or
```json
[{"id": 1, "message": "hello-world"},{"id": 2, "message": "hello-world"}]
```
The parsed result is as follows:
<figure>
<Image img={imgStep09} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the parsed result.
<figure>
<Image img={imgStep10} alt=""/>
</figure>
#### 7.2 Field Splitting
In the **Extract or Split from Column** field, enter the fields to be extracted or split from the message body. For example, to split the `message` field into `message_0` and `message_1`, select the `split` extractor, set `separator` to `-`, and set `number` to `2`.
Click **Add** to add more extraction rules.
Click **Delete** to remove the current extraction rule.
<figure>
<Image img={imgStep11} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the extracted/split result.
<figure>
<Image img={imgStep12} alt=""/>
</figure>
#### 7.3 Data Filtering
In the **Filter** section, enter filtering conditions. For example, if you enter `id != 1`, only data where `id` is not equal to `1` will be written into TDengine.
Click **Add** to add more filtering rules.
Click **Delete** to remove the current filtering rule.
<figure>
<Image img={imgStep13} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the filtered result.
<figure>
<Image img={imgStep14} alt=""/>
</figure>
#### 7.4 Table Mapping
From the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button on the right.
In the **Mapping** field, enter the name of the subtable in the target supertable, such as `t_{id}`. Based on the requirements, fill in the mapping rules. The mapping supports setting default values.
<figure>
<Image img={imgStep15} alt=""/>
</figure>
Click **Preview** to view the mapping result.
<figure>
<Image img={imgStep16} alt=""/>
</figure>
### 8. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the right `>` to expand it, as shown below:
<figure>
<Image img={imgStep17} alt=""/>
</figure>
<figure>
<Image img={imgStep18} alt=""/>
</figure>
### 9. Complete the Creation
Click the **Submit** button to complete the Kafka to TDengine data synchronization task. Go back to the **Data Sources** page to view the task's status.

View File

@ -0,0 +1,119 @@
---
title: InfluxDB
slug: /advanced-features/data-connectors/influxdb
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/influxdb-01.png';
import imgStep02 from '../../assets/influxdb-02.png';
import imgStep03 from '../../assets/influxdb-03.png';
import imgStep04 from '../../assets/influxdb-04.png';
import imgStep05 from '../../assets/influxdb-05.png';
import imgStep06 from '../../assets/influxdb-06.png';
import imgStep07 from '../../assets/influxdb-07.png';
import imgStep08 from '../../assets/influxdb-08.png';
import imgStep09 from '../../assets/influxdb-09.png';
import imgStep10 from '../../assets/influxdb-10.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from InfluxDB to the current TDengine cluster.
## Function Overview
InfluxDB is a popular open-source time series database optimized for handling large amounts of time series data. TDengine can efficiently read data from InfluxDB via the InfluxDB connector and write it into TDengine to achieve historical data migration or real-time data synchronization.
During task execution, progress information is saved to disk, so if the task is paused and restarted, or it recovers automatically from an error, the task will not start from the beginning. More options can be found by reading the descriptions of each form field on the task creation page.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the top left of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_influxdb_01`*.
Select *`InfluxDB`* from the **Type** dropdown box, as shown below (the fields on the page will change after selection).
The **Agent** field is optional. If needed, you can select a specified agent from the dropdown box, or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** is required. Since InfluxDB stores data in various time precisions such as seconds, milliseconds, microseconds, and nanoseconds, you need to select a *`nanosecond-precision database`*. You can also click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`connection information of the source InfluxDB database`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **Authentication** section, there are two tabs, *`1.x version`* and *`2.x version`*, as different versions of InfluxDB require different authentication parameters, and the APIs differ significantly. Please select based on your actual situation:
*`1.x version`*
**Version**: Select the version of the source InfluxDB database from the dropdown.
**User**: Enter the user for the source InfluxDB database, and the user must have read access in the organization.
**Password**: Enter the password for the above user in the source InfluxDB database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
*`2.x version`*
**Version**: Select the version of the source InfluxDB database from the dropdown.
**Organization ID**: Enter the organization ID of the source InfluxDB database, which is a string composed of hexadecimal characters, not the organization name. This can be obtained from the Organization->About page of the InfluxDB console.
**Token**: Enter the access token for the source InfluxDB database, which must have read access in the organization.
**Add Database Retention Policy**: This is a *`Yes/No`* toggle item. InfluxQL requires a combination of the database and retention policy (DBRP) to query data. Some 2.x versions and the InfluxDB Cloud version require manually adding this mapping. Turning on this switch allows the connector to automatically add this during task execution.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Below the **Authentication** area, there is a **Connectivity Check** button. Users can click this button to check whether the information entered above can correctly retrieve data from the source InfluxDB database. The check results are shown below:
**Failure**
<figure>
<Image img={imgStep06} alt=""/>
</figure>
**Success**
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 5. Configure Task Information
**Bucket**: In InfluxDB, a bucket is a namespace for storing data. Each task must specify a bucket. Users need to click the **Get Schema** button on the right to fetch the data structure information of the current source InfluxDB database and then select from the dropdown box, as shown below:
<figure>
<Image img={imgStep08} alt=""/>
</figure>
**Measurements**: This is optional. Users can select one or more measurements to synchronize. If not specified, all will be synchronized.
**Start Time**: This refers to the start time of the data in the source InfluxDB database. The time zone of the start time uses the time zone selected in the explorer. This field is required.
**End Time**: This refers to the end time of the data in the source InfluxDB database. If the end time is not specified, the synchronization of the latest data will continue; if the end time is specified, synchronization will only occur up to that point. The time zone of the end time uses the time zone selected in the explorer. This field is optional.
**Time Range per Read (minutes)**: This defines the maximum time range for a single read from the source InfluxDB database. This is an important parameter that users need to decide based on server performance and data storage density. If the range is too small, the synchronization task will execute slowly. If the range is too large, it may cause system failures in the InfluxDB database due to high memory usage.
**Delay (seconds)**: This is an integer between 1 and 30. To eliminate the impact of out-of-order data, TDengine always waits for the time specified here before reading the data.
### 6. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the `>` on the right to expand it, as shown below:
<figure>
<Image img={imgStep09} alt=""/>
</figure>
<figure>
<Image img={imgStep10} alt=""/>
</figure>
### 7. Completion
Click the **Submit** button to complete the creation of the InfluxDB to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.

View File

@ -0,0 +1,96 @@
---
title: OpenTSDB
slug: /advanced-features/data-connectors/opentsdb
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/opentsdb-01.png';
import imgStep02 from '../../assets/opentsdb-02.png';
import imgStep03 from '../../assets/opentsdb-03.png';
import imgStep04 from '../../assets/opentsdb-04.png';
import imgStep05 from '../../assets/opentsdb-05.png';
import imgStep06 from '../../assets/opentsdb-06.png';
import imgStep07 from '../../assets/opentsdb-07.png';
import imgStep08 from '../../assets/opentsdb-08.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from OpenTSDB to the current TDengine cluster.
## Function Overview
OpenTSDB is a real-time monitoring information collection and display platform built on top of the HBase system. TDengine can efficiently read data from OpenTSDB via the OpenTSDB connector and write it into TDengine to achieve historical data migration or real-time data synchronization.
During task execution, progress information is saved to disk, so if the task is paused and restarted, or it recovers automatically from an error, the task will not start from the beginning. More options can be found by reading the descriptions of each form field on the task creation page.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the top left of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_opentsdb_01`*.
Select *`OpenTSDB`* from the **Type** dropdown box, as shown below (the fields on the page will change after selection).
The **Agent** field is optional. If needed, you can select a specified agent from the dropdown box, or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** is required. Since OpenTSDB stores data with a time precision of milliseconds, you need to select a *`millisecond-precision database`*. You can also click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`connection information of the source OpenTSDB database`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
Below the **Connection Configuration** area, there is a **Connectivity Check** button. Users can click this button to check whether the information entered above can correctly retrieve data from the source OpenTSDB database. The check results are shown below:
**Failure**
<figure>
<Image img={imgStep04} alt=""/>
</figure>
**Success**
<figure>
<Image img={imgStep05} alt=""/>
</figure>
### 4. Configure Task Information
**Metrics**: These are the physical quantities stored in the OpenTSDB database. Users can specify multiple metrics to synchronize; if not specified, all data in the database will be synchronized. If users specify metrics, they need to click the **Get Metrics** button on the right to fetch all metric information from the current source OpenTSDB database and then select from the dropdown box, as shown below:
<figure>
<Image img={imgStep06} alt=""/>
</figure>
**Start Time**: This refers to the start time of the data in the source OpenTSDB database. The time zone of the start time uses the time zone selected in the explorer. This field is required.
**End Time**: This refers to the end time of the data in the source OpenTSDB database. If the end time is not specified, synchronization of the latest data will continue; if the end time is specified, synchronization will only occur up to that point. The time zone of the end time uses the time zone selected in the explorer. This field is optional.
**Time Range per Read (minutes)**: This defines the maximum time range for a single read from the source OpenTSDB database. This is an important parameter that users need to decide based on server performance and data storage density. If the range is too small, the synchronization task will execute slowly. If the range is too large, it may cause system failures in the OpenTSDB database due to high memory usage.
**Delay (seconds)**: This is an integer between 1 and 30. To eliminate the impact of out-of-order data, TDengine always waits for the time specified here before reading the data.
### 5. Configure Advanced Options
The **Advanced Options** section is collapsed by default. Click the `>` on the right to expand it, as shown below:
<figure>
<Image img={imgStep07} alt=""/>
</figure>
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 6. Completion
Click the **Submit** button to complete the creation of the OpenTSDB to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.

View File

@ -0,0 +1,116 @@
---
title: CSV File
slug: /advanced-features/data-connectors/csv-file
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/csv-file-01.png';
import imgStep02 from '../../assets/csv-file-02.png';
import imgStep03 from '../../assets/csv-file-03.png';
import imgStep04 from '../../assets/csv-file-04.png';
import imgStep05 from '../../assets/csv-file-05.png';
import imgStep06 from '../../assets/csv-file-06.png';
import imgStep07 from '../../assets/csv-file-07.png';
import imgStep10 from '../../assets/csv-file-10.png';
import imgStep11 from '../../assets/csv-file-11.png';
This section explains how to create a data migration task through the Explorer interface to migrate data from CSV to the current TDengine cluster.
## Function Overview
Import one or more CSV files into TDengine.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button on the data writing page to enter the Add Data Source page.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as: "test_csv";
Select **CSV** from the **Type** dropdown list.
In the **Target Database** dropdown list, select a target database, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure CSV Options
In the **Contains Header** section, toggle to enable or disable; if it contains a header, the first row will be treated as column information.
In the **Ignore First N Rows** section, enter N to ignore the first N rows of the CSV file.
In the **Field Separator** section, select the separator between CSV fields; the default is ",".
In the **Field Quotation Character** section, select the character used to surround field content when the CSV field contains separators or newline characters to ensure the entire field is correctly identified; the default is `" "`.
In the **Comment Prefix Character** section, select the character; if any line in the CSV file begins with this character, that line will be ignored; the default is "#".
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure CSV File Parsing
Upload the CSV file locally, for example: test-json.csv; this sample CSV file will then be used to configure extraction and filtering conditions.
#### 4.1 Parsing
After clicking **Select File**, choose test-json.csv, then click **Parse** to preview the identified columns.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
**Preview Parsing Results**
<figure>
<Image img={imgStep05} alt=""/>
</figure>
#### 4.2 Field Splitting
In the **Extract or Split from Columns** section, enter the fields to extract or split from the message body. For example, split the message field into `text_0` and `text_1` using the split extractor; enter `-` as the separator and `2` for the number.
Click **Delete** to remove the current extraction rule.
Click **Add** to add more extraction rules.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
Click the **Magnifying Glass Icon** to preview the extraction or splitting results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
<!-- In the **Filtering** section, enter filtering conditions, such as: `id != 1`, so that only data where id is not 1 will be written to TDengine.
Click **Delete** to remove the current filtering rule.
![csv-08.png](../../assets/csv-file-08.png)
Click the **Magnifying Glass Icon** to preview the filtering results.
![csv-09.png](../../assets/csv-file-09.png) -->
#### 4.3 Table Mapping
In the **Target Supertable** dropdown list, select a target supertable, or click the **Create Supertable** button on the right.
In the **Mapping** section, fill in the subtable name in the target supertable, for example: `t_${groupid}`.
Click **Preview** to see the mapping results.
### 5. Completion
Click the **Submit** button to complete the creation of the CSV to TDengine data synchronization task. Go back to the **Data Sources List** page to view the execution status of the task.

View File

@ -0,0 +1,164 @@
---
title: AVEVA Historian
slug: /advanced-features/data-connectors/aveva-historian
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/aveva-historian-01.png';
import imgStep02 from '../../assets/aveva-historian-02.png';
import imgStep03 from '../../assets/aveva-historian-03.png';
import imgStep04 from '../../assets/aveva-historian-04.png';
import imgStep05 from '../../assets/aveva-historian-05.png';
import imgStep06 from '../../assets/aveva-historian-06.png';
import imgStep07 from '../../assets/aveva-historian-07.png';
import imgStep08 from '../../assets/aveva-historian-08.png';
This section explains how to create data migration/data synchronization tasks through the Explorer interface to migrate/synchronize data from AVEVA Historian to the current TDengine cluster.
## Function Overview
AVEVA Historian is an industrial big data analytics software, formerly known as Wonderware. It captures and stores high-fidelity industrial big data, unlocking constrained potential to improve operations.
TDengine can efficiently read data from AVEVA Historian and write it to TDengine for historical data migration or real-time data synchronization.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button on the data writing page to enter the Add Data Source page.
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as: "test_avevaHistorian";
Select **AVEVA Historian** from the **Type** dropdown list.
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown, or click the **+Create New Agent** button on the right.
In the **Target Database** dropdown list, select a target database, or click the **+Create Database** button on the right.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the **Server Address** and **Server Port**.
In the **Authentication** area, fill in the **Username** and **Password**.
Click the **Connectivity Check** button to check if the data source is available.
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Data Collection Information
In the **Collection Configuration** area, fill in the parameters related to the collection task.
#### 4.1. Migrate Data
To perform data migration, configure the following parameters:
Select **migrate** from the **Collection Mode** dropdown list.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In the **Tag Group Size** field, specify the size of the tag group.
In the **Task Start Time** field, enter the start time for the data migration task.
In the **Task End Time** field, enter the end time for the data migration task.
In the **Query Time Window** field, specify a time interval; the data migration task will segment the time window according to this interval.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
#### 4.2. Synchronize Data from the History Table
To synchronize data from the **Runtime.dbo.History** table to TDengine, configure the following parameters:
Select **synchronize** from the **Collection Mode** dropdown list.
In the **Table** field, select **Runtime.dbo.History**.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In the **Tag Group Size** field, specify the size of the tag group.
In the **Task Start Time** field, enter the start time for the data migration task.
In the **Query Time Window** field, specify a time interval; the historical data part will segment according to this time interval.
In the **Real-Time Synchronization Interval** field, specify a time interval for polling real-time data.
In the **Out-of-Order Time Limit** field, specify a time interval; data that arrives later than this interval may be lost during real-time synchronization.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
#### 4.3. Synchronize Data from the Live Table
To synchronize data from the **Runtime.dbo.Live** table to TDengine, configure the following parameters:
Select **synchronize** from the **Collection Mode** dropdown list.
In the **Table** field, select **Runtime.dbo.Live**.
In the **Tags** field, enter the list of tags to migrate, separated by commas (,).
In the **Real-Time Synchronization Interval** field, specify a time interval for polling real-time data.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 5. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the AVEVA Historian server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine, and specify the columns to map to the supertable.
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 6. Configure Advanced Options
In the **Advanced Options** area, fill in the parameters related to advanced options.
In the **Maximum Read Concurrency** field, set the maximum read concurrency. The default value is 0, which means auto, automatically configuring the concurrency.
In the **Batch Size** field, set the batch size for each write, that is, the maximum number of messages sent at one time.
In the **Save Raw Data** section, choose whether to save the raw data. The default is no.
When saving raw data, the following two parameters take effect.
In the **Maximum Retention Days** field, set the maximum retention days for the raw data.
In the **Raw Data Storage Directory** field, set the path to save the raw data.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 7. Completion
Click the **Submit** button to complete the task creation. After submitting the task, return to the **Data Writing** page to check the task status.

View File

@ -0,0 +1,140 @@
---
title: MySQL
slug: /advanced-features/data-connectors/mysql
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/mysql-01.png';
import imgStep02 from '../../assets/mysql-02.png';
import imgStep03 from '../../assets/mysql-03.png';
import imgStep04 from '../../assets/mysql-04.png';
import imgStep05 from '../../assets/mysql-05.png';
import imgStep06 from '../../assets/mysql-06.png';
import imgStep07 from '../../assets/mysql-07.png';
import imgStep08 from '../../assets/mysql-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from MySQL to the current TDengine cluster.
## Function Overview
MySQL is one of the most popular relational databases. Many systems have used or are currently using MySQL databases to store data reported by IoT and industrial internet devices. However, as the number of devices connected to these systems continues to grow and user demands for real-time data feedback increase, MySQL can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.0.0, TDengine can efficiently read data from MySQL and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical pain points faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_mysql_01`*.
Select *`MySQL`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source MySQL database connection information`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **User** field, enter the user for the source MySQL database; this user must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source MySQL database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### 5. Configure Connection Options
In the **Character Set** field, set the character set for the connection. The default character set is utf8mb4. MySQL 5.5.3 supports this feature. If you need to connect to an older version, it is recommended to change it to utf8. Optional values include utf8, utf8mb4, utf16, utf32, gbk, big5, latin1, ascii.
In the **SSL Mode** field, set whether to negotiate a secure SSL TCP/IP connection with the server or prioritize how to negotiate it. The default value is PREFERRED. Optional values include DISABLED, PREFERRED, REQUIRED.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source MySQL database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date, e.g., 2024-03-14
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the MySQL server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from MySQL to TDengine. Return to the **Data Source List** page to view the task execution status.

View File

@ -0,0 +1,142 @@
---
title: PostgreSQL
slug: /advanced-features/data-connectors/postgresql
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/postgresql-01.png';
import imgStep02 from '../../assets/postgresql-02.png';
import imgStep03 from '../../assets/postgresql-03.png';
import imgStep04 from '../../assets/postgresql-04.png';
import imgStep05 from '../../assets/postgresql-05.png';
import imgStep06 from '../../assets/postgresql-06.png';
import imgStep07 from '../../assets/postgresql-07.png';
import imgStep08 from '../../assets/postgresql-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from PostgreSQL to the current TDengine cluster.
## Function Overview
PostgreSQL is a powerful open-source client/server relational database management system that has many features found in large commercial RDBMSs, including transactions, subqueries, triggers, views, foreign key referential integrity, and complex locking capabilities.
TDengine can efficiently read data from PostgreSQL and write it to TDengine for historical data migration or real-time data synchronization.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_postgres_01`*.
Select *`PostgreSQL`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source PostgreSQL database connection information`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **User** field, enter the user for the source PostgreSQL database; this user must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source PostgreSQL database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### 5. Configure Connection Options
In the **Application Name** field, set the application name to identify the connecting application.
In the **SSL Mode** field, set whether to negotiate a secure SSL TCP/IP connection with the server or prioritize how to negotiate it. The default value is PREFER. Optional values include DISABLE, ALLOW, PREFER, and REQUIRE.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source PostgreSQL database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date, e.g., 2024-03-14
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the PostgreSQL server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from PostgreSQL to TDengine. Return to the **Data Source List** page to view the task execution status.

View File

@ -0,0 +1,131 @@
---
title: Oracle Database
slug: /advanced-features/data-connectors/oracle-database
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/oracle-database-01.png';
import imgStep02 from '../../assets/oracle-database-02.png';
import imgStep03 from '../../assets/oracle-database-03.png';
import imgStep04 from '../../assets/oracle-database-04.png';
import imgStep05 from '../../assets/oracle-database-05.png';
import imgStep06 from '../../assets/oracle-database-06.png';
import imgStep07 from '../../assets/oracle-database-07.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from Oracle to the current TDengine cluster.
## Function Overview
The Oracle database system is one of the most popular relational database management systems in the world, known for its good portability, ease of use, and powerful features, suitable for various large, medium, and small computing environments. It is an efficient, reliable database solution capable of handling high throughput.
TDengine can efficiently read data from Oracle and write it to TDengine for historical data migration or real-time data synchronization.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_oracle_01`*.
Select *`Oracle`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source Oracle database connection information`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **User** field, enter the user for the source Oracle database; this user must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source Oracle database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source Oracle database.
### 5. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date; however, Oracle does not have a pure date type, so it will include zero hours, minutes, and seconds, e.g., 2024-03-14 00:00:00. Therefore, care must be taken when using `date <= ${end_date}`; it should not include data from the day of 2024-03-14.
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
### 6. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the Oracle server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 7. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 8. Completion
Click the **Submit** button to complete the creation of the data synchronization task from Oracle to TDengine. Return to the **Data Source List** page to view the task execution status.

View File

@ -0,0 +1,147 @@
---
title: Microsoft SQL Server
sidebar_label: SQL Server
slug: /advanced-features/data-connectors/sql-server
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/sql-server-01.png';
import imgStep02 from '../../assets/sql-server-02.png';
import imgStep03 from '../../assets/sql-server-03.png';
import imgStep04 from '../../assets/sql-server-04.png';
import imgStep05 from '../../assets/sql-server-05.png';
import imgStep06 from '../../assets/sql-server-06.png';
import imgStep07 from '../../assets/sql-server-07.png';
import imgStep08 from '../../assets/sql-server-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from Microsoft SQL Server to the current TDengine cluster.
## Function Overview
Microsoft SQL Server is one of the most popular relational databases. Many systems have used or are currently using Microsoft SQL Server databases to store data reported by IoT and industrial IoT devices. However, as the number of devices connected to the system increases and users' demands for real-time data feedback grow, Microsoft SQL Server can no longer meet business needs. Starting from TDengine Enterprise Edition 3.3.2.0, TDengine can efficiently read data from Microsoft SQL Server and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical challenges faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper left corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as *`test_mssql_01`*.
Select *`Microsoft SQL Server`* from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can first click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source Microsoft SQL Server database connection information`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **User** field, enter the user for the source Microsoft SQL Server database; this user must have read permissions in the organization.
In the **Password** field, enter the login password for the user in the source Microsoft SQL Server database.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### 5. Configure Connection Options
In the **Instance Name** field, set the Microsoft SQL Server instance name (the instance name defined in SQL Browser, only available on Windows platforms; if specified, the port will be replaced by the value returned from SQL Browser).
In the **Application Name** field, set the application name used to identify the connecting application.
In the **Encryption** field, set whether to use an encrypted connection. The default value is Off. The options are Off, On, NotSupported, and Required.
In the **Trust Server Certificate** field, set whether to trust the server certificate; if enabled, the server certificate will not be validated and will be accepted as is (if trust is enabled, the **Trust Certificate CA** field below will be hidden).
In the **Trust Certificate CA** field, set whether to trust the server's certificate CA. If a CA file is uploaded, the server certificate will be verified against the provided CA certificate in addition to the system trust store.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source Microsoft SQL Server database.
### 6. Configure SQL Query
The **Subtable Fields** are used to split the subtable fields. It is a `select distinct` SQL statement that queries unique combinations of specified fields, typically corresponding to tags in the transform section:
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **SQL Template**; otherwise, the expected effect cannot be achieved. Here are usage examples:
1. Fill in the subtable fields with the statement `select distinct col_name1, col_name2 from table`, indicating that the fields col_name1 and col_name2 from the source table will be used to split the subtables of the target supertable.
2. In the **SQL Template**, add placeholders for the subtable fields, such as `${col_name1} and ${col_name2}` in `select * from table where ts >= ${start} and ts < ${end} and ${col_name1} and ${col_name2}`.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
The **SQL Template** is an SQL statement template used for querying. The SQL statement must include time range conditions, and the start and end times must appear in pairs. The time range defined in the SQL statement template consists of a column representing time from the source database and the placeholders defined below.
SQL uses different placeholders to represent different time format requirements, specifically the following placeholder formats:
1. `${start}` and `${end}`: Represent RFC3339 formatted timestamps, e.g., 2024-03-14T08:00:00+0800
2. `${start_no_tz}` and `${end_no_tz}`: Represent RFC3339 strings without timezone: 2024-03-14T08:00:00
3. `${start_date}` and `${end_date}`: Represent only the date; however, only `datetime2` and `datetimeoffset` support using start/end queries, while `datetime` and `smalldatetime` can only use start_no_tz/end_no_tz queries, and `timestamp` cannot be used as a query condition.
To solve the problem of migration data disorder, sorting conditions should be added to the query statement, such as `order by ts asc`.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 7. Configure Data Mapping
In the **Data Mapping** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the Microsoft SQL Server.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable.
Click **Preview** to view the mapping results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from Microsoft SQL Server to TDengine. Return to the **Data Source List** page to view the task execution status.

View File

@ -0,0 +1,163 @@
---
title: MongoDB
slug: /advanced-features/data-connectors/mongodb
---
import Image from '@theme/IdealImage';
import imgStep01 from '../../assets/mongodb-01.png';
import imgStep02 from '../../assets/mongodb-02.png';
import imgStep03 from '../../assets/mongodb-03.png';
import imgStep04 from '../../assets/mongodb-04.png';
import imgStep05 from '../../assets/mongodb-05.png';
import imgStep06 from '../../assets/mongodb-06.png';
import imgStep07 from '../../assets/mongodb-07.png';
import imgStep08 from '../../assets/mongodb-08.png';
This section explains how to create data migration tasks through the Explorer interface to migrate data from MongoDB to the current TDengine cluster.
## Function Overview
MongoDB is a product that sits between relational and non-relational databases and is widely used in various fields such as content management systems, mobile applications, and the Internet of Things. Starting from TDengine Enterprise Edition 3.3.3.0, TDengine can efficiently read data from MongoDB and write it to TDengine for historical data migration or real-time data synchronization, addressing the technical challenges faced by businesses.
## Creating a Task
### 1. Add a Data Source
Click the **+Add Data Source** button in the upper right corner of the data writing page to enter the Add Data Source page, as shown below:
<figure>
<Image img={imgStep01} alt=""/>
</figure>
### 2. Configure Basic Information
In the **Name** field, enter a task name, such as `test_mongodb_01`.
Select `MongoDB` from the **Type** dropdown list, as shown below (the fields on the page will change after selection).
The **Agent** field is optional; if needed, you can select a specified agent from the dropdown or click the **+Create New Agent** button on the right to create a new agent.
The **Target Database** field is required; you can select a specified database from the dropdown or click the **+Create Database** button on the right to create a new database.
<figure>
<Image img={imgStep02} alt=""/>
</figure>
### 3. Configure Connection Information
In the **Connection Configuration** area, fill in the *`source MongoDB database connection information`*, as shown below:
<figure>
<Image img={imgStep03} alt=""/>
</figure>
### 4. Configure Authentication Information
In the **User** field, enter the user for the source MongoDB database; this user must have read permissions in the MongoDB system.
In the **Password** field, enter the login password for the user in the source MongoDB database.
In the **Authentication Database** field, enter the database in MongoDB that stores user information, which defaults to admin.
<figure>
<Image img={imgStep04} alt=""/>
</figure>
### 5. Configure Connection Options
In the **Application Name** field, set the application name used to identify the connecting application.
In the **SSL Certificate** field, set whether to use an encrypted connection, which is off by default. If enabled, you need to upload the following two files:
1. **CA File**: Upload the SSL encrypted certificate authorization file.
2. **Certificate File**: Upload the SSL encrypted certificate file.
<figure>
<Image img={imgStep05} alt=""/>
</figure>
Then click the **Check Connectivity** button; users can click this button to check if the information filled in above can successfully retrieve data from the source MongoDB database.
### 6. Configure Data Query
In the **Database** field, specify the source database in MongoDB, and you can use placeholders for dynamic configuration, such as `database_${Y}`. See the table below for the available placeholders.
In the **Collection** field, specify the collection in MongoDB, and you can also use placeholders for dynamic configuration, such as `collection_${md}`. See the table below for the available placeholders.
| Placeholder | Description | Example Data |
| :---------: | :----------------------------------------------------------: | :----------: |
| Y | Complete year in Gregorian calendar, zero-padded 4-digit integer | 2024 |
| y | Year in Gregorian calendar divided by 100, zero-padded 2-digit integer | 24 |
| M | Integer month (1 - 12) | 1 |
| m | Integer month (01 - 12) | 01 |
| B | Full name of the month in English | January |
| b | Abbreviation of the month in English (3 letters) | Jan |
| D | Numeric representation of the date (1 - 31) | 1 |
| d | Numeric representation of the date (01 - 31) | 01 |
| J | Day of the year (1 - 366) | 1 |
| j | Day of the year (001 - 366) | 001 |
| F | Equivalent to `${Y}-${m}-${d}` | 2024-01-01 |
The **Subtable Fields** are used to split the subtable fields, which typically correspond to tags in the transform section. Multiple fields are separated by commas, e.g., `col_name1,col_name2`.
This configuration is primarily aimed at solving the problem of data migration disorder and needs to be used in conjunction with the **Query Template**; otherwise, the expected effect cannot be achieved. Usage examples:
1. Configure two subtable fields `col_name1,col_name2`.
2. In the **Query Template**, add placeholders for the subtable fields, for example, `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}, ${col_name1}, ${col_name2}}` where `${col_name1}` and `${col_name2}` are the placeholders.
3. In the **transform** section, configure the tag mappings for `col_name1` and `col_name2`.
The **Query Template** is used for querying data. It must be in JSON format and must include time range conditions, with start and end times appearing in pairs. The defined time range in the template is composed of a column representing time from the source database and the placeholders defined below.
Using different placeholders represents different time format requirements, specifically the following placeholder formats:
1. `${start_datetime}` and `${end_datetime}`: Correspond to filtering based on backend datetime type fields, e.g., `{"ddate":{"$gte":${start_datetime},"$lt":${end_datetime}}}` will be converted to `{"ddate":{"$gte":{"$date":"2024-06-01T00:00:00+00:00"},"$lt":{"$date":"2024-07-01T00:00:00"}}}`
2. `${start_timestamp}` and `${end_timestamp}`: Correspond to filtering based on backend timestamp type fields, e.g., `{"ttime":{"$gte":${start_timestamp},"$lt":${end_timestamp}}}` will be converted to `{"ttime":{"$gte":{"$timestamp":{"t":123,"i":456}},"$lt":{"$timestamp":{"t":123,"i":456}}}}`
In the **Query Sorting** field, specify sorting conditions for executing the query in JSON format. It must comply with MongoDB's sorting condition format. Example usages:
1. `{"createtime":1}`: Returns MongoDB query results sorted by `createtime` in ascending order.
2. `{"createdate":1, "createtime":1}`: Returns MongoDB query results sorted by `createdate` in ascending order, followed by `createtime` in ascending order.
**Start Time** is the starting time for migrating data; this is a required field.
**End Time** is the end time for migrating data and can be left blank. If set, the migration task will complete automatically when it reaches the end time; if left blank, it will continuously synchronize real-time data, and the task will not automatically stop.
**Query Interval** is the time interval for segmenting queries. The default is 1 day. To avoid querying an excessive amount of data, a sub-task for data synchronization will query data by time segments according to the query interval.
**Delay Duration** is an integer range from 1 to 30; to avoid the loss of delayed written data in real-time synchronization scenarios, each synchronization task will read data before the specified delay duration.
<figure>
<Image img={imgStep06} alt=""/>
</figure>
### 7. Configure Data Mapping
In the **Payload Transformation** area, fill in the parameters related to data mapping.
Click the **Retrieve from Server** button to get sample data from the MongoDB server.
In the **Parsing** section, choose from JSON/Regex/UDT parsing rules for the raw message body; after configuration, click the **Preview** button on the right to view the parsing results.
In the **Extract or Split from Columns** section, fill in the fields to extract or split from the message body. For example, split the `vValue` field into `vValue_0` and `vValue_1` using the split extractor, specifying `,` as the separator and `2` for the number. After configuration, click the **Preview** button on the right to view the transformation results.
In the **Filtering** section, enter filtering conditions; for example, entering `Value > 0` means that only data where Value is greater than 0 will be written to TDengine. After configuration, click the **Preview** button on the right to view the filtering results.
In the **Mapping** section, select the supertable to map to TDengine and specify the columns to map to the supertable. After configuration, click the **Preview** button on the right to view the mapping results.
<figure>
<Image img={imgStep07} alt=""/>
</figure>
### 8. Configure Advanced Options
The **Advanced Options** area is folded by default; click the `>` button on the right to expand, as shown below:
**Maximum Read Concurrency** limits the number of connections to the data source or the number of reading threads. Modify this parameter when the default parameters do not meet your needs or when you need to adjust resource usage.
**Batch Size** is the maximum number of messages or rows sent at one time. The default is 10,000.
<figure>
<Image img={imgStep08} alt=""/>
</figure>
### 9. Completion
Click the **Submit** button to complete the creation of the data synchronization task from MongoDB to TDengine. Return to the **Data Source List** page to view the task execution status.

View File

@ -0,0 +1,314 @@
---
title: Data Connectors
slug: /advanced-features/data-connectors
---
import Image from '@theme/IdealImage';
import imgZeroCode from '../../assets/data-connectors-01.png';
import imgSampleData from '../../assets/data-connectors-02.png';
import imgJsonParsing from '../../assets/data-connectors-03.png';
import imgRegexParsing from '../../assets/data-connectors-04.png';
import imgResults from '../../assets/data-connectors-05.png';
import imgSplit from '../../assets/data-connectors-06.png';
## Overview
TDengine Enterprise is equipped with a powerful visual data management tool—taosExplorer. With taosExplorer, users can easily configure tasks in their browsers to seamlessly import data from various sources into TDengine with zero code. During the import process, TDengine automatically extracts, filters, and transforms data to ensure its quality. This zero-code data source access approach has successfully transformed TDengine into an outstanding time-series big data aggregation platform. Users do not need to deploy additional ETL tools, significantly simplifying the overall architecture design and improving data processing efficiency.
The following figure illustrates the system architecture of the zero-code access platform.
<figure>
<Image img={imgZeroCode} alt="Zero-code access platform"/>
<figcaption>Figure 1. Zero-code access platform</figcaption>
</figure>
## Supported Data Sources
Currently, TDengine supports the following data sources:
1. Aveva PI System: An industrial data management and analysis platform, formerly known as OSIsoft PI System, which can collect, integrate, analyze, and visualize industrial data in real-time, helping enterprises achieve intelligent decision-making and refined management.
2. Aveva Historian: An industrial big data analysis software, formerly known as Wonderware Historian, designed for industrial environments to store, manage, and analyze real-time and historical data from various industrial devices and sensors.
3. OPC DA/UA: OPC stands for Open Platform Communications, an open and standardized communication protocol used for data exchange between automation devices from different vendors. It was initially developed by Microsoft to address interoperability issues among different devices in the industrial control field. The OPC protocol was first released in 1996 as OPC DA (Data Access), primarily for real-time data collection and control. In 2006, the OPC Foundation released the OPC UA (Unified Architecture) standard, a service-oriented, object-oriented protocol with greater flexibility and scalability, which has become the mainstream version of the OPC protocol.
4. MQTT: Short for Message Queuing Telemetry Transport, a lightweight communication protocol based on a publish/subscribe model, designed for low-overhead, low-bandwidth instant messaging, widely used in IoT, small devices, mobile applications, and other fields.
5. Kafka: An open-source stream processing platform developed by the Apache Software Foundation, primarily used for processing real-time data and providing a unified, high-throughput, low-latency messaging system. It features high speed, scalability, persistence, and a distributed design, allowing it to handle hundreds of thousands of read and write operations per second, supporting thousands of clients while maintaining data reliability and availability.
6. OpenTSDB: A distributed and scalable time-series database based on HBase. It is mainly used to store, index, and provide metrics data collected from large-scale clusters (including network devices, operating systems, applications, etc.), making it easier to access and visualize this data.
7. CSV: Short for Comma Separated Values, a plain text file format that uses commas to separate values, typically used in spreadsheet or database software.
8. TDengine 2: Refers to instances of TDengine running version 2.x.
9. TDengine 3: Refers to instances of TDengine running version 3.x.
10. Relational databases such as MySQL, PostgreSQL, and Oracle.
## Data Extraction, Filtering, and Transformation
Since there can be multiple data sources, the physical units, naming conventions, and time zones may vary. To address this issue, TDengine has built-in ETL capabilities to parse and extract the necessary data from data packets and perform filtering and transformation to ensure the quality of the written data and provide a unified naming space. The specific functionalities are as follows:
1. Parsing: Use JSON Path or regular expressions to parse fields from raw messages.
2. Extracting or Splitting from Columns: Use split or regular expressions to extract multiple fields from a raw field.
3. Filtering: Only messages with a true expression value will be written to TDengine.
4. Transformation: Establish a conversion and mapping relationship between the parsed fields and TDengine supertable fields.
Below are detailed explanations of the data transformation rules.
### Parsing
This step is only required for unstructured data sources. Currently, MQTT and Kafka data sources use the provided rules to parse unstructured data and initially obtain structured data that can be represented as row and column data described by fields. In the explorer, you need to provide sample data and parsing rules to preview the structured data presented in a table format.
#### Sample Data
<figure>
<Image img={imgZeroCode} alt="Sample data"/>
<figcaption>Figure 2. Sample data</figcaption>
</figure>
As shown in the figure, the textarea input box contains the sample data, which can be obtained in three ways:
1. Directly inputting sample data into the textarea.
2. Clicking the button on the right "Retrieve from Server" retrieves sample data from the configured server and appends it to the sample data textarea.
3. Uploading a file to append its content to the sample data textarea.
Each sample data entry ends with a newline character.
#### Parsing
Parsing involves converting unstructured strings into structured data through parsing rules. The current parsing rules for message bodies support JSON, Regex, and UDT.
##### JSON Parsing
JSON parsing supports JSONObject or JSONArray. The following JSON sample data can automatically parse the fields: `groupid`, `voltage`, `current`, `ts`, `inuse`, `location`.
```json
{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}
```
Or
```json
[{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"},
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"},
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}]
```
Subsequent examples will illustrate with JSONObject as an example.
The following nested JSON structure can automatically parse the fields `groupid`, `data_voltage`, `data_current`, `ts`, `inuse`, `location_0_province`, `location_0_city`, `location_0_datun`, and you can also choose which fields to parse and set aliases.
```json
{"groupid": 170001, "data": { "voltage": "221V", "current": 12.3 }, "ts": "2023-12-18T22:12:00", "inuse": true, "location": [{"province": "beijing", "city":"chaoyang", "street": "datun"}]}
```
<figure>
<Image img={imgJsonParsing} alt="JSON parsing"/>
<figcaption>Figure 3. JSON parsing</figcaption>
</figure>
##### Regex Regular Expression
You can use **named capture groups** in regular expressions to extract multiple fields from any string (text) field. As shown in the figure, this extracts the access IP, timestamp, and accessed URL from the nginx log.
```re
(?<ip>\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b)\s-\s-\s\[(?<ts>\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}\s\+\d{4})\]\s"(?<method>[A-Z]+)\s(?<url>[^\s"]+).*(?<status>\d{3})\s(?<length>\d+)
```
<figure>
<Image img={imgRegexParsing} alt="Regex parsing"/>
<figcaption>Figure 4. Regex parsing</figcaption>
</figure>
##### UDT Custom Parsing Script
Custom Rhai syntax scripts can be used to parse input data (refer to `https://rhai.rs/book/`). The script currently only supports raw JSON data.
**Input**: The script can use the parameter data, which is the Object Map after parsing the raw JSON data.
**Output**: The output data must be an array.
For example, for data reporting three-phase voltage values, which are to be entered into three subtables, parsing is required.
```json
{
"ts": "2024-06-27 18:00:00",
"voltage": "220.1,220.3,221.1",
"dev_id": "8208891"
}
```
You can use the following script to extract the three voltage data.
```rhai
let v3 = data["voltage"].split(",");
[
#{"ts": data["ts"], "val": v3[0], "dev_id": data["dev_id"]},
#{"ts": data["ts"], "val": v3[1], "dev_id": data["dev_id"]},
#{"ts": data["ts"], "val": v3[2], "dev_id": data["dev_id"]}
]
```
The final parsed result is as follows:
<figure>
<Image img={imgResults} alt="Parsed results"/>
<figcaption>Figure 5. Parsed results</figcaption>
</figure>
### Extracting or Splitting
The parsed data may not meet the requirements of the target table. For instance, the raw data collected from the smart meter is as follows (in JSON format):
```json
{"groupid": 170001, "voltage": "221V", "current": 12.3, "ts": "2023-12-18T22:12:00", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "220V", "current": 12.2, "ts": "2023-12-18T22:12:02", "inuse": true, "location": "beijing.chaoyang.datun"}
{"groupid": 170001, "voltage": "216V", "current": 12.5, "ts": "2023-12-18T22:12:04", "inuse": false, "location": "beijing.chaoyang.datun"}
```
The voltage parsed using the JSON rules is expressed as a string with units. Ultimately, it is hoped to store the voltage and current values as integers for statistical analysis, which requires further splitting of the voltage; in addition, the date is expected to be split into date and time for storage.
You can use the split rule on the source field `ts` to split it into date and time, and use regex to extract the voltage value and unit from the `voltage` field. The split rule requires setting the **delimiter** and **number of splits**, and the naming convention for the split fields is `{original_field_name}_{order_number}`, while the Regex rule is the same as in the parsing process, using **named capture groups** to name the extracted fields.
### Filtering
The filtering function allows you to set filtering conditions so that only rows of data meeting the conditions will be written to the target table. The result of the filtering condition expression must be of boolean type. Before writing filtering conditions, you must determine the type of the parsed fields, and based on the type, you can use judgment functions and comparison operators (`>`, `>=`, `<=`, `<`, `==`, `!=`) for judgment.
#### Field Types and Conversion
Only by clearly defining the type of each parsed field can you use the correct syntax for data filtering.
Fields parsed using JSON rules automatically set types based on their attribute values:
1. bool type: `"inuse": true`
2. int type: `"voltage": 220`
3. float type: `"current" : 12.2`
4. String type: `"location": "MX001"`
Data parsed using regex rules are all of string type.
Data extracted or split using split and regex rules are of string type.
If the extracted data type does not match the expected type, you can perform type conversion. Common type conversions involve converting strings to numeric types. The supported conversion functions are as follows:
|Function|From type|To type|e.g.|
|:----|:----|:----|:----|
| parse_int | string | int | parse_int("56") // Resulting integer 56 |
| parse_float | string | float | parse_float("12.3") // Resulting float 12.3 |
#### Judgment Expressions
Different data types have their respective ways of writing judgment expressions.
##### BOOL Type
You can use variables or the operator `!`. For example, for the field `"inuse": true`, you can write the following expressions:
> 1. inuse
> 2. !inuse
##### Numeric Types (int/float)
Numeric types support the comparison operators `==`, `!=`, `>`, `>=`, `<`, `<=`.
##### String Types
Use comparison operators to compare strings.
String Functions
|Function|Description|e.g.|
|:----|:----|:----|
| is_empty | returns true if the string is empty | s.is_empty() |
| contains | checks if a certain character or sub-string occurs in the string | s.contains("substring") |
| starts_with | returns true if the string starts with a certain string | s.starts_with("prefix") |
| ends_with | returns true if the string ends with a certain string | s.ends_with("suffix") |
| len | returns the number of characters (not number of bytes) in the stringmust be used with comparison operator | s.len == 5 Determines whether the string length is 5; len as an attribute returns int, different from the first four functions, which directly return bool. |
##### Composite Expressions
Multiple judgment expressions can be combined using logical operators (&&, ||, !).
For example, the following expression retrieves the data from smart meters installed in Beijing with a voltage greater than 200.
> location.starts_with("beijing") && voltage > 200
### Mapping
Mapping refers to matching the **source fields** parsed, extracted, and split to the **target table fields**. It can be done directly or calculated through some rules before mapping to the target table.
#### Selecting Target Supertable
After selecting the target supertable, all tags and columns of the supertable will be loaded.
Source fields automatically use mapping rules to map to the target supertable's tags and columns based on their names.
For example, there is preview data after parsing, extracting, and splitting as follows:
#### Mapping Rules
The supported mapping rules are shown in the table below:
|rule|description|
|:----|:----|
| mapping | Direct mapping, requires selecting the mapping source field.|
| value | Constant; you can input string constants or numeric constants, and the constant value will be directly stored.|
| generator | Generator, currently only supports timestamp generator now, which will store the current time.|
| join | String concatenator, can specify connection characters to concatenate multiple source fields.|
| format | **String formatting tool**, fill in the formatting string. For example, if there are three source fields year, month, day representing year, month, and day respectively, and you want to store the date in yyyy-MM-dd format, you can provide the formatting string as `${year}-${month}-${day}`. Here `${}` serves as a placeholder, which can be a source field or a function processing a string-type field.|
| sum | Select multiple numeric fields for addition.|
| expr | **Numeric operation expression**, can perform more complex function processing and mathematical operations on numeric fields.|
##### Supported String Processing Functions in Format
|Function|description|e.g.|
|:----|:----|:----|
| pad(len, pad_chars) | pads the string with a character or a string to at least a specified length | "1.2".pad(5, '0') // Resulting "1.200" |
|trim|trims the string of whitespace at the beginning and end|" abc ee ".trim() // Resulting "abc ee"|
|sub_string(start_pos, len)|extracts a sub-stringtwo parameters:<br />1. start position, counting from end if < 0<br />2. (optional) number of characters to extract, none if ≤ 0, to end if omitted|"012345678".sub_string(5) // "5678"<br />"012345678".sub_string(5, 2) // "56"<br />"012345678".sub_string(-2) // "78"|
|replace(substring, replacement)|replaces a sub-string with another|"012345678".replace("012", "abc") // "abc345678"|
##### expr Numeric Calculation Expressions
Basic mathematical operations support addition `+`, subtraction `-`, multiplication `*`, and division `/`.
For example, if the data source collects values in degrees, and the target database wants to store the temperature value in Fahrenheit, then the temperature data needs to be converted.
The source field parsed is `temperature`, and the expression `temperature * 1.8 + 32` should be used.
Numeric expressions also support mathematical functions, and the available mathematical functions are shown in the table below:
|Function|description|e.g.|
|:----|:----|:----|
|sin、cos、tan、sinh、cosh|Trigonometry|a.sin() |
|asin、acos、atan、 asinh、acosh|arc-trigonometry|a.asin()|
|sqrt|Square root|a.sqrt() // 4.sqrt() == 2|
|exp|Exponential|a.exp()|
|ln、log|Logarithmic|a.ln() // e.ln() == 1<br />a.log() // 10.log() == 1|
|floor、ceiling、round、int、fraction|rounding|a.floor() // (4.2).floor() == 4<br />a.ceiling() // (4.2).ceiling() == 5<br />a.round() // (4.2).round() == 4<br />a.int() // (4.2).int() == 4<br />a.fraction() // (4.2).fraction() == 0.2|
#### Subtable Name Mapping
The subtable name is a string type, and you can define the subtable name using the string formatting format expression in the mapping rules.
## Task Creation
Taking the MQTT data source as an example, this section describes how to create an MQTT-type task to consume data from the MQTT Broker and write it into TDengine.
1. Log in to taosExplorer, then click on "Data Writing" in the left navigation bar to enter the task list page.
2. On the task list page, click "+ Add Data Source" to enter the task creation page.
3. After entering the task name, select the type as MQTT, and then you can create a new agent or select an existing agent.
4. Enter the IP address and port number of the MQTT broker, for example: 192.168.1.100:1883.
5. Configure authentication and SSL encryption:
- If the MQTT broker has user authentication enabled, enter the username and password of the MQTT broker in the authentication section.
- If the MQTT broker has SSL encryption enabled, you can turn on the SSL certificate switch on the page and upload the CA certificate, as well as the client certificate and private key files.
6. In the "Acquisition Configuration" section, you can choose the version of the MQTT protocol, currently supporting versions 3.1, 3.1.1, and 5.0. When configuring the Client ID, note that if you create multiple tasks for the same MQTT broker, the Client IDs must be different; otherwise, it will cause Client ID conflicts, preventing the tasks from running properly. When configuring topics and QoS, you need to use the format `<topic name>::<QoS>`, where two colons separate the subscribed topic from QoS, with QoS values being 0, 1, or 2, representing at most once, at least once, and exactly once, respectively. After completing the above configuration, you can click the "Check Connectivity" button to check the configuration. If the connectivity check fails, please modify it according to the specific error prompts returned on the page.
7. During the synchronization of data from the MQTT broker, taosX also supports extracting, filtering, and mapping fields in the message body. In the text box below "Payload Transformation", you can directly input sample messages or upload files. In the future, it will also support directly retrieving sample messages from the configured server.
8. Currently, there are two ways to extract fields from the message body: JSON and regular expressions. For simple key/value format JSON data, you can directly click the extract button to display the parsed field names. For complex JSON data, you can use JSON Path to extract the fields of interest. When using regular expressions to extract fields, ensure the correctness of the regular expressions.
9. After the fields in the message body are parsed, you can set filtering rules based on the parsed field names. Only data meeting the filtering rules will be written to TDengine; otherwise, the message will be ignored. For example, you can configure the filtering rule as voltage > 200, meaning only data with voltage greater than 200V will be synchronized to TDengine.
10. Finally, after configuring the mapping rules between the fields in the message body and those in the supertable, you can submit the task. Besides basic mapping, you can also convert the values of fields in the message, for example, you can use expressions (expr) to calculate power from the voltage and current in the original message body before writing them into TDengine.
11. Once the task is submitted, you will be automatically returned to the task list page. If the submission is successful, the task status will switch to "Running." If the submission fails, you can check the task's activity log to find the error cause.
12. For running tasks, clicking the metrics view button allows you to see detailed running metrics for the task. The pop-up window is divided into two tabs, displaying the accumulated metrics from multiple runs of the task and the metrics for the current run. These metrics will automatically refresh every two seconds.
## Task Management
On the task list page, you can also start, stop, view, delete, copy, and perform other operations on tasks, as well as check the running status of each task, including the number of records written, traffic, etc.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```

View File

@ -0,0 +1,14 @@
---
title: Advanced Features
description: 'TDengine Advanced Features'
slug: /advanced-features
---
This section describes the advanced features of TDengine, including data subscription, caching, and stream processing; edgecloud orchestration; and connectors for various data sources.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```

View File

@ -0,0 +1,741 @@
---
title: Connecting to TDengine
slug: /developer-guide/connecting-to-tdengine
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import Image from '@theme/IdealImage';
import imgConnect from '../assets/connecting-to-tdengine-01.png';
import ConnJava from "./_connect_java.mdx";
import ConnGo from "./_connect_go.mdx";
import ConnRust from "./_connect_rust.mdx";
import ConnNode from "./_connect_node.mdx";
import ConnPythonNative from "./_connect_python.mdx";
import ConnCSNative from "./_connect_cs.mdx";
import ConnC from "./_connect_c.mdx";
import InstallOnLinux from "../14-reference/05-connector/_linux_install.mdx";
import InstallOnWindows from "../14-reference/05-connector/_windows_install.mdx";
import InstallOnMacOS from "../14-reference/05-connector/_macos_install.mdx";
import VerifyLinux from "../14-reference/05-connector/_verify_linux.mdx";
import VerifyMacOS from "../14-reference/05-connector/_verify_macos.mdx";
import VerifyWindows from "../14-reference/05-connector/_verify_windows.mdx";
TDengine provides a rich set of application development interfaces. To facilitate users in quickly developing their applications, TDengine supports various programming language connectors, including official connectors for C/C++, Java, Python, Go, Node.js, C#, Rust, Lua (community-contributed), and PHP (community-contributed). These connectors support connecting to TDengine clusters using native interfaces (taosc) and REST interfaces (not supported by some languages). Community developers have also contributed several unofficial connectors, such as the ADO.NET connector, Lua connector, and PHP connector. Additionally, TDengine can directly call the REST API provided by taosadapter for data writing and querying operations.
## Connection Methods
TDengine provides three methods for establishing connections:
1. Directly connect to the server program taosd using the client driver taosc; this method is referred to as "native connection."
2. Establish a connection to taosd via the REST API provided by the taosAdapter component; this method is referred to as "REST connection."
3. Establish a connection to taosd via the WebSocket API provided by the taosAdapter component; this method is referred to as "WebSocket connection."
<figure>
<Image img={imgConnect} alt="Connecting to TDengine"/>
<figcaption>Figure 1. Connecting to TDengine</figcaption>
</figure>
Regardless of the method used to establish a connection, the connectors provide similar API operations for databases and can execute SQL statements. The only difference lies in how the connection is initialized, and users should not notice any difference in usage. For various connection methods and language connector support, refer to: [Feature Support](../../tdengine-reference/client-libraries/#feature-support).
Key differences include:
1. For the native connection, it is necessary to ensure that the client driver taosc and the TDengine server version are compatible.
2. With the REST connection, users do not need to install the client driver taosc, which offers cross-platform ease of use; however, users cannot experience features like data subscription and binary data types. Moreover, REST connections have the lowest performance compared to native and WebSocket connections. The REST API is stateless, and when using REST connections, users must specify the database name for tables and supertables in SQL.
3. For the WebSocket connection, users also do not need to install the client driver taosc.
4. To connect to cloud service instances, users must use REST or WebSocket connections.
**It is recommended to use WebSocket connections.**
## Install the Client Driver taosc
If you choose the native connection and the application is not running on the same server as TDengine, you need to install the client driver first; otherwise, this step can be skipped. To avoid incompatibility between the client driver and server, please use the same version.
### Installation Steps
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<InstallOnLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<InstallOnWindows />
</TabItem>
<TabItem value="macos" label="macOS">
<InstallOnMacOS />
</TabItem>
</Tabs>
### Installation Verification
After the installation and configuration are complete, and ensuring that the TDengine service is running normally, you can execute the TDengine command-line program taos included in the installation package to log in.
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<VerifyLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<VerifyWindows />
</TabItem>
<TabItem value="macos" label="macOS">
<VerifyMacOS />
</TabItem>
</Tabs>
## Install Connectors
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
If you are using Maven to manage the project, simply add the following dependency to the pom.xml.
```xml
<dependency>
<groupId>com.taosdata.jdbc</groupId>
<artifactId>taos-jdbcdriver</artifactId>
<version>3.4.0</version>
</dependency>
```
</TabItem>
<TabItem label="Python" value="python">
- **Installation Prerequisites**
- Install Python. The latest version of the taospy package requires Python 3.6.2+. Earlier versions require Python 3.7+. The taos-ws-py package requires Python 3.7+. If Python is not already installed on your system, refer to the [Python BeginnersGuide](https://wiki.python.org/moin/BeginnersGuide/Download) for installation.
- Install [pip](https://pypi.org/project/pip/). Most Python installation packages come with the pip tool. If not, refer to the [pip documentation](https://pip.pypa.io/en/stable/installation/) for installation.
- If you are using the native connection, you also need to [install the client driver](#install-the-client-driver-taosc). The client software includes the TDengine client dynamic link library (libtaos.so or taos.dll) and the TDengine CLI.
- **Install using pip**
- Uninstall old versions
If you previously installed an older version of the Python connector, please uninstall it first.
```shell
pip3 uninstall taos taospy
pip3 uninstall taos taos-ws-py
```
- Install `taospy`
- Latest version
```shell
pip3 install taospy
```
- Install a specific version
```shell
pip3 install taospy==2.3.0
```
- Install from GitHub
```shell
pip3 install git+https://github.com/taosdata/taos-connector-python.git
```
:::note This installation package is for the native connector.
- Install `taos-ws-py`
```shell
pip3 install taos-ws-py
```
:::
:::note This installation package is for the WebSocket connector.
- Install both `taospy` and `taos-ws-py`
```shell
pip3 install taospy[ws]
```
:::
- **Installation Verification**
<Tabs defaultValue="rest">
<TabItem value="native" label="Native Connection">
For the native connection, verify that both the client driver and the Python connector are correctly installed. If you can successfully import the `taos` module, it indicates that the client driver and Python connector have been correctly installed. You can type the following in the Python interactive shell:
```python
import taos
```
</TabItem>
<TabItem value="rest" label="REST Connection">
For the REST connection, you only need to verify whether you can successfully import the `taosrest` module. You can type the following in the Python interactive shell:
```python
import taosrest
```
</TabItem>
<TabItem value="ws" label="WebSocket Connection">
For the WebSocket connection, you only need to verify whether you can successfully import the `taosws` module. You can type the following in the Python interactive shell:
```python
import taosws
```
</TabItem>
</Tabs>
</TabItem>
<TabItem label="Go" value="go">
Edit `go.mod` to add the `driver-go` dependency.
```go-mod title=go.mod
module goexample
go 1.17
require github.com/taosdata/driver-go/v3 latest
```
:::note
driver-go uses cgo to wrap the taosc API. cgo requires GCC to compile C source code. Therefore, ensure that your system has GCC installed.
:::
</TabItem>
<TabItem label="Rust" value="rust">
Edit `Cargo.toml` to add the `taos` dependency.
```toml title=Cargo.toml
[dependencies]
taos = { version = "*"}
```
:::info
The Rust connector differentiates between connection methods through different features. By default, it supports both native and WebSocket connections. If you only need to establish a WebSocket connection, you can set the `ws` feature:
```toml
taos = { version = "*", default-features = false, features = ["ws"] }
```
:::
</TabItem>
<TabItem label="Node.js" value="node">
- **Installation Prerequisites**
- Install the Node.js development environment, using version 14 or higher. [Download link](https://nodejs.org/en/download/)
- **Installation**
- Install the Node.js connector using npm
```shell
npm install @tdengine/websocket
```
:::note Node.js currently only supports WebSocket connections.
- **Installation Verification**
- Create an installation verification directory, for example: `~/tdengine-test`, and download the [nodejsChecker.js source code](https://github.com/taosdata/TDengine/tree/main/docs/examples/node/websocketexample/nodejsChecker.js) from GitHub to your local machine.
- Execute the following command in the terminal.
```bash
npm init -y
npm install @tdengine/websocket
node nodejsChecker.js
```
- After executing the above steps, the command line will output the results of connecting to the TDengine instance and performing a simple insert and query.
</TabItem>
<TabItem label="C#" value="csharp">
Add the reference for [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) in the project configuration file:
```xml title=csharp.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<StartupObject>TDengineExample.AsyncQueryExample</StartupObject>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="TDengine.Connector" Version="3.1.0" />
</ItemGroup>
</Project>
```
You can also add it using the dotnet command:
```shell
dotnet add package TDengine.Connector
```
:::note
The following example code is based on dotnet 6.0; if you are using other versions, you may need to make appropriate adjustments.
:::
</TabItem>
<TabItem label="C" value="c">
If you have already installed the TDengine server software or the TDengine client driver taosc, then the C connector is already installed, and no additional action is required.
</TabItem>
<TabItem label="REST API" value="rest">
Using the REST API to access TDengine does not require the installation of any drivers or connectors.
</TabItem>
</Tabs>
## Establishing Connections
Before executing this step, ensure that there is a running and accessible TDengine instance, and that the server's FQDN is configured correctly. The following example code assumes that TDengine is installed on the local machine, with the FQDN (default localhost) and serverPort (default 6030) using default configurations.
### Connection Parameters
There are many configuration items for the connection. Before establishing the connection, we can introduce the parameters used by each language connector to establish a connection.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
The parameters for establishing a connection with the Java connector include URL and Properties.
The standard format for TDengine's JDBC URL is: `jdbc:[TAOS|TAOS-RS]://[host_name]:[port]/[database_name]?[user={user}|&password={password}|&charset={charset}|&cfgdir={config_dir}|&locale={locale}|&timezone={timezone}|&batchfetch={batchfetch}]`
For detailed parameter descriptions of URL and Properties, and how to use them, refer to [URL Specification](../../tdengine-reference/client-libraries/java/#url-specification).
**Note**: Adding the `batchfetch` parameter and setting it to true in REST connections will enable the WebSocket connection.
</TabItem>
<TabItem label="Python" value="python">
The Python connector uses the `connect()` method to establish a connection. The specific descriptions of the connection parameters are as follows:
- url: The URL of the `taosAdapter` REST service. The default is port `6041` on `localhost`.
- user: The TDengine username. The default is `root`.
- password: The TDengine user password. The default is `taosdata`.
- timeout: The HTTP request timeout, in seconds. The default is `socket._GLOBAL_DEFAULT_TIMEOUT`, which usually does not need to be configured.
</TabItem>
<TabItem label="Go" value="go">
The data source name has a general format, such as [PEAR DB](http://pear.php.net/manual/en/package.database.db.intro-dsn.php), but without a type prefix (the brackets indicate that it is optional):
``` text
[username[:password]@][protocol[(address)]]/[dbname][?param1=value1&...&paramN=valueN]
```
The complete form of the DSN:
```text
username:password@protocol(address)/dbname?param=value
```
Supported DSN parameters include:
For native connections:
- `cfg`: Specifies the taos.cfg directory.
- `cgoThread`: Specifies the number of cgo threads to execute concurrently, defaulting to the number of system cores.
- `cgoAsyncHandlerPoolSize`: Specifies the size of the async function handler, defaulting to 10,000.
For REST connections:
- `disableCompression`: Whether to accept compressed data; default is true (does not accept compressed data). If using gzip compression for transmission, set to false.
- `readBufferSize`: The size of the read data buffer, defaulting to 4K (4096). This value can be increased for larger query results.
- `token`: The token used when connecting to cloud services.
- `skipVerify`: Whether to skip certificate verification; default is false (does not skip verification). Set to true if connecting to an insecure service.
For WebSocket connections:
- `enableCompression`: Whether to send compressed data; default is false (does not send compressed data). Set to true if using compression.
- `readTimeout`: The read timeout for data, defaulting to 5m.
- `writeTimeout`: The write timeout for data, defaulting to 10s.
</TabItem>
<TabItem label="Rust" value="rust">
The Rust connector uses DSN to create connections. The basic structure of the DSN description string is as follows:
```text
<driver>[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
|------|------------|---|-----------|-----------|------|------|------------|-----------------------|
|driver| protocol | | username | password | host | port | database | params |
```
For detailed DSN explanations and usage, refer to [Connection Functionality](../../tdengine-reference/client-libraries/rust/#connection-functionality).
</TabItem>
<TabItem label="Node.js" value="node">
The Node.js connector uses DSN to create connections. The basic structure of the DSN description string is as follows:
```text
[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
|------------|---|-----------|-----------|------|------|------------|-----------------------|
| protocol | | username | password | host | port | database | params |
```
- **protocol**: Establish a connection using the WebSocket protocol. For example, `ws://localhost:6041`.
- **username/password**: The database username and password.
- **host/port**: The host address and port number. For example, `localhost:6041`.
- **database**: The database name.
- **params**: Other parameters, such as token.
- Complete DSN example:
```js
ws://root:taosdata@localhost:6041
```
</TabItem>
<TabItem label="C#" value="csharp">
The ConnectionStringBuilder sets connection parameters using a key-value approach, where the key is the parameter name and the value is the parameter value, separated by semicolons `;`.
For example:
```csharp
"protocol=WebSocket;host=127.0.0.1;port=6041;useSSL=false"
```
Supported parameters include:
- `host`: The address of the TDengine instance.
- `port`: The port of the TDengine instance.
- `username`: The username for the connection.
- `password`: The password for the connection.
- `protocol`: The connection protocol, with optional values of Native or WebSocket, defaulting to Native.
- `db`: The connected database.
- `timezone`: The timezone, defaulting to the local timezone.
- `connTimeout`: The connection timeout, defaulting to 1 minute.
WebSocket connections also support the following parameters:
- `readTimeout`: The read timeout, defaulting to 5 minutes.
- `writeTimeout`: The send timeout, defaulting to 10 seconds.
- `token`: The token for connecting to TDengine cloud.
- `useSSL`: Whether to use SSL for the connection, defaulting to false.
- `enableCompression`: Whether to enable WebSocket compression, defaulting to false.
- `autoReconnect`: Whether to automatically reconnect, defaulting to false.
- `reconnectRetryCount`: The number of reconnection attempts, defaulting to 3.
- `reconnectIntervalMs`: The reconnection interval in milliseconds, defaulting to 2000.
</TabItem>
<TabItem label="C" value="c">
**WebSocket Connection**
The C/C++ language connector uses the `ws_connect()` function to establish a connection to the TDengine database. Its parameter is a DSN description string with the following basic structure:
```text
<driver>[+<protocol>]://[[<username>:<password>@]<host>:<port>][/<database>][?<p1>=<v1>[&<p2>=<v2>]]
|------|------------|---|-----------|-----------|------|------|------------|-----------------------|
|driver| protocol | | username | password | host | port | database | params |
```
For detailed DSN explanations and usage, refer to [DSN](../../tdengine-reference/client-libraries/cpp/#dsn).
**Native Connection**
The C/C++ language connector uses the `taos_connect()` function to establish a connection to the TDengine database. The detailed parameter descriptions are as follows:
- `host`: The hostname or IP address of the database server to connect to. If it is a local database, you can use `"localhost"`.
- `user`: The username used to log in to the database.
- `passwd`: The password corresponding to the username.
- `db`: The default database name to select when connecting. If not specified, you can pass `NULL` or an empty string.
- `port`: The port number that the database server listens on. The default port number is `6030`.
There is also the `taos_connect_auth()` function for establishing a connection to the TDengine database using an MD5-encrypted password. This function works the same as `taos_connect`, but the difference lies in how the password is handled; `taos_connect_auth` requires the MD5 hash of the password.
</TabItem>
<TabItem label="REST API" value="rest">
When accessing TDengine via the REST API, the application directly establishes an HTTP connection with taosAdapter. It is recommended to use a connection pool to manage connections.
For specific parameters used in the REST API, refer to: [HTTP Request Format](../../tdengine-reference/client-libraries/rest-api/#http-request-format).
</TabItem>
</Tabs>
### WebSocket Connection
Below are code samples for establishing a WebSocket connection using each language connector. They demonstrate how to connect to the TDengine database using the WebSocket connection method and set some parameters for the connection. The entire process mainly involves establishing the database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}}
```
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/connect_websocket_examples.py:connect}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/connect/wsexample/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/restexample/examples/connect.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:createConnect}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsConnect/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c-ws/connect_example.c}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
### Native Connection
Below are code samples for establishing a native connection using each language connector. They demonstrate how to connect to the TDengine database using the native connection method and set some parameters for the connection. The entire process mainly involves establishing the database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JNIConnectExample.java:main}}
```
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/connect/cgoexample/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/connect.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/connect/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
<ConnC />
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
### REST Connection
Below are code samples for establishing a REST connection using each language connector. They demonstrate how to connect to the TDengine database using the REST connection method. The entire process mainly involves establishing the database connection and handling exceptions.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}}
```
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/connect_rest_example.py:connect}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/connect/restexample/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
Not supported
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
Not supported
</TabItem>
<TabItem label="C" value="c">
Not supported
</TabItem>
<TabItem label="REST API" value="rest">
Using the REST API to access TDengine allows the application to independently establish HTTP connections.
</TabItem>
</Tabs>
:::tip
If the connection fails, in most cases, it is due to incorrect FQDN or firewall configuration. For detailed troubleshooting methods, refer to [Frequently Asked Questions](../../frequently-asked-questions/) under "If I encounter the error Unable to establish connection, what should I do?"
:::
## Connection Pool
Some connectors provide connection pools or can work with existing connection pool components. Using a connection pool allows applications to quickly obtain available connections from the pool, avoiding the overhead of creating and destroying connections for each operation. This not only reduces resource consumption but also improves response speed. Additionally, connection pools support connection management, such as limiting the maximum number of connections and checking connection validity, ensuring efficient and reliable use of connections. We **recommend using connection pools to manage connections**.
Below are code samples for connection pool support in various language connectors.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
**HikariCP**
Usage example:
```java
{{#include docs/examples/java/src/main/java/com/taos/example/HikariDemo.java:connection_pool}}
```
> After obtaining a connection via HikariDataSource.getConnection(), you need to call the close() method after use; it does not actually close the connection, but returns it to the pool.
> For more issues related to HikariCP usage, refer to the [official documentation](https://github.com/brettwooldridge/HikariCP).
**Druid**
Usage example:
```java
{{#include docs/examples/java/src/main/java/com/taos/example/DruidDemo.java:connection_pool}}
```
> For more issues related to Druid usage, refer to the [official documentation](https://github.com/alibaba/druid).
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
Using `sql.Open`, the created connection already implements a connection pool. You can set connection pool parameters via the API, as shown below:
```go
{{#include docs/examples/go/connect/connpool/main.go:pool}}
```
</TabItem>
<TabItem label="Rust" value="rust">
In complex applications, it is recommended to enable the connection pool. The [taos] connection pool, by default (asynchronous mode), uses [deadpool] for implementation.
Here is how to generate a connection pool with default parameters.
```rust
let pool: Pool<TaosBuilder> = TaosBuilder::from_dsn("taos:///").unwrap().pool().unwrap();
```
You can also use the pool constructor to set connection pool parameters:
```rust
let pool: Pool<TaosBuilder> = Pool::builder(Manager::from_dsn(self.dsn.clone()).unwrap().0)
.max_size(88) // Maximum number of connections
.build()
.unwrap();
```
In the application code, use `pool.get()?` to obtain a connection object from [Taos].
```rust
let taos = pool.get()?;
```
</TabItem>
</Tabs>

View File

@ -0,0 +1,406 @@
---
title: Running SQL Statements
sidebar_label: Running SQL Statements
slug: /developer-guide/running-sql-statements
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
TDengine provides comprehensive support for SQL, allowing users to perform data queries, inserts, and deletions using familiar SQL syntax. TDengine's SQL also supports database and table management operations, such as creating, modifying, and deleting databases and tables. TDengine extends standard SQL by introducing features specific to time-series data processing, such as aggregation queries, downsampling, and interpolation queries, to accommodate the characteristics of time-series data. These extensions enable users to handle time-series data more efficiently and conduct complex data analysis and processing. For specific supported SQL syntax, please refer to [TDengine SQL](../../tdengine-reference/sql-manual/).
Below is an introduction to how to use various language connectors to execute SQL commands for creating databases, creating tables, inserting data, and querying data.
:::note
REST connection: Each programming language's connector encapsulates the connection using `HTTP` requests, supporting data writing and querying operations. Developers still access `TDengine` through the interfaces provided by the connector.
REST API: Directly calls the REST API interface provided by `taosadapter` to perform data writing and querying operations. Code examples demonstrate using the `curl` command.
:::
## Create Database and Table
Using a smart meter as an example, below demonstrates how to execute SQL commands using various language connectors to create a database named `power` and then set `power` as the default database. Next, it creates a supertable named `meters`, with columns including timestamp, current, voltage, phase, and tags for group ID and location.
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcCreatDBDemo.java:create_db_and_table}}
```
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
{{#include docs/examples/python/create_db_ws.py}}
```
```python title="Native Connection"
{{#include docs/examples/python/create_db_native.py}}
```
```python title="REST Connection"
{{#include docs/examples/python/create_db_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:create_db_and_table}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/createdb.rs:create_db_and_table}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:create_db_and_table}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:create_db_and_table}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
{{#include docs/examples/c-ws/create_db_demo.c:create_db_and_table}}
```
```c title="Native Connection"
{{#include docs/examples/c/create_db_demo.c:create_db_and_table}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Create Database
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
--data 'CREATE DATABASE IF NOT EXISTS power'
```
Create Table, specifying the database as `power` in the URL
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql/power' \
--data 'CREATE STABLE IF NOT EXISTS meters (ts TIMESTAMP, current FLOAT, voltage INT, phase FLOAT) TAGS (groupId INT, location BINARY(24))'
```
</TabItem>
</Tabs>
:::note
It is recommended to construct SQL statements using the `<dbName>.<tableName>` format; using the `USE DBName` approach in the application is not recommended.
:::
## Insert Data
Using a smart meter as an example, below demonstrates how to execute SQL to insert data into the `meters` supertable in the `power` database. The example uses TDengine's automatic table creation SQL syntax to write 3 data entries into the `d1001` subtable and 1 data entry into the `d1002` subtable, and then prints the actual number of inserted data entries.
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcInsertDataDemo.java:insert_data}}
```
:::note
NOW is an internal function that defaults to the current time of the client's computer. NOW + 1s means the client's current time plus 1 second; the number after represents the time unit: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), n (months), y (years).
:::
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
{{#include docs/examples/python/insert_ws.py}}
```
```python title="Native Connection"
{{#include docs/examples/python/insert_native.py}}
```
```python title="REST Connection"
{{#include docs/examples/python/insert_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:insert_data}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/insert.rs:insert_data}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:insertData}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:insert_data}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
{{#include docs/examples/c-ws/insert_data_demo.c:insert_data}}
```
```c title="Native Connection"
{{#include docs/examples/c/insert_data_demo.c:insert_data}}
```
:::note
NOW is an internal function that defaults to the current time of the client's computer. NOW + 1s means the client's current time plus 1 second; the number after represents the time unit: a (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), n (months), y (years).
:::
</TabItem>
<TabItem label="REST API" value="rest">
Write Data
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
--data 'INSERT INTO power.d1001 USING power.meters TAGS(2,'\''California.SanFrancisco'\'') VALUES (NOW + 1a, 10.30000, 219, 0.31000) (NOW + 2a, 12.60000, 218, 0.33000) (NOW + 3a, 12.30000, 221, 0.31000) power.d1002 USING power.meters TAGS(3, '\''California.SanFrancisco'\'') VALUES (NOW + 1a, 10.30000, 218, 0.25000)'
```
</TabItem>
</Tabs>
## Query Data
Using a smart meter as an example, below demonstrates how to execute SQL using various language connectors to query data, retrieving up to 100 rows from the `meters` supertable in the `power` database and printing the results line by line.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcQueryDemo.java:query_data}}
```
:::note
Query operations are consistent with relational databases. When accessing return field content using indexes, start from 1; it is recommended to use field names for retrieval.
:::
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
{{#include docs/examples/python/query_ws.py}}
```
```python title="Native Connection"
{{#include docs/examples/python/query_native.py}}
```
```python title="REST Connection"
{{#include docs/examples/python/query_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/sqlquery/main.go:select_data}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/query.rs:query_data}}
```
The Rust connector also supports using **serde** for deserialization to obtain results as structured data:
```rust
{{#include docs/examples/rust/nativeexample/examples/query.rs:query_data_2}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:queryData}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:select_data}}
```
</TabItem>
<TabItem label="C" value="c">
```c title="Websocket Connection"
{{#include docs/examples/c-ws/query_data_demo.c:query_data}}
```
```c title="Native Connection"
{{#include docs/examples/c/query_data_demo.c:query_data}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Query Data
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql' \
--data 'SELECT ts, current, location FROM power.meters limit 100'
```
</TabItem>
</Tabs>
## Execute SQL with reqId
reqId can be used for request tracing. It acts similarly to traceId in distributed systems. A request may need to go through multiple services or modules to complete. reqId is used to identify and associate all related operations for this request, making it easier to trace and analyze the complete path of the request.
Benefits of using reqId include:
- Request tracing: By associating the same reqId with all related operations of a request, you can trace the complete path of the request within the system.
- Performance analysis: Analyzing a request's reqId allows you to understand the processing time across various services and modules, helping to identify performance bottlenecks.
- Fault diagnosis: When a request fails, you can find out where the issue occurred by examining the reqId associated with that request.
If users do not set a reqId, the connector will randomly generate one internally, but it is recommended to set it explicitly for better association with user requests.
Below are code samples for setting reqId while executing SQL with various language connectors.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JdbcReqIdDemo.java:with_reqid}}
```
</TabItem>
<TabItem label="Python" value="python">
```python title="Websocket Connection"
{{#include docs/examples/python/reqid_ws.py}}
```
```python title="Native Connection"
{{#include docs/examples/python/reqid_native.py}}
```
```python title="REST Connection"
{{#include docs/examples/python/reqid_rest.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/queryreqid/main.go:query_id}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/query.rs:query_with_req_id}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/sql_example.js:sqlWithReqid}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsInsert/Program.cs:query_id}}
```
</TabItem>
<TabItem label="C" value="c">
```c "Websocket Connection"
{{#include docs/examples/c-ws/with_reqid_demo.c:with_reqid}}
```
```c "Native Connection"
{{#include docs/examples/c/with_reqid_demo.c:with_reqid}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Query Data, specifying reqId as 3
```bash
curl --location -uroot:taosdata 'http://127.0.0.1:6041/rest/sql?req_id=3' \
--data 'SELECT ts, current, location FROM power.meters limit 1'
```
</TabItem>
</Tabs>

View File

@ -0,0 +1,362 @@
---
title: Ingesting Data in Schemaless Mode
sidebar_label: Schemaless Ingestion
slug: /developer-guide/schemaless-ingestion
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
In Internet of Things (IoT) applications, it is often necessary to collect a large number of data points to achieve various functionalities such as automated management, business analysis, and device monitoring. However, due to reasons like version upgrades of application logic and adjustments in the hardware of devices, the data collection items may change frequently. To address this challenge, TDengine provides a schemaless writing mode aimed at simplifying the data recording process.
By using schemaless writing, users do not need to pre-create supertables or subtables, as TDengine automatically creates the corresponding storage structure based on the actual data written. Additionally, when necessary, schemaless writing can also automatically add required data columns or tag columns to ensure that the data written by users can be stored correctly.
It is worth noting that supertables and their corresponding subtables created through schemaless writing are functionally indistinguishable from those created directly via SQL. Users can still use SQL to write data into these tables directly. However, the table names generated through schemaless writing are based on the tag values and are created according to a fixed mapping rule, which may result in names that lack readability and are not easy to understand.
**When using schemaless writing, tables are automatically created, and there is no need to manually create tables. Manual table creation may result in unknown errors.**
## Schemaless Writing Line Protocol
TDengine's schemaless writing line protocol is compatible with InfluxDB's line protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. For the standard writing protocols of InfluxDB and OpenTSDB, please refer to their respective official documentation.
The following introduces the protocol based on InfluxDB's line protocol, along with the extensions made by TDengine. This protocol allows users to control (supertable) schemas in a more granular way. A string can express a data row, and multiple rows can be passed to the writing API at once as multiple strings. The format is specified as follows.
```text
measurement,tag_set field_set timestamp
```
The parameters are described as follows:
- measurement is the table name, separated from tag_set by a comma.
- tag_set has the format `<tag_key>=<tag_value>, <tag_key>=<tag_value>`, indicating the tag column data, separated by commas and space from field_set.
- field_set has the format `<field_key>=<field_value>, <field_key>=<field_value>`, indicating the ordinary columns, also separated by commas and space from the timestamp.
- timestamp is the primary key timestamp corresponding to this data row.
- Schemaless writing does not support writing data to tables with a second primary key column.
All data in tag_set is automatically converted to the nchar data type and does not require double quotes.
In the schemaless writing data row protocol, each data item in field_set needs to describe its own data type with specific requirements as follows:
- If surrounded by double quotes, it indicates varchar type, e.g., "abc".
- If surrounded by double quotes and prefixed with L or l, it indicates nchar type, e.g., L" error message ".
- If surrounded by double quotes and prefixed with G or g, it indicates geometry type, e.g., G"Point(4.343 89.342)".
- If surrounded by double quotes and prefixed with B or b, it indicates varbinary type, where the quoted string can start with \x for hexadecimal or be a regular string, e.g., B"\x98f46e" and B"hello".
- For characters like spaces, equals sign (=), commas (,), double quotes ("), and backslashes (\), the preceding backslash (\) must be used for escaping (all are in English half-width symbols). The domain escaping rules for the schemaless writing protocol are shown in the following table.
| **No.** | **Domain** | **Need to Escape Characters** |
| -------- | -------- | ---------------- |
| 1 | Supertable Name | Comma, space |
| 2 | Tag Name | Comma, equals sign, space |
| 3 | Tag Value | Comma, equals sign, space |
| 4 | Column Name | Comma, equals sign, space |
| 5 | Column Value | Double quotes, backslash |
If two consecutive backslashes are used, the first backslash acts as an escape character; if only one backslash is used, it does not need to be escaped. The backslash escape rules for the schemaless writing protocol are shown in the following table.
| **No.** | **Backslash** | **Escaped as** |
| -------- | ------------ | ---------- |
| 1 | \ | \ |
| 2 | \\\\ | \ |
| 3 | \\\\\\ | \\\\ |
| 4 | \\\\\\\\ | \\\\ |
| 5 | \\\\\\\\\\ | \\\\\\ |
| 6 | \\\\\\\\\\\\ | \\\\\\ |
Numerical types will be distinguished by suffixes. The numerical type escape rules for the schemaless writing protocol are shown in the following table.
| **No.** | **Suffix** | **Mapped Type** | **Size (bytes)** |
| -------- | ----------- | ----------------------------- | -------------- |
| 1 | None or f64 | double | 8 |
| 2 | f32 | float | 4 |
| 3 | i8/u8 | TinyInt/UTinyInt | 1 |
| 4 | i16/u16 | SmallInt/USmallInt | 2 |
| 5 | i32/u32 | Int/UInt | 4 |
| 6 | i64/i/u64/u | BigInt/BigInt/UBigInt/UBigInt | 8 |
- t, T, true, True, TRUE, f, F, false, False will be treated directly as BOOL type.
For example, the following data row represents: writing to a subtable under the supertable named `st`, with tags t1 as "3" (NCHAR), t2 as "4" (NCHAR), and t3 as "t3" (NCHAR), writing column c1 as 3 (BIGINT), column c2 as false (BOOL), column c3 as "passit" (BINARY), and column c4 as 4 (DOUBLE), with a primary key timestamp of 1626006833639000000.
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```
It is important to note that errors in case sensitivity when specifying data type suffixes, or incorrect specifications for data types, can lead to error messages and cause data writing to fail.
TDengine provides idempotency guarantees for data writing, meaning you can repeatedly call the API to write erroneous data. However, it does not provide atomicity guarantees for writing multiple rows of data. This means that during a batch write process, some data may be successfully written while other data may fail.
## Schemaless Writing Processing Rules
Schemaless writing processes row data according to the following principles:
1. The following rules will be used to generate subtable names: first, combine the measurement name and the tag key and value into the following string.
```json
"measurement,tag_key1=tag_value1,tag_key2=tag_value2"
```
- It should be noted that tag_key1, tag_key2 here are not in the original order of user-input tags, but are arranged in alphabetical order based on tag names. Therefore, tag_key1 is not the first tag input in the row protocol.
After arrangement, the MD5 hash value "md5_val" of the string is calculated. Then, the result is combined with the string to generate the table name: "t_md5_val". The prefix "t_" is fixed, and every table generated by this mapping relationship has this prefix.
- If you do not want to use the automatically generated table name, there are two ways to specify subtable names (the first one has a higher priority).
1. Specify it by configuring the `smlAutoChildTableNameDelimiter` parameter in `taos.cfg` (excluding `@ # space carriage return newline tab`).
1. For example: if configured `smlAutoChildTableNameDelimiter=-` and the data inserted is `st,t0=cpu1,t1=4 c1=3 1626006833639000000`, the created table name will be `cpu1-4`.
2. Specify it by configuring the `smlChildTableName` parameter in `taos.cfg`.
1. For example: if configured `smlChildTableName=tname` and the data inserted is `st,tname=cpu1,t1=4 c1=3 1626006833639000000`, the created table name will be `cpu1`. Note that if the `tname` is the same for multiple rows of data but with different tag sets, only the first row with the automatically created table will use the specified tag set; the others will be ignored.
2. If the supertable obtained from parsing the row protocol does not exist, it will be created (it is not recommended to manually create supertables; otherwise, data insertion may be abnormal).
3. If the subtable obtained from parsing the row protocol does not exist, Schemaless will create the subtable according to the names determined in steps 1 or 2.
4. If the tag columns or ordinary columns specified in the data row do not exist, the corresponding tag columns or ordinary columns will be added to the supertable (only additions are allowed).
5. If some tag columns or ordinary columns exist in the supertable but are not specified with values in a data row, those columns will be set to NULL in that row.
6. For BINARY or NCHAR columns, if the provided value's length exceeds the column type limit, the allowable character length of that column will be automatically increased (only increments are allowed) to ensure complete data storage.
7. Any errors encountered during the entire process will interrupt the writing process and return an error code.
8. To improve writing efficiency, it is assumed by default that the order of field_set in the same supertable is the same (the first data contains all fields, and the subsequent data follows this order). If the order differs, you need to configure the parameter `smlDataFormat` as false; otherwise, data will be written in the same order, and the data in the library will be abnormal. Starting from version 3.0.3.0, automatic detection of order consistency is performed, and this configuration is deprecated.
9. Since SQL table names do not support dots (.), schemaless writing also handles dots (.). If the table name generated by schemaless writing contains a dot (.), it will be automatically replaced with an underscore (_). If a subtable name is specified manually, dots (.) in the name will also be converted to underscores (_).
10. The `taos.cfg` configuration file has added the `smlTsDefaultName` configuration (value as a string), which only takes effect on the client side. After configuration, the timestamp column name of the automatically created supertable can be set through this configuration. If not configured, it defaults to `_ts`.
11. The names of supertables or subtables created through schemaless writing are case-sensitive.
12. Schemaless writing still adheres to TDengine's underlying limitations on data structures, such as the total length of each row of data not exceeding 48KB (64KB from version 3.0.5.0), and the total length of tag values not exceeding 16KB.
## Time Resolution Identification
Schemaless writing supports three specified modes, as shown in the table below:
| **No.** | **Value** | **Description** |
| -------- | ------------------- | ----------------------------- |
| 1 | SML_LINE_PROTOCOL | InfluxDB Line Protocol |
| 2 | SML_TELNET_PROTOCOL | OpenTSDB Telnet Protocol |
| 3 | SML_JSON_PROTOCOL | JSON Format Protocol |
In SML_LINE_PROTOCOL parsing mode, users need to specify the time resolution of the input timestamps. Available time resolutions are as follows:
| **No.** | **Time Resolution Definition** | **Meaning** |
| -------- | --------------------------------- | -------------- |
| 1 | TSDB_SML_TIMESTAMP_NOT_CONFIGURED | Undefined (Invalid) |
| 2 | TSDB_SML_TIMESTAMP_HOURS | Hours |
| 3 | TSDB_SML_TIMESTAMP_MINUTES | Minutes |
| 4 | TSDB_SML_TIMESTAMP_SECONDS | Seconds |
| 5 | TSDB_SML_TIMESTAMP_MILLI_SECONDS | Milliseconds |
| 6 | TSDB_SML_TIMESTAMP_MICRO_SECONDS | Microseconds |
| 7 | TSDB_SML_TIMESTAMP_NANO_SECONDS | Nanoseconds |
In SML_TELNET_PROTOCOL and SML_JSON_PROTOCOL modes, the time precision is determined by the length of the timestamps (this is the same as the standard operation method for OpenTSDB), and the user-specified time resolution will be ignored.
## Data Mode Mapping Rules
The data from InfluxDB line protocol will be mapped to a structured format, where the measurement maps to the supertable name, the tag names in tag_set map to the tag names in the data structure, and the names in field_set map to column names. For example, the following data:
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000
```
This data row generates a supertable named `st`, which includes three nchar-type tags: `t1`, `t2`, `t3`, and five data columns: `ts` (timestamp), `c1` (bigint), `c3` (binary), `c2` (bool), `c4` (bigint). It maps to the following SQL statement:
```json
create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) tags(t1 nchar(1), t2 nchar(1), t3 nchar(2))
```
## Data Mode Change Handling
This section explains the impact on data mode under different row data writing conditions.
When writing with a clearly identified field type using row protocol, changing the field type definition later will result in a clear data mode error, triggering the writing API to report an error. As shown below,
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4i 1626006833640000000
```
The first row maps the data type of column `c4` to Double, but the second row specifies this column as BigInt through the numerical suffix, triggering a parsing error in schemaless writing.
If a row protocol indicates that a column is binary, but subsequent rows require a longer binary length, this will trigger a change in the supertable mode.
```json
st,t1=3,t2=4,t3=t3 c1=3i64,c5="pass" 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c5="passit" 1626006833640000000
```
In the first row, the row protocol parsing declares `c5` as a binary(4) field, and the second row will extract `c5` as a binary column, but its width will be increased to accommodate the new string width.
```json
st,t1=3,t2=4,t3=t3 c1=3i64 1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c6="passit" 1626006833640000000
```
In the second row, compared to the first, an additional column `c6` of type binary(6) has been added. Therefore, a new column `c6` of type binary(6) will be automatically added.
## Example of Schemaless Writing
Using the smart meter as an example, here are code samples demonstrating how various language connectors use the schemaless writing interface to write data, covering three protocols: InfluxDB's line protocol, OpenTSDB's TELNET line protocol, and OpenTSDB's JSON format protocol.
:::note
- Since the automatic table creation rules for schemaless writing differ from those in previous SQL example sections, please ensure that the `meters`, `metric_telnet`, and `metric_json` tables do not exist before running the code samples.
- The TELNET line protocol and JSON format protocol of OpenTSDB only support a single data column, so other examples have been used.
:::
### Websocket Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/SchemalessWsTest.java:schemaless}}
```
Execute schemaless writing with reqId, the last parameter reqId can be used for request tracing.
```java
writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);
```
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/schemaless_ws.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/schemaless/ws/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/restexample/examples/schemaless.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/line_example.js}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wssml/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c-ws/sml_insert_demo.c:schemaless}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
### Native Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/SchemalessJniTest.java:schemaless}}
```
Execute schemaless writing with reqId, the last parameter reqId can be used for request tracing.
```java
writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);
```
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/schemaless_native.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/schemaless/native/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/schemaless.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/nativesml/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c/sml_insert_demo.c:schemaless}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
## Querying Written Data
Running the code samples from the previous section will automatically create tables in the power database. We can query the data through the taos shell or application. Below are examples of querying supertables and the meters table data using the taos shell.
```shell
taos> show power.stables;
stable_name |
=================================
meter_current |
stb0_0 |
meters |
Query OK, 3 row(s) in set (0.002527s)
taos> select * from power.meters limit 1 \G;
*************************** 1.row ***************************
_ts: 2021-07-11 20:33:53.639
current: 10.300000199999999
voltage: 219
phase: 0.310000000000000
groupid: 2
location: California.SanFrancisco
Query OK, 1 row(s) in set (0.004501s)
```

View File

@ -0,0 +1,158 @@
---
title: Ingesting Data in Parameter Binding Mode
sidebar_label: Parameter Binding
slug: /developer-guide/parameter-binding
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
Using parameter binding for writing data can avoid the resource consumption of SQL syntax parsing, thus significantly improving writing performance. The reasons parameter binding can enhance writing efficiency include:
- **Reduced Parsing Time**: With parameter binding, the structure of the SQL statement is determined upon the first execution. Subsequent executions only need to replace the parameter values, thereby avoiding syntax parsing for each execution, which reduces parsing time.
- **Precompilation**: When using parameter binding, SQL statements can be precompiled and cached. When executing with different parameter values later, the precompiled version can be used directly, improving execution efficiency.
- **Reduced Network Overhead**: Parameter binding can also reduce the amount of data sent to the database since only parameter values need to be sent rather than the full SQL statement. This difference is particularly noticeable when executing a large number of similar insert or update operations.
**Tips: Data writing is recommended to use parameter binding.**
Next, we will continue using smart meters as an example to demonstrate how various language connectors efficiently write data using parameter binding:
1. Prepare a parameterized SQL insert statement for inserting data into the supertable `meters`. This statement allows dynamically specifying the subtable name, tags, and column values.
2. Loop to generate multiple subtables and their corresponding data rows. For each subtable:
- Set the subtable name and tag values (group ID and location).
- Generate multiple rows of data, each including a timestamp, randomly generated current, voltage, and phase values.
- Execute batch insert operations to insert these data rows into the corresponding subtable.
3. Finally, print the actual number of rows inserted into the table.
## Websocket Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem value="java" label="Java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/WSParameterBindingBasicDemo.java:para_bind}}
```
Here is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/WSParameterBindingFullDemo.java).
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/stmt_ws.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/stmt/ws/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/restexample/examples/stmt.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
```js
{{#include docs/examples/node/websocketexample/stmt_example.js:createConnect}}
```
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/wsStmt/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c-ws/stmt_insert_demo.c}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>
## Native Connection
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
```java
{{#include docs/examples/java/src/main/java/com/taos/example/ParameterBindingBasicDemo.java:para_bind}}
```
Here is a [more detailed parameter binding example](https://github.com/taosdata/TDengine/blob/main/docs/examples/java/src/main/java/com/taos/example/ParameterBindingFullDemo.java).
</TabItem>
<TabItem label="Python" value="python">
```python
{{#include docs/examples/python/stmt_native.py}}
```
</TabItem>
<TabItem label="Go" value="go">
```go
{{#include docs/examples/go/stmt/native/main.go}}
```
</TabItem>
<TabItem label="Rust" value="rust">
```rust
{{#include docs/examples/rust/nativeexample/examples/stmt.rs}}
```
</TabItem>
<TabItem label="Node.js" value="node">
Not supported
</TabItem>
<TabItem label="C#" value="csharp">
```csharp
{{#include docs/examples/csharp/stmtInsert/Program.cs:main}}
```
</TabItem>
<TabItem label="C" value="c">
```c
{{#include docs/examples/c/stmt_insert_demo.c}}
```
</TabItem>
<TabItem label="REST API" value="rest">
Not supported
</TabItem>
</Tabs>

1121
docs/en/07-develop/07-tmq.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,935 @@
---
sidebar_label: User-Defined Functions
title: User-Defined Functions (UDF)
slug: /developer-guide/user-defined-functions
---
## Introduction to UDF
In certain application scenarios, the query functions required by the application logic cannot be directly implemented using built-in functions. TDengine allows the writing of User-Defined Functions (UDF) to address the specific needs in such scenarios. Once the UDF is successfully registered in the cluster, it can be called in SQL just like system-built-in functions, with no difference in usage. UDFs are divided into scalar functions and aggregate functions. Scalar functions output a value for each row of data, such as calculating the absolute value (abs), sine function (sin), string concatenation function (concat), etc. Aggregate functions output a value for multiple rows of data, such as calculating the average (avg) or maximum value (max).
TDengine supports writing UDFs in both C and Python programming languages. UDFs written in C have performance almost identical to built-in functions, while those written in Python can leverage the rich Python computation libraries. To prevent exceptions during UDF execution from affecting database services, TDengine uses process separation technology to execute UDFs in another process. Even if a user-defined UDF crashes, it will not affect the normal operation of TDengine.
## Developing UDFs in C
When implementing UDFs in C, it is necessary to implement the specified interface functions:
- Scalar functions need to implement the scalar interface function `scalarfn`.
- Aggregate functions need to implement the aggregate interface functions `aggfn_start`, `aggfn`, `aggfn_finish`.
- If initialization is required, implement `udf_init`.
- If cleanup is required, implement `udf_destroy`.
### Interface Definition
The name of the interface function is the UDF name or a combination of the UDF name and specific suffixes (\_start, \_finish, \_init, \_destroy). The function names described later, such as `scalarfn` and `aggfn`, need to be replaced with the UDF name.
#### Scalar Function Interface
A scalar function is a function that converts input data to output data, typically used for calculations and transformations on a single data value. The prototype for the scalar function interface is as follows.
```c
int32_t scalarfn(SUdfDataBlock* inputDataBlock, SUdfColumn *resultColumn);
```
The main parameter descriptions are as follows:
- `inputDataBlock`: The input data block.
- `resultColumn`: The output column.
#### Aggregate Function Interface
An aggregate function is a special function used to group and calculate data to generate summary information. The workings of an aggregate function are as follows:
- Initialize the result buffer: First, call the `aggfn_start` function to generate a result buffer (result buffer) for storing intermediate results.
- Group data: Relevant data will be divided into multiple row data blocks, each containing a set of data with the same grouping key.
- Update intermediate results: For each data block, call the `aggfn` function to update the intermediate results. The `aggfn` function will compute the data according to the type of aggregate function (such as sum, avg, count, etc.) and store the calculation results in the result buffer.
- Generate final results: After updating the intermediate results of all data blocks, call the `aggfn_finish` function to extract the final result from the result buffer. The final result will contain either 0 or 1 piece of data, depending on the type of aggregate function and the input data.
The prototype for the aggregate function interface is as follows.
```c
int32_t aggfn_start(SUdfInterBuf *interBuf);
int32_t aggfn(SUdfDataBlock* inputBlock, SUdfInterBuf *interBuf, SUdfInterBuf *newInterBuf);
int32_t aggfn_finish(SUdfInterBuf* interBuf, SUdfInterBuf *result);
```
Where `aggfn` is a placeholder for the function name. First, call `aggfn_start` to generate the result buffer, then the relevant data will be divided into multiple row data blocks, and the `aggfn` function will be called for each data block to update the intermediate results. Finally, call `aggfn_finish` to produce the final result from the intermediate results, which can only contain 0 or 1 result data.
The main parameter descriptions are as follows:
- `interBuf`: The intermediate result buffer.
- `inputBlock`: The input data block.
- `newInterBuf`: The new intermediate result buffer.
- `result`: The final result.
#### Initialization and Destruction Interfaces
The initialization and destruction interfaces are shared by both scalar and aggregate functions, with the relevant APIs as follows.
```c
int32_t udf_init()
int32_t udf_destroy()
```
The `udf_init` function performs initialization, while the `udf_destroy` function handles cleanup. If there is no initialization work, there is no need to define the `udf_init` function; if there is no cleanup work, there is no need to define the `udf_destroy` function.
### Scalar Function Template
The template for developing scalar functions in C is as follows.
```c
#include "taos.h"
#include "taoserror.h"
#include "taosudf.h"
// Initialization function.
// If no initialization, we can skip definition of it.
// The initialization function shall be concatenation of the udf name and _init suffix.
// @return error number defined in taoserror.h
int32_t scalarfn_init() {
// initialization.
return TSDB_CODE_SUCCESS;
}
// Scalar function main computation function.
// @param inputDataBlock, input data block composed of multiple columns with each column defined by SUdfColumn
// @param resultColumn, output column
// @return error number defined in taoserror.h
int32_t scalarfn(SUdfDataBlock* inputDataBlock, SUdfColumn* resultColumn) {
// read data from inputDataBlock and process, then output to resultColumn.
return TSDB_CODE_SUCCESS;
}
// Cleanup function.
// If no cleanup related processing, we can skip definition of it.
// The destroy function shall be concatenation of the udf name and _destroy suffix.
// @return error number defined in taoserror.h
int32_t scalarfn_destroy() {
// clean up
return TSDB_CODE_SUCCESS;
}
```
### Aggregate Function Template
The template for developing aggregate functions in C is as follows.
```c
#include "taos.h"
#include "taoserror.h"
#include "taosudf.h"
// Initialization function.
// If no initialization, we can skip definition of it.
// The initialization function shall be concatenation of the udf name and _init suffix.
// @return error number defined in taoserror.h
int32_t aggfn_init() {
// initialization.
return TSDB_CODE_SUCCESS;
}
// Aggregate start function.
// The intermediate value or the state(@interBuf) is initialized in this function.
// The function name shall be concatenation of udf name and _start suffix.
// @param interbuf intermediate value to initialize
// @return error number defined in taoserror.h
int32_t aggfn_start(SUdfInterBuf* interBuf) {
// initialize intermediate value in interBuf
return TSDB_CODE_SUCCESS;
}
// Aggregate reduce function.
// This function aggregate old state(@interbuf) and one data bock(inputBlock) and output a new state(@newInterBuf).
// @param inputBlock input data block
// @param interBuf old state
// @param newInterBuf new state
// @return error number defined in taoserror.h
int32_t aggfn(SUdfDataBlock* inputBlock, SUdfInterBuf *interBuf, SUdfInterBuf *newInterBuf) {
// read from inputBlock and interBuf and output to newInterBuf
return TSDB_CODE_SUCCESS;
}
// Aggregate function finish function.
// This function transforms the intermediate value(@interBuf) into the final output(@result).
// The function name must be concatenation of aggfn and _finish suffix.
// @interBuf : intermediate value
// @result: final result
// @return error number defined in taoserror.h
int32_t int32_t aggfn_finish(SUdfInterBuf* interBuf, SUdfInterBuf *result) {
// read data from inputDataBlock and process, then output to result
return TSDB_CODE_SUCCESS;
}
// Cleanup function.
// If no cleanup related processing, we can skip definition of it.
// The destroy function shall be concatenation of the udf name and _destroy suffix.
// @return error number defined in taoserror.h
int32_t aggfn_destroy() {
// clean up
return TSDB_CODE_SUCCESS;
}
```
### Compilation
In TDengine, to implement UDFs, you need to write C source code and compile it into a dynamic link library file according to TDengine's specifications. Following the previously described rules, prepare the source code for the UDF `bit_and.c`. For the Linux operating system, execute the following command to compile and obtain the dynamic link library file.
```shell
gcc -g -O0 -fPIC -shared bit_and.c -o libbitand.so
```
To ensure reliable operation, it is recommended to use GCC version 7.5 or above.
### C UDF Data Structures
```c
typedef struct SUdfColumnMeta {
int16_t type;
int32_t bytes;
uint8_t precision;
uint8_t scale;
} SUdfColumnMeta;
typedef struct SUdfColumnData {
int32_t numOfRows;
int32_t rowsAlloc;
union {
struct {
int32_t nullBitmapLen;
char *nullBitmap;
int32_t dataLen;
char *data;
} fixLenCol;
struct {
int32_t varOffsetsLen;
int32_t *varOffsets;
int32_t payloadLen;
char *payload;
int32_t payloadAllocLen;
} varLenCol;
};
} SUdfColumnData;
typedef struct SUdfColumn {
SUdfColumnMeta colMeta;
bool hasNull;
SUdfColumnData colData;
} SUdfColumn;
typedef struct SUdfDataBlock {
int32_t numOfRows;
int32_t numOfCols;
SUdfColumn **udfCols;
} SUdfDataBlock;
typedef struct SUdfInterBuf {
int32_t bufLen;
char *buf;
int8_t numOfResult; //zero or one
} SUdfInterBuf;
```
The structure descriptions are as follows:
- `SUdfDataBlock` contains the number of rows `numOfRows` and the number of columns `numCols`. `udfCols[i]` (0 \<= i \<= `numCols`-1) represents each column of data, of type `SUdfColumn*`.
- `SUdfColumn` contains the data type definition `colMeta` and the data `colData`.
- The members of `SUdfColumnMeta` are defined similarly to the data type definitions in `taos.h`.
- `SUdfColumnData` can be of variable length, with `varLenCol` defining variable length data and `fixLenCol` defining fixed length data.
- `SUdfInterBuf` defines the intermediate structure buffer and the number of results in the buffer `numOfResult`.
To better operate on the above data structures, some utility functions are provided, defined in `taosudf.h`.
### C UDF Example Code
#### Scalar Function Example [bit_and](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/bit_and.c)
`bit_add` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `bit_add` ignores null values.
<details>
<summary>bit_and.c</summary>
```c
{{#include tests/script/sh/bit_and.c}}
```
</details>
#### Aggregate Function Example 1 Return Value as Numeric Type [l2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/l2norm.c)
`l2norm` implements the second-order norm of all data in the input column, which means squaring each data point, summing them, and then taking the square root.
<details>
<summary>l2norm.c</summary>
```c
{{#include tests/script/sh/l2norm.c}}
```
</details>
#### Aggregate Function Example 2 Return Value as String Type [max_vol](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/max_vol.c)
`max_vol` finds the maximum voltage from multiple input voltage columns and returns a combined string value composed of device ID + the location (row, column) of the maximum voltage + the maximum voltage value.
Create table:
```bash
create table battery(ts timestamp, vol1 float, vol2 float, vol3 float, deviceId varchar(16));
```
Create custom function:
```bash
create aggregate function max_vol as '/root/udf/libmaxvol.so' outputtype binary(64) bufsize 10240 language 'C';
```
Use custom function:
```bash
select max_vol(vol1, vol2, vol3, deviceid) from battery;
```
<details>
<summary>max_vol.c</summary>
```c
{{#include tests/script/sh/max_vol.c}}
```
</details>
## Developing UDFs in Python
### Preparing the Environment
The specific steps to prepare the environment are as follows:
- Step 1: Prepare the Python runtime environment.
- Step 2: Install the Python package `taospyudf`. The command is as follows.
```shell
pip3 install taospyudf
```
- Step 3: Execute the command `ldconfig`.
- Step 4: Start the `taosd` service.
During installation, C++ source code will be compiled, so the system must have `cmake` and `gcc`. The compiled file `libtaospyudf.so` will be automatically copied to the `/usr/local/lib/` directory, so if you are a non-root user, you need to add `sudo` during installation. After installation, you can check if this file exists in the directory:
```shell
root@slave11 ~/udf $ ls -l /usr/local/lib/libtaos*
-rw-r--r-- 1 root root 671344 May 24 22:54 /usr/local/lib/libtaospyudf.so
```
### Interface Definition
When developing UDFs using Python, you need to implement the specified interface functions. The specific requirements are as follows.
- Scalar functions need to implement the scalar interface function `process`.
- Aggregate functions need to implement the aggregate interface functions `start`, `reduce`, and `finish`.
- If initialization is required, implement the `init` function.
- If cleanup is required, implement the `destroy` function.
#### Scalar Function Interface
The interface for scalar functions is as follows.
```Python
def process(input: datablock) -> tuple[output_type]:
```
The main parameter description is as follows:
- `input`: `datablock` similar to a two-dimensional matrix, which reads the Python object located at row `row` and column `col` through the member method `data(row, col)`.
- The return value is a tuple of Python objects, with each element of the output type.
#### Aggregate Function Interface
The interface for aggregate functions is as follows.
```Python
def start() -> bytes:
def reduce(inputs: datablock, buf: bytes) -> bytes
def finish(buf: bytes) -> output_type:
```
The above code defines three functions used to implement a custom aggregate function. The specific process is as follows.
First, call the `start` function to generate the initial result buffer. This result buffer is used to store the internal state of the aggregate function and will be continuously updated as the input data is processed.
Then, the input data will be divided into multiple row data blocks. For each row data block, the `reduce` function will be called, passing the current row data block (`inputs`) and the current intermediate result (`buf`) as parameters. The `reduce` function will update the internal state of the aggregate function based on the input data and current state, returning the new intermediate result.
Finally, when all row data blocks are processed, the `finish` function will be called. This function receives the final intermediate result (`buf`) as a parameter and generates the final output from it. Due to the nature of aggregate functions, the final output can only contain 0 or 1 piece of data. This output result will be returned to the caller as the computation result of the aggregate function.
#### Initialization and Destruction Interfaces
The initialization and destruction interfaces are as follows.
```Python
def init()
def destroy()
```
Parameter descriptions:
- `init`: Completes initialization work.
- `destroy`: Completes cleanup work.
:::note
When developing UDFs in Python, it is necessary to define the `init` and `destroy` functions.
:::
### Scalar Function Template
The template for developing scalar functions in Python is as follows.
```Python
def init():
# initialization
def destroy():
# destroy
def process(input: datablock) -> tuple[output_type]:
```
### Aggregate Function Template
The template for developing aggregate functions in Python is as follows.
```Python
def init():
#initialization
def destroy():
#destroy
def start() -> bytes:
#return serialize(init_state)
def reduce(inputs: datablock, buf: bytes) -> bytes
# deserialize buf to state
# reduce the inputs and state into new_state.
# use inputs.data(i, j) to access python object of location(i, j)
# serialize new_state into new_state_bytes
return new_state_bytes
def finish(buf: bytes) -> output_type:
#return obj of type outputtype
```
### Data Type Mapping
The following table describes the mapping between TDengine SQL data types and Python data types. Any type of NULL value is mapped to Python's `None` value.
| **TDengine SQL Data Type** | **Python Data Type** |
| :-----------------------: | ------------ |
| TINYINT / SMALLINT / INT / BIGINT | int |
| TINYINT UNSIGNED / SMALLINT UNSIGNED / INT UNSIGNED / BIGINT UNSIGNED | int |
| FLOAT / DOUBLE | float |
| BOOL | bool |
| BINARY / VARCHAR / NCHAR | bytes|
| TIMESTAMP | int |
| JSON and other types | Not supported |
### Development Examples
This article contains five example programs, progressing from simple to complex, and also includes a wealth of practical debugging tips.
:::note
Logging cannot be output through the `print` function within UDFs; you need to write to files yourself or use Python's built-in logging library to write to files.
:::
#### Example One
Write a UDF function that only accepts a single integer: input `n`, output `ln(n^2 + 1)`.
First, write a Python file located in a certain system directory, such as `/root/udf/myfun.py`, with the following content.
```python
from math import log
def init():
pass
def destroy():
pass
def process(block):
rows, _ = block.shape()
return [log(block.data(i, 0) ** 2 + 1) for i in range(rows)]
```
This file contains three functions: `init` and `destroy` are both empty functions; they are the lifecycle functions of the UDF and need to be defined even if they do nothing. The key function is `process`, which accepts a data block; this data block object has two methods.
1. `shape()` returns the number of rows and columns in the data block.
2. `data(i, j)` returns the data located at row `i` and column `j`.
The `process` method of the scalar function needs to return as many rows of data as the number of rows in the input data block. The above code ignores the number of columns because it only needs to calculate the first column of each row.
Next, create the corresponding UDF function by executing the following statement in the TDengine CLI.
```sql
create function myfun as '/root/udf/myfun.py' outputtype double language 'Python'
```
The output is as follows.
```shell
taos> create function myfun as '/root/udf/myfun.py' outputtype double language 'Python';
Create OK, 0 row(s) affected (0.005202s)
```
It seems to go smoothly. Next, check all custom functions in the system to confirm that the creation was successful.
```text
taos> show functions;
name |
=================================
myfun |
Query OK, 1 row(s) in set (0.005767s)
```
Generate test data by executing the following commands in the TDengine CLI.
```sql
create database test;
create table t(ts timestamp, v1 int, v2 int, v3 int);
insert into t values('2023-05-01 12:13:14', 1, 2, 3);
insert into t values('2023-05-03 08:09:10', 2, 3, 4);
insert into t values('2023-05-10 07:06:05', 3, 4, 5);
```
Test the `myfun` function.
```sql
taos> select myfun(v1, v2) from t;
DB error: udf function execution failure (0.011088s)
```
Unfortunately, the execution failed. What is the reason? Check the logs of the `udfd` process.
```shell
tail -10 /var/log/taos/udfd.log
```
The following error message is found.
```text
05/24 22:46:28.733545 01665799 UDF ERROR can not load library libtaospyudf.so. error: operation not permitted
05/24 22:46:28.733561 01665799 UDF ERROR can not load python plugin. lib path libtaospyudf.so
```
The error is clear: the Python plugin `libtaospyudf.so` could not be loaded. If you encounter this error, please refer to the preparation environment section above.
After fixing the environment error, execute the command again as follows.
```sql
taos> select myfun(v1) from t;
myfun(v1) |
============================
0.693147181 |
1.609437912 |
2.302585093 |
```
Thus, we have completed the first UDF and learned some simple debugging methods.
#### Example Two
Although the above `myfun` passed the test, it has two shortcomings.
1. This scalar function only accepts one column of data as input; if the user passes in multiple columns, it will not raise an exception.
```sql
taos> select myfun(v1, v2) from t;
myfun(v1, v2) |
============================
0.693147181 |
1.609437912 |
2.302585093 |
```
2. It does not handle null values. We expect that if there are null values in the input, it will raise an exception and terminate execution. Therefore, the `process` function is improved as follows.
```python
def process(block):
rows, cols = block.shape()
if cols > 1:
raise Exception(f"require 1 parameter but given {cols}")
return [ None if block.data(i, 0) is None else log(block.data(i, 0) ** 2 + 1) for i in range(rows)]
```
Execute the following statement to update the existing UDF.
```sql
create or replace function myfun as '/root/udf/myfun.py' outputtype double language 'Python';
```
Passing two parameters to `myfun` will now cause it to fail.
```sql
taos> select myfun(v1, v2) from t;
DB error: udf function execution failure (0.014643s)
```
The custom exception message is printed in the plugin's log file `/var/log/taos/taospyudf.log`.
```text
2023-05-24 23:21:06.790 ERROR [1666188] [doPyUdfScalarProc@507] call pyUdfScalar proc function. context 0x7faade26d180. error: Exception: require 1 parameter but given 2
At:
/var/lib/taos//.udf/myfun_3_1884e1281d9.py(12): process
```
Thus, we have learned how to update the UDF and check the error logs produced by the UDF.
(Note: If the UDF does not take effect after being updated, in versions of TDengine prior to 3.0.5.0, it is necessary to restart `taosd`; in versions 3.0.5.0 and later, there is no need to restart `taosd` for it to take effect.)
#### Example Three
Input (x1, x2, ..., xn), output the sum of each value multiplied by its index: `1 * x1 + 2 * x2 + ... + n * xn`. If x1 to xn contains null, the result is null.
The difference from Example One is that this can accept any number of columns as input and needs to process the values of each column. Write the UDF file `/root/udf/nsum.py`.
```python
def init():
pass
def destroy():
pass
def process(block):
rows, cols = block.shape()
result = []
for i in range(rows):
total = 0
for j in range(cols):
v = block.data(i, j)
if v is None:
total = None
break
total += (j + 1) * block.data(i, j)
result.append(total)
return result
```
Create the UDF.
```sql
create function nsum as '/root/udf/nsum.py' outputtype double language 'Python';
```
Test the UDF.
```sql
taos> insert into t values('2023-05-25 09:09:15', 6, null, 8);
Insert OK, 1 row(s) affected (0.003675s)
taos> select ts, v1, v2, v3, nsum(v1, v2, v3) from t;
ts | v1 | v2 | v3 | nsum(v1, v2, v3) |
================================================================================================
2023-05-01 12:13:14.000 | 1 | 2 | 3 | 14.000000000 |
2023-05-03 08:09:10.000 | 2 | 3 | 4 | 20.000000000 |
2023-05-10 07:06:05.000 | 3 | 4 | 5 | 26.000000000 |
2023-05-25 09:09:15.000 | 6 | NULL | 8 | NULL |
Query OK, 4 row(s) in set (0.010653s)
```
#### Example Four
Write a UDF that takes a timestamp as input and outputs the next Sunday closest to that time. For example, if today is 2023-05-25, the next Sunday would be 2023-05-28. This function will use the third-party library `moment`. First, install this library.
```shell
pip3 install moment
```
Then, write the UDF file `/root/udf/nextsunday.py`.
```python
import moment
def init():
pass
def destroy():
pass
def process(block):
rows, cols = block.shape()
if cols > 1:
raise Exception("require only 1 parameter")
if not type(block.data(0, 0)) is int:
raise Exception("type error")
return [moment.unix(block.data(i, 0)).replace(weekday=7).format('YYYY-MM-DD')
for i in range(rows)]
```
The UDF framework will map TDengine's timestamp type to Python's int type, so this function only accepts an integer representing milliseconds. The `process` method first performs parameter checks, then uses the `moment` package to replace the weekday of the time with Sunday, and finally formats the output. The output string has a fixed length of 10 characters, so the UDF function can be created as follows.
```sql
create function nextsunday as '/root/udf/nextsunday.py' outputtype binary(10) language 'Python';
```
At this point, test the function; if you started `taosd` using `systemctl`, you will definitely encounter an error.
```sql
taos> select ts, nextsunday(ts) from t;
DB error: udf function execution failure (1.123615s)
```
```shell
tail -20 taospyudf.log
2023-05-25 11:42:34.541 ERROR [1679419] [PyUdf::PyUdf@217] py udf load module failure. error ModuleNotFoundError: No module named 'moment'
```
This is because the location of "moment" is not in the default library search path of the Python UDF plugin. How can we confirm this? By searching `taospyudf.log` with the following command.
```shell
grep 'sys path' taospyudf.log | tail -1
```
The output is as follows:
```text
2023-05-25 10:58:48.554 INFO [1679419] [doPyOpen@592] python sys path: ['', '/lib/python38.zip', '/lib/python3.8', '/lib/python3.8/lib-dynload', '/lib/python3/dist-packages', '/var/lib/taos//.udf']
```
It shows that the default search path for third-party libraries in the Python UDF plugin is: `/lib/python3/dist-packages`, while `moment` is installed by default in `/usr/local/lib/python3.8/dist-packages`. Now, we will modify the default library search path for the Python UDF plugin.
First, open the Python 3 command line and check the current `sys.path`.
```python
>>> import sys
>>> ":".join(sys.path)
'/usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages'
```
Copy the output string from the above script, then edit `/var/taos/taos.cfg` to add the following configuration.
```shell
UdfdLdLibPath /usr/lib/python3.8:/usr/lib/python3.8/lib-dynload:/usr/local/lib/python3.8/dist-packages:/usr/lib/python3/dist-packages
```
After saving, execute `systemctl restart taosd`, and then test again without errors.
```sql
taos> select ts, nextsunday(ts) from t;
ts | nextsunday(ts) |
===========================================
2023-05-01 12:13:14.000 | 2023-05-07 |
2023-05-03 08:09:10.000 | 2023-05-07 |
2023-05-10 07:06:05.000 | 2023-05-14 |
2023-05-25 09:09:15.000 | 2023-05-28 |
Query OK, 4 row(s) in set (1.011474s)
```
#### Example Five
Write an aggregate function to calculate the difference between the maximum and minimum values of a certain column.
The difference between aggregate functions and scalar functions is that scalar functions correspond to multiple outputs for multiple rows of input, while aggregate functions correspond to a single output for multiple rows of input. The execution process of aggregate functions is somewhat like the execution process of the classic map-reduce framework, which divides the data into several blocks, with each mapper processing one block, and the reducer aggregating the results from the mappers. The difference is that in TDengine's Python UDF, the `reduce` function has both map and reduce functionalities. The `reduce` function takes two parameters: one is the data to be processed, and the other is the result of the reduce function executed by another task. The following example demonstrates this in `/root/udf/myspread.py`.
```python
import io
import math
import pickle
LOG_FILE: io.TextIOBase = None
def init():
global LOG_FILE
LOG_FILE = open("/var/log/taos/spread.log", "wt")
log("init function myspead success")
def log(o):
LOG_FILE.write(str(o) + '\n')
def destroy():
log("close log file: spread.log")
LOG_FILE.close()
def start():
return pickle.dumps((-math.inf, math.inf))
def reduce(block, buf):
max_number, min_number = pickle.loads(buf)
log(f"initial max_number={max_number}, min_number={min_number}")
rows, _ = block.shape()
for i in range(rows):
v = block.data(i, 0)
if v > max_number:
log(f"max_number={v}")
max_number = v
if v < min_number:
log(f"min_number={v}")
min_number = v
return pickle.dumps((max_number, min_number))
def finish(buf):
max_number, min_number = pickle.loads(buf)
return max_number - min_number
```
In this example, we not only define an aggregate function but also add logging functionality to record execution logs.
1. The `init` function opens a file for logging.
2. The `log` function records logs, automatically converting the passed object to a string and adding a newline character.
3. The `destroy` function closes the log file after execution.
4. The `start` function returns the initial buffer for storing intermediate results of the aggregate function, initializing the maximum value to negative infinity and the minimum value to positive infinity.
5. The `reduce` function processes each data block and aggregates the results.
6. The `finish` function converts the buffer into the final output.
Execute the following SQL statement to create the corresponding UDF.
```sql
create or replace aggregate function myspread as '/root/udf/myspread.py' outputtype double bufsize 128 language 'Python';
```
This SQL statement has two important differences from the SQL statement for creating scalar functions.
1. The `aggregate` keyword has been added.
2. The `bufsize` keyword has been added to specify the memory size for storing intermediate results, in bytes. This value can be greater than the actual size used. In this case, the intermediate result consists of a tuple of two floating-point numbers, and after serialization, it occupies only 32 bytes. However, the specified `bufsize` is 128, which can be printed using Python to show the actual number of bytes used.
```python
>>> len(pickle.dumps((12345.6789, 23456789.9877)))
32
```
Test this function, and you will see that the output of `myspread` is consistent with the output of the built-in `spread` function.
```sql
taos> select myspread(v1) from t;
myspread(v1) |
============================
5.000000000 |
Query OK, 1 row(s) in set (0.013486s)
taos> select spread(v1) from t;
spread(v1) |
============================
5.000000000 |
Query OK, 1 row(s) in set (0.005501s)
```
Finally, check the execution log, and you will see that the `reduce` function was executed three times, with the `max` value being updated four times and the `min` value being updated once.
```shell
root@slave11 /var/log/taos $ cat spread.log
init function myspead success
initial max_number=-inf, min_number=inf
max_number=1
min_number=1
initial max_number=1, min_number=1
max_number=2
max_number=3
initial max_number=3, min_number=1
max_number=6
close log file: spread.log
```
Through this example, we learned how to define aggregate functions and print custom log information.
### More Python UDF Example Code
#### Scalar Function Example [pybitand](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/pybitand.py)
`pybitand` implements the bitwise AND function for multiple columns. If there is only one column, it returns that column. `pybitand` ignores null values.
<details>
<summary>pybitand.py</summary>
```Python
{{#include tests/script/sh/pybitand.py}}
```
</details>
#### Aggregate Function Example [pyl2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/pyl2norm.py)
`pyl2norm` implements the second-order norm of all data in the input column, which means squaring each data point, summing them, and then taking the square root.
<details>
<summary>pyl2norm.py</summary>
```c
{{#include tests/script/sh/pyl2norm.py}}
```
</details>
#### Aggregate Function Example [pycumsum](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/pycumsum.py)
`pycumsum` calculates the cumulative sum of all data in the input column using `numpy`.
<details>
<summary>pycumsum.py</summary>
```c
{{#include tests/script/sh/pycumsum.py}}
```
</details>
## Managing UDFs
The process of managing UDFs in the cluster involves creating, using, and maintaining these functions. Users can create and manage UDFs in the cluster via SQL, and once created, all users in the cluster can use these functions in SQL. Since UDFs are stored on the mnode of the cluster, they remain available even after the cluster is restarted.
When creating UDFs, it is necessary to distinguish between scalar functions and aggregate functions. Scalar functions accept zero or more input parameters and return a single value. Aggregate functions accept a set of input values and return a single value through some computation (such as summation, counting, etc.). If the wrong function type is declared during creation, an error will occur when calling the function via SQL.
Additionally, users need to ensure that the input data types match the UDF program, and that the UDF output data types match the `outputtype`. This means that when creating a UDF, the correct data types must be specified for both input parameters and output values. This helps ensure that when calling the UDF, the input data can be correctly passed to the UDF, and that the UDF's output value matches the expected data type.
### Creating Scalar Functions
The SQL syntax for creating scalar functions is as follows.
```sql
CREATE [OR REPLACE] FUNCTION function_name AS library_path OUTPUTTYPE output_type LANGUAGE 'Python';
```
Parameter descriptions are as follows:
- `or replace`: If the function already exists, it will modify the existing function properties.
- `function_name`: The name of the scalar function when called in SQL.
- `language`: Supports C language and Python language (version 3.7 and above), defaulting to C.
- `library_path`: If the programming language is C, the path is the absolute path to the dynamic link library containing the UDF implementation, usually pointing to a `.so` file. If the programming language is Python, the path is the file path containing the UDF implementation in Python. The path needs to be enclosed in single or double quotes.
- `output_type`: The data type name of the function's computation result.
### Creating Aggregate Functions
The SQL syntax for creating aggregate functions is as follows.
```sql
CREATE [OR REPLACE] AGGREGATE FUNCTION function_name library_path OUTPUTTYPE output_type BUFSIZE buffer_size LANGUAGE 'Python';
```
Where `buffer_size` indicates the buffer size for intermediate calculation results, measured in bytes. The meanings of other parameters are the same as for scalar functions.
The following SQL creates a UDF named `l2norm`.
```sql
CREATE AGGREGATE FUNCTION l2norm AS "/home/taos/udf_example/libl2norm.so" OUTPUTTYPE DOUBLE bufsize 8;
```
### Deleting UDFs
The SQL syntax for deleting a UDF with the specified name is as follows.
```sql
DROP FUNCTION function_name;
```
### Viewing UDFs
The SQL to display all currently available UDFs in the cluster is as follows.
```sql
show functions;
```
### Viewing Function Information
The version number of the UDF increases by 1 each time it is updated.
```sql
select * from ins_functions \G;
```

View File

@ -0,0 +1,284 @@
---
title: Ingesting Data Efficiently
slug: /developer-guide/ingesting-data-efficiently
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import Image from '@theme/IdealImage';
import imgThread from '../assets/ingesting-data-efficiently-01.png';
This section introduces how to efficiently write data to TDengine.
## Principles of Efficient Writing {#principle}
### From the Perspective of the Client Program {#application-view}
From the perspective of the client program, efficient data writing should consider the following factors:
1. The amount of data written at a time. Generally, the larger the batch size, the more efficient the writing (though the advantage may diminish beyond a certain threshold). When using SQL to write to TDengine, try to concatenate more data into a single SQL statement. Currently, the maximum length of a single SQL statement supported by TDengine is 1,048,576 (1MB) characters.
2. The number of concurrent connections. Generally, the more concurrent connections writing data simultaneously, the more efficient the writing (though performance may decline beyond a certain threshold, depending on server capacity).
3. The distribution of data across different tables (or subtables), i.e., the proximity of the data being written. Generally, writing data only to the same table (or subtable) in each batch is more efficient than writing to multiple tables (or subtables).
4. The method of writing. In general:
- Parameter binding is more efficient than SQL writing because it avoids SQL parsing (though it increases the number of C interface calls, which can incur performance overhead).
- SQL writing without automatic table creation is more efficient than with automatic table creation because the latter frequently checks for table existence.
- SQL writing is more efficient than schema-less writing, as the latter automatically creates tables and supports dynamic changes to table structures.
The client program should fully and appropriately utilize these factors. In each writing operation, data should ideally only be written to the same table (or subtable), and the amount of data written per batch should be set to a value that is optimal for the current system's processing capacity based on testing and tuning. The number of concurrent write connections should also be set to an optimal value for the current system's processing capacity to achieve the best writing speed in the existing system.
### From the Perspective of the Data Source {#datasource-view}
Client programs typically need to read data from a data source and then write it to TDengine. From the data source perspective, the following scenarios necessitate a queue between the read and write threads:
1. Multiple data sources generate data at a rate that is significantly lower than the single-threaded write speed, but the overall data volume is considerable. In this case, the queue's role is to aggregate data from multiple sources to increase the amount of data written in a single operation.
2. A single data source generates data at a rate significantly higher than the single-threaded write speed. Here, the queue's role is to increase the write concurrency.
3. Data for a single table is scattered across multiple data sources. In this case, the queue's role is to aggregate data for the same table in advance, enhancing the proximity of data during writing.
If the data source for the writing application is Kafka, and the writing application itself is a Kafka consumer, the characteristics of Kafka can be leveraged for efficient writing. For example:
1. Write data for the same table to the same Topic's same Partition, increasing data proximity.
2. Aggregate data by subscribing to multiple Topics.
3. Increase write concurrency by adding more Consumer threads.
4. Increase the maximum data amount fetched each time to raise the maximum amount written in a single operation.
### From the Perspective of Server Configuration {#setting-view}
From the perspective of server configuration, it's important to set an appropriate number of vgroups when creating the database based on the number of disks in the system, their I/O capabilities, and processor capacity to fully utilize system performance. If there are too few vgroups, the system performance cannot be fully realized; if there are too many vgroups, unnecessary resource contention may occur. A general recommendation is to set the number of vgroups to twice the number of CPU cores, but tuning should still be based on the specific system resource configuration.
For more tuning parameters, refer to [Manage Databases](../../tdengine-reference/sql-manual/manage-databases/) and [taosd Reference](../../tdengine-reference/components/taosd/).
## Examples of Efficient Writing {#sample-code}
### Scenario Design {#scenario}
The following example program demonstrates how to efficiently write data, with the scenario designed as follows:
- The TDengine client program continuously reads data from other data sources, simulated in this example by generating mock data.
- A single connection cannot match the reading speed, so the client program starts multiple threads, each establishing a connection to TDengine with a dedicated fixed-size message queue.
- The client program hashes received data based on the associated table name (or subtable name) to determine the corresponding Queue index, ensuring that data belonging to a specific table (or subtable) is processed by a designated thread.
- Each sub-thread writes the data from its associated message queue to TDengine after emptying the queue or reaching a predefined data volume threshold, and continues processing the subsequently received data.
<figure>
<Image img={imgThread} alt="Thread model for efficient writing example"/>
<figcaption>Figure 1. Thread model for efficient writing example</figcaption>
</figure>
### Example Code {#code}
This part provides example code for the above scenario. The principles of efficient writing are the same for other scenarios, but the code needs to be modified accordingly.
This example code assumes that the source data belongs to different subtables of the same supertable (`meters`). The program creates this supertable in the `test` database before writing data. For subtables, they will be automatically created by the application based on the received data. If the actual scenario involves multiple supertables, simply modify the code for automatic table creation in the write task.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
**Program Listing**
| Class Name | Function Description |
| ---------------------- | --------------------------------------------------------------------------------- |
| FastWriteExample | Main program |
| ReadTask | Reads data from the simulated source, hashes the table name to obtain the Queue index, and writes to the corresponding Queue |
| WriteTask | Retrieves data from the Queue, composes a Batch, and writes to TDengine |
| MockDataSource | Simulates the generation of data for various meters subtables |
| SQLWriter | Depends on this class for SQL concatenation, automatic table creation, SQL writing, and SQL length checking |
| StmtWriter | Implements batch writing via parameter binding (not yet completed) |
| DataBaseMonitor | Monitors write speed and prints the current write speed to the console every 10 seconds |
Below is the complete code for each class and a more detailed function description.
<details>
<summary>FastWriteExample</summary>
The main program is responsible for:
1. Creating message queues
2. Starting write threads
3. Starting read threads
4. Monitoring write speed every 10 seconds
The main program exposes 4 parameters by default, which can be adjusted during each program start for testing and tuning:
1. The number of read threads. Default is 1.
2. The number of write threads. Default is 3.
3. The total number of simulated tables. Default is 1,000. This will be evenly distributed among the read threads. If the total number of tables is large, table creation will take longer, and the initial monitoring of write speed may be slower.
4. The maximum number of records to write in a single batch. Default is 3,000.
The queue capacity (`taskQueueCapacity`) is also a performance-related parameter that can be adjusted by modifying the program. Generally speaking, the larger the queue capacity, the lower the probability of being blocked during enqueuing, and the greater the throughput of the queue, although memory usage will also increase. The default value in the example program has been set sufficiently high.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/FastWriteExample.java}}
```
</details>
<details>
<summary>ReadTask</summary>
The read task is responsible for reading data from the data source. Each read task is associated with a simulated data source. Each simulated data source generates a limited amount of data for a table. Different simulated data sources generate data for different tables.
The read task uses a blocking method to write to the message queue. This means that once the queue is full, the write operation will block.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/ReadTask.java}}
```
</details>
<details>
<summary>WriteTask</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/WriteTask.java}}
```
</details>
<details>
<summary>MockDataSource</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/MockDataSource.java}}
```
</details>
<details>
<summary>SQLWriter</summary>
The `SQLWriter` class encapsulates the logic for SQL concatenation and data writing. Note that none of the tables are created in advance; they are batch-created using the supertable as a template when a table-not-found exception occurs, and the INSERT statement is then re-executed. For other exceptions, the SQL statement executed at that time is simply logged, and you can log more clues for error diagnosis and fault recovery.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/SQLWriter.java}}
```
</details>
<details>
<summary>DataBaseMonitor</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/DataBaseMonitor.java}}
```
</details>
**Execution Steps**
<details>
<summary>Running the Java Example Program</summary>
Before running the program, configure the environment variable `TDENGINE_JDBC_URL`. If the TDengine Server is deployed locally and the username, password, and port are the default values, configure it as follows:
```shell
TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
**Running the Example Program in a Local IDE**
1. Clone the TDengine repository
```shell
git clone git@github.com:taosdata/TDengine.git --depth 1
```
2. Open the `docs/examples/java` directory in the IDE.
3. Configure the environment variable `TDENGINE_JDBC_URL` in the development environment. If you have already set a global environment variable for `TDENGINE_JDBC_URL`, you can skip this step.
4. Run the class `com.taos.example.highvolume.FastWriteExample`.
**Running the Example Program on a Remote Server**
To run the example program on a server, follow these steps:
1. Package the example code. Execute the following in the directory `TDengine/docs/examples/java`:
```shell
mvn package
```
2. Create an `examples` directory on the remote server:
```shell
mkdir -p examples/java
```
3. Copy the dependencies to the specified directory on the server:
- Copy the dependency packages (only do this once)
```shell
scp -r ./target/lib <user>@<host>:~/examples/java
```
- Copy the jar file for this program (need to copy each time the code is updated)
```shell
scp -r ./target/javaexample-1.0.jar <user>@<host>:~/examples/java
```
4. Configure the environment variable.
Edit `~/.bash_profile` or `~/.bashrc` to add the following content, for example:
```shell
export TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
The above uses the default JDBC URL for a locally deployed TDengine Server. You need to modify it according to your actual situation.
5. Use the Java command to start the example program with the command template:
```shell
java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample <read_thread_count> <write_thread_count> <total_table_count> <max_batch_size>
```
6. Terminate the test program. The test program will not automatically end. After achieving a stable write speed under the current configuration, press <kbd>CTRL</kbd> + <kbd>C</kbd> to terminate the program.
Below is an actual run log output, with a machine configuration of 16 cores + 64G + SSD.
```shell
root@vm85$ java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample 2 12
18:56:35.896 [main] INFO c.t.e.highvolume.FastWriteExample - readTaskCount=2, writeTaskCount=12 tableCount=1000 maxBatchSize=3000
18:56:36.011 [WriteThread-0] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.015 [WriteThread-0] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.021 [WriteThread-1] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.022 [WriteThread-1] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.031 [WriteThread-2] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.032 [WriteThread-2] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.041 [WriteThread-3] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.042 [WriteThread-3] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.093 [WriteThread-4] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.094 [WriteThread-4] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.099 [WriteThread-5] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.100 [WriteThread-5] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.100 [WriteThread-6] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.101 [WriteThread-6] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.103 [WriteThread-7] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.104 [WriteThread-7] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.105 [WriteThread-8] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.107 [WriteThread-8] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.108 [WriteThread-9] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.109 [WriteThread-9] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.156 [WriteThread-10] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.157 [WriteThread-11] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.158 [WriteThread-10] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.158 [ReadThread-0] INFO com.taos.example.highvolume.ReadTask - started
18:56:36.158 [ReadThread-1] INFO com.taos.example.highvolume.ReadTask - started
18:56:36.158 [WriteThread-11] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:46.369 [main] INFO c.t.e.highvolume.FastWriteExample - count=18554448 speed=1855444
18:56:56.946 [main] INFO c.t.e.highvolume.FastWriteExample - count=39059660 speed=2050521
18:57:07.322 [main] INFO c.t.e.highvolume.FastWriteExample - count=59403604 speed=2034394
18:57:18.032 [main] INFO c.t.e.highvolume.FastWriteExample - count=80262938 speed=2085933
18:57:28.432 [main] INFO c.t.e.highvolume.FastWriteExample - count=101139906 speed=2087696
18:57:38.921 [main] INFO c.t.e.highvolume.FastWriteExample - count=121807202 speed=2066729
18:57:49.375 [main] INFO c.t.e.highvolume.FastWriteExample - count=142952417 speed=2114521
18:58:00.689 [main] INFO c.t.e.highvolume.FastWriteExample - count=163650306 speed=2069788
18:58:11.646 [main] INFO c.t.e.highvolume.FastWriteExample - count=185019808 speed=2136950
```
</details>
:::note
When using Python connectors for multi-process connections to TDengine, there is a limitation: connections cannot be established in the parent process; all connections must be created in the child processes. If a connection is established in the parent process, and then connections are created in the child processes, it will cause a blockage. This is a known issue.
:::
</TabItem>
</Tabs>

View File

@ -1,3 +1,3 @@
```c title="Native Connection"
```c
{{#include docs/examples/c/connect_example.c}}
```

View File

@ -5,3 +5,4 @@
```csharp title="WebSocket Connection"
{{#include docs/examples/csharp/wsConnect/Program.cs}}
```

View File

@ -0,0 +1,17 @@
#### Accessing Unified Interface via Database
```go title="Native Connection"
{{#include docs/examples/go/connect/cgoexample/main.go}}
```
```go title="REST Connection"
{{#include docs/examples/go/connect/restexample/main.go}}
```
#### Using High-level Encapsulation
You can also establish a connection using the af package of driver-go. This module encapsulates TDengine's advanced features, such as parameter binding, subscription, etc.
```go title="Establishing Native Connection with af Package"
{{#include docs/examples/go/connect/afconn/main.go}}
```

View File

@ -6,10 +6,10 @@
{{#include docs/examples/java/src/main/java/com/taos/example/RESTConnectExample.java:main}}
```
When using REST connection, the feature of bulk pulling can be enabled if the size of resulting data set is huge.
When using REST connection, if the query data volume is large, you can also enable the batch fetching feature.
```java title="Enable Bulk Pulling" {4}
```java title="Enable Batch Fetching Feature" {4}
{{#include docs/examples/java/src/main/java/com/taos/example/WSConnectExample.java:main}}
```
More configuration about connection, please refer to [Java Client Library](../../reference/connectors/java)
For more connection parameter configurations, refer to [Java Connector](../../connector/java).

View File

@ -1,3 +1,3 @@
```php title=""native"
```php title="Native Connection"
{{#include docs/examples/php/connect.php}}
```

View File

@ -1,3 +1,3 @@
```python title="Native Connection"
```python
{{#include docs/examples/python/connect_example.py}}
```

View File

@ -0,0 +1,7 @@
```rust title="Native Connection"
{{#include docs/examples/rust/nativeexample/examples/connect.rs}}
```
:::note
For the Rust connector, the difference in connection methods only reflects the different features used. If the "ws" feature is enabled, only the WebSocket implementation will be compiled in.
:::

View File

@ -0,0 +1,36 @@
---
title: Developer's Guide
description: A guide to help developers quickly get started.
slug: /developer-guide
---
When developing an application and planning to use TDengine as a tool for time-series data processing, here are several steps to follow:
1. **Determine the Connection Method to TDengine**: Regardless of the programming language you use, you can always connect via the REST interface. However, you can also use the dedicated connectors available for each programming language for a more convenient connection.
2. **Define the Data Model Based on Your Application Scenario**: Depending on the characteristics of your data, decide whether to create one or multiple databases; differentiate between static tags and collected data, establish the correct supertable, and create subtables.
3. **Decide on the Data Insertion Method**: TDengine supports standard SQL for data writing, but it also supports Schemaless mode, allowing you to write data directly without manually creating tables.
4. **Determine the SQL Queries Needed**: Based on business requirements, identify which SQL query statements you need to write.
5. **For Lightweight Real-time Statistical Analysis**: If you plan to perform lightweight real-time statistical analysis on time-series data, including various monitoring dashboards, it is recommended to utilize TDengine 3.0s stream computing capabilities without deploying complex stream processing systems like Spark or Flink.
6. **For Applications Needing Data Consumption Notifications**: If your application has modules that need to consume inserted data and require notifications for new data inserts, it is recommended to use the data subscription feature provided by TDengine, rather than deploying Kafka or other message queue software.
7. **Utilize TDengines Cache Feature**: In many scenarios (e.g., vehicle management), if your application needs to retrieve the latest status of each data collection point, it is advisable to use TDengines Cache feature instead of deploying separate caching software like Redis.
8. **Use User-Defined Functions (UDFs) if Necessary**: If you find that TDengine's functions do not meet your requirements, you can create user-defined functions (UDFs) to solve your problems.
This section is organized according to the above sequence. For better understanding, TDengine provides example code for each feature and supported programming language, located in the [Example Code](https://github.com/taosdata/TDengine/tree/main/docs/examples). All example codes are verified for correctness through CI, with the scripts located at [Example Code CI](https://github.com/taosdata/TDengine/tree/main/tests/docs-examples-test).
If you wish to delve deeper into SQL usage, refer to the [SQL Manual](../tdengine-reference/sql-manual/). For more information on the use of each connector, please read the [Connector Reference Guide](../tdengine-reference/client-libraries/). If you want to integrate TDengine with third-party systems, such as Grafana, please refer to [Third-party Tools](../third-party-tools/).
If you encounter any issues during development, please click on the "Feedback Issues" link at the bottom of each page to submit an issue directly on GitHub: [Feedback Issues](https://github.com/taosdata/TDengine/issues/new/choose).
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```

View File

@ -1,75 +0,0 @@
---
title: Resource Planning
sidebar_label: Resource Planning
description: This document describes how to plan compute and storage resources for your TDengine cluster.
---
It is important to plan computing and storage resources if using TDengine to build an IoT, time-series or Big Data platform. How to plan the CPU, memory and disk resources required, will be described in this chapter.
## Server Memory Requirements
Each database creates a fixed number of vgroups. This number is 2 by default and can be configured with the `vgroups` parameter. The number of replicas can be controlled with the `replica` parameter. Each replica requires one vnode per vgroup. Altogether, the memory required by each database depends on the following configuration options:
- vgroups
- replica
- buffer
- pages
- pagesize
- cachesize
For more information, see [Database](../../reference/taos-sql/database).
The memory required by a database is therefore greater than or equal to:
```
vgroups * replica * (buffer + pages * pagesize + cachesize)
```
However, note that this requirement is spread over all dnodes in the cluster, not on a single physical machine. The physical servers that run dnodes meet the requirement together. If a cluster has multiple databases, the memory required increases accordingly. In complex environments where dnodes were added after initial deployment in response to increasing resource requirements, load may not be balanced among the original dnodes and newer dnodes. In this situation, the actual status of your dnodes is more important than theoretical calculations.
## Client Memory Requirements
For the client programs using TDengine client driver `taosc` to connect to the server side there is a memory requirement as well.
The memory consumed by a client program is mainly about the SQL statements for data insertion, caching of table metadata, and some internal use. Assuming maximum number of tables is N (the memory consumed by the metadata of each table is 256 bytes), maximum number of threads for parallel insertion is T, maximum length of a SQL statement is S (normally 1 MB), the memory required by a client program can be estimated using the below formula:
```
M = (T * S * 3 + (N / 4096) + 100)
```
For example, if the number of parallel data insertion threads is 100, total number of tables is 10,000,000, then the minimum memory requirement of a client program is:
```
100 * 3 + (10000000 / 4096) + 100 = 2741 (MBytes)
```
So, at least 3GB needs to be reserved for such a client.
## CPU Requirement
The CPU resources required depend on two aspects:
- **Data Insertion** Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The difference in computing resource consumed, between inserting 1 row at a time, and inserting 10 rows at a time is very small. So, the more the number of rows that can be inserted one time, the higher the efficiency. If each insert request contains more than 200 records, a single core can process more than 1 million records per second. Inserting in batch also imposes requirements on the client side which needs to cache rows to insert in batch once the number of cached rows reaches a threshold.
- **Data Query** High efficiency query is provided in TDengine, but it's hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, and other requirements provided by users.
In short, the CPU resource required for data insertion can be estimated but it's hard to do so for query use cases. If possible, ensure that CPU usage remains below 50%. If this threshold is exceeded, it's a reminder for system operator to add more nodes in the cluster to expand resources.
## Disk Requirement
The compression ratio in TDengine is much higher than that in RDBMS. In most cases, the compression ratio in TDengine is bigger than 5, or even 10 in some cases, depending on the characteristics of the original data. The data size before compression can be calculated based on below formula:
```
Raw DataSize = numOfTables * rowSizePerTable * rowsPerTable
```
For example, there are 10,000,000 meters, while each meter collects data every 15 minutes and the data size of each collection is 128 bytes, so the raw data size of one year is: 10000000 \* 128 \* 24 \* 60 / 15 \* 365 = 44.8512(TB). Assuming compression ratio is 5, the actual disk size is: 44.851 / 5 = 8.97024(TB).
Parameter `keep` can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device. This is completely transparent to application programs.
To increase performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability.
## Number of Hosts
A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulae mentioned previously. If the number of data replicas is not 1, the required resources are multiplied by the number of replicas.
Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily.

File diff suppressed because it is too large Load Diff

View File

@ -1,25 +0,0 @@
---
title: Fault Tolerance and Disaster Recovery
description: This document describes how TDengine provides fault tolerance and disaster recovery.
---
## Fault Tolerance
TDengine uses **WAL**, i.e. Write Ahead Log, to achieve fault tolerance and high reliability.
When a data block is received by TDengine, the original data block is first written into WAL. The log in WAL will be deleted only after the data has been written into data files in the database. Data can be recovered from WAL in case the server is stopped abnormally for any reason and then restarted.
There are 2 configuration parameters related to WAL:
- wal_level: Specifies the WAL level. 1 indicates that WAL is enabled but fsync is disabled. 2 indicates that WAL and fsync are both enabled. The default value is 1.
- wal_fsync_period: This parameter is only valid when wal_level is set to 2. It specifies the interval, in milliseconds, of invoking fsync. If set to 0, it means fsync is invoked immediately once WAL is written.
To achieve absolutely no data loss, set wal_level to 2 and wal_fsync_period to 0. There is a performance penalty to the data ingestion rate. However, if the concurrent data insertion threads on the client side can reach a big enough number, for example 50, the data ingestion performance will be still good enough. Our verification shows that the drop is only 30% when wal_fsync_period is set to 3000 milliseconds.
## Disaster Recovery
TDengine provides disaster recovery by using taosX to replicate data between two TDengine clusters which are deployed in two distant data centers. Assume there are two TDengine clusters, A and B, A is the source and B is the target, and A takes the workload of writing and querying. You can deploy `taosX` in the data center where cluster A resides in, `taosX` consumes the data written into cluster A and writes into cluster B. If the data center of cluster A is disrupted because of disaster, you can switch to cluster B to take the workload of data writing and querying, and deploy a `taosX` in the data center of cluster B to replicate data from cluster B to cluster A if cluster A has been recovered, or another cluster C if cluster A has not been recovered.
You can use the data replication feature of `taosX` to build more complicated disaster recovery solution.
taosX is only provided in TDengine enterprise edition, for more details please contact business@tdengine.com.

View File

@ -1,62 +0,0 @@
---
title: Data Import
description: This document describes how to import data into TDengine.
---
There are multiple ways of importing data provided by TDengine: import with script, import from data file, import using `taosdump`.
## Import Using Script
TDengine CLI `taos` supports `source <filename>` command for executing the SQL statements in the file in batch. The SQL statements for creating databases, creating tables, and inserting rows can be written in a single file with one statement on each line, then the file can be executed using the `source` command in TDengine CLI `taos` to execute the SQL statements in order and in batch. In the script file, any line beginning with "#" is treated as comments and ignored silently.
## Import from Data File
In TDengine CLI, data can be imported from a CSV file into an existing table. The data in a single CSV must belong to the same table and must be consistent with the schema of that table. The SQL statement is as below:
```sql
insert into tb1 file 'path/data.csv';
```
:::note
If there is a description in the first line of the CSV file, please remove it before importing. If there is no value for a column, please use `NULL` without quotes.
:::
For example, there is a subtable d1001 whose schema is as below:
```sql
taos> DESCRIBE d1001
Field | Type | Length | Note |
=================================================================================
ts | TIMESTAMP | 8 | |
current | FLOAT | 4 | |
voltage | INT | 4 | |
phase | FLOAT | 4 | |
location | BINARY | 64 | TAG |
groupid | INT | 4 | TAG |
```
The format of the CSV file to be imported, data.csv, is as below:
```csv
'2018-10-04 06:38:05.000',10.30000,219,0.31000
'2018-10-05 06:38:15.000',12.60000,218,0.33000
'2018-10-06 06:38:16.800',13.30000,221,0.32000
'2018-10-07 06:38:05.000',13.30000,219,0.33000
'2018-10-08 06:38:05.000',14.30000,219,0.34000
'2018-10-09 06:38:05.000',15.30000,219,0.35000
'2018-10-10 06:38:05.000',16.30000,219,0.31000
'2018-10-11 06:38:05.000',17.30000,219,0.32000
'2018-10-12 06:38:05.000',18.30000,219,0.31000
```
Then, the below SQL statement can be used to import data from file "data.csv", assuming the file is located under the home directory of the current Linux user.
```sql
taos> insert into d1001 file '~/data.csv';
Query OK, 9 row(s) affected (0.004763s)
```
## Import using taosdump
A convenient tool for importing and exporting data is provided by TDengine, `taosdump`, which can be used to export data from one TDengine cluster and import into another one. For the details of using `taosdump` please refer to the taosdump documentation.

View File

@ -1,22 +0,0 @@
---
title: Data Export
description: This document describes how to export data from TDengine.
---
There are two ways of exporting data from a TDengine cluster:
- Using a SQL statement in TDengine CLI
- Using the `taosdump` tool
## Export Using SQL
If you want to export the data of a table or a STable, please execute the SQL statement below, in the TDengine CLI.
```sql
select * from <tb_name> >> data.csv;
```
The data of table or STable specified by `tb_name` will be exported into a file named `data.csv` in CSV format.
## Export Using taosdump
With `taosdump`, you can choose to export the data of all databases, a database, a table or a STable, you can also choose to export the data within a time range, or even only export the schema definition of a table. For the details of using `taosdump` please refer to the taosdump documentation.

View File

@ -1,331 +0,0 @@
---
title: TDengine Monitoring
description: This document describes how to monitor your TDengine cluster.
---
After TDengine is started, it automatically writes monitoring data including CPU, memory and disk usage, bandwidth, number of requests, disk I/O speed, slow queries, into a designated database at a predefined interval through taosKeeper. Additionally, some important system operations, like logon, create user, drop database, and alerts and warnings generated in TDengine are written into the `log` database too. A system operator can view the data in `log` database from TDengine CLI or from a web console.
The collection of the monitoring information is enabled by default, but can be disabled by parameter `monitor` in the configuration file.
## TDinsight
TDinsight is a complete solution which uses the monitoring database `log` mentioned previously, and Grafana, to monitor a TDengine cluster.
A script `TDinsight.sh` is provided to deploy TDinsight automatically.
Download `TDinsight.sh` with the below command:
```bash
wget https://github.com/taosdata/grafanaplugin/raw/master/dashboards/TDinsight.sh
chmod +x TDinsight.sh
```
Prepare:
1. TDengine Server
- The URL of REST service: for example `http://localhost:6041` if TDengine is deployed locally
- User name and password
2. Grafana Alert Notification
You can use below command to setup Grafana alert notification.
An existing Grafana Notification Channel can be specified with parameter `-E`, the notifier uid of the channel can be obtained by `curl -u admin:admin localhost:3000/api/alert-notifications |jq`
```bash
./TDinsight.sh -a http://localhost:6041 -u root -p taosdata -E <notifier uid>
```
Launch `TDinsight.sh` with the command above and restart Grafana, then open Dashboard `http://localhost:3000/d/tdinsight`.
## log database
The data of tdinsight dashboard is stored in `log` database (default. You can change it in taoskeeper's config file. For more infrmation, please reference to [taoskeeper document](../../reference/components/taosKeeper)). The taoskeeper will create log database on taoskeeper startup.
### taosd\_cluster\_basic table
`taosd_cluster_basic` table contains cluster basic information.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|first\_ep|VARCHAR||first ep of cluster|
|first\_ep\_dnode\_id|INT||dnode id or first\_ep|
|cluster_version|VARCHAR||tdengine version. such as: 3.0.4.0|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_cluster\_info table
`taosd_cluster_info` table contains cluster information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|cluster\_uptime|DOUBLE||seconds of master's uptime|
|dbs\_total|DOUBLE||total number of databases in cluster|
|tbs\_total|DOUBLE||total number of tables in cluster|
|stbs\_total|DOUBLE||total number of stables in cluster|
|dnodes\_total|DOUBLE||total number of dnodes in cluster|
|dnodes\_alive|DOUBLE||total number of dnodes in ready state|
|mnodes\_total|DOUBLE||total number of mnodes in cluster|
|mnodes\_alive|DOUBLE||total number of mnodes in ready state|
|vgroups\_total|DOUBLE||total number of vgroups in cluster|
|vgroups\_alive|DOUBLE||total number of vgroups in ready state|
|vnodes\_total|DOUBLE||total number of vnode in cluster|
|vnodes\_alive|DOUBLE||total number of vnode in ready state|
|connections\_total|DOUBLE||total number of connections to cluster|
|topics\_total|DOUBLE||total number of topics in cluster|
|streams\_total|DOUBLE||total number of streams in cluster|
|grants_expire\_time|DOUBLE||time until grants expire in seconds|
|grants_timeseries\_used|DOUBLE||timeseries used|
|grants_timeseries\_total|DOUBLE||total timeseries|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_vgroups\_info table
`taosd_vgroups_info` table contains vgroups information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|tables\_num|DOUBLE||number of tables per vgroup|
|status|DOUBLE||status, value range:unsynced = 0, ready = 1|
|vgroup\_id|VARCHAR|TAG|vgroup id|
|database\_name|VARCHAR|TAG|database for the vgroup|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_dnodes\_info table
`taosd_dnodes_info` table contains dnodes information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|uptime|DOUBLE||dnode uptime in `seconds`|
|cpu\_engine|DOUBLE||cpu usage of tdengine. read from `/proc/<taosd_pid>/stat`|
|cpu\_system|DOUBLE||cpu usage of server. read from `/proc/stat`|
|cpu\_cores|DOUBLE||cpu cores of server|
|mem\_engine|DOUBLE||memory usage of tdengine. read from `/proc/<taosd_pid>/status`|
|mem\_free|DOUBLE||available memory on the server in `KB`|
|mem\_total|DOUBLE||total memory of server in `KB`|
|disk\_used|DOUBLE||usage of data dir in `bytes`|
|disk\_total|DOUBLE||the capacity of data dir in `bytes`|
|system\_net\_in|DOUBLE||network throughput rate in byte/s. read from `/proc/net/dev`|
|system\_net\_out|DOUBLE||network throughput rate in byte/s. read from `/proc/net/dev`|
|io\_read|DOUBLE||io throughput rate in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_write|DOUBLE||io throughput rate in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_read\_disk|DOUBLE||io throughput rate of disk in byte/s. read from `/proc/<taosd_pid>/io`|
|io\_write\_disk|DOUBLE||io throughput rate of disk in byte/s. read from `/proc/<taosd_pid>/io`|
|vnodes\_num|DOUBLE||number of vnodes per dnode|
|masters|DOUBLE||number of master vnodes|
|has\_mnode|DOUBLE||if the dnode has mnode, value range:include=1, not_include=0|
|has\_qnode|DOUBLE||if the dnode has qnode, value range:include=1, not_include=0|
|has\_snode|DOUBLE||if the dnode has snode, value range:include=1, not_include=0|
|has\_bnode|DOUBLE||if the dnode has bnode, value range:include=1, not_include=0|
|error\_log\_count|DOUBLE||error count|
|info\_log\_count|DOUBLE||info count|
|debug\_log\_count|DOUBLE||debug count|
|trace\_log\_count|DOUBLE||trace count|
|dnode\_id|VARCHAR|TAG|dnode id|
|dnode\_ep|VARCHAR|TAG|dnode endpoint|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_dnodes\_status table
`taosd_dnodes_status` table contains dnodes information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|status|DOUBLE||dnode status, value range:ready=1offline =0|
|dnode\_id|VARCHAR|TAG|dnode id|
|dnode\_ep|VARCHAR|TAG|dnode endpoint|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_dnodes\_log\_dir table
`log_dir` table contains log directory information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|avail|DOUBLE||available space for log directory in `bytes`|
|used|DOUBLE||used space for data directory in `bytes`|
|total|DOUBLE||total space for data directory in `bytes`|
|name|VARCHAR|TAG|log directory. default is `/var/log/taos/`|
|dnode\_id|VARCHAR|TAG|dnode id|
|dnode\_ep|VARCHAR|TAG|dnode endpoint|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_dnodes\_data\_dir table
`taosd_dnodes_data_dir` table contains data directory information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|avail|DOUBLE||available space for data directory in `bytes`|
|used|DOUBLE||used space for data directory in `bytes`|
|total|DOUBLE||total space for data directory in `bytes`|
|level|VARCHAR|TAG|level for multi-level storage|
|name|VARCHAR|TAG|data directory. default is `/var/lib/taos`|
|dnode\_id|VARCHAR|TAG|dnode id|
|dnode\_ep|VARCHAR|TAG|dnode endpoint|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_mnodes\_info table
`taosd_mnodes_info` table contains mnode information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|role|DOUBLE||the role of mnode. value range:offline = 0,follower = 100,candidate = 101,leader = 102,error = 103,learner = 104|
|mnode\_id|VARCHAR|TAG|master node id|
|mnode\_ep|VARCHAR|TAG|master node endpoint|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_vnodes\_role table
`taosd_vnodes_role` table contains vnode role information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|role|DOUBLE||role. value range:offline = 0,follower = 100,candidate = 101,leader = 102,error = 103,learner = 104|
|vgroup\_id|VARCHAR|TAG|vgroup id|
|database\_name|VARCHAR|TAG|database for the vgroup|
|dnode\_id|VARCHAR|TAG|dnode id|
|cluster\_id|VARCHAR|TAG|cluster id|
### taosd\_sql\_req table
`taosd_sql_req` tables contains taosd sql records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|count|DOUBLE||sql count|
|result|VARCHAR|TAG|sql execution resultvalue range: Success, Failed|
|username|VARCHAR|TAG|user name who executed the sql|
|sql\_type|VARCHAR|TAG|sql typevalue range:inserted_rows|
|dnode\_id|VARCHAR|TAG|dnode id|
|dnode\_ep|VARCHAR|TAG|dnode endpoint|
|vgroup\_id|VARCHAR|TAG|dnode id|
|cluster\_id|VARCHAR|TAG|cluster id|
### taos\_sql\_req 表
`taos_sql_req` tables contains taos sql records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|count|DOUBLE||sql count|
|result|VARCHAR|TAG|sql execution resultvalue range: Success, Failed|
|username|VARCHAR|TAG|user name who executed the sql|
|sql\_type|VARCHAR|TAG|sql typevalue range:select, insertdelete|
|cluster\_id|VARCHAR|TAG|cluster id|
### taos\_slow\_sql 表
`taos_slow_sql` ables contains taos slow sql records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|count|DOUBLE||sql count|
|result|VARCHAR|TAG|sql execution resultvalue range: Success, Failed|
|username|VARCHAR|TAG|user name who executed the sql|
|duration|VARCHAR|TAG|sql execution durationvalue range:3-10s,10-100s,100-1000s,1000s-|
|cluster\_id|VARCHAR|TAG|cluster id|
### keeper\_monitor table
`keeper_monitor` table contains keeper monitor information records.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|ts|TIMESTAMP||timestamp|
|cpu|FLOAT||cpu usage|
|mem|FLOAT||memory usage|
|identify|NCHAR|TAG||
### taosadapter\_restful\_http\_request\_total table
`taosadapter_restful_http_request_total` table contains taosadapter rest request information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|client\_ip|NCHAR|TAG|client ip|
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
|status\_code|NCHAR|TAG|status code|
### taosadapter\_restful\_http\_request\_fail table
`taosadapter_restful_http_request_fail` table contains taosadapter failed rest request information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|client\_ip|NCHAR|TAG|client ip|
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
|status\_code|NCHAR|TAG|status code|
### taosadapter\_restful\_http\_request\_in\_flight table
`taosadapter_restful_http_request_in_flight` table contains taosadapter rest request information record in real time. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|endpoint|NCHAR|TAG|taosadpater endpoint|
### taosadapter\_restful\_http\_request\_summary\_milliseconds table
`taosadapter_restful_http_request_summary_milliseconds` table contains the summary or rest information record. The timestamp column of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|count|DOUBLE|||
|sum|DOUBLE|||
|0.5|DOUBLE|||
|0.9|DOUBLE|||
|0.99|DOUBLE|||
|0.1|DOUBLE|||
|0.2|DOUBLE|||
|endpoint|NCHAR|TAG|taosadpater endpoint|
|request\_method|NCHAR|TAG|request method|
|request\_uri|NCHAR|TAG|request uri|
### taosadapter\_system\_mem\_percent table
`taosadapter_system_mem_percent` table contains taosadapter memory usage information. The timestamp of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||metric value|
|endpoint|NCHAR|TAG|taosadpater endpoint|
### taosadapter\_system\_cpu\_percent table
`taosadapter_system_cpu_percent` table contains taosadapter cup usage information. The timestamp of this table is `_ts`.
|field|type|is\_tag|comment|
|:----|:---|:-----|:------|
|\_ts|TIMESTAMP||timestamp|
|gauge|DOUBLE||mertic value|
|endpoint|NCHAR|TAG|taosadpater endpoint|

View File

@ -1,72 +0,0 @@
---
title: Problem Diagnostics
description: This document describes how to diagnose issues with your TDengine cluster.
---
## Network Connection Diagnostics
When a TDengine client is unable to access a TDengine server, the network connection between the client side and the server side must be checked to find the root cause and resolve problems.
Diagnostics for network connections can be executed between Linux/Windows/macOS.
Diagnostic steps:
1. If the port range to be diagnosed is being occupied by a `taosd` server process, please first stop `taosd.
2. On the server side, execute command `taos -n server -P <port> -l <pktlen>` to monitor the port range starting from the port specified by `-P` parameter with the role of "server".
3. On the client side, execute command `taos -n client -h <fqdn of server> -P <port> -l <pktlen>` to send a testing package to the specified server and port.
-l &lt;pktlen&gt;: The size of the testing package, in bytes. The value range is [11, 64,000] and default value is 1,000.
Please note that the package length must be same in the above 2 commands executed on server side and client side respectively.
Output of the server side for the example is below:
```bash
# taos -n server -P 6030 -l 1000
network test server is initialized, port:6030
request is received, size:1000
request is received, size:1000
...
...
...
request is received, size:1000
request is received, size:1000
```
Output of the client side for the example is below:
```bash
# taos -n client -h 172.27.0.7 -P 6000
taos -n client -h v3s2 -P 6030 -l 1000
network test client is initialized, the server is v3s2:6030
request is sent, size:1000
response is received, size:1000
request is sent, size:1000
response is received, size:1000
...
...
...
request is sent, size:1000
response is received, size:1000
request is sent, size:1000
response is received, size:1000
total succ: 100/100 cost: 16.23 ms speed: 5.87 MB/s
```
The output needs to be checked carefully for the system operator to find the root cause and resolve the problem.
## Server Log
The parameter `debugFlag` is used to control the log level of the `taosd` server process. The default value is 131. For debugging and tracing, it needs to be set to either 135 or 143 respectively.
Once this parameter is set to 135 or 143, the log file grows very quickly especially when there is a huge volume of data insertion and data query requests. Ensure that the disk drive on which logs are stored has sufficient space.
## Client Log
An independent log file, named as "taoslog+&lt;seq num&gt;" is generated for each client program, i.e. a client process. The parameter `debugFlag` is used to control the log level. The default value is 131. For debugging and tracing, it needs to be set to either 135 or 143 respectively.
The default value of `debugFlag` is also 131 and only logs at level of INFO/ERROR/WARNING are recorded. As stated above, for debugging and tracing, it needs to be changed to 135 or 143 respectively, so that logs at DEBUG or TRACE level can be recorded.
The maximum length of a single log file is controlled by parameter `numOfLogLines` and only 2 log files are kept for each `taosd` server process.
Log files are written in an async way to minimize the workload on disk, but the trade off for performance is that a few log lines may be lost in some extreme conditions. You can configure asynclog to 0 when needed for troubleshooting purposes to ensure that no log information is lost.

View File

@ -1,13 +0,0 @@
---
title: Administration
description: This document describes how to perform management operations on your TDengine cluster from an administrator's perspective.
---
This chapter is mainly written for system administrators. It covers download, install/uninstall, data import/export, system monitoring, user management, connection management, capacity planning and system optimization.
```mdx-code-block
import DocCardList from '@theme/DocCardList';
import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
<DocCardList items={useCurrentSidebarCategory().items}/>
```

View File

@ -1,17 +0,0 @@
#### Unified Database Access Interface
```go title="Native Connection"
{{#include docs/examples/go/connect/cgoexample/main.go}}
```
```go title="REST Connection"
{{#include docs/examples/go/connect/restexample/main.go}}
```
#### Advanced Features
The af package of driver-go can also be used to establish connection, with this way some advanced features of TDengine, like parameter binding and subscription, can be used.
```go title="Establish native connection using af package"
{{#include docs/examples/go/connect/afconn/main.go}}
```

View File

@ -1,8 +0,0 @@
```rust title="Native Connection"
{{#include docs/examples/rust/nativeexample/examples/connect.rs}}
```
:::note
For Rust client library, the connection depends on the feature being used. If "ws" feature is enabled, then only the implementation for "websocket" is compiled and packaged.
:::

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

View File

@ -1,300 +0,0 @@
---
title: Connect to TDengine
sidebar_label: Connect
description: This document describes how to establish connections to TDengine and how to install and use TDengine client libraries.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import ConnJava from "./_connect_java.mdx";
import ConnGo from "./_connect_go.mdx";
import ConnRust from "./_connect_rust.mdx";
import ConnNode from "./_connect_node.mdx";
import ConnPythonNative from "./_connect_python.mdx";
import ConnCSNative from "./_connect_cs.mdx";
import ConnC from "./_connect_c.mdx";
import ConnR from "./_connect_r.mdx";
import ConnPHP from "./_connect_php.mdx";
import InstallOnLinux from "../../14-reference/05-connectors/_linux_install.mdx";
import InstallOnWindows from "../../14-reference/05-connectors/_windows_install.mdx";
import InstallOnMacOS from "../../14-reference/05-connectors/_macos_install.mdx";
import VerifyLinux from "../../14-reference/05-connectors/_verify_linux.mdx";
import VerifyWindows from "../../14-reference/05-connectors/_verify_windows.mdx";
import VerifyMacOS from "../../14-reference/05-connectors/_verify_macos.mdx";
Any application running on any platform can access TDengine through the REST API provided by TDengine. For information, see [REST API](../../reference/connectors/rest-api/). Applications can also use the client libraries for various programming languages, including C/C++, Java, Python, Go, Node.js, C#, and Rust, to access TDengine. These client libraries support connecting to TDengine clusters using both native interfaces (taosc). Some client libraries also support connecting over a REST interface. Community developers have also contributed several unofficial client libraries, such as the ADO.NET, Lua, and PHP libraries.
## Establish Connection
There are three ways for a client library to establish connections to TDengine:
1. Native connection through the TDengine client driver (taosc).
2. REST connection through the REST API provided by the taosAdapter component.
3. Websocket connection provided by the taosAdapter component.
![TDengine connection type](connection-type-en.webp)
For these ways of connections, client libraries provide similar APIs for performing operations and running SQL statements on your databases. The main difference is the method of establishing the connection, which is not visible to users.
Key differences:
1. For a Native connection, the client driver taosc and the server TDengine version must be compatible.
2. For a REST connection, users do not need to install the client driver taosc, providing the advantage of cross-platform ease of use. However, functions such as data subscription and binary data types are not available. Additionally, compared to Native and Websocket connections, a REST connection has the worst performance.
3. For a Websocket connection, users also do not need to install the client driver taosc.
4. To connect to a cloud service instance, you need to use the REST connection or Websocket connection.
Normally we recommend using **Websocket connection**.
## Install Client Driver taosc
If you are choosing to use the native connection and the the application is not on the same host as TDengine server, the TDengine client driver taosc needs to be installed on the application host. If choosing to use the REST connection or the application is on the same host as TDengine server, this step can be skipped. It's better to use same version of taosc as the TDengine server.
### Install
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<InstallOnLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<InstallOnWindows />
</TabItem>
<TabItem value="macos" label="MacOS">
<InstallOnMacOS />
</TabItem>
</Tabs>
### Verify
After the above installation and configuration are done and making sure TDengine service is already started and in service, the TDengine command-line interface `taos` can be launched to access TDengine.
<Tabs defaultValue="linux" groupId="os">
<TabItem value="linux" label="Linux">
<VerifyLinux />
</TabItem>
<TabItem value="windows" label="Windows">
<VerifyWindows />
</TabItem>
<TabItem value="macos" label="MacOS">
<VerifyMacOS />
</TabItem>
</Tabs>
## Install Client Library
<Tabs groupId="lang">
<TabItem label="Java" value="java">
If `maven` is used to manage the projects, what needs to be done is only adding below dependency in `pom.xml`.
```xml
<dependency>
<groupId>com.taosdata.jdbc</groupId>
<artifactId>taos-jdbcdriver</artifactId>
<version>3.3.3</version>
</dependency>
```
</TabItem>
<TabItem label="Python" value="python">
Install from PyPI using `pip`:
```
pip install taospy
```
Install from Git URL:
```
pip install git+https://github.com/taosdata/taos-connector-python.git
```
</TabItem>
<TabItem label="Go" value="go">
Just need to add `driver-go` dependency in `go.mod` .
```go-mod title=go.mod
module goexample
go 1.17
require github.com/taosdata/driver-go/v3 latest
```
:::note
`driver-go` uses `cgo` to wrap the APIs provided by taosc, while `cgo` needs `gcc` to compile source code in C language, so please make sure you have proper `gcc` on your system.
:::
</TabItem>
<TabItem label="Rust" value="rust">
Just need to add `taos` dependency in `Cargo.toml`.
```toml title=Cargo.toml
[dependencies]
taos = { version = "*"}
```
:::info
Rust client library uses different features to distinguish the way to establish connection. To establish Websocket connection, please enable `ws` feature.
```toml
taos = { version = "*", default-features = false, features = ["ws"] }
```
:::
</TabItem>
<TabItem label="Node.js" value="node">
Node.js client library provides different ways of establishing connections by providing different packages.
1. Install Node.js Native Client Library
```
npm install @tdengine/client
```
:::note
It's recommend to use Node whose version is between `node-v12.8.0` and `node-v13.0.0`.
:::
2. Install Node.js REST Client Library
```
npm install @tdengine/rest
```
</TabItem>
<TabItem label="C#" value="csharp">
Just need to add the reference to [TDengine.Connector](https://www.nuget.org/packages/TDengine.Connector/) in the project configuration file.
```xml title=csharp.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net6.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<StartupObject>TDengineExample.AsyncQueryExample</StartupObject>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="TDengine.Connector" Version="3.1.0" />
</ItemGroup>
</Project>
```
Or add by `dotnet` command.
```
dotnet add package TDengine.Connector
```
:::note
The sample code below are based on dotnet6.0, they may need to be adjusted if your dotnet version is not exactly same.
:::
</TabItem>
<TabItem label="R" value="r">
1. Download [taos-jdbcdriver-version-dist.jar](https://repo1.maven.org/maven2/com/taosdata/jdbc/taos-jdbcdriver/3.0.0/).
2. Install the dependency package `RJDBC`:
```R
install.packages("RJDBC")
```
</TabItem>
<TabItem label="C" value="c">
If the client driver (taosc) is already installed, then the C client library is already available.
<br/>
</TabItem>
<TabItem label="PHP" value="php">
**Download Source Code Package and Unzip: **
```shell
curl -L -o php-tdengine.tar.gz https://github.com/Yurunsoft/php-tdengine/archive/refs/tags/v1.0.2.tar.gz \
&& mkdir php-tdengine \
&& tar -xzf php-tdengine.tar.gz -C php-tdengine --strip-components=1
```
> Version number `v1.0.2` is only for example, it can be replaced to any newer version.
**Non-Swoole Environment: **
```shell
phpize && ./configure && make -j && make install
```
**Specify TDengine Location: **
```shell
phpize && ./configure --with-tdengine-dir=/usr/local/Cellar/tdengine/3.0.0.0 && make -j && make install
```
> `--with-tdengine-dir=` is followed by the TDengine installation location.
> This way is useful in case TDengine location can't be found automatically or macOS.
**Swoole Environment: **
```shell
phpize && ./configure --enable-swoole && make -j && make install
```
**Enable The Extension:**
Option One: Add `extension=tdengine` in `php.ini`
Option Two: Specify the extension on CLI `php -d extension=tdengine test.php`
</TabItem>
</Tabs>
## Establish a connection
Prior to establishing connection, please make sure TDengine is already running and accessible. The following sample code assumes TDengine is running on the same host as the client program, with FQDN configured to "localhost" and serverPort configured to "6030".
<Tabs groupId="lang" defaultValue="java">
<TabItem label="Java" value="java">
<ConnJava />
</TabItem>
<TabItem label="Python" value="python">
<ConnPythonNative />
</TabItem>
<TabItem label="Go" value="go">
<ConnGo />
</TabItem>
<TabItem label="Rust" value="rust">
<ConnRust />
</TabItem>
<TabItem label="Node.js" value="node">
<ConnNode />
</TabItem>
<TabItem label="C#" value="csharp">
<ConnCSNative />
</TabItem>
<TabItem label="R" value="r">
<ConnR/>
</TabItem>
<TabItem label="C" value="c">
<ConnC />
</TabItem>
<TabItem label="PHP" value="php">
<ConnPHP />
</TabItem>
</Tabs>
:::tip
If the connection fails, in most cases it's caused by improper configuration for FQDN or firewall. Please refer to the section "Unable to establish connection" in [FAQ](../../train-faq/faq).
:::

View File

@ -1,84 +0,0 @@
---
title: Data Model
description: This document describes the data model of TDengine.
---
The data model employed by TDengine is similar to that of a relational database. You have to create databases and tables. You must design the data model based on your own business and application requirements. You should design the [STable](../../concept/#super-table-stable) (an abbreviation for super table) schema to fit your data. This chapter will explain the big picture without getting into syntactical details.
Note: before you read this chapter, please make sure you have already read through [Key Concepts](../../concept/), since TDengine introduces new concepts like "one table for one [data collection point](../../concept/#data-collection-point)" and "[super table](../../concept/#super-table-stable)".
## Create Database
The characteristics of time-series data from different data collection points may be different. Characteristics include collection frequency, retention policy and others which determine how you create and configure the database. For e.g. days to keep, number of replicas, data block size, whether data updates are allowed and other configurable parameters would be determined by the characteristics of your data and your business requirements. For TDengine to operate with the best performance, we strongly recommend that you create and configure different databases for data with different characteristics. This allows you, for example, to set up different storage and retention policies. When creating a database, there are a lot of parameters that can be configured such as, the days to keep data, the number of replicas, the size of the cache, time precision, the minimum and maximum number of rows in each data block, whether compression is enabled, the time range of the data in single data file and so on. An example is shown as follows:
```sql
CREATE DATABASE power KEEP 365 DURATION 10 BUFFER 16 WAL_LEVEL 1;
```
In the above SQL statement:
- a database named "power" is created
- the data in it is retained for 365 days, which means that data older than 365 days will be deleted automatically
- a new data file will be created every 10 days
- the size of the write cache pool on each VNode is 16 MB
- the number of vgroups is 100
- WAL is enabled but fsync is disabled For more details please refer to [Database](../../reference/taos-sql/database).
After creating a database, the current database in use can be switched using SQL command `USE`. For example the SQL statement below switches the current database to `power`.
```sql
USE power;
```
Without the current database specified, table name must be preceded with the corresponding database name.
:::note
- Any table or STable must belong to a database. To create a table or STable, the database it belongs to must be ready.
- Timestamp needs to be specified when inserting rows or querying historical rows.
:::
## Create STable
In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](../../concept/), the SQL statement below can be used to create the super table.
```sql
CREATE STABLE meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
```
Similar to creating a regular table, when creating a STable, the name and schema need to be provided. In the STable schema, the first column must always be a timestamp (like ts in the example), and the other columns (like current, voltage and phase in the example) are the data collected. The remaining columns can [contain data of type](../../reference/taos-sql/data-type/) integer, float, double, string etc. In addition, the schema for tags, like location and groupId in the example, must be provided. The tag type can be integer, float, string, etc. Tags are essentially the static properties of a data collection point. For example, properties like the location, device type, device group ID, manager ID are tags. Tags in the schema can be added, removed or updated. Please refer to [STable](../../reference/taos-sql/stable) for more details.
For each kind of data collection point, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another data collection point for environmental data like temperature, humidity and wind direction. Multiple STables are required for these kinds of devices.
At most 4096 columns are allowed in a STable. If there are more than 4096 of metrics to be collected for a data collection point, multiple STables are required. There can be multiple databases in a system, while one or more STables can exist in a database.
## Create Table
A specific table needs to be created for each data collection point. Similar to RDBMS, table name and schema are required to create a table. Additionally, one or more tags can be created for each table. To create a table, a STable needs to be used as template and the values need to be specified for the tags. For example, for the smart meters table, the table can be created using below SQL statement.
```sql
CREATE TABLE d1001 USING meters TAGS ("California.SanFrancisco", 2);
```
In the above SQL statement, "d1001" is the table name, "meters" is the STable name, followed by the value of tag "Location" and the value of tag "groupId", which are "California.SanFrancisco" and "2" respectively in the example. The tag values can be updated after the table is created. Please refer to [Tables](../../reference/taos-sql/table) for details.
It's suggested to use the globally unique ID of a data collection point as the table name. For example the device serial number could be used as a unique ID. If a unique ID doesn't exist, multiple IDs that are not globally unique can be combined to form a globally unique ID. It's not recommended to use a globally unique ID as tag value.
## Create Table Automatically
In some circumstances, it's unknown whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exists.
```sql
INSERT INTO d1001 USING meters TAGS ("California.SanFrancisco", 2) VALUES (now, 10.2, 219, 0.32);
```
In the above SQL statement, a row with value `(now, 10.2, 219, 0.32)` will be inserted into table "d1001". If table "d1001" doesn't exist, it will be created automatically using STable "meters" as template with tag value `"California.SanFrancisco", 2`.
For more details please refer to [Create Table Automatically](../../reference/taos-sql/insert#automatically-create-table-when-inserting).
## Single Column vs Multiple Column
A multiple columns data model is supported in TDengine. As long as multiple metrics are collected by the same data collection point at the same time, i.e. the timestamps are identical, these metrics can be put in a single STable as columns. However, there is another kind of design, i.e. single column data model in which a table is created for each metric. This means that a STable is required for each kind of metric. For example in a single column model, 3 STables would be required for current, voltage and phase.
It's recommended to use a multiple column data model as much as possible because insert and query performance is higher. In some cases, however, the collected metrics may vary frequently and so the corresponding STable schema needs to be changed frequently too. In such cases, it's more convenient to use single column data model.

View File

@ -1,144 +0,0 @@
---
title: Insert Using SQL
description: This document describes how to insert data into TDengine using SQL.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaSQL from "./_java_sql.mdx";
import JavaStmt from "./_java_stmt.mdx";
import PySQL from "./_py_sql.mdx";
import PyStmt from "./_py_stmt.mdx";
import GoSQL from "./_go_sql.mdx";
import GoStmt from "./_go_stmt.mdx";
import RustSQL from "./_rust_sql.mdx";
import RustStmt from "./_rust_stmt.mdx";
import NodeSQL from "./_js_sql.mdx";
import NodeStmt from "./_js_stmt.mdx";
import CsSQL from "./_cs_sql.mdx";
import CsStmt from "./_cs_stmt.mdx";
import CSQL from "./_c_sql.mdx";
import CStmt from "./_c_stmt.mdx";
import PhpSQL from "./_php_sql.mdx";
import PhpStmt from "./_php_stmt.mdx";
## Introduction
Application programs can execute `INSERT` statement through client libraries to insert rows. The TDengine CLI can also be used to manually insert data.
### Insert Single Row
The below SQL statement is used to insert one row into table "d1001".
```sql
INSERT INTO d1001 VALUES (ts1, 10.3, 219, 0.31);
```
`ts1` is Unix timestamp, the timestamps which is larger than the difference between current time and KEEP in config is only allowed. For further detail, refer to [TDengine SQL insert timestamp section](../../../reference/taos-sql/insert).
### Insert Multiple Rows
Multiple rows can be inserted in a single SQL statement. The example below inserts 2 rows into table "d1001".
```sql
INSERT INTO d1001 VALUES (ts2, 10.2, 220, 0.23) (ts2, 10.3, 218, 0.25);
```
`ts1` and `ts2` is Unix timestamp, the timestamps which is larger than the difference between current time and KEEP in config is only allowed. For further detail, refer to [TDengine SQL insert timestamp section](../../../reference/taos-sql/insert).
### Insert into Multiple Tables
Data can be inserted into multiple tables in the same SQL statement. The example below inserts 2 rows into table "d1001" and 1 row into table "d1002".
```sql
INSERT INTO d1001 VALUES (ts1, 10.3, 219, 0.31) (ts2, 12.6, 218, 0.33) d1002 VALUES (ts3, 12.3, 221, 0.31);
```
`ts1`, `ts2` and `ts3` is Unix timestamp, the timestamps which is larger than the difference between current time and KEEP in config is only allowed. For further detail, refer to [TDengine SQL insert timestamp section](../../../reference/taos-sql/insert).
For more details about `INSERT` please refer to [INSERT](../../../reference/taos-sql/insert).
:::info
- Inserting in batches can improve performance. The higher the batch size, the better the performance. Please note that a single row can't exceed 48K bytes and each SQL statement can't exceed 1MB.
- Inserting with multiple threads can also improve performance. However, at a certain point, increasing the number of threads no longer offers any benefit and can even decrease performance due to the overhead involved in frequent thread switching. The optimal number of threads for a system depends on the processing capabilities and configuration of the server, the configuration of the database, the data schema, and the batch size for writing data. In general, more powerful clients and servers can support higher numbers of concurrently writing threads. Given a sufficiently powerful server, a higher number of vgroups for a database also increases the number of concurrent writes. Finally, a simpler data schema enables more concurrent writes as well.
:::
:::warning
- If the timestamp of a new record already exists in a table, columns with new data for that timestamp replace old data with new data, while columns without new data are not affected.
- The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted cannot be newer than the timestamp of current time plus parameter `DURATION`. If `DURATION` is set to 2, the data newer than 2 days later can't be inserted.
:::
## Sample program
### Insert Using SQL
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaSQL />
</TabItem>
<TabItem label="Python" value="python">
<PySQL />
</TabItem>
<TabItem label="Go" value="go">
<GoSQL />
</TabItem>
<TabItem label="Rust" value="rust">
<RustSQL />
</TabItem>
<TabItem label="Node.js" value="node">
<NodeSQL />
</TabItem>
<TabItem label="C#" value="csharp">
<CsSQL />
</TabItem>
<TabItem label="C" value="c">
<CSQL />
</TabItem>
<TabItem label="PHP" value="php">
<PhpSQL />
</TabItem>
</Tabs>
:::note
1. With either native connection or REST connection, the above samples can work well.
2. Please note that `use db` can't be used with a REST connection because REST connections are stateless, so in the samples `dbName.tbName` is used to specify the table name.
:::
### Insert with Parameter Binding
TDengine also provides API support for parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. This avoids the resource consumption of SQL syntax parsing when writing data through the parameter binding interface, thus significantly improving write performance in most cases.
Parameter binding is available only with native connection.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaStmt />
</TabItem>
<TabItem label="Python" value="python">
<PyStmt />
</TabItem>
<TabItem label="Go" value="go">
<GoStmt />
</TabItem>
<TabItem label="Rust" value="rust">
<RustStmt />
</TabItem>
<TabItem label="Node.js" value="node">
<NodeStmt />
</TabItem>
<TabItem label="C#" value="csharp">
<CsStmt />
</TabItem>
<TabItem label="C" value="c">
<CStmt />
</TabItem>
<TabItem label="PHP" value="php">
<PhpStmt />
</TabItem>
</Tabs>

View File

@ -1,47 +0,0 @@
---
title: Write from Kafka
description: This document describes how to insert data into TDengine using Kafka.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import PyKafka from "./_py_kafka.mdx";
## About Kafka
Apache Kafka is an open-source distributed event streaming platform, used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. For the key concepts of kafka, please refer to [kafka documentation](https://kafka.apache.org/documentation/#gettingStarted).
### kafka topic
Messages in Kafka are organized by topics. A topic may have one or more partitions. We can manage kafka topics through `kafka-topics`.
create a topic named `kafka-events`:
```
bin/kafka-topics.sh --create --topic kafka-events --bootstrap-server localhost:9092
```
Alter `kafka-events` topic to set partitions to 3:
```
bin/kafka-topics.sh --alter --topic kafka-events --partitions 3 --bootstrap-server=localhost:9092
```
Show all topics and partitions in Kafka:
```
bin/kafka-topics.sh --bootstrap-server=localhost:9092 --describe
```
## Insert into TDengine
We can write data into TDengine via SQL or Schemaless. For more information, please refer to [Insert Using SQL](../sql-writing/) or [High Performance Writing](../high-volume/) or [Schemaless Writing](../../../reference/schemaless/).
## Examples
<Tabs defaultValue="Python" groupId="lang">
<TabItem label="Python" value="Python">
<PyKafka />
</TabItem>
</Tabs>

View File

@ -1,80 +0,0 @@
---
title: InfluxDB Line Protocol
sidebar_label: InfluxDB Line Protocol
description: This document describes how to insert data into TDengine using the InfluxDB Line Protocol.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaLine from "./_java_line.mdx";
import PyLine from "./_py_line.mdx";
import GoLine from "./_go_line.mdx";
import RustLine from "./_rust_line.mdx";
import NodeLine from "./_js_line.mdx";
import CsLine from "./_cs_line.mdx";
import CLine from "./_c_line.mdx";
## Introduction
In the InfluxDB Line protocol format, a single line of text is used to represent one row of data. Each line contains 4 parts as shown below.
```
measurement,tag_set field_set timestamp
```
- `measurement` will be used as the name of the STable Enter a comma (,) between `measurement` and `tag_set`.
- `tag_set` will be used as tags, with format like `<tag_key>=<tag_value>,<tag_key>=<tag_value>` Enter a space between `tag_set` and `field_set`.
- `field_set`will be used as data columns, with format like `<field_key>=<field_value>,<field_key>=<field_value>` Enter a space between `field_set` and `timestamp`.
- `timestamp` is the primary key timestamp corresponding to this row of data
For example:
```
meters,location=California.LosAngeles,groupid=2 current=13.4,voltage=223,phase=0.29 1648432611249500
```
:::note
- All the data in `tag_set` will be converted to NCHAR type automatically
- Each data in `field_set` must be self-descriptive for its data type. For example 1.2f32 means a value 1.2 of float type. Without the "f" type suffix, it will be treated as type double
- Multiple kinds of precision can be used for the `timestamp` field. Time precision can be from nanosecond (ns) to hour (h)
- The rule of table name
- The child table name is created automatically in a rule to guarantee its uniqueness.
- You can configure `smlAutoChildTableNameDelimiter` in taos.cfg to specify a delimiter between tag values as the table names. For example, you set `smlAutoChildTableNameDelimiter=-` in taos.cfg, when you insert `st,t0=cpu1,t1=4 c1=3 1626006833639000000`, the child table will be `cpu1-4`
- You can configure `smlChildTableName` in taos.cfg to specify a tag value as the table names if the tag value is unique globally. For example, if a tag is called `tname` and you set `smlChildTableName=tname` in taos.cfg, when you insert `st,tname=cpu1,t1=4 c1=3 1626006833639000000`, the child table `cpu1` will be created automatically. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored
- It is assumed that the order of field_set in a supertable is consistent, meaning that the first record contains all fields and subsequent records store fields in the same order. If the order is not consistent, set smlDataFormat in taos.cfg to false. Otherwise, data will be written out of order and a database error will occur.(smlDataFormat in taos.cfg default to false after version of 3.0.1.3, smlDataFormat is discarded since 3.0.3.0)
:::
For more details please refer to [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v2.0/reference/syntax/line-protocol/) and [TDengine Schemaless](../../../reference/schemaless/)
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaLine />
</TabItem>
<TabItem label="Python" value="Python">
<PyLine />
</TabItem>
<TabItem label="Go" value="go">
<GoLine />
</TabItem>
<TabItem label="Node.js" value="node">
<NodeLine />
</TabItem>
<TabItem label="C#" value="csharp">
<CsLine />
</TabItem>
<TabItem label="C" value="c">
<CLine />
</TabItem>
</Tabs>
## Query Examples
If you want query the data of `location=California.LosAngeles,groupid=2`, here is the query SQL:
```sql
SELECT * FROM meters WHERE location = "California.LosAngeles" AND groupid = 2;
```

View File

@ -1,95 +0,0 @@
---
title: OpenTSDB Line Protocol
sidebar_label: OpenTSDB Line Protocol
description: This document describes how to insert data into TDengine using the OpenTSDB Line Protocol.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaTelnet from "./_java_opts_telnet.mdx";
import PyTelnet from "./_py_opts_telnet.mdx";
import GoTelnet from "./_go_opts_telnet.mdx";
import RustTelnet from "./_rust_opts_telnet.mdx";
import NodeTelnet from "./_js_opts_telnet.mdx";
import CsTelnet from "./_cs_opts_telnet.mdx";
import CTelnet from "./_c_opts_telnet.mdx";
## Introduction
A single line of text is used in OpenTSDB line protocol to represent one row of data. OpenTSDB employs a single column data model, so each line can only contain a single data column. There can be multiple tags. Each line contains 4 parts as below:
```txt
<metric> <timestamp> <value> <tagk_1>=<tagv_1>[ <tagk_n>=<tagv_n>]
```
- `metric` will be used as the STable name.
- `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. Second and millisecond time precision are supported.
- `value` is a metric which must be a numeric value, The corresponding column name is "value".
- The last part is the tag set separated by spaces, all tags will be converted to NCHAR type automatically.
For example:
```txt
meters.current 1648432611250 11.3 location=California.LosAngeles groupid=3
```
- The rule of table name
- The child table name is created automatically in a rule to guarantee its uniqueness.
- You can configure `smlAutoChildTableNameDelimiter` in taos.cfg to specify a delimiter between tag values as the table names. For example, you set `smlAutoChildTableNameDelimiter=-` in taos.cfg, when you insert `st,t0=cpu1,t1=4 c1=3 1626006833639000000`, the child table will be `cpu1-4`
- You can configure `smlChildTableName` in taos.cfg to specify a tag value as the table names if the tag value is unique globally. For example, if a tag is called `tname` and you set `smlChildTableName=tname` in taos.cfg, when you insert `st,tname=cpu1,t1=4 c1=3 1626006833639000000`, the child table `cpu1` will be created automatically. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored
Please refer to [OpenTSDB Telnet API](http://opentsdb.net/docs/build/html/api_telnet/put.html) for more details.
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaTelnet />
</TabItem>
<TabItem label="Python" value="Python">
<PyTelnet />
</TabItem>
<TabItem label="Go" value="go">
<GoTelnet />
</TabItem>
<TabItem label="Node.js" value="node">
<NodeTelnet />
</TabItem>
<TabItem label="C#" value="csharp">
<CsTelnet />
</TabItem>
<TabItem label="C" value="c">
<CTelnet />
</TabItem>
</Tabs>
2 STables will be created automatically and each STable has 4 rows of data in the above sample code.
```cmd
taos> use test;
Database changed.
taos> show stables;
name |
=================================
meters_current |
meters_voltage |
Query OK, 2 row(s) in set (0.002544s)
taos> select tbname, * from `meters_current`;
tbname | _ts | _value | groupid | location |
==================================================================================================================================
t_0e7bcfa21a02331c06764f275... | 2022-03-28 09:56:51.249 | 10.800000000 | 3 | California.LosAngeles |
t_0e7bcfa21a02331c06764f275... | 2022-03-28 09:56:51.250 | 11.300000000 | 3 | California.LosAngeles |
t_7e7b26dd860280242c6492a16... | 2022-03-28 09:56:51.249 | 10.300000000 | 2 | California.SanFrancisco |
t_7e7b26dd860280242c6492a16... | 2022-03-28 09:56:51.250 | 12.600000000 | 2 | California.SanFrancisco |
Query OK, 4 row(s) in set (0.005399s)
```
## Query Examples
If you want query the data of `location=California.LosAngeles groupid=3`, here is the query SQL:
```sql
SELECT * FROM `meters_current` WHERE location = "California.LosAngeles" AND groupid = 3;
```

View File

@ -1,108 +0,0 @@
---
title: OpenTSDB JSON Protocol
sidebar_label: OpenTSDB JSON Protocol
description: This document describes how to insert data into TDengine using the OpenTSDB JSON protocol.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import JavaJson from "./_java_opts_json.mdx";
import PyJson from "./_py_opts_json.mdx";
import GoJson from "./_go_opts_json.mdx";
import RustJson from "./_rust_opts_json.mdx";
import NodeJson from "./_js_opts_json.mdx";
import CsJson from "./_cs_opts_json.mdx";
import CJson from "./_c_opts_json.mdx";
## Introduction
A JSON string is used in OpenTSDB JSON to represent one or more rows of data, for example: For example:
```json
[
{
"metric": "sys.cpu.nice",
"timestamp": 1346846400,
"value": 18,
"tags": {
"host": "web01",
"dc": "lga"
}
},
{
"metric": "sys.cpu.nice",
"timestamp": 1346846400,
"value": 9,
"tags": {
"host": "web02",
"dc": "lga"
}
}
]
```
Similar to OpenTSDB line protocol, `metric` will be used as the STable name, `timestamp` is the timestamp to be used, `value` represents the metric collected, `tags` are the tag sets.
Please refer to [OpenTSDB HTTP API](http://opentsdb.net/docs/build/html/api_http/put.html) for more details.
:::note
- In JSON protocol, strings will be converted to NCHAR type and numeric values will be converted to double type.
- The rule of table name
- The child table name is created automatically in a rule to guarantee its uniqueness.
- You can configure `smlAutoChildTableNameDelimiter` in taos.cfg to specify a delimiter between tag values as the table names. For example, you set `smlAutoChildTableNameDelimiter=-` in taos.cfg, when you insert `st,t0=cpu1,t1=4 c1=3 1626006833639000000`, the child table will be `cpu1-4`
- You can configure `smlChildTableName` in taos.cfg to specify a tag value as the table names if the tag value is unique globally. For example, if a tag is called `tname` and you set `smlChildTableName=tname` in taos.cfg, when you insert `st,tname=cpu1,t1=4 c1=3 1626006833639000000`, the child table `cpu1` will be created automatically. Note that if multiple rows have the same tname but different tag_set values, the tag_set of the first row is used to create the table and the others are ignored
:::
## Examples
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
<JavaJson />
</TabItem>
<TabItem label="Python" value="Python">
<PyJson />
</TabItem>
<TabItem label="Go" value="go">
<GoJson />
</TabItem>
<TabItem label="Node.js" value="node">
<NodeJson />
</TabItem>
<TabItem label="C#" value="csharp">
<CsJson />
</TabItem>
<TabItem label="C" value="c">
<CJson />
</TabItem>
</Tabs>
2 STables will be created automatically and each STable has 2 rows of data in the above sample code.
```cmd
taos> use test;
Database changed.
taos> show stables;
name |
=================================
meters_current |
meters_voltage |
Query OK, 2 row(s) in set (0.001954s)
taos> select * from `meters_current`;
_ts | _value | groupid | location |
===================================================================================================================
2022-03-28 09:56:51.249 | 10.300000000 | 2.000000000 | California.SanFrancisco |
2022-03-28 09:56:51.250 | 12.600000000 | 2.000000000 | California.SanFrancisco |
Query OK, 2 row(s) in set (0.004076s)
```
## Query Examples
If you want query the data of "tags": &lcub;"location": "California.LosAngeles", "groupid": 1&rcub;, here is the query SQL:
```sql
SELECT * FROM `meters_current` WHERE location = "California.LosAngeles" AND groupid = 3;
```

View File

@ -1,442 +0,0 @@
---
title: High Performance Writing
sidebar_label: High Performance Writing
description: This document describes how to achieve high performance when writing data into TDengine.
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
This chapter introduces how to write data into TDengine with high throughput.
## How to achieve high performance data writing
To achieve high performance writing, there are a few aspects to consider. In the following sections we will describe these important factors in achieving high performance writing.
### Application Program
From the perspective of application program, you need to consider:
1. The data size of each single write, also known as batch size. Generally speaking, higher batch size generates better writing performance. However, once the batch size is over a specific value, you will not get any additional benefit anymore. When using SQL to write into TDengine, it's better to put as much as possible data in single SQL. The maximum SQL length supported by TDengine is 1,048,576 bytes, i.e. 1 MB.
2. The number of concurrent connections. Normally more connections can get better result. However, once the number of connections exceeds the processing ability of the server side, the performance may downgrade.
3. The distribution of data to be written across tables or sub-tables. Writing to single table in one batch is more efficient than writing to multiple tables in one batch.
4. Data Writing Protocol.
- Parameter binding mode is more efficient than SQL because it doesn't have the cost of parsing SQL.
- Writing to known existing tables is more efficient than writing to uncertain tables in automatic creating mode because the later needs to check whether the table exists or not before actually writing data into it.
- Writing in SQL is more efficient than writing in schemaless mode because schemaless writing creates table automatically and may alter table schema.
Application programs need to take care of the above factors and try to take advantage of them. The application program should write to single table in each write batch. The batch size needs to be tuned to a proper value on a specific system. The number of concurrent connections needs to be tuned to a proper value too to achieve the best writing throughput.
### Data Source
Application programs need to read data from data source then write into TDengine. If you meet one or more of below situations, you need to setup message queues between the threads for reading from data source and the threads for writing into TDengine.
1. There are multiple data sources, the data generation speed of each data source is much slower than the speed of single writing thread. In this case, the purpose of message queues is to consolidate the data from multiple data sources together to increase the batch size of single write.
2. The speed of data generation from single data source is much higher than the speed of single writing thread. The purpose of message queue in this case is to provide buffer so that data is not lost and multiple writing threads can get data from the buffer.
3. The data for single table are from multiple data source. In this case the purpose of message queues is to combine the data for single table together to improve the write efficiency.
If the data source is Kafka, then the application program is a consumer of Kafka, you can benefit from some kafka features to achieve high performance writing:
1. Put the data for a table in single partition of single topic so that it's easier to put the data for each table together and write in batch
2. Subscribe multiple topics to accumulate data together.
3. Add more consumers to gain more concurrency and throughput.
4. Incrase the size of single fetch to increase the size of write batch.
### Tune TDengine
On the server side, database configuration parameter `vgroups` needs to be set carefully to maximize the system performance. If it's set too low, the system capability can't be utilized fully; if it's set too big, unnecessary resource competition may be produced. A normal recommendation for `vgroups` parameter is 2 times of the number of CPU cores. However, depending on the actual system resources, it may still need to tuned.
For more configuration parameters, please refer to [Database Configuration](../../../reference/taos-sql/database) and [Server Configuration](../../../reference/config).
## Sample Programs
This section will introduce the sample programs to demonstrate how to write into TDengine with high performance.
### Scenario
Below are the scenario for the sample programs of high performance writing.
- Application program reads data from data source, the sample program simulates a data source by generating data
- The speed of single writing thread is much slower than the speed of generating data, so the program starts multiple writing threads while each thread establish a connection to TDengine and each thread has a message queue of fixed size.
- Application program maps the received data to different writing threads based on table name to make sure all the data for each table is always processed by a specific writing thread.
- Each writing thread writes the received data into TDengine once the message queue becomes empty or the read data meets a threshold.
![Thread Model of High Performance Writing into TDengine](highvolume.webp)
### Sample Programs
The sample programs listed in this section are based on the scenario described previously. If your scenarios is different, please try to adjust the code based on the principles described in this chapter.
The sample programs assume the source data is for all the different sub tables in same super table (meters). The super table has been created before the sample program starts to writing data. Sub tables are created automatically according to received data. If there are multiple super tables in your case, please try to adjust the part of creating table automatically.
<Tabs defaultValue="java" groupId="lang">
<TabItem label="Java" value="java">
**Program Inventory**
| Class | Description |
| ---------------- | ----------------------------------------------------------------------------------------------------- |
| FastWriteExample | Main Program |
| ReadTask | Read data from simulated data source and put into a queue according to the hash value of table name |
| WriteTask | Read data from Queue, compose a write batch and write into TDengine |
| MockDataSource | Generate data for some sub tables of super table meters |
| SQLWriter | WriteTask uses this class to compose SQL, create table automatically, check SQL length and write data |
| StmtWriter | Write in Parameter binding mode (Not finished yet) |
| DataBaseMonitor | Calculate the writing speed and output on console every 10 seconds |
Below is the list of complete code of the classes in above table and more detailed description.
<details>
<summary>FastWriteExample</summary>
The main Program is responsible for:
1. Create message queues
2. Start writing threads
3. Start reading threads
4. Output writing speed every 10 seconds
The main program provides 4 parameters for tuning:
1. The number of reading threads, default value is 1
2. The number of writing threads, default value is 2
3. The total number of tables in the generated data, default value is 1000. These tables are distributed evenly across all writing threads. If the number of tables is very big, it will cost much time to firstly create these tables.
4. The batch size of single write, default value is 3,000
The capacity of message queue also impacts performance and can be tuned by modifying program. Normally it's always better to have a larger message queue. A larger message queue means lower possibility of being blocked when enqueueing and higher throughput. But a larger message queue consumes more memory space. The default value used in the sample programs is already big enough.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/FastWriteExample.java}}
```
</details>
<details>
<summary>ReadTask</summary>
ReadTask reads data from data source. Each ReadTask is associated with a simulated data source, each data source generates data for a group of specific tables, and the data of any table is only generated from a single specific data source.
ReadTask puts data in message queue in blocking mode. That means, the putting operation is blocked if the message queue is full.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/ReadTask.java}}
```
</details>
<details>
<summary>WriteTask</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/WriteTask.java}}
```
</details>
<details>
<summary>MockDataSource</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/MockDataSource.java}}
```
</details>
<details>
<summary>SQLWriter</summary>
SQLWriter class encapsulates the logic of composing SQL and writing data. Please be noted that the tables have not been created before writing, but are created automatically when catching the exception of table doesn't exist. For other exceptions caught, the SQL which caused the exception are logged for you to debug.
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/SQLWriter.java}}
```
</details>
<details>
<summary>DataBaseMonitor</summary>
```java
{{#include docs/examples/java/src/main/java/com/taos/example/highvolume/DataBaseMonitor.java}}
```
</details>
**Steps to Launch**
<details>
<summary>Launch Java Sample Program</summary>
You need to set environment variable `TDENGINE_JDBC_URL` before launching the program. If TDengine Server is setup on localhost, then the default value for user name, password and port can be used, like below:
```
TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
**Launch in IDE**
1. Clone TDengine repository
```
git clone git@github.com:taosdata/TDengine.git --depth 1
```
2. Use IDE to open `docs/examples/java` directory
3. Configure environment variable `TDENGINE_JDBC_URL`, you can also configure it before launching the IDE, if so you can skip this step.
4. Run class `com.taos.example.highvolume.FastWriteExample`
**Launch on server**
If you want to launch the sample program on a remote server, please follow below steps:
1. Package the sample programs. Execute below command under directory `TDengine/docs/examples/java`:
```
mvn package
```
2. Create `examples/java` directory on the server
```
mkdir -p examples/java
```
3. Copy dependencies (below commands assume you are working on a local Windows host and try to launch on a remote Linux host)
- Copy dependent packages
```
scp -r .\target\lib <user>@<host>:~/examples/java
```
- Copy the jar of sample programs
```
scp -r .\target\javaexample-1.0.jar <user>@<host>:~/examples/java
```
4. Configure environment variable
Edit `~/.bash_profile` or `~/.bashrc` and add below:
```
export TDENGINE_JDBC_URL="jdbc:TAOS://localhost:6030?user=root&password=taosdata"
```
If your TDengine server is not deployed on localhost or doesn't use default port, you need to change the above URL to correct value in your environment.
5. Launch the sample program
```
java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample <read_thread_count> <white_thread_count> <total_table_count> <max_batch_size>
```
6. The sample program doesn't exit unless you press <kbd>CTRL</kbd> + <kbd>C</kbd> to terminate it.
Below is the output of running on a server of 16 cores, 64GB memory and SSD hard disk.
```
root@vm85$ java -classpath lib/*:javaexample-1.0.jar com.taos.example.highvolume.FastWriteExample 2 12
18:56:35.896 [main] INFO c.t.e.highvolume.FastWriteExample - readTaskCount=2, writeTaskCount=12 tableCount=1000 maxBatchSize=3000
18:56:36.011 [WriteThread-0] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.015 [WriteThread-0] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.021 [WriteThread-1] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.022 [WriteThread-1] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.031 [WriteThread-2] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.032 [WriteThread-2] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.041 [WriteThread-3] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.042 [WriteThread-3] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.093 [WriteThread-4] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.094 [WriteThread-4] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.099 [WriteThread-5] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.100 [WriteThread-5] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.100 [WriteThread-6] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.101 [WriteThread-6] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.103 [WriteThread-7] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.104 [WriteThread-7] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.105 [WriteThread-8] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.107 [WriteThread-8] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.108 [WriteThread-9] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.109 [WriteThread-9] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.156 [WriteThread-10] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.157 [WriteThread-11] INFO c.taos.example.highvolume.WriteTask - started
18:56:36.158 [WriteThread-10] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:36.158 [ReadThread-0] INFO com.taos.example.highvolume.ReadTask - started
18:56:36.158 [ReadThread-1] INFO com.taos.example.highvolume.ReadTask - started
18:56:36.158 [WriteThread-11] INFO c.taos.example.highvolume.SQLWriter - maxSQLLength=1048576
18:56:46.369 [main] INFO c.t.e.highvolume.FastWriteExample - count=18554448 speed=1855444
18:56:56.946 [main] INFO c.t.e.highvolume.FastWriteExample - count=39059660 speed=2050521
18:57:07.322 [main] INFO c.t.e.highvolume.FastWriteExample - count=59403604 speed=2034394
18:57:18.032 [main] INFO c.t.e.highvolume.FastWriteExample - count=80262938 speed=2085933
18:57:28.432 [main] INFO c.t.e.highvolume.FastWriteExample - count=101139906 speed=2087696
18:57:38.921 [main] INFO c.t.e.highvolume.FastWriteExample - count=121807202 speed=2066729
18:57:49.375 [main] INFO c.t.e.highvolume.FastWriteExample - count=142952417 speed=2114521
18:58:00.689 [main] INFO c.t.e.highvolume.FastWriteExample - count=163650306 speed=2069788
18:58:11.646 [main] INFO c.t.e.highvolume.FastWriteExample - count=185019808 speed=2136950
```
</details>
</TabItem>
<TabItem label="Python" value="python">
**Program Inventory**
Sample programs in Python uses multi-process and cross-process message queues.
| Function/CLass | Description |
| ---------------------------- | --------------------------------------------------------------------------- |
| main Function | Program entry point, create child processes and message queues |
| run_monitor_process Function | Create database, super table, calculate writing speed and output to console |
| run_read_task Function | Read data and distribute to message queues |
| MockDataSource Class | Simulate data source, return next 1,000 rows of each table |
| run_write_task Function | Read as much as possible data from message queue and write in batch |
| SQLWriter Class | Write in SQL and create table automatically |
| StmtWriter Class | Write in parameter binding mode (not finished yet) |
<details>
<summary>main function</summary>
`main` function is responsible for creating message queues and fork child processes, there are 3 kinds of child processes:
1. Monitoring process, initializes database and calculating writing speed
2. Reading process (n), reads data from data source
3. Writing process (m), writes data into TDengine
`main` function provides 5 parameters:
1. The number of reading tasks, default value is 1
2. The number of writing tasks, default value is 1
3. The number of tables, default value is 1,000
4. The capacity of message queue, default value is 1,000,000 bytes
5. The batch size in single write, default value is 3000
```python
{{#include docs/examples/python/fast_write_example.py:main}}
```
</details>
<details>
<summary>run_monitor_process</summary>
Monitoring process initializes database and monitoring writing speed.
```python
{{#include docs/examples/python/fast_write_example.py:monitor}}
```
</details>
<details>
<summary>run_read_task function</summary>
Reading process reads data from other data system and distributes to the message queue allocated for it.
```python
{{#include docs/examples/python/fast_write_example.py:read}}
```
</details>
<details>
<summary>MockDataSource</summary>
Below is the simulated data source, we assume table name exists in each generated data.
```python
{{#include docs/examples/python/mockdatasource.py}}
```
</details>
<details>
<summary>run_write_task function</summary>
Writing process tries to read as much as possible data from message queue and writes in batch.
```python
{{#include docs/examples/python/fast_write_example.py:write}}
```
</details>
<details>
SQLWriter class encapsulates the logic of composing SQL and writing data. Please be noted that the tables have not been created before writing, but are created automatically when catching the exception of table doesn't exist. For other exceptions caught, the SQL which caused the exception are logged for you to debug. This class also checks the SQL length, and passes the maximum SQL length by parameter maxSQLLength according to actual TDengine limit.
<summary>SQLWriter</summary>
```python
{{#include docs/examples/python/sql_writer.py}}
```
</details>
**Steps to Launch**
<details>
<summary>Launch Sample Program in Python</summary>
1. Prerequisites
- TDengine client driver has been installed
- Python3 has been installed, the the version >= 3.8
- TDengine Python client library `taospy` has been installed
2. Install faster-fifo to replace python builtin multiprocessing.Queue
```
pip3 install faster-fifo
```
3. Click the "Copy" in the above sample programs to copy `fast_write_example.py`, `sql_writer.py`, and `mockdatasource.py`.
4. Execute the program
```
python3 fast_write_example.py <READ_TASK_COUNT> <WRITE_TASK_COUNT> <TABLE_COUNT> <QUEUE_SIZE> <MAX_BATCH_SIZE>
```
Below is the output of running on a server of 16 cores, 64GB memory and SSD hard disk.
```
root@vm85$ python3 fast_write_example.py 8 8
2022-07-14 19:13:45,869 [root] - READ_TASK_COUNT=8, WRITE_TASK_COUNT=8, TABLE_COUNT=1000, QUEUE_SIZE=1000000, MAX_BATCH_SIZE=3000
2022-07-14 19:13:48,882 [root] - WriteTask-0 started with pid 718347
2022-07-14 19:13:48,883 [root] - WriteTask-1 started with pid 718348
2022-07-14 19:13:48,884 [root] - WriteTask-2 started with pid 718349
2022-07-14 19:13:48,884 [root] - WriteTask-3 started with pid 718350
2022-07-14 19:13:48,885 [root] - WriteTask-4 started with pid 718351
2022-07-14 19:13:48,885 [root] - WriteTask-5 started with pid 718352
2022-07-14 19:13:48,886 [root] - WriteTask-6 started with pid 718353
2022-07-14 19:13:48,886 [root] - WriteTask-7 started with pid 718354
2022-07-14 19:13:48,887 [root] - ReadTask-0 started with pid 718355
2022-07-14 19:13:48,888 [root] - ReadTask-1 started with pid 718356
2022-07-14 19:13:48,889 [root] - ReadTask-2 started with pid 718357
2022-07-14 19:13:48,889 [root] - ReadTask-3 started with pid 718358
2022-07-14 19:13:48,890 [root] - ReadTask-4 started with pid 718359
2022-07-14 19:13:48,891 [root] - ReadTask-5 started with pid 718361
2022-07-14 19:13:48,892 [root] - ReadTask-6 started with pid 718364
2022-07-14 19:13:48,893 [root] - ReadTask-7 started with pid 718365
2022-07-14 19:13:56,042 [DataBaseMonitor] - count=6676310 speed=667631.0
2022-07-14 19:14:06,196 [DataBaseMonitor] - count=20004310 speed=1332800.0
2022-07-14 19:14:16,366 [DataBaseMonitor] - count=32290310 speed=1228600.0
2022-07-14 19:14:26,527 [DataBaseMonitor] - count=44438310 speed=1214800.0
2022-07-14 19:14:36,673 [DataBaseMonitor] - count=56608310 speed=1217000.0
2022-07-14 19:14:46,834 [DataBaseMonitor] - count=68757310 speed=1214900.0
2022-07-14 19:14:57,280 [DataBaseMonitor] - count=80992310 speed=1223500.0
2022-07-14 19:15:07,689 [DataBaseMonitor] - count=93805310 speed=1281300.0
2022-07-14 19:15:18,020 [DataBaseMonitor] - count=106111310 speed=1230600.0
2022-07-14 19:15:28,356 [DataBaseMonitor] - count=118394310 speed=1228300.0
2022-07-14 19:15:38,690 [DataBaseMonitor] - count=130742310 speed=1234800.0
2022-07-14 19:15:49,000 [DataBaseMonitor] - count=143051310 speed=1230900.0
2022-07-14 19:15:59,323 [DataBaseMonitor] - count=155276310 speed=1222500.0
2022-07-14 19:16:09,649 [DataBaseMonitor] - count=167603310 speed=1232700.0
2022-07-14 19:16:19,995 [DataBaseMonitor] - count=179976310 speed=1237300.0
```
</details>
:::note
Don't establish connection to TDengine in the parent process if using Python client library in multi-process way, otherwise all the connections in child processes are blocked always. This is a known issue.
:::
</TabItem>
</Tabs>

View File

@ -1,3 +0,0 @@
```c
{{#include docs/examples/c/line_example.c:main}}
```

View File

@ -1,3 +0,0 @@
```c
{{#include docs/examples/c/json_protocol_example.c:main}}
```

View File

@ -1,3 +0,0 @@
```c
{{#include docs/examples/c/telnet_line_example.c:main}}
```

View File

@ -1,3 +0,0 @@
```c
{{#include docs/examples/c/insert_example.c}}
```

View File

@ -1,6 +0,0 @@
```c title=Single Row Binding
{{#include docs/examples/c/stmt_example.c}}
```
```c title=Multiple Row Binding 72:117
{{#include docs/examples/c/multi_bind_example.c}}
```

View File

@ -1,3 +0,0 @@
```csharp
{{#include docs/examples/csharp/influxdbLine/Program.cs}}
```

View File

@ -1,3 +0,0 @@
```csharp
{{#include docs/examples/csharp/optsJSON/Program.cs}}
```

View File

@ -1,3 +0,0 @@
```csharp
{{#include docs/examples/csharp/optsTelnet/Program.cs}}
```

View File

@ -1,3 +0,0 @@
```csharp
{{#include docs/examples/csharp/sqlInsert/Program.cs}}
```

View File

@ -1,3 +0,0 @@
```csharp
{{#include docs/examples/csharp/stmtInsert/Program.cs}}
```

View File

@ -1,3 +0,0 @@
```go
{{#include docs/examples/go/insert/line/main.go}}
```

View File

@ -1,3 +0,0 @@
```go
{{#include docs/examples/go/insert/json/main.go}}
```

View File

@ -1,3 +0,0 @@
```go
{{#include docs/examples/go/insert/telnet/main.go}}
```

View File

@ -1,3 +0,0 @@
```go
{{#include docs/examples/go/insert/sql/main.go}}
```

View File

@ -1,8 +0,0 @@
```go
{{#include docs/examples/go/insert/stmt/main.go}}
```
:::tip
`github.com/taosdata/driver-go/v3/wrapper` module in driver-go is the wrapper for C API, it can be used to insert data with parameter binding.
:::

View File

@ -1,3 +0,0 @@
```java
{{#include docs/examples/java/src/main/java/com/taos/example/LineProtocolExample.java}}
```

View File

@ -1,3 +0,0 @@
```java
{{#include docs/examples/java/src/main/java/com/taos/example/JSONProtocolExample.java}}
```

View File

@ -1,3 +0,0 @@
```java
{{#include docs/examples/java/src/main/java/com/taos/example/TelnetLineProtocolExample.java}}
```

View File

@ -1,3 +0,0 @@
```java
{{#include docs/examples/java/src/main/java/com/taos/example/RestInsertExample.java:insert}}
```

View File

@ -1,3 +0,0 @@
```java
{{#include docs/examples/java/src/main/java/com/taos/example/StmtInsertExample.java}}
```

Some files were not shown because too many files have changed in this diff Show More