Merge branch 'main' into merge/3.3.6tomain

This commit is contained in:
Simon Guan 2025-03-25 09:04:46 +08:00
commit dbbe1997b9
25 changed files with 1424 additions and 0 deletions

View File

@ -0,0 +1,193 @@
---
title: Installation
sidebar_label: Installation
---
## Preparing Your Environment
To use the analytics capabilities offered by TDgpt, you deploy an AI node (anode) in your TDengine cluster. Anodes run on Linux and require Python 3.10 or later.
TDgpt is supported in TDengine 3.3.6 and later. You must upgrade your cluster to version 3.3.6 or later before deploying any anodes.
You can run the following commands to install Python 3.10 in Ubuntu.
### Install Python
```shell
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2
sudo update-alternatives --config python3
sudo apt install python3.10-venv
sudo apt install python3.10-dev
```
### Install pip
```shell
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
```
### Configure Environment Variables
Add `~/.local/bin` to the `PATH` environment variable in `~/.bashrc` or `~/.bash_profile`.
```shell
export PATH=$PATH:~/.local/bin
```
The Python environment has been installed. You can now install TDgpt.
### Install TDgpt
Obtain the installation package `TDengine-anode-3.3.x.x-Linux-x64.tar.gz` and install it on your machine:
```bash
tar -xzvf TDengine-anode-3.3.6.0-Linux-x64.tar.gz
cd TDengine-anode-3.3.6.0
sudo ./install.sh
```
You can run the `rmtaosanode` command to uninstall TDgpt.
To prevent TDgpt from affecting Python environments that may exist on your machine, anodes are installed in a virtual environment. When you install an anode, a virtual Python environment is deployed in the `/var/lib/taos/taosanode/venv/` directory. All libraries required by the anode are installed in this directory. Note that this virtual environment is not uninstalled automatically by the `rmtaosanode` command. If you are sure that you do not want to use TDgpt on a machine, you can remove the directory manually.
### Start the TDgpt Service
The `taosanoded` service is created when you install an anode. You can use systemd to manage this service:
```bash
systemctl start taosanoded
systemctl stop taosanoded
systemctl status taosanoded
```
## Directory and Configuration Information
The directory structure of an anode is described in the following table:
|Directory or File|Description|
|---------------|------|
|/usr/local/taos/taosanode/bin|Directory containing executable files|
|/usr/local/taos/taosanode/resource|Directory containing resource files, linked to `/var/lib/taos/taosanode/resource/`|
|/usr/local/taos/taosanode/lib|Directory containing libraries|
|/usr/local/taos/taosanode/model|Directory containing models, linked to `/var/lib/taos/taosanode/model`|
|/var/log/taos/taosanode/|Log directory|
|/etc/taos/taosanode.ini|Configuration file|
### Configuration
The anode provides services through an uWSGI driver. The configuration for the anode and for uWSGI are both found in the `taosanode.ini` file, located by default in the `/etc/taos/` directory.
The configuration options are described as follows:
```ini
[uwsgi]
# Anode RESTful service ip:port
http = 127.0.0.1:6090
# base directory for Anode python files do NOT modified this
chdir = /usr/local/taos/taosanode/lib
# initialize Anode python file
wsgi-file = /usr/local/taos/taosanode/lib/taos/app.py
# pid file
pidfile = /usr/local/taos/taosanode/taosanode.pid
# conflict with systemctl, so do NOT uncomment this
# daemonize = /var/log/taos/taosanode/taosanode.log
# uWSGI log files
logto = /var/log/taos/taosanode/taosanode.log
# uWSGI monitor port
stats = 127.0.0.1:8387
# python virtual environment directory, used by Anode
virtualenv = /usr/local/taos/taosanode/venv/
[taosanode]
# default taosanode log file
app-log = /var/log/taos/taosanode/taosanode.app.log
# model storage directory
model-dir = /usr/local/taos/taosanode/model/
# default log level
log-level = INFO
```
:::note
Do not specify a value for the `daemonize` parameter. This parameter causes a conflict between uWSGI and systemctl. If you enable the `daemonize` parameter, your anode will fail to start.
:::
The configuration file above includes only the basic configuration needed for an anode to provide services. For more information about configuring uWSGI, see the [official documentation](https://uwsgi-docs.readthedocs.io/en/latest/).
The main configuration options for an anode are described as follows:
- app-log: Specify the directory in which anode log files are stored.
- model-dir: Specify the directory in which models are stored. Models are generated by algorithms based on existing datasets.
- log-level: Specify the log level for anode logs.
## Managing Anodes
You manage anodes through the TDengine CLI. The following actions must be performed within the CLI on a client that is connected to your TDengine cluster.
### Create an Anode
```sql
CREATE ANODE {node_url}
```
The `node_url` parameter determines the IP address and port of the anode. This information will be registered to your TDengine cluster. Do not register a single anode to multiple TDengine clusters.
### View Anodes
You can run the following command to display the FQDN and status of the anodes in your cluster:
```sql
SHOW ANODES;
taos> show anodes;
id | url | status | create_time | update_time |
==================================================================================================================
1 | 192.168.0.1:6090 | ready | 2024-11-28 18:44:27.089 | 2024-11-28 18:44:27.089 |
Query OK, 1 row(s) in set (0.037205s)
```
### View Advanced Analytics Services
```SQL
SHOW ANODES FULL;
taos> show anodes full;
id | type | algo |
============================================================================
1 | anomaly-detection | shesd |
1 | anomaly-detection | iqr |
1 | anomaly-detection | ksigma |
1 | anomaly-detection | lof |
1 | anomaly-detection | grubbs |
1 | anomaly-detection | ad_encoder |
1 | forecast | holtwinters |
1 | forecast | arima |
Query OK, 8 row(s) in set (0.008796s)
```
### Refresh the Algorithm Cache
```SQL
UPDATE ANODE {anode_id}
UPDATE ALL ANODES
```
### Delete an Anode
```sql
DROP ANODE {anode_id}
```
Deleting an anode only removes it from your TDengine cluster. To stop an anode, use systemctl on the machine where the anode is located. To remove an anode, run the `rmtaosanode` command on the machine where the anode is located.

View File

@ -0,0 +1,65 @@
---
title: Data Preprocessing
sidebar_label: Data Preprocessing
---
import Image from '@theme/IdealImage';
import preprocFlow from '../../assets/tdgpt-02.png';
import wnData from '../../assets/tdgpt-03.png'
## Analysis Workflow
Data must be preprocessed before it can be analyzed by TDgpt. This process is described in the following figure:
<figure>
<Image img={preprocFlow} alt="Preprocessing workflow" />
<figcaption>Preprocessing workflow</figcaption>
</figure>
TDgpt first performs a white noise data check on the dataset that you input. Data that passes this check and is intended for use in forecasting is then resampled and its timestamps are aligned. Note that resampling and alignment are not performed for datasets used in anomaly detection.
After the data has been preprocessed, forecasting or anomaly detection is performed. Preprocessing is not part of the business logic for forecasting and anomaly detection.
## White Noise Data Check
<figure>
<Image img={wnData} alt="White noise data"/>
<figcaption>White noise data</figcaption>
</figure>
The white noise data check determines whether the input data consists of random numbers. The figure above shows an example of a regular distribution of random numbers. Random numbers cannot be analyzed meaningfully, and this data is rejected by the system. The white noise data check is performed using the classic Ljung-Box test. The test is performed over an entire time series. If you are certain that your data is not random, you can specify the `wncheck=0` parameter to force TDgpt to skip this check.
TDgpt does not provide white noise checking as an independent feature. It is performed only as part of data preprocessing.
## Resampling and Timestamp Alignment
Time-series data must be preprocessed before forecasting can be performed. Preprocessing is intended to resolve the following two issues:
The timestamps of real time-series datasets are not aligned. It is impossible to guarantee that devices generating data or network gateways create timestamps at strict intervals. For this reason, it cannot be guaranteed that the timestamps of time-series data are in strict alignment with the sampling rate of the data. For example, a time series sampled at 1 Hz may have the following timestamps:
```text
['20:12:21.143', '20:12:22.187', '20:12:23.032', '20:12:24.384', '20:12:25.033']
```
The data returned by the forecasting algorithm is strictly aligned by timestamp. For example, the next two data points in the set must be `['20:12:26.000', '20:12:27.000']`. For this reason, data such as the preceding set must be aligned as follows:
```
['20:12:21.000', '20:12:22.000', '20:12:23.000', '20:12:24.000', '20:12:25.000']
```
The sampling rate input by the user can exceed the output rate of the results. For example, the following data was sampled at 5 second intervals, but the user could request forecasting in 10 second intervals:
```
['20:12:20.000', '20:12:25.000', '20:12:30.000', '20:12:35.000', '20:12:40.000']
```
The data is then resampled to 10 second intervals as follows:
```
['20:12:20.000', '20:12:30.000', '20:12:40.000']
```
This resampled data is then input into the forecasting algorithm. In this case, the data points `['20:12:25.000', '20:12:35.000']` are discarded.
It is important to note that TDgpt does not fill in missing data during preprocessing. If you input the dataset `['20:12:10.113', '20:12:21.393', '20:12:29.143', '20:12:51.330']` and specify an interval of 10 seconds, the aligned dataset will be `['20:12:10.000', '20:12:20.000', '20:12:30.000', '20:12:50.000']`. This will cause the forecasting algorithm to return an error.

View File

@ -0,0 +1,63 @@
---
title: ARIMA
sidebar_label: ARIMA
---
This document describes how to generate autoregressive integrated moving average (ARIMA) models.
## Description
The ARIMA(*p*, *d*, *q*) model is one of the most common in time-series forecasting. It is an autoregressive model that can predict future data from an independent variable. ARIMA requires that time-series data be stationary. Accurate results cannot be obtained from non-stationary data.
A stationary time series is one whose characteristics do not change based on the time at which it is observed. Time series that experience trends or seasonality are not stationary because they exhibit different characteristics at different times.
The following variables can be dynamically input to generate appropriate ARIMA models:
- *p* is the order of the autoregressive model
- *d* is the order of differencing
- *q* is the order of the moving-average model
## Parameters
Automated ARIMA modeling is performed in TDgpt. For this reason, the results for each input are automatically fitted to the most appropriate model. Forecasting is then performed based on the specified model.
|Parameter|Description|Required?|
|---|---|-----|
|period|The number of data points included in each period. If not specified or set to 0, non-seasonal ARIMA models are used.|No|
|start_p|The starting order of the autoregressive model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
|max_p|The ending order of the autoregressive model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
|start_q|The starting order of the moving-average model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
|max_q|The ending order of the moving-average model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
|d|The order of differencing.|No|
The `start_p`, `max_p`, `start_q`, and `max_q` parameters cause the model to find the optimal solution within the specified restrictions. Given the same input data, a larger range will result in higher resource consumption and slower response time.
## Example
In this example, forecasting is performed on the `i32` column. Each 10 data points in the column form a period. The values of `start_p` and `start_q` are both 1, and the corresponding ending values are both 5. The forecasting results are within a 95% confidence interval.
```
FORECAST(i32, "algo=arima,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5")
```
The complete SQL statement is shown as follows:
```SQL
SELECT _frowts, FORECAST(i32, "algo=arima,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5") from foo
```
```json5
{
"rows": fc_rows, // Rows returned
"period": period, // Period of results (equivalent to input period)
"alpha": alpha, // Confidence interval of results (equivalent to input confidence interval)
"algo": "arima", // Algorithm
"mse": mse, // Mean square error (MSE) of model generated for input time series
"res": res // Results in column format
}
```
## References
- https://en.wikipedia.org/wiki/Autoregressive_moving-average_model
- [https://baike.baidu.com/item/自回归滑动平均模型/5023931](https://baike.baidu.com/item/%E8%87%AA%E5%9B%9E%E5%BD%92%E6%BB%91%E5%8A%A8%E5%B9%B3%E5%9D%87%E6%A8%A1%E5%9E%8B/5023931)

View File

@ -0,0 +1,53 @@
---
title: Holt-Winters
sidebar_label: Holt-Winters
---
This document describes the usage of the Holt-Winters method for forecasting.
## Description
Holt-Winters, or exponential moving average (EMA), is used to forecast non-stationary time series that have linear trends or periodic fluctuations. This method uses exponential smoothing to constantly adapt the model parameters to the changes in the time series and perform short-term forecasting.
If seasonal variation remains mostly consistent within a time series, the additive Holt-Winters model is used, whereas if seasonal variation is proportional to the level of the time series, the multiplicative Holt-Winters model is used.
Holt-Winters does not provide results within a confidence interval. The forecast results are the same as those on the upper and lower thresholds of the confidence interval.
## Parameters
Automated Holt-Winters modeling is performed in TDgpt. For this reason, the results for each input are automatically fitted to the most appropriate model. Forecasting is then performed based on the specified model.
|Parameter|Description|Required?|
|---|---|---|
|period|The number of data points included in each period. If not specified or set to 0, exponential smoothing is applied for data fitting, and then future data is forecast.|No|
|trend|Use additive (`add`) or multiplicative (`mul`) Holt-Winters for the trend model.|No|
|seasonal|Use additive (`add`) or multiplicative (`mul`) Holt-Winters for seasonality.|No|
## Example
In this example, forecasting is performed on the `i32` column. Each 10 data points in the column form a period. Multiplicative Holt-Winters is used for trends and for seasonality.
```
FORECAST(i32, "algo=holtwinters,period=10,trend=mul,seasonal=mul")
```
The complete SQL statement is shown as follows:
```SQL
SELECT _frowts, FORECAST(i32, "algo=holtwinters, peroid=10,trend=mul,seasonal=mul") from foo
```
```json5
{
"rows": fc_rows, // Rows returned
"period": period, // Period of results (equivalent to input period; set to 0 if no periodicity)
"algo": 'holtwinters' // Algorithm
"mse": mse, // Mean square error (MSE)
"res": res // Results in column format (typically returned as two columns, `timestamp` and `fc_results`.)
}
```
## References
- https://en.wikipedia.org/wiki/Exponential_smoothing
- https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-simplified/

View File

@ -0,0 +1,31 @@
---
title: LSTM
sidebar_label: LSTM
---
This document describes how to use LSTM in TDgpt.
## Description
Long short-term memory (LSTM) is a special type of recurrent neural network (RNN) well-suited for tasks such as time-series data processing and natural language processing. Its unique gating mechanism allows it to effectively capture long-term dependencies. and address the gradient vanishing problem found in traditional RNNs, enabling more accurate predictions on sequential data. However, it does not directly provide confidence interval results for its computations.
The complete SQL statement is shown as follows:
```SQL
SELECT _frowts, FORECAST(i32, "algo=lstm,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5") from foo
```
```json5
{
"rows": fc_rows, // Rows returned
"period": period, // Period of results (equivalent to input period)
"alpha": alpha, // Confidence interval of results (equivalent to input confidence interval)
"algo": "lstm", // Algorithm
"mse": mse, // Mean square error (MSE) of model generated for input time series
"res": res // Results in column format
}
```
## References
- [1] Hochreiter S. Long Short-term Memory[J]. Neural Computation MIT-Press, 1997.

View File

@ -0,0 +1,33 @@
---
title: MLP
sidebar_label: MLP
---
This document describes how to use MLP in TDgpt.
## Description
MLP (Multilayer Perceptron) is a classic neural network model that can learn nonlinear relationships from historical data, capture patterns in time-series data, and make future value predictions. It performs feature extraction and mapping through multiple fully connected layers, generating prediction results based on the input historical data. Since it does not directly account for trends or seasonal variations, it typically requires data preprocessing to improve performance. It is well-suited for handling nonlinear and complex time-series problems.
The complete SQL statement is shown as follows:
```SQL
SELECT _frowts, FORECAST(i32, "algo=mlp") from foo
```
```json5
{
"rows": fc_rows, // Rows returned
"period": period, // Period of results (equivalent to input period)
"alpha": alpha, // Confidence interval of results (equivalent to input confidence interval)
"algo": "mlp", // Algorithm
"mse": mse, // Mean square error (MSE) of model generated for input time series
"res": res // Results in column format
}
```
## References
- [1]Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. nature, 1986, 323(6088): 533-536.
- [2]Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological review, 1958, 65(6): 386.
- [3]LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

View File

@ -0,0 +1,198 @@
---
title: Forecasting Algorithms
description: Forecasting Algorithms
---
import Image from '@theme/IdealImage';
import fcResult from '../../../assets/tdgpt-04.png';
Time-series forecasting takes a continuous period of time-series data as its input and forecasts how the data will trend in the next continuous period. The number of data points in the forecast results is not fixed, but can be specified by the user. TDgpt uses the `FORECAST` function to provide forecasting. The input for this function is the historical time-series data used as a basis for forecasting, and the output is forecast data. You can use the `FORECAST` function to invoke a forecasting algorithm on an anode to provide service. Forecasting is typically performed on a subtable or on the same time series across tables.
In this section, the table `foo` is used as an example to describe how to perform forecasting and anomaly detection in TDgpt. This table is described as follows:
| Column | Type | Description |
| ------ | --------- | ---------------------------- |
|ts|timestamp|Primary timestamp|
|i32|int32|Metric generated by a device as a 4-byte integer|
```sql
taos> select * from foo;
ts | i32 |
========================================
2020-01-01 00:00:12.681 | 13 |
2020-01-01 00:00:13.727 | 14 |
2020-01-01 00:00:14.378 | 8 |
2020-01-01 00:00:15.774 | 10 |
2020-01-01 00:00:16.170 | 16 |
2020-01-01 00:00:17.558 | 26 |
2020-01-01 00:00:18.938 | 32 |
2020-01-01 00:00:19.308 | 27 |
```
## Syntax
```SQL
FORECAST(column_expr, option_expr)
option_expr: {"
algo=expr1
[,wncheck=1|0]
[,conf=conf_val]
[,every=every_val]
[,rows=rows_val]
[,start=start_ts_val]
[,expr2]
"}
```
1. `column_expr`: The time-series data column to forecast. Enter a column whose data type is numerical.
2. `options`: The parameters for forecasting. Enter parameters in key=value format, separating multiple parameters with a comma (,). It is not necessary to use quotation marks or escape characters. Only ASCII characters are supported. The supported parameters are described as follows:
## Parameter Description
|Parameter|Definition|Default|
| ------- | ------------------------------------------ | ---------------------------------------------- |
|algo|Forecasting algorithm.|holtwinters|
|wncheck|White noise data check. Enter 1 to enable or 0 to disable.|1|
|conf|Confidence interval for forecast data. Enter an integer between 0 and 100, inclusive.|95|
|every|Sampling period.|The sampling period of the input data|
|start|Starting timestamp for forecast data.|One sampling period after the final timestamp in the input data|
|rows|Number of forecast rows to return.|10|
1. Three pseudocolumns are used in forecasting:`_FROWTS`: the timestamp of the forecast data; `_FLOW`: the lower threshold of the confidence interval; and `_FHIGH`: the upper threshold of the confidence interval. For algorithms that do not include a confidence interval, the `_FLOW` and `_FHIGH` pseudocolumns contain the forecast results.
2. You can specify the `START` parameter to modify the starting time of forecast results. This does not affect the forecast values, only the time range.
3. The `EVERY` parameter can be lesser than or equal to the sampling period of the input data. However, it cannot be greater than the sampling period of the input data.
4. If you specify a confidence interval for an algorithm that does not use it, the upper and lower thresholds of the confidence interval regress to a single point.
5. The maximum value of rows is 1024. If you specify a higher value, only 1024 rows are returned.
6. The maximum size of the input historical data is 40,000 rows. Note that some models may have stricter limitations.
## Example
```SQL
--- ARIMA forecast, return 10 rows of results (default), perform white noise data check, with 95% confidence interval
SELECT _flow, _fhigh, _frowts, FORECAST(i32, "algo=arima")
FROM foo;
--- ARIMA forecast, periodic input data, 10 samples per period, disable white noise data check, with 95% confidence interval
SELECT _flow, _fhigh, _frowts, FORECAST(i32, "algo=arima,alpha=95,period=10,wncheck=0")
FROM foo;
```
```sql
taos> select _flow, _fhigh, _frowts, forecast(i32) from foo;
_flow | _fhigh | _frowts | forecast(i32) |
========================================================================================
10.5286684 | 41.8038254 | 2020-01-01 00:01:35.000 | 26 |
-21.9861946 | 83.3938904 | 2020-01-01 00:01:36.000 | 30 |
-78.5686035 | 144.6729126 | 2020-01-01 00:01:37.000 | 33 |
-154.9797363 | 230.3057709 | 2020-01-01 00:01:38.000 | 37 |
-253.9852905 | 337.6083984 | 2020-01-01 00:01:39.000 | 41 |
-375.7857971 | 466.4594727 | 2020-01-01 00:01:40.000 | 45 |
-514.8043823 | 622.4426270 | 2020-01-01 00:01:41.000 | 53 |
-680.6343994 | 796.2861328 | 2020-01-01 00:01:42.000 | 57 |
-868.4956665 | 992.8603516 | 2020-01-01 00:01:43.000 | 62 |
-1076.1566162 | 1214.4498291 | 2020-01-01 00:01:44.000 | 69 |
```
## Built-In Forecasting Algorithms
- [ARIMA](./arima/)
- [Holt-Winters](./holtwinters/)
- Complex exponential smoothing (CES)
- Theta
- Prophet
- XGBoost
- LightGBM
- Multiple Seasonal-Trend decomposition using LOESS (MSTL)
- ETS (Error, Trend, Seasonal)
- Long Short-Term Memory (LSTM)
- Multilayer Perceptron (MLP)
- DeepAR
- N-BEATS
- N-HiTS
- Patch Time Series Transformer (PatchTST)
- Temporal Fusion Transformer
- TimesNet
## Evaluating Algorithm Effectiveness
TDengine Enterprise includes `analytics_compare`, a tool that evaluates the effectiveness of time-series forecasting algorithms in TDgpt. You can configure this tool to perform backtesting on data stored in TDengine and determine which algorithms and models are most effective for your data. The evaluation is based on mean squared error (MSE). MAE and MAPE are in development.
The configuration of the evaluation tool is described as follows:
```ini
[forecast]
# number of data points per training period
period = 10
# consider final 10 rows of in-scope data as forecasting results
rows = 10
# start time of training data
start_time = 1949-01-01T00:00:00
# end time of training data
end_time = 1960-12-01T00:00:00
# start time of results
res_start_time = 1730000000000
# specify whether to create a graphical chart
gen_figure = true
```
To use the tool, run `analytics_compare` in TDgpt's `misc` directory. Ensure that you run the tool on a machine with a Python environment installed. You can test the tool as follows:
1. Configure your TDengine cluster information in the `analytics.ini` file:
```ini
[taosd]
# taosd hostname
host = 127.0.0.1
# username
user = root
# password
password = taosdata
# tdengine configuration file
conf = /etc/taos/taos.cfg
[input_data]
# database for testing forecasting algorithms
db_name = test
# table with test data
table_name = passengers
# columns with test data
column_name = val, _c0
```
2. Prepare your data. A sample data file `sample-fc.sql` is included in the `resource` directory. Run the following command to ingest the sample data into TDengine:
```shell
taos -f sample-fc.sql
```
You can now begin the evaluation.
3. Ensure that the Python environment on the local machine is operational. Then run the following command:
```shell
python3.10 ./analytics_compare.py forecast
```
4. The evaluation results are written to `fc_result.xlsx`. The first card shows the results, shown as follows, including the algorithm name, parameters, mean square error, and elapsed time.
| algorithm | params | MSE | elapsed_time(ms.) |
| ----------- | ------------------------------------------------------------------------- | ------- | ----------------- |
| holtwinters | `{"trend":"add", "seasonal":"add"}` | 351.622 | 125.1721 |
| arima | `{"time_step":3600000, "start_p":0, "max_p":10, "start_q":0, "max_q":10}` | 433.709 | 45577.9187 |
If you set `gen_figure` to `true`, a chart is also generated, as displayed in the following figure.
<figure>
<Image img={fcResult} alt="Forecasting comparison"/>
</figure>

View File

@ -0,0 +1,67 @@
---
title: Statistical Algorithms
sidebar_label: Statistical Algorithms
---
- k-sigma<sup>[1]</sup>, or ***689599.7 rule***: The *k* value defines how many standard deviations indicate an anomaly. The default value is 3. The k-sigma algorithm require data to be in a regular distribution. Data points that lie outside of *k* standard deviations are considered anomalous.
|Parameter|Description|Required?|Default|
|---|---|---|---|
|k|Number of standard deviations|No|3|
```SQL
--- Use the k-sigma algorithm with a k value of 2
SELECT _WSTART, COUNT(*)
FROM foo
ANOMALY_WINDOW(foo.i32, "algo=ksigma,k=2")
```
- Interquartile range (IQR)<sup>[2]</sup>: IQR divides a rank-ordered dataset into even quartiles, Q1 through Q3. IQR=Q3=Q1, for *v*, Q1 - (1.5 x IQR) \<= v \<= Q3 + (1.5 x IQR) is normal. Data points outside this range are considered anomalous. This algorithm does not take any parameters.
```SQL
--- Use the IQR algorithm.
SELECT _WSTART, COUNT(*)
FROM foo
ANOMALY_WINDOW(foo.i32, "algo=iqr")
```
- Grubbs's test<sup>[3]</sup>, or maximum normalized residual test: Grubbs is used to test whether the deviation from mean of the maximum and minimum is anomalous. It requires a univariate data set in a close to normal distribution. Grubbs's test cannot be uses for datasets that are not normally distributed. This algorithm does not take any parameters.
```SQL
--- Use Grubbs's test.
SELECT _WSTART, COUNT(*)
FROM foo
ANOMALY_WINDOW(foo.i32, "algo=grubbs")
```
- Seasonal Hybrid ESD (S-H-ESD)<sup>[4]</sup>: Extreme Studentized Deviate (ESD) can identify multiple anomalies in time-series data. You define whether to detect positive anomalies (`pos`), negative anomalies (`neg`), or both (`both`). The maximum proportion of data that can be anomalous (`max_anoms`) is at worst 49.9% Typically, the proportion of anomalies in a dataset does not exceed 5%.
|Parameter|Description|Required?|Default|
|---|---|---|---|
|direction|Specify the direction of anomalies ('pos', 'neg', or 'both').|No|"both"|
|max_anoms|Specify maximum proportion of data that can be anomalous *k*, where 0 \< *k* \<= 49.9|No|0.05|
|period|The number of data points included in each period|No|0|
```SQL
--- Use the SHESD algorithm in both direction with a maximum 5% of the data being anomalous
SELECT _WSTART, COUNT(*)
FROM foo
ANOMALY_WINDOW(foo.i32, "algo=shesd,direction=both,anoms=0.05")
```
The following algorithms are in development:
- Gaussian Process Regression
Change point detection--based algorithms:
- CUSUM (Cumulative Sum Control Chart)
- PELT (Pruned Exact Linear Time)
## References
1. [https://en.wikipedia.org/wiki/689599.7 rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
2. https://en.wikipedia.org/wiki/Interquartile_range
3. Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T. (2015-01-14). "Data Transformation Technique to Improve the Outlier Detection Power of Grubbs's Test for Data Expected to Follow Linear Relation". Journal of Applied Mathematics. 2015: 19. doi:10.1155/2015/708948.
4. Hochenbaum, O. S. Vallis, and A. Kejariwal. 2017. Automatic Anomaly Detection in the Cloud Via Statistical Learning. arXiv preprint arXiv:1704.07706 (2017).

View File

@ -0,0 +1,33 @@
---
title: Data Density Algorithms
sidebar_label: Data Density Algorithms
---
## Data Density/Mining Algorithms
Local outlier factor (LOF)<sup>[1]</sup>:
LOF is a density-based algorithm for determining local outliers proposed by Breunig et al. in 2000. It is suitable for data with varying cluster densities and diverse dispersion. First, the local reachability density of each data point is calculated based on the density of its neighborhood. The local reachability density is then used to assign an outlier factor to each data point.
This outlier factor indicates how anomalous a data point is. A higher factor indicates more anomalous data. Finally, the top *k* outliers are output.
```SQL
--- Use LOF.
SELECT count(*)
FROM foo
ANOMALY_WINDOW(foo.i32, "algo=lof")
```
The following algorithms are in development:
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- K-Nearest Neighbors (KNN)
- Principal Component Analysis (PCA)
Third-party anomaly detection algorithms:
- PyOD
## References
1. Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers (PDF). Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93104. doi:10.1145/335191.335388. ISBN 1-58113-217-4.

View File

@ -0,0 +1,27 @@
---
title: Machine Learning Algorithms
sidebar_label: Machine Learning Algorithms
---
TDgpt includes a built-in autoencoder for anomaly detection.
This algorithm is suitable for detecting anomalies in periodic time-series data. It must be pre-trained on your time-series data.
The trained model is saved to the `ad_autoencoder` directory. You then specify the model in your SQL statement.
```SQL
--- Add the name of the model `ad_autoencoder_foo` in the options of the anomaly window and detect anomalies in the dataset `foo` using the autoencoder algorithm.
SELECT COUNT(*), _WSTART
FROM foo
ANOMALY_WINDOW(col1, 'algo=encoder, model=ad_autoencoder_foo');
```
The following algorithms are in development:
- Isolation Forest
- One-Class Support Vector Machines (SVM)
- Prophet
## References
1. https://en.wikipedia.org/wiki/Autoencoder

View File

@ -0,0 +1,119 @@
---
title: Anomaly Detection Algorithms
description: Anomaly Detection Algorithms
---
import Image from '@theme/IdealImage';
import anomDetect from '../../../assets/tdgpt-05.png';
import adResult from '../../../assets/tdgpt-06.png';
Anomaly detection is provided via an anomaly window that has been introduced into TDengine. An anomaly window is a special type of event window, defined by the anomaly detection algorithm as a time window during which an anomaly is occurring. This window differs from an event window in that the algorithm determines when it opens and closes instead of expressions input by the user. You can use the `ANOMALY_WINDOW` keyword in a `WHERE` clause to invoke the anomaly detection service. The window pseudocolumns `_WSTART`, `_WEND`, and `_WDURATION` record the start, end, and duration of the window. For example:
```SQL
--- Use the IQR algorithm to detect anomalies in the `col_val` column. Also return the start and end time of the anomaly window as well as the sum of the `col` column within the window.
SELECT _wstart, _wend, SUM(col)
FROM foo
ANOMALY_WINDOW(col_val, "algo=iqr");
```
As shown in the following figure, the anode returns the anomaly window [10:51:30, 10:53:40].
<figure>
<Image img={anomDetect} alt="Anomaly detection" />
</figure>
You can then query, aggregate, or perform other operations on the data in the window.
## Syntax
```SQL
ANOMALY_WINDOW(column_name, option_expr)
option_expr: {"
algo=expr1
[,wncheck=1|0]
[,expr2]
"}
```
1. `column_name`: The data column in which to detect anomalies. Specify only one column per query. The data type of the column must be numerical; string types such as NCHAR are not supported. Functions are not supported.
2. `options`: The parameters for anomaly detection. Enter parameters in key=value format, separating multiple parameters with a comma (,). It is not necessary to use quotation marks or escape characters. Only ASCII characters are supported. For example: `algo=ksigma,k=2` indicates that the anomaly detection algorithm is k-sigma and the k value is 2.
3. You can use the results of anomaly detection as the inner part of a nested query. The same functions are supported as in other windowed queries.
4. White noise checking is performed on the input data by default. If the input data is white noise, no results are returned.
## Parameter Description
|Parameter|Definition|Default|
| ------- | ------------------------------------------ | ------ |
|algo|Specify the anomaly detection algorithm.|iqr|
|wncheck|Enter 1 to perform the white noise data check or 0 to disable the white noise data check.|1|
## Example
```SQL
--- Use the IQR algorithm to detect anomalies in the `i32` column.
SELECT _wstart, _wend, SUM(i32)
FROM foo
ANOMALY_WINDOW(i32, "algo=iqr");
--- Use the k-sigma algorithm with k value of 2 to detect anomalies in the `i32`
SELECT _wstart, _wend, SUM(i32)
FROM foo
ANOMALY_WINDOW(i32, "algo=ksigma,k=2");
taos> SELECT _wstart, _wend, count(*) FROM foo ANOMAYL_WINDOW(i32);
_wstart | _wend | count(*) |
====================================================================
2020-01-01 00:00:16.000 | 2020-01-01 00:00:17.000 | 2 |
Query OK, 1 row(s) in set (0.028946s)
```
## Built-In Anomaly Detection Algorithms
TDgpt comes with six anomaly detection algorithms, divided among the following three categories: [Statistical Algorithms](./02-statistics-approach.md), [Data Density Algorithms](./03-data-density.md), and [Machine Learning Algorithms](./04-machine-learning.md). If you do not specify an algorithm, the IQR algorithm is used by default.
## Evaluating Algorithm Effectiveness
TDgpt provides an automated tool to compare the effectiveness of different algorithms across various datasets. For anomaly detection algorithms, it uses the recall and precision metrics to evaluate their performance.
By setting the following options in the configuration file `analysis.ini`, you can specify the anomaly detection algorithm to be used, the time range of the test data, whether to generate annotated result images, the desired algorithm, and its corresponding parameters.
Before comparing anomaly detection algorithms, you must manually label the results of the anomaly detection dataset. This is done by setting the value of the [anno_res] option. Each number in the array represents the index of an anomaly. For example, in the test dataset below, if the 9th point is an anomaly, the labeled result would be [9].
```bash
[ad]
# training data start time
start_time = 2021-01-01T01:01:01
# training data end time
end_time = 2021-01-01T01:01:11
# draw the results or not
gen_figure = true
# annotate the anomaly_detection result
anno_res = [9]
# algorithms list that is involved in the comparion
[ad.algos]
ksigma={"k": 2}
iqr={}
grubbs={}
lof={"algorithm":"auto", "n_neighbor": 3}
```
After the comparison program finishes running, it automatically generates a file named ·ad_result.xlsx·. The first sheet contains the algorithm execution results (as shown in the table below), including five metrics: algorithm name, execution parameters, recall, precision, and execution time.
| algorithm | params | precision(%) | recall(%) | elapsed_time(ms.) |
| --------- | -------------------------------------- | ------------ | --------- | ----------------- |
| ksigma | `{"k":2}` | 100 | 100 | 0.453 |
| iqr | `{}` | 100 | 100 | 2.727 |
| grubbs | `{}` | 100 | 100 | 2.811 |
| lof | `{"algorithm":"auto", "n_neighbor":3}` | 0 | 0 | 4.660 |
If `gen_figure` is set to true, the tool automatically generates a visual representation of the analysis results for each algorithm being compared. The k-sigma algorithm is shown here as an example.
<figure>
<Image img={adResult} alt="Anomaly detection results"/>
</figure>

View File

@ -0,0 +1,112 @@
---
title: Forecasting Algorithms
sidebar_label: Forecasting Algorithms
---
## Input Limitations
`execute` is the core method of forecasting algorithms. Before calling this method, the framework configures the historical time-series data used for forecasting in the `self.list` object attribute.
## Output Limitations and Parent Class Attributes
Running the `execute` method generates the following dictionary objects:
```python
return {
"mse": mse, # Mean squared error of the fit data
"res": res # Result groups [timestamp, forecast results, lower boundary of confidence interval, upper boundary of confidence interval]
}
```
The parent class `AbstractForecastService` of forecasting algorithms includes the following object attributes.
|Attribute|Description|Default|
|---|---|---|
|period|Specify the periodicity of the data, i.e. the number of data points included in each period. If the data is not periodic, enter 0.|0|
|start_ts|Specify the start time of forecasting results.|0|
|time_step|Specify the interval between consecutive data points in the forecast results.|0|
|fc_rows|Specify the number of forecast rows to return.|0|
|return_conf|Specify 1 to include a confidence interval in the forecast results or 0 to not include a confidence interval in the results. If you specify 0, the mean is returned as the upper and lower boundaries.|1|
|conf|Specify a confidence interval quantile.|95|
## Sample Code
The following code is an sample algorithm that always returns 1 as the forecast results.
```python
import numpy as np
from taosanalytics.service import AbstractForecastService
# Algorithm files must start with an underscore ("_") and end with "Service".
class _MyForecastService(AbstractForecastService):
""" Define a class inheriting from AbstractAnomalyDetectionService and implementing the `execute` method. """
# Name the algorithm using only lowercase ASCII characters.
name = 'myfc'
# Include a description of the algorithm (recommended)
desc = """return the forecast time series data"""
def __init__(self):
"""Method to initialize the class"""
super().__init__()
def execute(self):
""" Implementation of algorithm logic"""
res = []
"""This algorithm always returns 1 as the forecast result. The number of results returned is determined by the self.fc_rows value input by the user."""
ts_list = [self.start_ts + i * self.time_step for i in range(self.fc_rows)]
res.append(ts_list) # set timestamp column for forecast results
"""Generate forecast results whose value is 1. """
res_list = [1] * self.fc_rows
res.append(res_list)
"""Check whether user has requested the upper and lower boundaries of the confidence interval."""
if self.return_conf:
"""If the algorithm does not calculate these values, return the forecast results."""
bound_list = [1] * self.fc_rows
res.append(bound_list) # lower confidence limit
res.append(bound_list) # upper confidence limit
"""Return results"""
return {"res": res, "mse": 0}
def set_params(self, params):
"""This algorithm does not take any parameters, only calling a parent function, so this logic is not included."""
return super().set_params(params)
```
Save this file to the `./lib/taosanalytics/algo/fc/` directory and restart the `taosanode` service. In the TDengine CLI, run `SHOW ANODES FULL` to see your new algorithm. Your applications can now use this algorithm via SQL.
```SQL
--- Detect anomalies in the `col` column using the newly added `myfc` algorithm
SELECT _flow, _fhigh, _frowts, FORECAST(col_name, "algo=myfc")
FROM foo;
```
If you have never started the anode, see [Installation](../../management/) to add the anode to your TDengine cluster.
## Unit Testing
You can add unit test cases to the `forecase_test.py` file in the `taosanalytics/test` directory or create a file for unit tests. Unit tests have a depency on the Python unittest module.
```python
def test_myfc(self):
""" Test the myfc class """
s = loader.get_service("myfc")
# Configure data for forecasting
s.set_input_list(self.get_input_list(), None)
# Check whether all results are 1
r = s.set_params(
{"fc_rows": 10, "start_ts": 171000000, "time_step": 86400 * 30, "start_p": 0}
)
r = s.execute()
expected_list = [1] * 10
self.assertEqlist(r["res"][0], expected_list)
```

View File

@ -0,0 +1,79 @@
---
title: Anomaly Detection Algorithms
sidebar_label: Anomaly Detection Algorithms
---
## Input Limitations
`execute` is the core method of anomaly detection algorithms. Before calling this method, the framework configures the historical time-series data used for anomaly detection in the `self.list` object attribute.
## Output Limitations
The `execute` method returns an array of the same length as `self.list`. A value of `-1` in the array indicates an anomaly.
For example, in the series `[2, 2, 2, 2, 100]`, assuming that `100` is an anomaly, the method returns `[1, 1, 1, 1, -1]`.
## Sample Code
This section describes an example anomaly detection algorithm that returns the final data point in a time series as an anomaly.
```python
from taosanalytics.service import AbstractAnomalyDetectionService
# Algorithm files must start with an underscore ("_") and end with "Service".
class _MyAnomalyDetectionService(AbstractAnomalyDetectionService):
""" Define a class inheriting from AbstractAnomalyDetectionService and implementing the abstract method of that class. """
# Name the algorithm using only lowercase ASCII characters.
name = 'myad'
# Include a description of the algorithm (recommended)
desc = """return the last value as the anomaly data"""
def __init__(self):
"""Method to initialize the class"""
super().__init__()
def execute(self):
""" Implementation of algorithm logic"""
"""Create an array with length len(self.list) whose results are all 1, then set the final value in the array to -1 to indicate an anomaly"""
res = [1] * len(self.list)
res[-1] = -1
"""Return results"""
return res
def set_params(self, params):
"""This algorithm does not take any parameters, so this logic is not included."""
```
Save this file to the `./lib/taosanalytics/algo/ad/` directory and restart the `taosanode` service. In the TDengine CLI, run `SHOW ANODES FULL` to see your new algorithm. Your applications can now invoke this algorithm via SQL.
```SQL
--- Detect anomalies in the `col` column using the newly added `myad` algorithm
SELECT COUNT(*) FROM foo ANOMALY_WINDOW(col, 'algo=myad')
```
If you have never started the anode, see [Installation](../../management/) to add the anode to your TDengine cluster.
### Unit Testing
You can add unit test cases to the `anomaly_test.py` file in the `taosanalytics/test` directory or create a file for unit tests. The framework uses the Python unittest module.
```python
def test_myad(self):
""" Test the _IqrService class """
s = loader.get_service("myad")
# Configure the data to test
s.set_input_list(AnomalyDetectionTest.input_list, None)
r = s.execute()
# The final value is an anomaly
self.assertEqual(r[-1], -1)
self.assertEqual(len(r), len(AnomalyDetectionTest.input_list))
```

View File

@ -0,0 +1,100 @@
---
title: Algorithm Developer's Guide
sidebar_label: Algorithm Developer's Guide
---
TDgpt is an extensible platform for advanced time-series data analytics. You can follow the steps described in this document to develop your own analytics algorithms and add them to the platform. Your applications can then use SQL statements to invoke these algorithms. Custom algorithms must be developed in Python.
The anode adds algorithms semi-dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:
1. Develop an analytics algorithm according to the TDgpt requirements.
2. Place the source code files in the appropriate directory and restart the anode.
3. Run the `CREATE ANODE` statement to add the anode to your TDengine cluster.
Your algorithm has been added to TDgpt and can be used by your applications. Because TDgpt is decoupled from TDengine, adding or upgrading algorithms on the anode does not affect the TDengine server (taosd). On the application side, it is necessary only to update your SQL statements to start using new or upgraded algorithms.
This extensibility makes TDgpt suitable for a wide range of use cases. You can add any algorithms needed by your use cases on demand and invoke them via SQL. You can also update algorithms without making significant changes to your applications.
This document describes how to add algorithms to an anode and invoke them with SQL statements.
## Directory Structure
The directory structure of an anode is described as follows:
```bash
.
├── bin
├── cfg
├── lib
│   └── taosanalytics
│   ├── algo
│   │   ├── ad
│   │   └── fc
│   ├── misc
│   └── test
├── log -> /var/log/taos/taosanode
├── model -> /var/lib/taos/taosanode/model
└── venv -> /var/lib/taos/taosanode/venv
```
|Directory|Description|
|---|---|
|taosanalytics| Source code, including the `algo` subdirectory for algorithms, the `test` subdirectory for unit and integration tests, and the `misc` subdirectory for other files. Within the `algo` subdirectory, the `ad` subdirectory includes anomaly detection algorithms, and the `fc` subdirectory includes forecasting algorithms.|
|venv| Virtual Python environment |
|model|Trained models for datasets|
|cfg|Configuration files|
:::note
- Place Python source code for anomaly detection in the `./taos/algo/ad` directory.
- Place Python source code for forecasting in the `./taos/algo/fc` directory.
:::
## Class Naming Rules
The anode adds algorithms automatically. Your algorithm must therefore consist of appropriately named Python files. Algorithm files must start with an underscore (`_`) and end with `Service`. For example: `_KsigmaService` is the name of the k-sigma anomaly detection algorithm.
## Class Inheritance Rules
- All anomaly detection algorithms must inherit `AbstractAnomalyDetectionService` and implement the `execute` method.
- All forecasting algorithms must inherit `AbstractForecastService` and implement the `execute` method.
## Class Property Initialization
Your classes must initialize the following properties:
- `name`: identifier of the algorithm. Use lowercase letters only. This identifier is displayed when you use the `SHOW` statement to display available algorithms.
- `desc`: basic description of the algorithm.
```SQL
--- The `algo` key takes the defined `name` value.
SELECT COUNT(*)
FROM foo ANOMALY_WINDOW(col_name, 'algo=name')
```
## Adding Algorithms with Models
Certain machine learning algorithms must be trained on your data and generate a model. The same algorithm may use different models for different datasets.
When you add an algorithm that uses models to your anode, first create subdirectories for your models in the `model` directory, and save the trained model for each algorithm and dataset to the corresponding subdirectory. You can specify custom names for these subdirectories in your algorithms. Use the `joblib` library to serialize trained models to ensure that they can be read and loaded.
The following section describes how to add an anomaly detection algorithm that requires trained models. The autoencoder algorithm is used as an example.
First, create the `ad_autoencoder` subdirectory in the `model` directory. This subdirectory is used to store models for the autoencoder algorithm. Next, train the algorithm on the `foo` table and obtain a trained model named `ad_autoencoder_foo`. Use the `joblib` library to serialize the model and save it to the `ad_autoencoder` subdirectory. As shown in the following figure, the `ad_autoencoder_foo` model comprises two files: the model file `ad_autoencoder_foo.dat` and the model description `ad_autoencoder_foo.info`.
```bash
.
└── model
└── ad_autoencoder
├── ad_autoencoder_foo.dat
└── ad_autoencoder_foo.info
```
The following section describes how to invoke this model with a SQL statement.
Set the `algo` parameter to `ad_encoder` to instruct TDgpt to use the autoencoder algorithm. This algorithm is in the available algorithms list and can be used directly. Set the `model` parameter to `ad_autoencoder_foo` to instruct TDgpt to use the trained model generated in the previous section.
```SQL
--- Add the name of the model `ad_autoencoder_foo` in the options of the anomaly window and detect anomalies in the dataset `foo` using the autoencoder algorithm.
SELECT COUNT(*), _WSTART
FROM foo
ANOMALY_WINDOW(col1, 'algo=ad_encoder, model=ad_autoencoder_foo');
```

View File

@ -0,0 +1,6 @@
---
title: Data Imputation
sidebar_label: Data Imputation
---
Coming soon

View File

@ -0,0 +1,6 @@
---
title: Time-Series Classification
sidebar_label: Time-Series Classification
---
Coming soon

View File

@ -0,0 +1,75 @@
---
title: Quick Start Guide
sidebar_label: Quick Start Guide
---
## Get Started with Docker
This document describes how to get started with TDgpt in Docker.
### Start TDgpt
If you have installed Docker, pull the latest TDengine container:
```shell
docker pull tdengine/tdengine:latest
```
You can specify a version if desired:
```shell
docker pull tdengine/tdengine:3.3.3.0
```
Then run the following command:
```shell
docker run -d -p 6030:6030 -p 6041:6041 -p 6043:6043 -p 6044-6049:6044-6049 -p 6044-6045:6044-6045/udp -p 6060:6060 tdengine/tdengine
```
Note: TDgpt runs on TCP port 6090. TDgpt is a stateless analytics agent and does not persist data. It only saves log files to local disk
Confirm that your Docker container is running:
```shell
docker ps
```
Enter the container and run the bash shell:
```shell
docker exec -it <container name> bash
```
You can now run Linux commands and access TDengine.
## Get Started with an Installation Package
### Obtain the Package
1. Download the tar.gz package from the list:
2. Open the directory containing the downloaded package and decompress it.
3. Open the directory containing the decompressed package and run the `install.sh` script.
Note: Replace `<version>` with the version that you downloaded.
```bash
tar -zxvf TDengine-anode-<version>-Linux-x64.tar.gz
```
Decompress the file, open the directory created, and run the `install.sh` script:
```bash
sudo ./install.sh
```
### Deploy TDgpt
See [Installing TDgpt](../management/) to prepare your environment and deploy TDgpt.
## Get Started in TDengine Cloud
You can use TDgpt with your TDengine Cloud deployment. Register for a TDengine Cloud account, ensure that you have at least one instance, and register TDgpt to your TDengine Cloud instance as described in the documentation. See the TDengine Cloud documentation for more information.
Create a TDgpt instance, and then refer to [Installing TDgpt](../management/) to manage your anode.

View File

@ -0,0 +1,48 @@
---
title: Frequently Asked questions
sidebar_label: Frequently Asked questions
---
## 1. During the installation process, uWSGI fails to compile
The TDgpt installation process compiles uWSGI on your local machine. In certain Python distributions, such as Anaconda, conflicts may occur during compilation. In this case, you can choose not to install uWSGI.
However, this means that you must manually run the `python3.10 /usr/local/taos/taosanode/lib/taosanalytics/app.py` command when starting the taosanode service. Use a virtual Python environment when running this command to ensure that dependencies can be loaded.
## 2. Anodes fail to be created because the service cannot be accessed.
```bash
taos> create anode '127.0.0.1:6090';
DB error: Analysis service can't access[0x80000441] (0.117446s)
```
First, use curl to check whether the anode is providing services: The output of `curl '127.0.0.1:6090'` should be as follows:
```bash
TDengine© Time Series Data Analytics Platform (ver 1.0.x)
```
The following output indicates that the anode is not providing services:
```bash
curl: (7) Failed to connect to 127.0.0.1 port 6090: Connection refused
```
If the anode has not started or is not running, check the uWSGI log logs in the `/var/log/taos/taosanode/taosanode.log` file to find and resolve any errors.
Note: Do not use systemctl to check the status of the taosanode service.
## 3. The service is operational, but queries return that the service is not available.
```bash
taos> select _frowts,forecast(current, 'algo=arima, alpha=95, wncheck=0, rows=20') from d1 where ts<='2017-07-14 10:40:09.999';
DB error: Analysis service can't access[0x80000441] (60.195613s)
```
The timeout period for the analysis service is 60 seconds. If the analysis process cannot be completed within this period, this error will occur. You can reduce the scope of data being analyzed or try another algorithm to avoid the error.
## 4. Illegal json format error is returned.
This indicates that the analysis results contain an error. Check the anode operation logs in the `/var/log/taos/taosanode/taosanode.app.log` file to find and resolve any issues.

View File

@ -0,0 +1,116 @@
---
sidebar_label: TDgpt
title: TDgpt
---
import Image from '@theme/IdealImage';
import tdgptArch from '../../assets/tdgpt-01.png';
## Introduction
Numerous algorithms have been proposed to perform time-series forecasting, anomaly detection, imputation, and classification, with varying technical characteristics suited for different scenarios.
Typically, these analysis algorithms are packaged as toolkits in high-level programming languages (such as Python or R) and are widely distributed and used through open-source channels. This model helps software developers integrate complex analysis algorithms into their systems and greatly lowers the barrier to using advanced algorithms.
Database system developers have also attempted to integrate data analysis algorithm models directly into database systems. By building machine learning libraries (e.g., Sparks MLlib), they aim to leverage mature analytical techniques to enhance the advanced data analysis capabilities of databases or analytical computing engines.
The rapid development of artificial intelligence (AI) has brought new opportunities to time-series data analysis. Efficiently applying AI capabilities to this field also presents new possibilities for databases. To this end, TDengine has introduced TDgpt, an intelligent agent for time-series analytics. With TDgpt, you can use statistical analysis algorithms, machine learning models, deep learning models, foundational models for time-series data, and large language models via SQL statements. TDgpt exposes the analytical capabilities of these algorithms and models through SQL and applies them to your time-series data using new windows and functions.
## Technical Features
TDgpt is an external agent that integrates seamlessly with TDengines main process taosd. It allows time-series analysis services to be embedded directly into TDengines query execution flow.
TDgpt is a stateless platform that includes the classic statsmodels library of statistical analysis models as well as embedded frameworks such as Torch and Keras for machine and deep learning. In addition, it can directly invoke TDengines proprietary foundation model TDtsfm through request forwarding and adaptation.
As an analytics agent, TDgpt will also support integration with third-party time-series model-as-a-service (MaaS) platforms in the future. By modifying just a single parameter (algo), you will be able to access cutting-edge time-series model services.
TDgpt is an open system to which you can easily add your own algorithms for forecasting, anomaly detection, imputation, and classification. Once added, the new algorithms can be used simply by changing the corresponding parameters in the SQL statement, with no need to modify a single line of application code.
## System Architecture
TDgpt is composed of one or more stateless analysis nodes, called AI nodes (anodes). These anodes can be deployed as needed across the TDengine cluster in appropriate hardware environments (for example, on compute nodes equipped with GPUs) depending on the requirements of the algorithms being used.
TDgpt provides a unified interface and invocation method for different types of analysis algorithms. Based on user-specified parameters, it calls advanced algorithm packages and other analytical tools, then returns the results to TDengines main process (taosd) in a predefined format.
TDgpt consists of four main components:
- Built-in analytics libraries: Includes libraries such as statsmodels, pyculiarity, and pmdarima, offering ready-to-use models for forecasting and anomaly detection.
- Built-in machine learning libraries: Includes libraries like Torch, Keras, and Scikit-learn to run pre-trained machine and deep learning models within TDgpts process space. The training process can be managed using end-to-end open-source ML frameworks such as Merlion or Kats, and trained models can be deployed by uploading them to a designated TDgpt directory.
- Request adapter for general-purpose LLMs: Converts time-series forecasting requests into prompts for general-purpose LLMs such as Llama in a MaaS manner. (Note: This functionality is not open source.)
- Adapter for locally deployed time-series models: Sends requests directly to models like Time-MoE and TDtsfm that are specifically designed for time-series data. Compared to general-purpose LLMs, these models do not require prompt engineering, are lighter-weight, and are easier to deploy locally with lower hardware requirements. In addition, the adapter can also connect to cloud-based time-series MaaS systems such as TimeGPT, enabling localized analysis powered by cloud-hosted models.
<figure>
<Image img={tdgptArch} alt="TDgpt Architecture"/>
<figcaption>TDgpt architecture</figcaption>
</figure>
During query execution, the vnode in TDengine forwards any elements involving advanced time-series data analytics directly to the anode. Once the analysis is completed, the results are assembled and embedded back into the query execution process.
## Advanced Analytics
The analytics services provided by TDgpt are described as follows:
- Anomaly detection: This service is provided via a new anomaly window that has been introduced into TDengine. An anomaly window is a special type of event window, defined by the anomaly detection algorithm as a time window during which an anomaly is occurring. This window differs from an event window in that the algorithm determines when it opens and closes instead of expressions input by the user. The query operations supported by other windows are also supported for anomaly windows.
- Time-series forecasting: The FORECAST function invokes a specified (or default) forecasting algorithm to predict future time-series data based on input historical data.
- Data imputation: To be released in July 2025
- Time-series classification: To be released in July 2025
## Custom Algorithms
TDgpt is an extensible platform to which you can add your own algorithms and models using the process described in [Algorithm Developer's Guide](./dev/). After adding an algorithm, you can access it through SQL statements just like the built-in algorithms. It is not necessary to make updates to your applications.
Custom algorithms must be developed in Python. The anode adds algorithms dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:
1. Develop an analytics algorithm according to the TDgpt requirements.
2. Place the source code files in the appropriate directory and restart the anode.
3. Refresh the algorithm cache table.
You can then use your new algorithm in SQL statements.
## Algorithm Evaluation
TDengine Enterprise includes a tool that evaluates the effectiveness of different algorithms and models. You can use this tool on any algorithm or model in TDgpt, including built-in and custom forecasting and anomaly detection algorithms and models. The tool uses quantitative metrics to evaluate the accuracy and performance of each algorithm with a given dataset in TDengine.
## Model Management
Trained models for machine learning frameworks such as Torch, TensorFlow, and Keras must be placed in the designated directory on the anode. The anode automatically detects and loads models from this directory.
TDengine Enterprise includes a model manager that integrates seamlessly with open-source end-to-end ML frameworks for time-series data such as Merlion and Kats.
## Processing Performance
Time-series analytics is a CPU-intensive workflow. Using a more powerful CPU or GPU can improve performance.
Machine and deep learning models in TDgpt are run through torch, and you can use standard methods of improving performance, for example deploying TDgpt on a machine with more RAM and uing a torch model that can use GPUs.
You can add different algorithms and models to different anodes to enable concurrent processing.
## Operations and Maintenance
With TDengine OSS, permissions and resource management are not provided for TDgpt.
TDgpt is deployed as a Flask service through uWSGI. You can monitor its status by opening the port in uWSGI.
## References
[1] Merlion: https://opensource.salesforce.com/Merlion/latest/index.html
[2] Kats: https://facebookresearch.github.io/Kats/
[3] StatsModels: https://www.statsmodels.org/stable/index.html
[4] Keras: https://keras.io/guides/
[5] Torch: https://pytorch.org/
[6] Scikit-learn: https://scikit-learn.org/stable/index.html
[7] Time-MoE: https://github.com/Time-MoE/Time-MoE
[8] TimeGPT: https://docs.nixtla.io/docs/getting-started-about_timegpt
[9] DeepSeek: https://www.deepseek.com/
[10] Llama: https://www.llama.com/docs/overview/
[11] Spark MLlib: https://spark.apache.org/docs/latest/ml-guide.html

BIN
docs/en/assets/tdgpt-01.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 280 KiB

BIN
docs/en/assets/tdgpt-02.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

BIN
docs/en/assets/tdgpt-03.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

BIN
docs/en/assets/tdgpt-04.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

BIN
docs/en/assets/tdgpt-05.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

BIN
docs/en/assets/tdgpt-06.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB