Merge branch 'main' into merge/3.3.6tomain

2025-03-25 09:04:46 +08:00 · 2025-03-25 09:04:46 +08:00 · dbbe1997b9
parent e73d98456d db3471f053
commit dbbe1997b9
25 changed files with 1424 additions and 0 deletions
--- a/docs/en/06-advanced/06-tdgpt/02-management.md
+++ b/docs/en/06-advanced/06-tdgpt/02-management.md
@ -0,0 +1,193 @@
+---
+title: Installation
+sidebar_label: Installation
+---
+
+## Preparing Your Environment
+
+To use the analytics capabilities offered by TDgpt, you deploy an AI node (anode) in your TDengine cluster. Anodes run on Linux and require Python 3.10 or later.
+
+TDgpt is supported in TDengine 3.3.6 and later. You must upgrade your cluster to version 3.3.6 or later before deploying any anodes.
+
+You can run the following commands to install Python 3.10 in Ubuntu.
+
+### Install Python
+
+```shell
+sudo apt-get install software-properties-common
+sudo add-apt-repository ppa:deadsnakes/ppa
+sudo apt update
+sudo apt install python3.10
+sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2
+sudo update-alternatives --config python3
+sudo apt install python3.10-venv
+sudo apt install python3.10-dev
+```
+
+### Install pip
+
+```shell
+curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
+```
+
+### Configure Environment Variables
+
+Add `~/.local/bin` to the `PATH` environment variable in `~/.bashrc` or `~/.bash_profile`.
+```shell
+export PATH=$PATH:~/.local/bin
+```
+The Python environment has been installed. You can now install TDgpt.
+
+### Install TDgpt
+
+Obtain the installation package `TDengine-anode-3.3.x.x-Linux-x64.tar.gz` and install it on your machine:
+
+```bash
+tar -xzvf TDengine-anode-3.3.6.0-Linux-x64.tar.gz
+cd TDengine-anode-3.3.6.0
+sudo ./install.sh
+```
+
+You can run the `rmtaosanode` command to uninstall TDgpt.
+To prevent TDgpt from affecting Python environments that may exist on your machine, anodes are installed in a virtual environment. When you install an anode, a virtual Python environment is deployed in the `/var/lib/taos/taosanode/venv/` directory. All libraries required by the anode are installed in this directory. Note that this virtual environment is not uninstalled automatically by the `rmtaosanode` command. If you are sure that you do not want to use TDgpt on a machine, you can remove the directory manually.
+
+### Start the TDgpt Service
+
+The `taosanoded` service is created when you install an anode. You can use systemd to manage this service:
+
+```bash
+systemctl start  taosanoded
+systemctl stop   taosanoded
+systemctl status taosanoded
+```
+
+## Directory and Configuration Information
+
+The directory structure of an anode is described in the following table:
+
+|Directory or File|Description|
+|---------------|------|
+|/usr/local/taos/taosanode/bin|Directory containing executable files|
+|/usr/local/taos/taosanode/resource|Directory containing resource files, linked to `/var/lib/taos/taosanode/resource/`|
+|/usr/local/taos/taosanode/lib|Directory containing libraries|
+|/usr/local/taos/taosanode/model|Directory containing models, linked to `/var/lib/taos/taosanode/model`|
+|/var/log/taos/taosanode/|Log directory|
+|/etc/taos/taosanode.ini|Configuration file|
+
+### Configuration
+
+The anode provides services through an uWSGI driver. The configuration for the anode and for uWSGI are both found in the `taosanode.ini` file, located by default in the `/etc/taos/` directory.
+
+The configuration options are described as follows:
+
+```ini
+[uwsgi]
+
+# Anode RESTful service ip:port
+http = 127.0.0.1:6090
+
+# base directory for Anode python files， do NOT modified this
+chdir = /usr/local/taos/taosanode/lib
+
+# initialize Anode python file
+wsgi-file = /usr/local/taos/taosanode/lib/taos/app.py
+
+# pid file
+pidfile = /usr/local/taos/taosanode/taosanode.pid
+
+# conflict with systemctl, so do NOT uncomment this
+# daemonize = /var/log/taos/taosanode/taosanode.log
+
+# uWSGI log files
+logto = /var/log/taos/taosanode/taosanode.log
+
+# uWSGI monitor port
+stats = 127.0.0.1:8387
+
+# python virtual environment directory, used by Anode
+virtualenv = /usr/local/taos/taosanode/venv/
+
+[taosanode]
+# default taosanode log file
+app-log = /var/log/taos/taosanode/taosanode.app.log
+
+# model storage directory
+model-dir = /usr/local/taos/taosanode/model/
+
+# default log level
+log-level = INFO
+
+```
+
+:::note
+Do not specify a value for the `daemonize` parameter. This parameter causes a conflict between uWSGI and systemctl. If you enable the `daemonize` parameter, your anode will fail to start.
+:::
+
+The configuration file above includes only the basic configuration needed for an anode to provide services. For more information about configuring uWSGI, see the [official documentation](https://uwsgi-docs.readthedocs.io/en/latest/).
+
+The main configuration options for an anode are described as follows:
+
+- app-log: Specify the directory in which anode log files are stored.
+- model-dir: Specify the directory in which models are stored. Models are generated by algorithms based on existing datasets.
+- log-level: Specify the log level for anode logs.
+
+##  Managing Anodes
+
+You manage anodes through the TDengine CLI. The following actions must be performed within the CLI on a client that is connected to your TDengine cluster. 
+
+### Create an Anode
+
+```sql 
+CREATE ANODE {node_url}
+```
+
+The `node_url` parameter determines the IP address and port of the anode. This information will be registered to your TDengine cluster. Do not register a single anode to multiple TDengine clusters.
+
+### View Anodes
+
+You can run the following command to display the FQDN and status of the anodes in your cluster:
+
+```sql
+SHOW ANODES;
+
+taos> show anodes;
+     id      |              url               |    status    |       create_time       |       update_time       |
+==================================================================================================================
+           1 | 192.168.0.1:6090               | ready        | 2024-11-28 18:44:27.089 | 2024-11-28 18:44:27.089 |
+Query OK, 1 row(s) in set (0.037205s)
+
+```
+
+### View Advanced Analytics Services
+
+```SQL
+SHOW ANODES FULL;
+
+taos> show anodes full;
+     id      |            type            |              algo              |
+============================================================================
+           1 | anomaly-detection          | shesd                          |
+           1 | anomaly-detection          | iqr                            |
+           1 | anomaly-detection          | ksigma                         |
+           1 | anomaly-detection          | lof                            |
+           1 | anomaly-detection          | grubbs                         |
+           1 | anomaly-detection          | ad_encoder                     |
+           1 | forecast                   | holtwinters                    |
+           1 | forecast                   | arima                          |
+Query OK, 8 row(s) in set (0.008796s)
+
+```
+
+### Refresh the Algorithm Cache
+
+```SQL
+UPDATE ANODE {anode_id}
+UPDATE ALL ANODES
+```
+
+### Delete an Anode
+
+```sql
+DROP ANODE {anode_id}
+```
+Deleting an anode only removes it from your TDengine cluster. To stop an anode, use systemctl on the machine where the anode is located. To remove an anode, run the `rmtaosanode` command on the machine where the anode is located.
--- a/docs/en/06-advanced/06-tdgpt/03-preprocess.md
+++ b/docs/en/06-advanced/06-tdgpt/03-preprocess.md
@ -0,0 +1,65 @@
+---
+title: Data Preprocessing
+sidebar_label: Data Preprocessing
+---
+
+import Image from '@theme/IdealImage';
+import preprocFlow from '../../assets/tdgpt-02.png';
+import wnData from '../../assets/tdgpt-03.png'
+
+## Analysis Workflow
+
+Data must be preprocessed before it can be analyzed by TDgpt. This process is described in the following figure:
+
+<figure>
+<Image img={preprocFlow} alt="Preprocessing workflow" />
+<figcaption>Preprocessing workflow</figcaption>
+</figure>
+
+
+TDgpt first performs a white noise data check on the dataset that you input. Data that passes this check and is intended for use in forecasting is then resampled and its timestamps are aligned. Note that resampling and alignment are not performed for datasets used in anomaly detection.
+
+After the data has been preprocessed, forecasting or anomaly detection is performed. Preprocessing is not part of the business logic for forecasting and anomaly detection.
+
+## White Noise Data Check
+
+<figure>
+<Image img={wnData} alt="White noise data"/>
+<figcaption>White noise data</figcaption>
+</figure>
+
+The white noise data check determines whether the input data consists of random numbers. The figure above shows an example of a regular distribution of random numbers. Random numbers cannot be analyzed meaningfully, and this data is rejected by the system. The white noise data check is performed using the classic Ljung-Box test. The test is performed over an entire time series. If you are certain that your data is not random, you can specify the `wncheck=0` parameter to force TDgpt to skip this check.
+
+TDgpt does not provide white noise checking as an independent feature. It is performed only as part of data preprocessing.
+
+## Resampling and Timestamp Alignment
+
+Time-series data must be preprocessed before forecasting can be performed. Preprocessing is intended to resolve the following two issues:
+
+The timestamps of real time-series datasets are not aligned. It is impossible to guarantee that devices generating data or network gateways create timestamps at strict intervals. For this reason, it cannot be guaranteed that the timestamps of time-series data are in strict alignment with the sampling rate of the data. For example, a time series sampled at 1 Hz may have the following timestamps:
+
+```text
+['20:12:21.143', '20:12:22.187', '20:12:23.032', '20:12:24.384', '20:12:25.033']
+```
+
+The data returned by the forecasting algorithm is strictly aligned by timestamp. For example, the next two data points in the set must be `['20:12:26.000', '20:12:27.000']`. For this reason, data such as the preceding set must be aligned as follows:
+
+```
+['20:12:21.000', '20:12:22.000', '20:12:23.000', '20:12:24.000', '20:12:25.000']
+```
+
+The sampling rate input by the user can exceed the output rate of the results. For example, the following data was sampled at 5 second intervals, but the user could request forecasting in 10 second intervals:
+
+```
+['20:12:20.000', '20:12:25.000', '20:12:30.000', '20:12:35.000', '20:12:40.000'] 
+```
+
+The data is then resampled to 10 second intervals as follows:
+
+```
+['20:12:20.000', '20:12:30.000', '20:12:40.000']
+```
+
+This resampled data is then input into the forecasting algorithm. In this case, the data points `['20:12:25.000', '20:12:35.000']` are discarded.
+
+It is important to note that TDgpt does not fill in missing data during preprocessing. If you input the dataset `['20:12:10.113', '20:12:21.393', '20:12:29.143', '20:12:51.330']` and specify an interval of 10 seconds, the aligned dataset will be `['20:12:10.000', '20:12:20.000', '20:12:30.000', '20:12:50.000']`. This will cause the forecasting algorithm to return an error.
--- a/docs/en/06-advanced/06-tdgpt/04-forecast/02-arima.md
+++ b/docs/en/06-advanced/06-tdgpt/04-forecast/02-arima.md
@ -0,0 +1,63 @@
+---
+title: ARIMA
+sidebar_label: ARIMA
+---
+
+This document describes how to generate autoregressive integrated moving average (ARIMA) models.
+
+##  Description
+
+The ARIMA(*p*, *d*, *q*) model is one of the most common in time-series forecasting. It is an autoregressive model that can predict future data from an independent variable. ARIMA requires that time-series data be stationary. Accurate results cannot be obtained from non-stationary data.
+
+A stationary time series is one whose characteristics do not change based on the time at which it is observed. Time series that experience trends or seasonality are not stationary because they exhibit different characteristics at different times.
+
+The following variables can be dynamically input to generate appropriate ARIMA models:
+
+- *p* is the order of the autoregressive model
+- *d* is the order of differencing
+- *q* is the order of the moving-average model
+
+## Parameters
+
+Automated ARIMA modeling is performed in TDgpt. For this reason, the results for each input are automatically fitted to the most appropriate model. Forecasting is then performed based on the specified model.
+
+|Parameter|Description|Required?|
+|---|---|-----|
+|period|The number of data points included in each period. If not specified or set to 0, non-seasonal ARIMA models are used.|No|
+|start_p|The starting order of the autoregressive model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
+|max_p|The ending order of the autoregressive model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
+|start_q|The starting order of the moving-average model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
+|max_q|The ending order of the moving-average model. Enter an integer greater than or equal to 0. Values greater than 10 are not recommended.|No|
+|d|The order of differencing.|No|
+
+The `start_p`, `max_p`, `start_q`, and `max_q` parameters cause the model to find the optimal solution within the specified restrictions. Given the same input data, a larger range will result in higher resource consumption and slower response time.
+
+## Example
+
+In this example, forecasting is performed on the `i32` column. Each 10 data points in the column form a period. The values of `start_p` and `start_q` are both 1, and the corresponding ending values are both 5. The forecasting results are within a 95% confidence interval.
+
+```
+FORECAST(i32, "algo=arima,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5")
+```
+
+The complete SQL statement is shown as follows:
+
+```SQL
+SELECT _frowts, FORECAST(i32, "algo=arima,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5") from foo
+```
+
+```json5
+{
+"rows": fc_rows,  // Rows returned
+"period": period, // Period of results (equivalent to input period)
+"alpha": alpha,   // Confidence interval of results (equivalent to input confidence interval)
+"algo": "arima",  // Algorithm
+"mse": mse,       // Mean square error (MSE) of model generated for input time series
+"res": res        // Results in column format
+}
+```
+
+## References
+
+- https://en.wikipedia.org/wiki/Autoregressive_moving-average_model
+- [https://baike.baidu.com/item/自回归滑动平均模型/5023931](https://baike.baidu.com/item/%E8%87%AA%E5%9B%9E%E5%BD%92%E6%BB%91%E5%8A%A8%E5%B9%B3%E5%9D%87%E6%A8%A1%E5%9E%8B/5023931)
--- a/docs/en/06-advanced/06-tdgpt/04-forecast/03-holtwinters.md
+++ b/docs/en/06-advanced/06-tdgpt/04-forecast/03-holtwinters.md
@ -0,0 +1,53 @@
+---
+title: Holt-Winters
+sidebar_label: Holt-Winters
+---
+
+This document describes the usage of the Holt-Winters method for forecasting.
+
+##  Description
+
+Holt-Winters, or exponential moving average (EMA), is used to forecast non-stationary time series that have linear trends or periodic fluctuations. This method uses exponential smoothing to constantly adapt the model parameters to the changes in the time series and perform short-term forecasting.
+
+If seasonal variation remains mostly consistent within a time series, the additive Holt-Winters model is used, whereas if seasonal variation is proportional to the level of the time series, the multiplicative Holt-Winters model is used.
+
+Holt-Winters does not provide results within a confidence interval. The forecast results are the same as those on the upper and lower thresholds of the confidence interval.
+
+## Parameters
+
+Automated Holt-Winters modeling is performed in TDgpt. For this reason, the results for each input are automatically fitted to the most appropriate model. Forecasting is then performed based on the specified model.
+
+|Parameter|Description|Required?|
+|---|---|---|
+|period|The number of data points included in each period. If not specified or set to 0, exponential smoothing is applied for data fitting, and then future data is forecast.|No|
+|trend|Use additive (`add`) or multiplicative (`mul`) Holt-Winters for the trend model.|No|
+|seasonal|Use additive (`add`) or multiplicative (`mul`) Holt-Winters for seasonality.|No|
+
+## Example
+
+In this example, forecasting is performed on the `i32` column. Each 10 data points in the column form a period. Multiplicative Holt-Winters is used for trends and for seasonality.
+
+```
+FORECAST(i32, "algo=holtwinters,period=10,trend=mul,seasonal=mul")
+```
+
+The complete SQL statement is shown as follows:
+
+```SQL
+SELECT _frowts, FORECAST(i32, "algo=holtwinters, peroid=10,trend=mul,seasonal=mul") from foo
+```
+
+```json5
+{
+"rows": fc_rows,  // Rows returned
+"period": period, // Period of results (equivalent to input period; set to 0 if no periodicity)
+"algo": 'holtwinters' // Algorithm
+"mse": mse,       // Mean square error (MSE)
+"res": res        // Results in column format (typically returned as two columns, `timestamp` and `fc_results`.)
+}
+```
+
+## References
+
+- https://en.wikipedia.org/wiki/Exponential_smoothing
+- https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-simplified/
--- a/docs/en/06-advanced/06-tdgpt/04-forecast/04-lstm.md
+++ b/docs/en/06-advanced/06-tdgpt/04-forecast/04-lstm.md
@ -0,0 +1,31 @@
+---
+title: LSTM
+sidebar_label: LSTM
+---
+
+This document describes how to use LSTM in TDgpt.
+
+##  Description
+
+Long short-term memory (LSTM) is a special type of recurrent neural network (RNN) well-suited for tasks such as time-series data processing and natural language processing. Its unique gating mechanism allows it to effectively capture long-term dependencies. and address the gradient vanishing problem found in traditional RNNs, enabling more accurate predictions on sequential data. However, it does not directly provide confidence interval results for its computations.
+
+The complete SQL statement is shown as follows:
+
+```SQL
+SELECT _frowts, FORECAST(i32, "algo=lstm,alpha=95,period=10,start_p=1,max_p=5,start_q=1,max_q=5") from foo
+```
+
+```json5
+{
+"rows": fc_rows,  // Rows returned
+"period": period, // Period of results (equivalent to input period)
+"alpha": alpha,   // Confidence interval of results (equivalent to input confidence interval)
+"algo": "lstm",  // Algorithm
+"mse": mse,       // Mean square error (MSE) of model generated for input time series
+"res": res        // Results in column format
+}
+```
+
+## References
+
+- [1] Hochreiter S. Long Short-term Memory[J]. Neural Computation MIT-Press, 1997.
--- a/docs/en/06-advanced/06-tdgpt/04-forecast/05-mlp.md
+++ b/docs/en/06-advanced/06-tdgpt/04-forecast/05-mlp.md
@ -0,0 +1,33 @@
+---
+title: MLP
+sidebar_label: MLP
+---
+
+This document describes how to use MLP in TDgpt.
+
+##  Description
+
+MLP (Multilayer Perceptron) is a classic neural network model that can learn nonlinear relationships from historical data, capture patterns in time-series data, and make future value predictions. It performs feature extraction and mapping through multiple fully connected layers, generating prediction results based on the input historical data. Since it does not directly account for trends or seasonal variations, it typically requires data preprocessing to improve performance. It is well-suited for handling nonlinear and complex time-series problems.
+
+The complete SQL statement is shown as follows:
+
+```SQL
+SELECT _frowts, FORECAST(i32, "algo=mlp") from foo
+```
+
+```json5
+{
+"rows": fc_rows,  // Rows returned
+"period": period, // Period of results (equivalent to input period)
+"alpha": alpha,   // Confidence interval of results (equivalent to input confidence interval)
+"algo": "mlp",  // Algorithm
+"mse": mse,       // Mean square error (MSE) of model generated for input time series
+"res": res        // Results in column format
+}
+```
+
+## References
+
+- [1]Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. nature, 1986, 323(6088): 533-536.
+- [2]Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain[J]. Psychological review, 1958, 65(6): 386.
+- [3]LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
--- a/docs/en/06-advanced/06-tdgpt/04-forecast/index.md
+++ b/docs/en/06-advanced/06-tdgpt/04-forecast/index.md
@ -0,0 +1,198 @@
+---
+title: Forecasting Algorithms
+description: Forecasting Algorithms
+---
+
+import Image from '@theme/IdealImage';
+import fcResult from '../../../assets/tdgpt-04.png';
+
+Time-series forecasting takes a continuous period of time-series data as its input and forecasts how the data will trend in the next continuous period. The number of data points in the forecast results is not fixed, but can be specified by the user. TDgpt uses the `FORECAST` function to provide forecasting. The input for this function is the historical time-series data used as a basis for forecasting, and the output is forecast data. You can use the `FORECAST` function to invoke a forecasting algorithm on an anode to provide service. Forecasting is typically performed on a subtable or on the same time series across tables.
+
+In this section, the table `foo` is used as an example to describe how to perform forecasting and anomaly detection in TDgpt. This table is described as follows:
+
+| Column | Type | Description |
+| ------ | --------- | ---------------------------- |
+|ts|timestamp|Primary timestamp|
+|i32|int32|Metric generated by a device as a 4-byte integer|
+
+```sql
+taos> select * from foo;
+           ts            |      i32    |
+========================================
+ 2020-01-01 00:00:12.681 |          13 |
+ 2020-01-01 00:00:13.727 |          14 |
+ 2020-01-01 00:00:14.378 |           8 |
+ 2020-01-01 00:00:15.774 |          10 |
+ 2020-01-01 00:00:16.170 |          16 |
+ 2020-01-01 00:00:17.558 |          26 |
+ 2020-01-01 00:00:18.938 |          32 |
+ 2020-01-01 00:00:19.308 |          27 |
+```
+
+## Syntax
+
+```SQL
+FORECAST(column_expr, option_expr)
+
+option_expr: {"
+algo=expr1
+[,wncheck=1|0]
+[,conf=conf_val]
+[,every=every_val]
+[,rows=rows_val]
+[,start=start_ts_val]
+[,expr2]
+"}
+```
+
+1. `column_expr`: The time-series data column to forecast. Enter a column whose data type is numerical.
+2. `options`: The parameters for forecasting. Enter parameters in key=value format, separating multiple parameters with a comma (,). It is not necessary to use quotation marks or escape characters. Only ASCII characters are supported. The supported parameters are described as follows:
+
+## Parameter Description
+
+|Parameter|Definition|Default|
+| ------- | ------------------------------------------ | ---------------------------------------------- |
+|algo|Forecasting algorithm.|holtwinters|
+|wncheck|White noise data check. Enter 1 to enable or 0 to disable.|1|
+|conf|Confidence interval for forecast data. Enter an integer between 0 and 100, inclusive.|95|
+|every|Sampling period.|The sampling period of the input data|
+|start|Starting timestamp for forecast data.|One sampling period after the final timestamp in the input data|
+|rows|Number of forecast rows to return.|10|
+
+1. Three pseudocolumns are used in forecasting:`_FROWTS`: the timestamp of the forecast data; `_FLOW`: the lower threshold of the confidence interval; and `_FHIGH`: the upper threshold of the confidence interval. For algorithms that do not include a confidence interval, the `_FLOW` and `_FHIGH` pseudocolumns contain the forecast results.
+2. You can specify the `START` parameter to modify the starting time of forecast results. This does not affect the forecast values, only the time range.
+3. The `EVERY` parameter can be lesser than or equal to the sampling period of the input data. However, it cannot be greater than the sampling period of the input data.
+4. If you specify a confidence interval for an algorithm that does not use it, the upper and lower thresholds of the confidence interval regress to a single point.
+5. The maximum value of rows is 1024. If you specify a higher value, only 1024 rows are returned.
+6. The maximum size of the input historical data is 40,000 rows. Note that some models may have stricter limitations.
+
+## Example
+
+```SQL
+--- ARIMA forecast, return 10 rows of results (default), perform white noise data check, with 95% confidence interval 
+SELECT  _flow, _fhigh, _frowts, FORECAST(i32, "algo=arima")
+FROM foo;
+
+--- ARIMA forecast, periodic input data, 10 samples per period, disable white noise data check, with 95% confidence interval
+SELECT  _flow, _fhigh, _frowts, FORECAST(i32, "algo=arima,alpha=95,period=10,wncheck=0")
+FROM foo;
+```
+
+```sql
+taos> select _flow, _fhigh, _frowts, forecast(i32) from foo;
+        _flow         |        _fhigh        |       _frowts           | forecast(i32) |
+========================================================================================
+           10.5286684 |           41.8038254 | 2020-01-01 00:01:35.000 |            26 |
+          -21.9861946 |           83.3938904 | 2020-01-01 00:01:36.000 |            30 |
+          -78.5686035 |          144.6729126 | 2020-01-01 00:01:37.000 |            33 |
+         -154.9797363 |          230.3057709 | 2020-01-01 00:01:38.000 |            37 |
+         -253.9852905 |          337.6083984 | 2020-01-01 00:01:39.000 |            41 |
+         -375.7857971 |          466.4594727 | 2020-01-01 00:01:40.000 |            45 |
+         -514.8043823 |          622.4426270 | 2020-01-01 00:01:41.000 |            53 |
+         -680.6343994 |          796.2861328 | 2020-01-01 00:01:42.000 |            57 |
+         -868.4956665 |          992.8603516 | 2020-01-01 00:01:43.000 |            62 |
+        -1076.1566162 |         1214.4498291 | 2020-01-01 00:01:44.000 |            69 |
+```
+
+## Built-In Forecasting Algorithms
+
+- [ARIMA](./arima/)
+- [Holt-Winters](./holtwinters/)
+- Complex exponential smoothing (CES) 
+- Theta
+- Prophet
+- XGBoost
+- LightGBM
+- Multiple Seasonal-Trend decomposition using LOESS (MSTL)
+- ETS (Error, Trend, Seasonal)
+- Long Short-Term Memory (LSTM)
+- Multilayer Perceptron (MLP)
+- DeepAR
+- N-BEATS
+- N-HiTS
+- Patch Time Series Transformer (PatchTST)
+- Temporal Fusion Transformer
+- TimesNet
+
+## Evaluating Algorithm Effectiveness
+
+TDengine Enterprise includes `analytics_compare`, a tool that evaluates the effectiveness of time-series forecasting algorithms in TDgpt. You can configure this tool to perform backtesting on data stored in TDengine and determine which algorithms and models are most effective for your data. The evaluation is based on mean squared error (MSE). MAE and MAPE are in development.
+
+The configuration of the evaluation tool is described as follows:
+
+```ini
+[forecast]
+# number of data points per training period
+period = 10
+
+# consider final 10 rows of in-scope data as forecasting results
+rows = 10
+
+# start time of training data
+start_time = 1949-01-01T00:00:00
+
+# end time of training data
+end_time = 1960-12-01T00:00:00
+
+# start time of results
+res_start_time = 1730000000000
+
+# specify whether to create a graphical chart
+gen_figure = true
+```
+
+To use the tool, run `analytics_compare` in TDgpt's `misc` directory. Ensure that you run the tool on a machine with a Python environment installed. You can test the tool as follows:
+
+1. Configure your TDengine cluster information in the `analytics.ini` file:
+
+   ```ini
+   [taosd]
+   # taosd hostname
+   host = 127.0.0.1
+
+   # username
+   user = root
+
+   # password
+   password = taosdata
+
+   # tdengine configuration file
+   conf = /etc/taos/taos.cfg
+
+   [input_data]
+   # database for testing forecasting algorithms
+   db_name = test
+
+   # table with test data
+   table_name = passengers
+
+   # columns with test data
+   column_name = val, _c0   
+   ```
+
+2. Prepare your data. A sample data file `sample-fc.sql` is included in the `resource` directory. Run the following command to ingest the sample data into TDengine:
+
+   ```shell
+   taos -f sample-fc.sql
+   ```
+
+   You can now begin the evaluation.
+
+3. Ensure that the Python environment on the local machine is operational. Then run the following command:
+   
+   ```shell
+   python3.10 ./analytics_compare.py forecast
+   ```
+
+4. The evaluation results are written to `fc_result.xlsx`. The first card shows the results, shown as follows, including the algorithm name, parameters, mean square error, and elapsed time.
+
+   | algorithm   | params                                                                    | MSE     | elapsed_time(ms.) |
+   | ----------- | ------------------------------------------------------------------------- | ------- | ----------------- |
+   | holtwinters | `{"trend":"add", "seasonal":"add"}`                                       | 351.622 | 125.1721          |
+   | arima       | `{"time_step":3600000, "start_p":0, "max_p":10, "start_q":0, "max_q":10}` | 433.709 | 45577.9187        |
+
+If you set `gen_figure` to `true`, a chart is also generated, as displayed in the following figure.
+
+<figure>
+<Image img={fcResult} alt="Forecasting comparison"/>
+</figure>
--- a/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/02-statistics-approach.md
+++ b/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/02-statistics-approach.md
@ -0,0 +1,67 @@
+---
+title: Statistical Algorithms
+sidebar_label: Statistical Algorithms
+---
+
+- k-sigma<sup>[1]</sup>, or ***68–95–99.7 rule***: The *k* value defines how many standard deviations indicate an anomaly. The default value is 3. The k-sigma algorithm require data to be in a regular distribution. Data points that lie outside of *k* standard deviations are considered anomalous.
+
+|Parameter|Description|Required?|Default|
+|---|---|---|---|
+|k|Number of standard deviations|No|3|
+
+```SQL
+--- Use the k-sigma algorithm with a k value of 2
+SELECT _WSTART, COUNT(*)
+FROM foo
+ANOMALY_WINDOW(foo.i32, "algo=ksigma,k=2")
+```
+
+- Interquartile range (IQR)<sup>[2]</sup>: IQR divides a rank-ordered dataset into even quartiles, Q1 through Q3. IQR=Q3=Q1, for *v*, Q1 - (1.5 x IQR) \<= v \<= Q3 + (1.5 x IQR) is normal. Data points outside this range are considered anomalous. This algorithm does not take any parameters.
+
+```SQL
+--- Use the IQR algorithm.
+SELECT _WSTART, COUNT(*)
+FROM foo
+ANOMALY_WINDOW(foo.i32, "algo=iqr")
+```
+
+- Grubbs's test<sup>[3]</sup>, or maximum normalized residual test: Grubbs is used to test whether the deviation from mean of the maximum and minimum is anomalous. It requires a univariate data set in a close to normal distribution. Grubbs's test cannot be uses for datasets that are not normally distributed. This algorithm does not take any parameters.
+
+```SQL
+--- Use Grubbs's test.
+SELECT _WSTART, COUNT(*)
+FROM foo
+ANOMALY_WINDOW(foo.i32, "algo=grubbs")
+```
+
+- Seasonal Hybrid ESD (S-H-ESD)<sup>[4]</sup>: Extreme Studentized Deviate (ESD) can identify multiple anomalies in time-series data. You define whether to detect positive anomalies (`pos`), negative anomalies (`neg`), or both (`both`). The maximum proportion of data that can be anomalous (`max_anoms`) is at worst 49.9% Typically, the proportion of anomalies in a dataset does not exceed 5%.
+
+|Parameter|Description|Required?|Default|
+|---|---|---|---|
+|direction|Specify the direction of anomalies ('pos', 'neg', or 'both').|No|"both"|
+|max_anoms|Specify maximum proportion of data that can be anomalous *k*, where 0 \< *k* \<= 49.9|No|0.05|
+|period|The number of data points included in each period|No|0|
+
+
+```SQL
+--- Use the SHESD algorithm in both direction with a maximum 5% of the data being anomalous
+SELECT _WSTART, COUNT(*)
+FROM foo
+ANOMALY_WINDOW(foo.i32, "algo=shesd,direction=both,anoms=0.05")
+```
+
+The following algorithms are in development:
+
+- Gaussian Process Regression
+
+Change point detection--based algorithms:
+
+- CUSUM (Cumulative Sum Control Chart)
+- PELT (Pruned Exact Linear Time)
+
+## References
+
+1. [https://en.wikipedia.org/wiki/68–95–99.7 rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
+2. https://en.wikipedia.org/wiki/Interquartile_range
+3. Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T. (2015-01-14). "Data Transformation Technique to Improve the Outlier Detection Power of Grubbs's Test for Data Expected to Follow Linear Relation". Journal of Applied Mathematics. 2015: 1–9. doi:10.1155/2015/708948.
+4. Hochenbaum, O. S. Vallis, and A. Kejariwal. 2017. Automatic Anomaly Detection in the Cloud Via Statistical Learning. arXiv preprint arXiv:1704.07706 (2017).
--- a/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/03-data-density.md
+++ b/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/03-data-density.md
@ -0,0 +1,33 @@
+---
+title: Data Density Algorithms
+sidebar_label: Data Density Algorithms
+---
+
+## Data Density/Mining Algorithms
+
+Local outlier factor (LOF)<sup>[1]</sup>:
+
+LOF is a density-based algorithm for determining local outliers proposed by Breunig et al. in 2000. It is suitable for data with varying cluster densities and diverse dispersion. First, the local reachability density of each data point is calculated based on the density of its neighborhood. The local reachability density is then used to assign an outlier factor to each data point.
+
+This outlier factor indicates how anomalous a data point is. A higher factor indicates more anomalous data. Finally, the top *k* outliers are output.
+
+```SQL
+--- Use LOF.
+SELECT count(*)
+FROM foo
+ANOMALY_WINDOW(foo.i32, "algo=lof")
+```
+
+The following algorithms are in development:
+
+- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
+- K-Nearest Neighbors (KNN)
+- Principal Component Analysis (PCA)
+
+Third-party anomaly detection algorithms:
+
+- PyOD
+
+## References
+
+1. Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers (PDF). Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93–104. doi:10.1145/335191.335388. ISBN 1-58113-217-4.
--- a/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/04-machine-learning.md
+++ b/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/04-machine-learning.md
@ -0,0 +1,27 @@
+---
+title: Machine Learning Algorithms
+sidebar_label: Machine Learning Algorithms
+---
+
+TDgpt includes a built-in autoencoder for anomaly detection.
+
+This algorithm is suitable for detecting anomalies in periodic time-series data. It must be pre-trained on your time-series data.
+
+The trained model is saved to the `ad_autoencoder` directory. You then specify the model in your SQL statement.
+
+```SQL
+--- Add the name of the model `ad_autoencoder_foo` in the options of the anomaly window and detect anomalies in the dataset `foo` using the autoencoder algorithm.
+SELECT COUNT(*), _WSTART
+FROM foo
+ANOMALY_WINDOW(col1, 'algo=encoder, model=ad_autoencoder_foo');
+```
+
+The following algorithms are in development:
+
+- Isolation Forest
+- One-Class Support Vector Machines (SVM)
+- Prophet
+
+## References
+
+1. https://en.wikipedia.org/wiki/Autoencoder
--- a/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/index.md
+++ b/docs/en/06-advanced/06-tdgpt/05-anomaly-detection/index.md
@ -0,0 +1,119 @@
+---
+title: Anomaly Detection Algorithms
+description: Anomaly Detection Algorithms
+---
+
+import Image from '@theme/IdealImage';
+import anomDetect from '../../../assets/tdgpt-05.png';
+import adResult from '../../../assets/tdgpt-06.png';
+
+Anomaly detection is provided via an anomaly window that has been introduced into TDengine. An anomaly window is a special type of event window, defined by the anomaly detection algorithm as a time window during which an anomaly is occurring. This window differs from an event window in that the algorithm determines when it opens and closes instead of expressions input by the user. You can use the `ANOMALY_WINDOW` keyword in a `WHERE` clause to invoke the anomaly detection service. The window pseudocolumns `_WSTART`, `_WEND`, and `_WDURATION` record the start, end, and duration of the window. For example:
+
+```SQL
+--- Use the IQR algorithm to detect anomalies in the `col_val` column. Also return the start and end time of the anomaly window as well as the sum of the `col` column within the window.
+SELECT _wstart, _wend, SUM(col) 
+FROM foo
+ANOMALY_WINDOW(col_val, "algo=iqr");
+```
+
+As shown in the following figure, the anode returns the anomaly window [10:51:30, 10:53:40]. 
+
+<figure>
+<Image img={anomDetect} alt="Anomaly detection" />
+</figure>
+
+
+You can then query, aggregate, or perform other operations on the data in the window.
+
+## Syntax
+
+```SQL
+ANOMALY_WINDOW(column_name, option_expr)
+
+option_expr: {"
+algo=expr1
+[,wncheck=1|0]
+[,expr2]
+"}
+```
+
+1. `column_name`: The data column in which to detect anomalies. Specify only one column per query. The data type of the column must be numerical; string types such as NCHAR are not supported. Functions are not supported.
+2. `options`: The parameters for anomaly detection. Enter parameters in key=value format, separating multiple parameters with a comma (,). It is not necessary to use quotation marks or escape characters. Only ASCII characters are supported. For example: `algo=ksigma,k=2` indicates that the anomaly detection algorithm is k-sigma and the k value is 2.
+3. You can use the results of anomaly detection as the inner part of a nested query. The same functions are supported as in other windowed queries.
+4. White noise checking is performed on the input data by default. If the input data is white noise, no results are returned.
+
+## Parameter Description
+
+|Parameter|Definition|Default|
+| ------- | ------------------------------------------ | ------ |
+|algo|Specify the anomaly detection algorithm.|iqr|
+|wncheck|Enter 1 to perform the white noise data check or 0 to disable the white noise data check.|1|
+
+## Example
+
+```SQL
+--- Use the IQR algorithm to detect anomalies in the `i32` column.
+SELECT _wstart, _wend, SUM(i32) 
+FROM foo
+ANOMALY_WINDOW(i32, "algo=iqr");
+
+--- Use the k-sigma algorithm with k value of 2 to detect anomalies in the `i32`
+SELECT _wstart, _wend, SUM(i32) 
+FROM foo
+ANOMALY_WINDOW(i32, "algo=ksigma,k=2");
+
+taos> SELECT _wstart, _wend, count(*) FROM foo ANOMAYL_WINDOW(i32);
+         _wstart         |          _wend          |   count(*)    |
+====================================================================
+ 2020-01-01 00:00:16.000 | 2020-01-01 00:00:17.000 |             2 |
+Query OK, 1 row(s) in set (0.028946s)
+```
+
+## Built-In Anomaly Detection Algorithms
+
+TDgpt comes with six anomaly detection algorithms, divided among the following three categories: [Statistical Algorithms](./02-statistics-approach.md), [Data Density Algorithms](./03-data-density.md), and [Machine Learning Algorithms](./04-machine-learning.md). If you do not specify an algorithm, the IQR algorithm is used by default.
+
+## Evaluating Algorithm Effectiveness
+
+TDgpt provides an automated tool to compare the effectiveness of different algorithms across various datasets. For anomaly detection algorithms, it uses the recall and precision metrics to evaluate their performance.
+
+By setting the following options in the configuration file `analysis.ini`, you can specify the anomaly detection algorithm to be used, the time range of the test data, whether to generate annotated result images, the desired algorithm, and its corresponding parameters.
+
+Before comparing anomaly detection algorithms, you must manually label the results of the anomaly detection dataset. This is done by setting the value of the [anno_res] option. Each number in the array represents the index of an anomaly. For example, in the test dataset below, if the 9th point is an anomaly, the labeled result would be [9].
+
+```bash
+[ad]
+# training data start time
+start_time = 2021-01-01T01:01:01
+
+# training data end time
+end_time = 2021-01-01T01:01:11
+
+# draw the results or not
+gen_figure = true
+
+# annotate the anomaly_detection result
+anno_res = [9]
+
+# algorithms list that is involved in the comparion
+[ad.algos]
+ksigma={"k": 2}
+iqr={}
+grubbs={}
+lof={"algorithm":"auto", "n_neighbor": 3}
+```
+
+After the comparison program finishes running, it automatically generates a file named ·ad_result.xlsx·. The first sheet contains the algorithm execution results (as shown in the table below), including five metrics: algorithm name, execution parameters, recall, precision, and execution time.
+
+| algorithm | params                                 | precision(%) | recall(%) | elapsed_time(ms.) |
+| --------- | -------------------------------------- | ------------ | --------- | ----------------- |
+| ksigma    | `{"k":2}`                              | 100          | 100       | 0.453             |
+| iqr       | `{}`                                   | 100          | 100       | 2.727             |
+| grubbs    | `{}`                                   | 100          | 100       | 2.811             |
+| lof       | `{"algorithm":"auto", "n_neighbor":3}` | 0            | 0         | 4.660             |
+
+If `gen_figure` is set to true, the tool automatically generates a visual representation of the analysis results for each algorithm being compared. The k-sigma algorithm is shown here as an example.
+
+<figure>
+<Image img={adResult} alt="Anomaly detection results"/>
+</figure>
--- a/docs/en/06-advanced/06-tdgpt/06-dev/02-forecast.md
+++ b/docs/en/06-advanced/06-tdgpt/06-dev/02-forecast.md
@ -0,0 +1,112 @@
+---
+title: Forecasting Algorithms
+sidebar_label: Forecasting Algorithms
+---
+
+## Input Limitations
+
+`execute` is the core method of forecasting algorithms. Before calling this method, the framework configures the historical time-series data used for forecasting in the `self.list` object attribute.
+
+## Output Limitations and Parent Class Attributes
+
+Running the `execute` method generates the following dictionary objects:
+
+```python
+return {
+    "mse": mse, # Mean squared error of the fit data
+    "res": res  # Result groups [timestamp, forecast results, lower boundary of confidence interval, upper boundary of confidence interval]
+}
+```
+
+The parent class `AbstractForecastService` of forecasting algorithms includes the following object attributes.
+
+|Attribute|Description|Default|
+|---|---|---|
+|period|Specify the periodicity of the data, i.e. the number of data points included in each period. If the data is not periodic, enter 0.|0|
+|start_ts|Specify the start time of forecasting results.|0|
+|time_step|Specify the interval between consecutive data points in the forecast results.|0|
+|fc_rows|Specify the number of forecast rows to return.|0|
+|return_conf|Specify 1 to include a confidence interval in the forecast results or 0 to not include a confidence interval in the results. If you specify 0, the mean is returned as the upper and lower boundaries.|1|	
+|conf|Specify a confidence interval quantile.|95|
+
+## Sample Code
+
+The following code is an sample algorithm that always returns 1 as the forecast results.
+
+```python
+import numpy as np
+from taosanalytics.service import AbstractForecastService
+
+
+# Algorithm files must start with an underscore ("_") and end with "Service".
+class _MyForecastService(AbstractForecastService):
+    """ Define a class inheriting from AbstractAnomalyDetectionService and implementing the `execute` method.  """
+
+    # Name the algorithm using only lowercase ASCII characters.
+    name = 'myfc'
+
+    # Include a description of the algorithm (recommended)
+    desc = """return the forecast time series data"""
+
+    def __init__(self):
+        """Method to initialize the class"""
+        super().__init__()
+
+    def execute(self):
+        """ Implementation of algorithm logic"""
+        res = []
+
+        """This algorithm always returns 1 as the forecast result. The number of results returned is determined by the self.fc_rows value input by the user."""
+        ts_list = [self.start_ts + i * self.time_step for i in range(self.fc_rows)]
+        res.append(ts_list)  # set timestamp column for forecast results
+
+        """Generate forecast results whose value is 1. """
+        res_list = [1] * self.fc_rows
+        res.append(res_list)
+
+        """Check whether user has requested the upper and lower boundaries of the confidence interval."""
+        if self.return_conf:
+            """If the algorithm does not calculate these values, return the forecast results."""
+            bound_list = [1] * self.fc_rows
+            res.append(bound_list)  # lower confidence limit
+            res.append(bound_list)  # upper confidence limit
+
+        """Return results"""
+        return {"res": res, "mse": 0}
+
+    def set_params(self, params):
+        """This algorithm does not take any parameters, only calling a parent function, so this logic is not included."""
+        return super().set_params(params)
+
+```
+
+Save this file to the `./lib/taosanalytics/algo/fc/` directory and restart the `taosanode` service. In the TDengine CLI, run `SHOW ANODES FULL` to see your new algorithm. Your applications can now use this algorithm via SQL.
+
+```SQL
+--- Detect anomalies in the `col` column using the newly added `myfc` algorithm
+SELECT  _flow, _fhigh, _frowts, FORECAST(col_name, "algo=myfc")
+FROM foo;
+```
+
+If you have never started the anode, see [Installation](../../management/) to add the anode to your TDengine cluster.
+
+## Unit Testing
+
+You can add unit test cases to the `forecase_test.py` file in the `taosanalytics/test` directory or create a file for unit tests. Unit tests have a depency on the Python unittest module.
+
+```python
+def test_myfc(self):
+    """ Test the myfc class """
+    s = loader.get_service("myfc")
+
+    # Configure data for forecasting
+    s.set_input_list(self.get_input_list(), None)
+    # Check whether all results are 1
+    r = s.set_params(
+        {"fc_rows": 10, "start_ts": 171000000, "time_step": 86400 * 30, "start_p": 0}
+    )
+    r = s.execute()
+
+    expected_list = [1] * 10
+    self.assertEqlist(r["res"][0], expected_list)
+```
--- a/docs/en/06-advanced/06-tdgpt/06-dev/03-ad.md
+++ b/docs/en/06-advanced/06-tdgpt/06-dev/03-ad.md
@ -0,0 +1,79 @@
+---
+title: Anomaly Detection Algorithms
+sidebar_label: Anomaly Detection Algorithms
+---
+
+## Input Limitations
+
+`execute` is the core method of anomaly detection algorithms. Before calling this method, the framework configures the historical time-series data used for anomaly detection in the `self.list` object attribute.
+
+## Output Limitations
+
+The `execute` method returns an array of the same length as `self.list`. A value of `-1` in the array indicates an anomaly.
+
+For example, in the series `[2, 2, 2, 2, 100]`, assuming that `100` is an anomaly, the method returns `[1, 1, 1, 1, -1]`.
+
+## Sample Code
+
+This section describes an example anomaly detection algorithm that returns the final data point in a time series as an anomaly.
+
+```python
+from taosanalytics.service import AbstractAnomalyDetectionService
+
+# Algorithm files must start with an underscore ("_") and end with "Service".
+class _MyAnomalyDetectionService(AbstractAnomalyDetectionService):
+    """ Define a class inheriting from AbstractAnomalyDetectionService and implementing the abstract method of that class.  """
+
+    # Name the algorithm using only lowercase ASCII characters.
+    name = 'myad'
+
+    # Include a description of the algorithm (recommended)
+    desc = """return the last value as the anomaly data"""
+
+    def __init__(self):
+        """Method to initialize the class"""
+        super().__init__()
+
+    def execute(self):
+        """ Implementation of algorithm logic"""
+
+        """Create an array with length len(self.list) whose results are all 1, then set the final value in the array to -1 to indicate an anomaly"""
+        res = [1] * len(self.list)
+        res[-1] = -1
+
+        """Return results"""
+        return res
+
+	
+    def set_params(self, params):
+        """This algorithm does not take any parameters, so this logic is not included."""
+        
+```
+
+Save this file to the `./lib/taosanalytics/algo/ad/` directory and restart the `taosanode` service. In the TDengine CLI, run `SHOW ANODES FULL` to see your new algorithm. Your applications can now invoke this algorithm via SQL.
+
+```SQL
+--- Detect anomalies in the `col` column using the newly added `myad` algorithm
+SELECT COUNT(*) FROM foo ANOMALY_WINDOW(col, 'algo=myad')
+```
+
+If you have never started the anode, see [Installation](../../management/) to add the anode to your TDengine cluster.
+
+### Unit Testing
+
+You can add unit test cases to the `anomaly_test.py` file in the `taosanalytics/test` directory or create a file for unit tests. The framework uses the Python unittest module.
+
+```python
+def test_myad(self):
+    """ Test the _IqrService class """
+    s = loader.get_service("myad")
+
+    # Configure the data to test
+    s.set_input_list(AnomalyDetectionTest.input_list, None)
+
+    r = s.execute()
+
+    # The final value is an anomaly
+    self.assertEqual(r[-1], -1)
+    self.assertEqual(len(r), len(AnomalyDetectionTest.input_list))
+```
--- a/docs/en/06-advanced/06-tdgpt/06-dev/index.md
+++ b/docs/en/06-advanced/06-tdgpt/06-dev/index.md
@ -0,0 +1,100 @@
+---
+title: Algorithm Developer's Guide
+sidebar_label: Algorithm Developer's Guide
+---
+
+TDgpt is an extensible platform for advanced time-series data analytics. You can follow the steps described in this document to develop your own analytics algorithms and add them to the platform. Your applications can then use SQL statements to invoke these algorithms. Custom algorithms must be developed in Python.
+
+The anode adds algorithms semi-dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:
+
+1. Develop an analytics algorithm according to the TDgpt requirements.
+2. Place the source code files in the appropriate directory and restart the anode.
+3. Run the `CREATE ANODE` statement to add the anode to your TDengine cluster.
+   
+Your algorithm has been added to TDgpt and can be used by your applications. Because TDgpt is decoupled from TDengine, adding or upgrading algorithms on the anode does not affect the TDengine server (taosd). On the application side, it is necessary only to update your SQL statements to start using new or upgraded algorithms.
+
+This extensibility makes TDgpt suitable for a wide range of use cases. You can add any algorithms needed by your use cases on demand and invoke them via SQL. You can also update algorithms without making significant changes to your applications.
+
+This document describes how to add algorithms to an anode and invoke them with SQL statements.
+
+## Directory Structure
+
+The directory structure of an anode is described as follows:
+
+```bash
+.
+├── bin
+├── cfg
+├── lib
+│   └── taosanalytics
+│       ├── algo
+│       │   ├── ad
+│       │   └── fc
+│       ├── misc
+│       └── test
+├── log -> /var/log/taos/taosanode
+├── model -> /var/lib/taos/taosanode/model
+└── venv -> /var/lib/taos/taosanode/venv
+
+```
+
+|Directory|Description|
+|---|---|
+|taosanalytics| Source code, including the `algo` subdirectory for algorithms, the `test` subdirectory for unit and integration tests, and the `misc` subdirectory for other files. Within the `algo` subdirectory, the `ad` subdirectory includes anomaly detection algorithms, and the `fc` subdirectory includes forecasting algorithms.|
+|venv| Virtual Python environment |
+|model|Trained models for datasets|
+|cfg|Configuration files|
+
+:::note
+- Place Python source code for anomaly detection in the `./taos/algo/ad` directory.
+- Place Python source code for forecasting in the `./taos/algo/fc` directory.
+:::
+
+## Class Naming Rules
+
+The anode adds algorithms automatically. Your algorithm must therefore consist of appropriately named Python files. Algorithm files must start with an underscore (`_`) and end with `Service`. For example: `_KsigmaService` is the name of the k-sigma anomaly detection algorithm.
+
+## Class Inheritance Rules
+
+- All anomaly detection algorithms must inherit `AbstractAnomalyDetectionService` and implement the `execute` method.
+- All forecasting algorithms must inherit `AbstractForecastService` and implement the `execute` method.
+
+## Class Property Initialization
+
+Your classes must initialize the following properties:
+
+- `name`: identifier of the algorithm. Use lowercase letters only. This identifier is displayed when you use the `SHOW` statement to display available algorithms.
+- `desc`: basic description of the algorithm.
+
+```SQL
+--- The `algo` key takes the defined `name` value.
+SELECT COUNT(*)
+FROM foo ANOMALY_WINDOW(col_name, 'algo=name')
+```
+
+## Adding Algorithms with Models
+
+Certain machine learning algorithms must be trained on your data and generate a model. The same algorithm may use different models for different datasets.
+When you add an algorithm that uses models to your anode, first create subdirectories for your models in the `model` directory, and save the trained model for each algorithm and dataset to the corresponding subdirectory. You can specify custom names for these subdirectories in your algorithms. Use the `joblib` library to serialize trained models to ensure that they can be read and loaded.
+
+The following section describes how to add an anomaly detection algorithm that requires trained models. The autoencoder algorithm is used as an example.
+First, create the `ad_autoencoder` subdirectory in the `model` directory. This subdirectory is used to store models for the autoencoder algorithm. Next, train the algorithm on the `foo` table and obtain a trained model named `ad_autoencoder_foo`. Use the `joblib` library to serialize the model and save it to the `ad_autoencoder` subdirectory. As shown in the following figure, the `ad_autoencoder_foo` model comprises two files: the model file `ad_autoencoder_foo.dat` and the model description `ad_autoencoder_foo.info`.
+
+```bash
+.
+└── model
+    └── ad_autoencoder
+        ├── ad_autoencoder_foo.dat
+        └── ad_autoencoder_foo.info
+
+```
+
+The following section describes how to invoke this model with a SQL statement.
+Set the `algo` parameter to `ad_encoder` to instruct TDgpt to use the autoencoder algorithm. This algorithm is in the available algorithms list and can be used directly. Set the `model` parameter to `ad_autoencoder_foo` to instruct TDgpt to use the trained model generated in the previous section.
+
+```SQL
+--- Add the name of the model `ad_autoencoder_foo` in the options of the anomaly window and detect anomalies in the dataset `foo` using the autoencoder algorithm.
+SELECT COUNT(*), _WSTART
+FROM foo
+ANOMALY_WINDOW(col1, 'algo=ad_encoder, model=ad_autoencoder_foo');
+```
--- a/docs/en/06-advanced/06-tdgpt/07-imputation.md
+++ b/docs/en/06-advanced/06-tdgpt/07-imputation.md
@ -0,0 +1,6 @@
+---
+title: Data Imputation
+sidebar_label: Data Imputation
+---
+
+Coming soon
--- a/docs/en/06-advanced/06-tdgpt/08-classification.md
+++ b/docs/en/06-advanced/06-tdgpt/08-classification.md
@ -0,0 +1,6 @@
+---
+title: Time-Series Classification
+sidebar_label: Time-Series Classification
+---
+
+Coming soon
--- a/docs/en/06-advanced/06-tdgpt/09-tutorial.md
+++ b/docs/en/06-advanced/06-tdgpt/09-tutorial.md
@ -0,0 +1,75 @@
+---
+title: Quick Start Guide
+sidebar_label: Quick Start Guide
+---
+
+## Get Started with Docker
+
+This document describes how to get started with TDgpt in Docker.
+
+### Start TDgpt
+
+If you have installed Docker, pull the latest TDengine container:
+
+```shell
+docker pull tdengine/tdengine:latest
+```
+
+You can specify a version if desired:
+
+```shell
+docker pull tdengine/tdengine:3.3.3.0
+```
+
+Then run the following command:
+
+```shell
+docker run -d -p 6030:6030 -p 6041:6041 -p 6043:6043 -p 6044-6049:6044-6049 -p 6044-6045:6044-6045/udp -p 6060:6060 tdengine/tdengine
+```
+
+Note: TDgpt runs on TCP port 6090. TDgpt is a stateless analytics agent and does not persist data. It only saves log files to local disk
+
+Confirm that your Docker container is running:
+
+```shell
+docker ps
+```
+
+Enter the container and run the bash shell:
+
+```shell
+docker exec -it <container name> bash
+```
+
+You can now run Linux commands and access TDengine.
+
+## Get Started with an Installation Package
+
+### Obtain the Package
+
+1. Download the tar.gz package from the list:
+2. Open the directory containing the downloaded package and decompress it.
+3. Open the directory containing the decompressed package and run the `install.sh` script.
+
+Note: Replace `<version>` with the version that you downloaded.
+
+```bash
+tar -zxvf TDengine-anode-<version>-Linux-x64.tar.gz
+```
+
+Decompress the file, open the directory created, and run the `install.sh` script:
+
+```bash
+sudo ./install.sh
+```
+
+### Deploy TDgpt
+
+See [Installing TDgpt](../management/) to prepare your environment and deploy TDgpt.
+
+## Get Started in TDengine Cloud
+
+You can use TDgpt with your TDengine Cloud deployment. Register for a TDengine Cloud account, ensure that you have at least one instance, and register TDgpt to your TDengine Cloud instance as described in the documentation. See the TDengine Cloud documentation for more information.
+
+Create a TDgpt instance, and then refer to [Installing TDgpt](../management/) to manage your anode.
+
--- a/docs/en/06-advanced/06-tdgpt/10-faq.md
+++ b/docs/en/06-advanced/06-tdgpt/10-faq.md
@ -0,0 +1,48 @@
+---
+title: Frequently Asked questions
+sidebar_label: Frequently Asked questions
+---
+
+## 1. During the installation process, uWSGI fails to compile
+
+The TDgpt installation process compiles uWSGI on your local machine. In certain Python distributions, such as Anaconda, conflicts may occur during compilation. In this case, you can choose not to install uWSGI.
+
+However, this means that you must manually run the `python3.10 /usr/local/taos/taosanode/lib/taosanalytics/app.py` command when starting the taosanode service. Use a virtual Python environment when running this command to ensure that dependencies can be loaded.
+
+## 2. Anodes fail to be created because the service cannot be accessed.
+
+```bash
+taos> create anode '127.0.0.1:6090';
+
+DB error: Analysis service can't access[0x80000441] (0.117446s)
+```
+
+First, use curl to check whether the anode is providing services: The output of `curl '127.0.0.1:6090'` should be as follows:
+
+```bash
+TDengine© Time Series Data Analytics Platform (ver 1.0.x)
+```
+
+The following output indicates that the anode is not providing services:
+
+```bash
+curl: (7) Failed to connect to 127.0.0.1 port 6090: Connection refused
+```
+
+If the anode has not started or is not running, check the uWSGI log logs in the `/var/log/taos/taosanode/taosanode.log` file to find and resolve any errors.
+
+Note: Do not use systemctl to check the status of the taosanode service.
+
+## 3. The service is operational, but queries return that the service is not available.
+
+```bash
+taos> select _frowts,forecast(current, 'algo=arima, alpha=95, wncheck=0, rows=20') from d1 where ts<='2017-07-14 10:40:09.999';
+
+DB error: Analysis service can't access[0x80000441] (60.195613s)
+```
+
+The timeout period for the analysis service is 60 seconds. If the analysis process cannot be completed within this period, this error will occur. You can reduce the scope of data being analyzed or try another algorithm to avoid the error.
+
+## 4. Illegal json format error is returned.
+
+This indicates that the analysis results contain an error. Check the anode operation logs in the `/var/log/taos/taosanode/taosanode.app.log` file to find and resolve any issues.
--- a/docs/en/06-advanced/06-tdgpt/index.md
+++ b/docs/en/06-advanced/06-tdgpt/index.md
@ -0,0 +1,116 @@
+---
+sidebar_label: TDgpt
+title: TDgpt
+---
+
+import Image from '@theme/IdealImage';
+import tdgptArch from '../../assets/tdgpt-01.png';
+
+## Introduction
+
+Numerous algorithms have been proposed to perform time-series forecasting, anomaly detection, imputation, and classification, with varying technical characteristics suited for different scenarios.
+
+Typically, these analysis algorithms are packaged as toolkits in high-level programming languages (such as Python or R) and are widely distributed and used through open-source channels. This model helps software developers integrate complex analysis algorithms into their systems and greatly lowers the barrier to using advanced algorithms.
+
+Database system developers have also attempted to integrate data analysis algorithm models directly into database systems. By building machine learning libraries (e.g., Spark’s MLlib), they aim to leverage mature analytical techniques to enhance the advanced data analysis capabilities of databases or analytical computing engines. 
+
+The rapid development of artificial intelligence (AI) has brought new opportunities to time-series data analysis. Efficiently applying AI capabilities to this field also presents new possibilities for databases. To this end, TDengine has introduced TDgpt, an intelligent agent for time-series analytics. With TDgpt, you can use statistical analysis algorithms, machine learning models, deep learning models, foundational models for time-series data, and large language models via SQL statements. TDgpt exposes the analytical capabilities of these algorithms and models through SQL and applies them to your time-series data using new windows and functions.
+
+## Technical Features
+
+TDgpt is an external agent that integrates seamlessly with TDengine’s main process taosd. It allows time-series analysis services to be embedded directly into TDengine’s query execution flow.
+
+TDgpt is a stateless platform that includes the classic statsmodels library of statistical analysis models as well as embedded frameworks such as Torch and Keras for machine and deep learning. In addition, it can directly invoke TDengine’s proprietary foundation model TDtsfm through request forwarding and adaptation.
+
+As an analytics agent, TDgpt will also support integration with third-party time-series model-as-a-service (MaaS) platforms in the future. By modifying just a single parameter (algo), you will be able to access cutting-edge time-series model services. 
+
+TDgpt is an open system to which you can easily add your own algorithms for forecasting, anomaly detection, imputation, and classification. Once added, the new algorithms can be used simply by changing the corresponding parameters in the SQL statement, with no need to modify a single line of application code.
+
+## System Architecture
+
+TDgpt is composed of one or more stateless analysis nodes, called AI nodes (anodes). These anodes can be deployed as needed across the TDengine cluster in appropriate hardware environments (for example, on compute nodes equipped with GPUs) depending on the requirements of the algorithms being used. 
+
+TDgpt provides a unified interface and invocation method for different types of analysis algorithms. Based on user-specified parameters, it calls advanced algorithm packages and other analytical tools, then returns the results to TDengine’s main process (taosd) in a predefined format.
+
+TDgpt consists of four main components:
+
+- Built-in analytics libraries: Includes libraries such as statsmodels, pyculiarity, and pmdarima, offering ready-to-use models for forecasting and anomaly detection.
+- Built-in machine learning libraries: Includes libraries like Torch, Keras, and Scikit-learn to run pre-trained machine and deep learning models within TDgpt’s process space. The training process can be managed using end-to-end open-source ML frameworks such as Merlion or Kats, and trained models can be deployed by uploading them to a designated TDgpt directory.
+- Request adapter for general-purpose LLMs: Converts time-series forecasting requests into prompts for general-purpose LLMs such as Llama in a MaaS manner. (Note: This functionality is not open source.)
+- Adapter for locally deployed time-series models: Sends requests directly to models like Time-MoE and TDtsfm that are specifically designed for time-series data. Compared to general-purpose LLMs, these models do not require prompt engineering, are lighter-weight, and are easier to deploy locally with lower hardware requirements. In addition, the adapter can also connect to cloud-based time-series MaaS systems such as TimeGPT, enabling localized analysis powered by cloud-hosted models.
+
+<figure>
+<Image img={tdgptArch} alt="TDgpt Architecture"/>
+<figcaption>TDgpt architecture</figcaption>
+</figure>
+
+During query execution, the vnode in TDengine forwards any elements involving advanced time-series data analytics directly to the anode. Once the analysis is completed, the results are assembled and embedded back into the query execution process.
+
+## Advanced Analytics
+
+The analytics services provided by TDgpt are described as follows:
+
+- Anomaly detection: This service is provided via a new anomaly window that has been introduced into TDengine. An anomaly window is a special type of event window, defined by the anomaly detection algorithm as a time window during which an anomaly is occurring. This window differs from an event window in that the algorithm determines when it opens and closes instead of expressions input by the user. The query operations supported by other windows are also supported for anomaly windows.
+- Time-series forecasting: The FORECAST function invokes a specified (or default) forecasting algorithm to predict future time-series data based on input historical data.
+- Data imputation: To be released in July 2025
+- Time-series classification: To be released in July 2025
+
+## Custom Algorithms
+
+TDgpt is an extensible platform to which you can add your own algorithms and models using the process described in [Algorithm Developer's Guide](./dev/). After adding an algorithm, you can access it through SQL statements just like the built-in algorithms. It is not necessary to make updates to your applications.
+
+Custom algorithms must be developed in Python. The anode adds algorithms dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:
+
+1. Develop an analytics algorithm according to the TDgpt requirements.
+2. Place the source code files in the appropriate directory and restart the anode.
+3. Refresh the algorithm cache table.
+
+You can then use your new algorithm in SQL statements.
+
+## Algorithm Evaluation
+
+TDengine Enterprise includes a tool that evaluates the effectiveness of different algorithms and models. You can use this tool on any algorithm or model in TDgpt, including built-in and custom forecasting and anomaly detection algorithms and models. The tool uses quantitative metrics to evaluate the accuracy and performance of each algorithm with a given dataset in TDengine.
+
+## Model Management
+
+Trained models for machine learning frameworks such as Torch, TensorFlow, and Keras must be placed in the designated directory on the anode. The anode automatically detects and loads models from this directory.
+
+TDengine Enterprise includes a model manager that integrates seamlessly with open-source end-to-end ML frameworks for time-series data such as Merlion and Kats.
+
+## Processing Performance
+
+Time-series analytics is a CPU-intensive workflow. Using a more powerful CPU or GPU can improve performance.
+
+Machine and deep learning models in TDgpt are run through torch, and you can use standard methods of improving performance, for example deploying TDgpt on a machine with more RAM and uing a torch model that can use GPUs.
+
+You can add different algorithms and models to different anodes to enable concurrent processing.
+
+## Operations and Maintenance
+
+With TDengine OSS, permissions and resource management are not provided for TDgpt.
+
+TDgpt is deployed as a Flask service through uWSGI. You can monitor its status by opening the port in uWSGI.
+
+## References
+
+[1] Merlion: https://opensource.salesforce.com/Merlion/latest/index.html
+
+[2] Kats: https://facebookresearch.github.io/Kats/
+
+[3] StatsModels: https://www.statsmodels.org/stable/index.html
+
+[4] Keras: https://keras.io/guides/
+
+[5] Torch: https://pytorch.org/
+
+[6] Scikit-learn: https://scikit-learn.org/stable/index.html
+
+[7] Time-MoE: https://github.com/Time-MoE/Time-MoE
+
+[8] TimeGPT: https://docs.nixtla.io/docs/getting-started-about_timegpt
+
+[9] DeepSeek: https://www.deepseek.com/
+
+[10] Llama: https://www.llama.com/docs/overview/
+
+[11] Spark MLlib: https://spark.apache.org/docs/latest/ml-guide.html
--- a/docs/en/assets/tdgpt-01.png
+++ b/docs/en/assets/tdgpt-01.png
--- a/docs/en/assets/tdgpt-02.png
+++ b/docs/en/assets/tdgpt-02.png
--- a/docs/en/assets/tdgpt-03.png
+++ b/docs/en/assets/tdgpt-03.png
--- a/docs/en/assets/tdgpt-04.png
+++ b/docs/en/assets/tdgpt-04.png
--- a/docs/en/assets/tdgpt-05.png
+++ b/docs/en/assets/tdgpt-05.png
--- a/docs/en/assets/tdgpt-06.png
+++ b/docs/en/assets/tdgpt-06.png