tsma user manual

This commit is contained in:
wangjiaming0909 2024-04-25 17:21:13 +08:00
parent 1b5ababb98
commit 51760ef12f
3 changed files with 21 additions and 29 deletions

View File

@ -34,6 +34,13 @@ SELECT * FROM information_schema.INS_INDEXES
You can also add filter conditions to limit the results.
````sql
SHOW INDEXES FROM tbl_name [FROM db_name];
SHOW INDEXES FROM [db_name.]tbl_name ;
````
Use `show indexes` commands to show indices that have been created for the specified database or table.
## Detailed Specification
1. Indexes can improve query performance significantly if they are used properly. The operators supported by tag index include `=`, `>`, `>=`, `<`, `<=`. If you use these operators with tags, indexes can improve query performance significantly. However, for operators not in this scope, indexes don't help. More and more operators will be added in future.

View File

@ -10,7 +10,7 @@ To improve the performance of aggregate function queries on large datasets, you
```sql
-- Create TSMA based on a super table or regular table
CREATE TSMA tsma_name ON [dbname].table_name FUNCTION (func_name(func_param) [, ...] ) INTERVAL(time_duration);
CREATE TSMA tsma_name ON [dbname.]table_name FUNCTION (func_name(func_param) [, ...] ) INTERVAL(time_duration);
-- Create a large window TSMA based on a small window TSMA
CREATE RECURSIVE TSMA tsma_name ON [db_name.]tsma_name1 INTERVAL(time_duration);
@ -26,9 +26,9 @@ TSMA can only be created based on super tables and regular tables, not on subtab
In the function list, you can only specify supported aggregate functions (see below), and the number of function parameters must be 1, even if the current function supports multiple parameters. The function parameters must be ordinary column names, not tag columns. Duplicate functions and columns in the function list will be deduplicated. When calculating TSMA, all `intermediate results of the functions` will be output to another super table, and the output super table also includes all tag columns of the original table. The maximum number of functions in the function list is the maximum number of columns in the output table (including tag columns) minus the four additional columns added for TSMA calculation, namely `_wstart`, `_wend`, `_wduration`, and a new tag column `tbname`, minus the number of tag columns in the original table. If the number of columns exceeds the limit, an error `Too many columns` will be reported.
Since the output of TSMA is a super table, the row length of the output table is subject to the maximum row length limit. The size of the `intermediate results of different functions` vary, but they are generally larger than the original data size. If the row length of the output table exceeds the maximum row length limit, an error `Row length exceeds max length` will be reported.
Since the output of TSMA is a super table, the row length of the output table is subject to the maximum row length limit. The size of the `intermediate results of different functions` varies, but they are generally larger than the original data size. If the row length of the output table exceeds the maximum row length limit, an error `Row length exceeds max length` will be reported. In this case, you need to reduce the number of functions or split commonly used functions groups into multiple TSMA objects.
The window size is limited to [1ms ~ 1h].
The window size is limited to [1ms ~ 1h]. The unit of INTERVAL is the same as the INTERVAL clause in the query, such as a (milliseconds), b (nanoseconds), h (hours), m (minutes), s (seconds), u (microseconds).
TSMA is a database-level object, but it is globally unique. The number of TSMA that can be created in the cluster is limited by the parameter `maxTsmaNum`, with a default value of 8 and a range of [0-12]. Note that since TSMA background calculation uses stream computing, creating a TSMA will create a stream. Therefore, the number of TSMA that can be created is also limited by the number of existing streams and the maximum number of streams that can be created.
@ -65,24 +65,16 @@ Client configuration parameter: `querySmaOptimize`, used to control whether to u
Client configuration parameter: `maxTsmaCalcDelay`, in seconds, is used to control the acceptable TSMA calculation delay for users. If the calculation progress of a TSMA is within this range from the latest time, the TSMA will be used. If it exceeds this range, it will not be used. The default value is 600 (10 minutes), with a minimum value of 600 (10 minutes) and a maximum value of 86400 (1 day).
### Selection of TSMAs
### Using TSMA Duraing Query
The aggregate functions defined in TSMA can be directly used in most query scenarios. If multiple TSMA are available, the one with the larger window size is preferred. For unclosed windows, the calculation can be done using smaller window TSMA or the original data. However, there are certain scenarios where TSMA cannot be used (see below). In such cases, the entire query will be calculated using the original data.
The default behavior for queries without specified window sizes is to prioritize the use of the largest window TSMA that includes all the aggregate functions used in the query. For example, `SELECT COUNT(*) FROM stable GROUP BY tbname` will use the TSMA with the largest window that includes the `count(ts)` function.
The default behavior for queries without specified window sizes is to prioritize the use of the largest window TSMA that includes all the aggregate functions used in the query. For example, `SELECT COUNT(*) FROM stable GROUP BY tbname` will use the TSMA with the largest window that includes the `count(ts)` function. Therefore, when using aggregate queries frequently, it is recommended to create TSMA objects with larger window size.
When specifying the window size, which is the `INTERVAL` statement, use the largest TSMA window that is divisible by the window size of the query. In window queries, the window size of the `INTERVAL`, `OFFSET`, and `SLIDING` all affect the TSMA window size that can be used. Divisible window TSMA refers to a TSMA window size that is divisible by the `INTERVAL`, `OFFSET`, and `SLIDING` of the query statement.
When specifying the window size, which is the `INTERVAL` statement, use the largest TSMA window that is divisible by the window size of the query. In window queries, the window size of the `INTERVAL`, `OFFSET`, and `SLIDING` all affect the TSMA window size that can be used. Divisible window TSMA refers to a TSMA window size that is divisible by the `INTERVAL`, `OFFSET`, and `SLIDING` of the query statement. Therefore, when using window queries frequently, consider the window size, as well as the offset and sliding size when creating TSMA objects.
Example 1. If TSMA with window size of `5m` and `10m` is created, and the query is `INTERVAL(30m)`, the TSMA with window size of `10m` will be used. If the query is `INTERVAL(30m, 10m) SLIDING(5m)`, only the TSMA with window size of `5m` can be used for the query.
When there is a `WHERE` condition with a primary key time column, if the start and end times are not aligned with the window, the boundary window will be calculated from other aligned smaller window TSMAs. If there are no aligned smaller window TSMAs, the calculation will be done directly from the original data.
Example 2. If two TSMA are created with window sizes of `5m` and `10m`: `tsma1` and `tsma2`, and the query is `INTERVAL(10m)` with a `WHERE` condition of `ts >= '2024-01-01 10:05:00.000' and ts < '2024-01-01 11:00:00.000'`, then the data in the time interval `['10:05:00.000', '10:10:00.000')` will be calculated by `tsma1`, and the remaining data will be calculated by `tsma2`.
Example 3. If there is no TSMA that aligns with the window, then this portion of data will be calculated using the original data. For example, consider the two TSMA created earlier and the query is `INTERVAL(10m)` with a `WHERE` condition of `ts >= '2024-01-01 10:05:00.000' and ts < '2024-01-01 11:04:00.000'`. In this case, the data in the time interval `['10:05:00.000', '10:10:00.000')` will be calculated by `tsma1`, the data in the time interval `['10:10:00.000', '11:00:00.000')` will be calculated by `tsma2`, and the remaining data `['11:00:00.000', '11:04:00.000')` will be calculated using the original data.
Note: In the examples above, the right side of the `WHERE` condition is an open interval, while in SQL, the right side of `BETWEEN` is a closed interval. When the right side of the `WHERE` condition uses a closed interval, the rightmost data will always be calculated using the original data. Even if the right side time aligns with the TSMA window, as in Example 2 above, if the right side of the `WHERE` condition is a closed interval, then the data in the time interval `['10:05:00.000', '10:10:00.000')` will be calculated by `tsma1`, and the data in the time interval `['10:10:00.000', '11:00:00.000')` will be calculated by `tsma2`. The data at the moment `'11:00:00.000'` will be calculated using the original data.
### Limitations of Query
When the parameter `querySmaOptimize` is enabled and there is no `skip_tsma()` hint, the following query scenarios cannot use TSMA:
@ -130,7 +122,7 @@ After creating a TSMA, there are certain restrictions on operations that can be
- You must delete all TSMAs on the table before you can delete the table itself.
- All tag columns of the original table cannot be deleted, nor can the tag column names or sub-table tag values be modified. You must first delete the TSMA before you can delete the tag column.
- If some columns are being used by the TSMA, these columns cannot be deleted. You must first delete the TSMA. However, adding new columns to the table is not affected.
- If some columns are being used by the TSMA, these columns cannot be deleted. You must first delete the TSMA. However, adding new columns to the table is not affected. However, new columns added are not included in any TSMA, so if you want to calculate the new columns, you need to create new TSMA for them.
## Show TSMA

View File

@ -10,7 +10,7 @@ description: 窗口预聚集使用说明
```sql
-- 创建基于超级表或普通表的tsma
CREATE TSMA tsma_name ON [dbname].table_name FUNCTION (func_name(func_param) [, ...] ) INTERVAL(time_duration);
CREATE TSMA tsma_name ON [dbname.]table_name FUNCTION (func_name(func_param) [, ...] ) INTERVAL(time_duration);
-- 创建基于小窗口tsma的大窗口tsma
CREATE RECURSIVE TSMA tsma_name ON [db_name.]tsma_name1 INTERVAL(time_duration);
@ -26,9 +26,9 @@ TSMA只能基于超级表和普通表创建, 不能基于子表创建.
函数列表中只能指定支持的聚集函数(见下文), 并且函数参数必须为1个, 即使当前函数支持多个参数, 函数参数内必须为普通列名, 不能为标签列. 函数列表中完全相同的函数和列会被去重, 如同时创建两个avg(c1), 则只会计算一个输出. TSMA 计算时将会把所有`函数中间结果`都输出到另一张超级表中, 输出超级表还包含了原始表的所有tag列. 函数列表中函数个数最多支持创建表最大列个数(包括tag列)减去 TSMA 计算附加的四列, 分别为`_wstart`, `_wend`, `_wduration`, 以及一个新增tag列 `tbname`, 再减去原始表的tag列数. 若列个数超出限制, 会报`Too many columns`错误.
由于TSMA输出为一张超级表, 因此输出表的行长度受最大行长度限制, 不同函数的`中间结果`大小各异, 一般都大于原始数据大小, 若输出表的行长度大于最大行长度限制, 将会报`Row length exceeds max length`错误.
由于TSMA输出为一张超级表, 因此输出表的行长度受最大行长度限制, 不同函数的`中间结果`大小各异, 一般都大于原始数据大小, 若输出表的行长度大于最大行长度限制, 将会报`Row length exceeds max length`错误. 此时需要减少函数个数或者将常用的函数进行分组拆分到多个TSMA中.
窗口大小的限制为[1ms ~ 1h].
窗口大小的限制为[1ms ~ 1h]. INTERVAL 的单位与查询中INTERVAL字句相同, 如 a (毫秒), b (纳秒), h (小时), m (分钟), s (秒), u (微妙).
TSMA为库内对象, 但名字全局唯一. 集群内一共可创建TSMA个数受参数`maxTsmaNum`限制, 参数默认值为8, 范围: [0-12]. 注意, 由于TSMA后台计算使用流计算, 因此每创建一条TSMA, 将会创建一条流, 因此能够创建的TSMA条数也受当前已经存在的流条数和最大可创建流条数限制.
@ -64,23 +64,16 @@ TSMA的计算结果为与原始表相同库下的一张超级表, 此表用户
客户端配置参数:`maxTsmaCalcDelay`,单位 s用于控制用户可以接受的 TSMA 计算延迟,若 TSMA 的计算进度与最新时间差距在此范围内, 则该 TSMA 将会被使用, 若超出该范围, 则不使用, 默认值: 60010 分钟), 最小值: 60010 分钟), 最大值: 864001 天).
### TSMA的选择
### 查询时使用TSMA
已在 TSMA 中定义的 agg 函数在大部分查询场景下都可直接使用, 若存在多个可用的 TSMA 优先使用大窗口的 TSMA 未闭合窗口通过查询小窗口TSMA或者原始数据计算。 同时也有某些场景不能使用 TSMA(见下文)。 不可用时整个查询将使用原始数据进行计算。
未指定窗口大小的查询语句默认优先使用包含所有查询聚合函数的最大窗口 TSMA 进行数据的计算。 如`SELECT COUNT(*) FROM stable GROUP BY tbname`将会使用包含count(ts)且窗口最大的TSMA。
未指定窗口大小的查询语句默认优先使用包含所有查询聚合函数的最大窗口 TSMA 进行数据的计算。 如`SELECT COUNT(*) FROM stable GROUP BY tbname`将会使用包含count(ts)且窗口最大的TSMA。因此若使用聚合查询频率高时, 应当尽可能创建大窗口的TSMA.
指定窗口大小时即 `INTERVAL` 语句,使用最大的可整除窗口 TSMA。 窗口查询中, `INTERVAL` 的窗口大小, `OFFSET` 以及 `SLIDING` 都影响能使用的 TSMA 窗口大小, 可整 除窗口 TSMA 即 TSMA 窗口大小可被查询语句的 `INTERVAL OFFSET SLIDING` 整除的窗口。
指定窗口大小时即 `INTERVAL` 语句,使用最大的可整除窗口 TSMA。 窗口查询中, `INTERVAL` 的窗口大小, `OFFSET` 以及 `SLIDING` 都影响能使用的 TSMA 窗口大小, 可整 除窗口 TSMA 即 TSMA 窗口大小可被查询语句的 `INTERVAL OFFSET SLIDING` 整除的窗口。因此若使用窗口查询较多时, 需要考虑经常查询的窗口大小, 以及 offset, sliding大小来创建TSMA.
例 1. 如 创建 TSMA 窗口大小 `5m` 一条, `10m` 一条, 查询时 `INTERVAL(30m)` 那么优先使用 `10m` 的 TSMA 若查询为 `INTERVAL(30m, 10m) SLIDING(5m)` 那么仅可使用 `5m` 的 TSMA 查询。
在带主键时间列的 `WHERE` 条件时,若开始和结束时间与窗口不对齐, 那么边界窗口会从其他对齐的小窗口 TSMA 中计算, 若不存在对齐的小窗口 TSMA 那么直接从原始数据进行计算。
例 2. 如创建了 `5m``10m` 两条 TSMA: `tsma1`, `tsma2` 查询时 `INTERVAL(10m)` `WHERE` 条件为 `ts >= '2024-01-01 10:05:00.000' and ts < '2024-01-01 11:00:00.000'` 那么时间区间: `['10:05:00.000', '10:10:00.000')`的数据由 `tsma1` 计算, 剩下部分由 `tsma2` 计算。
例 3. 若不存在能对齐窗口的 TSMA 那么这部分数据由原始数据计算, 如还是上述的两条 tsma 查询为: `INTERVAL(10m)`, `WHERE` 条件为: `ts >= '2024-01-01 10:05:00.000' and ts < '2024-01-01 11:04:00.000'` 那么时间区间: `['10:05:00.000', '10:10:00.000')`的数据由 `tsma1` 计算, 时间区间: `['10:10:00.000', '11:00:00.000')``tsma2` 计算。 剩下的`['11:00:00.000', '11:04:00.000')`由原始数据进行计算。
注意: 上述例子中 `WHERE` 条件的右侧都为开区间, 而 SQL 中的`BETWEEN`的右侧为闭区间, 当 `WHERE` 的`右侧(仅右侧)`使用闭区间时, 最右侧的数据一定会使用原始数据进行计算。 即使右侧时间与 TSMA 窗口对齐, 如上述例 2`WHERE` 条件右侧为闭区间, 那么时间区间: `['10:05:00.000', '10:10:00.000')`的数据由 `tsma1` 计算, 时间区间`['10:10:00.000', '11:00:00.000')`的数据由 `tsma2` 计算。 时刻`'11:00:00.000'`的数据将由原始数据计算。
### 查询限制
@ -129,7 +122,7 @@ SELECT COUNT(*), MIN(c1) FROM stable where c2 > 0; ---- can't use tsma1 or tsam2
- 必须删除该表上的所有TSMA才能删除该表.
- 原始表所有tag列不能删除, 也不能修改tag列名或子表的tag值, 必须先删除TSMA, 才能删除tag列.
- 若某些列被TSMA使用了, 则这些列不能被删除, 必须先删除TSMA. 添加列不受影响.
- 若某些列被TSMA使用了, 则这些列不能被删除, 必须先删除TSMA. 添加列不受影响, 但是新添加的列不在任何TSMA中, 因此若要计算新增列, 需要新创建其他的TSMA.
## 查看TSMA
```sql