doc: refine document about compression

2024-07-17 12:25:25 +08:00 · 2024-07-17 12:25:25 +08:00 · acaa85bc74
parent 5e37442be5
commit acaa85bc74
9 changed files with 136 additions and 80 deletions
--- a/docs/en/12-taos-sql/03-table.md
+++ b/docs/en/12-taos-sql/03-table.md
@ -49,6 +49,7 @@ table_option: {
 7. Escape character "\`" can be used to avoid the conflict between table names and reserved keywords, above rules will be bypassed when using escape character on table names, but the upper limit for the name length is still valid. The table names specified using escape character are case sensitive.
   For example \`aBc\` and \`abc\` are different table names but `abc` and `aBc` are same table names because they are both converted to `abc` internally.
   Only ASCII visible characters can be used with escape character.
+8. For the details of using `ENCODE` and `COMPRESS`, please refer to [Encode and Compress for Column](./compress).

 **Parameter description**

--- a/docs/en/12-taos-sql/04-stable.md
+++ b/docs/en/12-taos-sql/04-stable.md
@ -13,17 +13,29 @@ create_definition:
    col_name column_definition
 
 column_definition:
-    type_name
+    type_name [comment 'string_value'] [PRIMARY KEY] [ENCODE 'encode_type'] [COMPRESS 'compress_type'] [LEVEL 'level_type']
+
+table_options:
+    table_option ...
+
+table_option: {
+    COMMENT 'string_value'
+  | SMA(col_name [, col_name] ...)
+  | TTL value
+}
+
 ```

 **More explanations**
- Each supertable can have a maximum of 4096 columns, including tags. The minimum number of columns is 3: a timestamp column used as the key, one tag column, and one data column.
- The TAGS keyword defines the tag columns for the supertable. The following restrictions apply to tag columns:
+1. Each supertable can have a maximum of 4096 columns, including tags. The minimum number of columns is 3: a timestamp column used as the key, one tag column, and one data column.
+2. Since version 3.3.0.0, besides the timestamp, you can specify another column as primary key using `PRIMARY KEY` keyword, the column specified using `primary key` must be type of integer or varchar.
+2. The TAGS keyword defines the tag columns for the supertable. The following restrictions apply to tag columns:
    - A tag column can use the TIMESTAMP data type, but the values in the column must be fixed numbers. Timestamps including formulae, such as "now + 10s", cannot be stored in a tag column.
    - The name of a tag column cannot be the same as the name of any other column.
    - The name of a tag column cannot be a reserved keyword.
    - Each supertable must contain between 1 and 128 tags. The total length of the TAGS keyword cannot exceed 16 KB.
- For more information about table parameters, see Create a Table.
+3. Regarding how to use `ENCODE` and `COMPRESS`, please refer to [Encode and Compress for Column](./compress).
+3. For more information about table parameters, see Create a Table.

 ## View a Supertable

--- a/docs/en/21-tdinternal/08-compress.md
+++ b/docs/en/21-tdinternal/08-compress.md
--- a/docs/en/14-reference/12-config/index.md
+++ b/docs/en/14-reference/12-config/index.md
@ -729,6 +729,57 @@ The charset that takes effect is UTF-8.
 | Value Range | -1: none message is compressed; 0: all messages are compressed; N (N>0): messages exceeding N bytes are compressed |
 | Default     | -1                                                                                                                 |

+### fPrecision
+
+| Attribute     | Description                           |
+| -------- | -------------------------------- |
+| Application | Server Only                         |
+| Meaning     | Compression precision for float data type    |
+| Value Range | 0.1 ~ 0.00000001                 |
+| Default   | 0.00000001                       |
+| Note | The floating value below this setting will be cut off |
+
+### dPrecision
+
+| Attribute     | Description                            |
+| -------- | -------------------------------- |
+| Applicable | Server Only                        |
+| Meaning     | Compression precision for double data type |
+| Value Range | 0.1 ~ 0.0000000000000001         |
+| Default   | 0.0000000000000001               |
+| Note | The floating value below this setting will be cut off |
+
+### lossyColumn
+
+| Attribute     | Description                             |
+| -------- | -------------------------------- |
+| Applicable | Server Only                         |
+| Meaning     | Enable TSZ lossy compression for float and/or double |
+| Value Range |  float, double        |
+| Default   | none: disable TSZ lossy compression                |
+
+**补充说明**
+1. It's only available since 3.2.0.0 version, and can't downgrade to previous version once upgrading to 3.2.0.0 and enabling this parameter
+2. TSZ compression algorithm compresses data based on data prediction technique, so it's more suitable for data with specific pattern
+3. TSZ compression algorithm may take longer time but it has better compression ratio, so it's suitable when you have enough CPU resources and are more sensitive to disk occupation
+4. Example: enable TSZ for both float and double
+```shell
+lossyColumns     float|double
+```
+5. After configuring, taosd service needs to restarted. After restarting, if you see the following output in taosd logfile, it means the function has been enabled
+```sql
+   02/22 10:49:27.607990 00002933 UTL  lossyColumns     float|double
+```
+
+### ifAdtFse 
+
+| Attribute     | Description                         |
+| -------- | -------------------------------- |
+| Applicable | Server Only                         |
+| Meaning     | Replace HUFFMAN with FSE in TSZ, FSE is faster when compressing but slower when uncompressing |
+| Value Range |  0: Use HUFFMAN, 1: Use FSE         |
+| Default   | 0: Use HUFFMAN               |
+

 ## Other Parameters

--- a/docs/zh/12-taos-sql/03-table.md
+++ b/docs/zh/12-taos-sql/03-table.md
@ -49,6 +49,7 @@ table_option: {
 6. 使用数据类型 BINARY/NCHAR/GEOMETRY，需指定其最长的字节数，如 BINARY(20)，表示 20 字节。
 7. 为了兼容支持更多形式的表名，TDengine 引入新的转义符 "\`"，可以让表名与关键词不冲突，同时不受限于上述表名称合法性约束检查。但是同样具有长度限制要求。使用转义字符以后，不再对转义字符中的内容进行大小写统一，
   例如：\`aBc\` 和 \`abc\` 是不同的表名，但是 abc 和 aBc 是相同的表名。
+8. 关于 `ENCODE` 和 `COMPRESS` 的使用，请参考[按列压缩](./compress)

 **参数说明**

--- a/docs/zh/12-taos-sql/04-stable.md
+++ b/docs/zh/12-taos-sql/04-stable.md
@ -13,17 +13,28 @@ create_definition:
    col_name column_definition
 
 column_definition:
-    type_name
+    type_name [comment 'string_value'] [PRIMARY KEY] [ENCODE 'encode_type'] [COMPRESS 'compress_type'] [LEVEL 'level_type']
+
+table_options:
+    table_option ...
+
+table_option: {
+    COMMENT 'string_value'
+  | SMA(col_name [, col_name] ...)
+  | TTL value
+}
 ```

 **使用说明**
- 超级表中列的最大个数为 4096，需要注意，这里的 4096 是包含 TAG 列在内的，最小个数为 3，包含一个时间戳主键、一个 TAG 列和一个数据列。
- TAGS语法指定超级表的标签列，标签列需要遵循以下约定：
+1. 超级表中列的最大个数为 4096，需要注意，这里的 4096 是包含 TAG 列在内的，最小个数为 3，包含一个时间戳主键、一个 TAG 列和一个数据列。
+2. 除时间戳主键列之外，还可以通过 PRIMARY KEY 关键字指定第二列为额外的主键列。被指定为主键列的第二列必须为整型或字符串类型（varchar）
+3. TAGS语法指定超级表的标签列，标签列需要遵循以下约定：
    - TAGS 中的 TIMESTAMP 列写入数据时需要提供给定值，而暂不支持四则运算，例如 NOW + 10s 这类表达式。
    - TAGS 列名不能与其他列名相同。
    - TAGS 列名不能为预留关键字。
    - TAGS 最多允许 128 个，至少 1 个，总长度不超过 16 KB。
- 关于表参数的详细说明，参见 CREATE TABLE 中的介绍。
+4. 关于 `ENCODE` 和 `COMPRESS` 的使用，请参考 [按列压缩](./compress)
+5. 关于 table_option 中的参数说明，请参考 [建表 SQL 说明](./table)

 ## 查看超级表

--- a/docs/zh/21-tdinternal/08-compress.md
+++ b/docs/zh/21-tdinternal/08-compress.md
@ -1,5 +1,6 @@
 ---
 title: 可配置压缩算法
+sidebar_label: 可配置压缩
 description: 可配置压缩算法
 ---

--- a/docs/zh/14-reference/12-config/index.md
+++ b/docs/zh/14-reference/12-config/index.md
@ -784,6 +784,57 @@ charset 的有效值是 UTF-8。
 | 取值范围 | -1: 所有消息都不压缩; 0: 所有消息都压缩; N (N>0): 只有大于 N 个字节的消息才压缩 |
 | 缺省值   | -1                                                                              |

+### fPrecision
+FLOAT 类型压缩精度控制：
+
+| 属性     | 说明                             |
+| -------- | -------------------------------- |
+| 适用范围 | 服务器端                         |
+| 含义     | 设置 float 类型浮点数压缩精度    |
+| 取值范围 | 0.1 ~ 0.00000001                 |
+| 缺省值   | 0.00000001                       |
+| 补充说明 | 小于此值的浮点数尾数部分将被截取 |
+
+### dPrecision
+| 属性     | 说明                             |
+| -------- | -------------------------------- |
+| 适用范围 | 服务器端                         |
+| 含义     | 设置 double 类型浮点数压缩精度   |
+| 取值范围 | 0.1 ~ 0.0000000000000001         |
+| 缺省值   | 0.0000000000000001               |
+| 补充说明 | 小于此值的浮点数尾数部分将被截取 |
+
+### lossyColumn
+
+| 属性     | 说明                             |
+| -------- | -------------------------------- |
+| 适用范围 | 服务器端                         |
+| 含义     | 对 float 和/或 double 类型启用 TSZ 有损压缩 |
+| 取值范围 |  float, double        |
+| 缺省值   | none：表示关闭无损压缩                |
+
+**补充说明**
+1. 在 3.2.0.0 及以后版本生效，启用该参数后不能回退到升级前的版本
+2. TSZ 压缩算法是通过数据预测技术完成的压缩，所以更适合有规律变化的数据
+3. TSZ 压缩时间会更长一些，如果您的服务器 CPU 空闲多，存储空间小的情况下适合选用
+4. 示例：对 float 和 double 类型都启用有损压缩
+```shell
+lossyColumns     float|double
+```
+5. 配置需重启服务生效，重启如果在 taosd 日志中看到以下内容，表明配置已生效：
+```sql
+   02/22 10:49:27.607990 00002933 UTL  lossyColumns     float|double
+```
+
+### ifAdtFse 
+
+| 属性     | 说明                             |
+| -------- | -------------------------------- |
+| 适用范围 | 服务器端                         |
+| 含义     | 在启用 TSZ 有损压缩时，使用 FSE 算法替换 HUFFMAN 算法， FSE 算法压缩速度更快，但解压稍慢，追求压缩速度可选用此算法  |
+| 取值范围 |  0：关闭  1：打开         |
+| 缺省值   | 0：关闭                |
+
 ## 3.0 中有效的配置参数列表

 | #   |        **参数**        | **适用于 2.X ** | **适用于 3.0 **                 | 3.0 版本的当前行为 |
@ -922,8 +973,5 @@ charset 的有效值是 UTF-8。
 | 73  |      probeSeconds       | 是              | 否              | 3.0 行为未知                                         |
 | 74  |    probeKillSeconds     | 是              | 否              | 3.0 行为未知                                         |
 | 75  |      probeInterval      | 是              | 否              | 3.0 行为未知                                         |
-| 76  |      lossyColumns       | 是              | 否              | 3.0 行为未知                                         |
-| 77  |       fPrecision        | 是              | 否              | 3.0 行为未知                                         |
-| 78  |       dPrecision        | 是              | 否              | 3.0 行为未知                                         |
 | 79  |        maxRange         | 是              | 否              | 3.0 行为未知                                         |
 | 80  |          range          | 是              | 否              | 3.0 行为未知                                         |
--- a/docs/zh/21-tdinternal/07-tsz.md
+++ b/docs/zh/21-tdinternal/07-tsz.md
@ -1,69 +0,0 @@
---
-title: TSZ 压缩算法
-description: TDengine 对浮点数进行高效压缩的算法
---
-
-TSZ 压缩算法是 TDengine 为浮点数据类型提供的可选压缩算法，可以实现浮点数有损至无损全状态压缩，相比默认压缩算法， TSZ 压缩算法压缩率更高，即使切至无损状态，压缩率也会比默认压缩高一倍。
-
-## 适合场景
-
- TSZ 压缩算法是通过数据预测技术完成的压缩，所以更适合有规律变化的数据
- TSZ 压缩时间会更长一些，如果您的服务器 CPU 空闲多，存储空间小的情况下适合选用
-
-## 使用步骤
- TDengine 支持版本为 3.2.0.0 或以上
- 开启选项
-  在 taos.cfg 配置中增加以下内容，即可开启 TSZ 压缩算法，功能打开后，会替换默认算法。
-  以下表示字段类型是 float 及 double 类型都使用此压缩算法，也可以单独只配置一个
-
-```sql
-   lossyColumns     float|double
-```
-
- 配置需重启服务生效
- Taosd 日志输出以下内容，表明功能已生效：
-
-```sql
-   02/22 10:49:27.607990 00002933 UTL  lossyColumns     float|double
-```
-
-## 配置参数
-
-### fPrecision
-FLOAT 类型精度控制：
-
-| 属性     | 说明                             |
-| -------- | -------------------------------- |
-| 适用范围 | 服务器端                         |
-| 含义     | 设置 float 类型浮点数压缩精度    |
-| 取值范围 | 0.1 ~ 0.00000001                 |
-| 缺省值   | 0.00000001                       |
-| 补充说明 | 小于此值的浮点数尾数部分将被截取 |
-
-
-
-### dPrecision
-DOUBLE 类型精度控制：
-
-| 属性     | 说明                             |
-| -------- | -------------------------------- |
-| 适用范围 | 服务器端                         |
-| 含义     | 设置 double 类型浮点数压缩精度   |
-| 取值范围 | 0.1 ~ 0.0000000000000001         |
-| 缺省值   | 0.0000000000000001               |
-| 补充说明 | 小于此值的浮点数尾数部分将被截取 |
-
-
-### ifAdtFse 
-TSZ 压缩中可选择的算法 FSE，默认为 HUFFMAN：
-
-| 属性     | 说明                             |
-| -------- | -------------------------------- |
-| 适用范围 | 服务器端                         |
-| 含义     | 使用 FSE 算法替换 HUFFMAN 算法， FSE 算法压缩速度更快，但解压稍慢，追求压缩速度可选用此算法  |
-| 取值范围 |  0：关闭  1：打开         |
-| 缺省值   | 0：关闭                |
-
-
-## 注意事项
- 打开 TSZ 后生成的存储数据格式，回退至 3.2.0.0 之前的版本，数据将不能被识别