Merge pull request #25530 from taosdata/doc/configurableColCompress

add compress doc
This commit is contained in:
wade zhang 2024-04-28 08:05:26 +08:00 committed by GitHub
commit f9a601bbe3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 183 additions and 0 deletions

View File

@ -0,0 +1,92 @@
---
title: Configurable Column Compression
description: Configurable column storage compression method
---
# Configurable Storage Compression
Since TDengine 3.3.0.0, more advanced compression feature is introduced, you can specify compression or not, the compression method and compression level for each column.
## Compression Terminology Definition
### Compression Level Definition
- Level 1 Compression: Encoding the data, which is essentially a form of compression
- Level 2 Compression: Compressing data blocks.
### Compression Algorithm Level
In this article, it specifically refers to the level within the secondary compression algorithm, such as zstd, at least 8 levels can be selected, each level has different performance, essentially it is a tradeoff between compression ratio, compression speed, and decompression speed. To avoid the difficulty of choice, it is simplified and defined as the following three levels:
- high: The highest compression ratio, the worst compression speed and decompression speed.
- low: The best compression speed and decompression speed, the lowest compression ratio.
- medium: Balancing compression ratio, compression speed, and decompression speed.
### Compression Algorithm List
- Encoding algorithm list (Level 1 compression): simple8b, bit-packing, delta-i, delta-d, disabled
- Compression algorithm list (Level 2 compression): lz4, zlib, zstd, tsz, xz, disabled
- Default compression algorithm list and applicable range for each data type
| Data Type | Optional Encoding Algorithm | Default Encoding Algorithm | Optional Compression Algorithm|Default Compression Algorithm| Default Compression Level|
| :-----------:|:----------:|:-------:|:-------:|:----------:|:----:|
tinyint/untinyint/smallint/usmallint/int/uint | simple8b| simple8b | lz4/zlib/zstd/xz| lz4 | medium|
| bigint/ubigint/timestamp | simple8b/delta-i | delta-i |lz4/zlib/zstd/xz | lz4| medium|
|float/double | delta-d|delta-d |lz4/zlib/zstd/xz/tsz|tsz| medium|
|binary/nchar| disabled| disabled|lz4/zlib/zstd/xz| lz4| medium|
|bool| bit-packing| bit-packing| lz4/zlib/zstd/xz| lz4| medium|
Note: For floating point types, if configured as tsz, its precision is determined by the global configuration of taosd. If configured as tsz, but the lossy compression flag is not configured, lz4 is used for compression by default.
## SQL
### Create Table with Compression
```sql
CREATE [dbname.]tabname (colName colType [ENCODE 'encode_type'] [COMPRESS 'compress_type' [LEVEL 'level'], [, other cerate_definition]...])
```
**Parameter Description**
- tabname: Super table or ordinary table name
- encode_type: Level 1 compression, specific parameters see the above list
- compress_type: Level 2 compression, specific parameters see the above list
- level: Specifically refers to the level of secondary compression, the default value is medium, supports abbreviation as 'h'/'l'/'m'
**Function Description**
- Specify the compression method for the column when creating a table
### Change Compression Method
```sql
ALTER TABLE [db_name.]tabName MODIFY COLUMN colName [ENCODE 'ecode_type'] [COMPRESS 'compress_type'] [LEVEL "high"]
```
**Parameter Description**
- tabName: Table name, can be a super table or an ordinary table
- colName: The column to change the compression algorithm, can only be a normal column
**Function Description**
- Change the compression method of the column
### View Compression Dethod
```sql
DESCRIBE [dbname.]tabName
```
**Function Description**
- Display basic information of the column, including type and compression method
## Compatibility
- Fully compatible with existing data
- Can't be rolled back once you upgrade to 3.3.0.0

View File

@ -0,0 +1,91 @@
---
title: 可配置压缩算法
description: 可配置压缩算法
---
# 可配置存储压缩
从 TDengine 3.3.0.0 版本开始TDengine 提供了更高级的压缩功能,用户可以在建表时针对每一列配置是否进行压缩、以及使用的压缩算法和压缩级别。
## 压缩术语定义
### 压缩等级
- 一级压缩:对数据进行编码,本质也是一种压缩
- 二级压缩:在编码的基础上对数据块进行压缩
### 压缩级别
在本文中特指二级压缩算法内部的级别比如zstd至少8个level可选每个level 下都有不同表现,本质是压缩率、压缩速度、解压速度之间的 tradeoff为了避免选择困难特简化定义为如下三种级别
- high压缩率最高压缩速度和解压速度相对最差。
- low压缩速度和解压速度最好压缩率相对最低。
- medium兼顾压缩率、压缩速度和解压速度。
### 压缩算法列表
- 编码算法列表(一级压缩):simple8b, bit-packing,delta-i, delta-d, disabled
- 压缩算法列表(二级压缩): lz4、zlib、zstd、tsz、xz、disabled
- 各个数据类型的默认压缩算法列表和适用范围
| 数据类型 | 可选编码算法 | 编码算法默认值 | 可选压缩算法|可选压缩算法| 压缩等级默认值|
| :-----------:|:----------:|:-------:|:-------:|:----------:|:----:|
tinyint/untinyint/smallint/usmallint/int/uint | simple8b| simple8b | lz4/zlib/zstd/xz| lz4 | medium|
| bigint/ubigint/timestamp | simple8b/delta-i | delta-i |lz4/zlib/zstd/xz | lz4| medium|
|float/double | delta-d|delta-d |lz4/zlib/zstd/xz/tsz|tsz| medium|
|binary/nchar| disabled| disabled|lz4/zlib/zstd/xz| lz4| medium|
|bool| bit-packing| bit-packing| lz4/zlib/zstd/xz| lz4| medium|
注意: 针对浮点类型如果配置为tsz, 其精度由taosd的全局配置决定如果配置为tsz, 但是没有配置有损压缩标志, 则使用lz4进行压缩
## SQL 语法
### 建表时指定压缩
```sql
CREATE [dbname.]tabname (colName colType [ENCODE 'encode_type'] [COMPRESS 'compress_type' [LEVEL 'level'], [, other cerate_definition]...])
```
**参数说明**
- tabname超级表或者普通表名称
- encode_type: 一级压缩,具体参数见上面列表
- compress_type: 二级压缩,具体参数见上面列表
- level: 特指二级压缩的级别默认值为medium, 支持简写为 'h'/'l'/'m'
**功能说明**
- 创建表的时候指定列的压缩方式
### 更改列的压缩方式
```sql
ALTER TABLE [db_name.]tabName MODIFY COLUMN colName [ENCODE 'ecode_type'] [COMPRESS 'compress_type'] [LEVEL "high"]
```
**参数说明**
- tabName: 表名,可以为超级表、普通表
- colName: 待更改压缩算法的列, 只能为普通列
**功能说明**
- 更改列的压缩方式
### 查看列的压缩方式
```sql
DESCRIBE [dbname.]tabName
```
**功能说明**
- 显示列的基本信息,包括类型、压缩方式
## 兼容性
- 完全兼容已经存在的数据
- 从更低版本升级到 3.3.0.0 后不能回退