Add configurable storage compression documentation

This commit is contained in:
Yihao Deng 2024-04-26 09:59:28 +00:00
parent 7cd2628776
commit 35db3ac29d
2 changed files with 179 additions and 0 deletions

View File

@ -0,0 +1,90 @@
---
title: Configurable Storage Compression
description: Configurable column storage compression method
---
# Configurable Storage Compression
## Compression Terminology Definition
### Compression Level Definition
- Level 1 Compression: Encoding the data, which is essentially a form of compression
- Level 2 Compression: Compressing data blocks.
### Compression Algorithm Level
In this article, it specifically refers to the level within the secondary compression algorithm, such as zstd, at least 8 levels can be selected, each level has different performance, essentially it is a tradeoff between compression ratio, compression speed, and decompression speed. To avoid the difficulty of choice, it is simplified and defined as the following three levels:
- high: The highest compression ratio, the worst compression speed and decompression speed.
- low: The best compression speed and decompression speed, the lowest compression ratio.
- medium: Balancing compression ratio, compression speed, and decompression speed.
### Compression Algorithm List
- Encoding algorithm list (Level 1 compression): simple8b, bit-packing, delta-i, delta-d, disabled
- Compression algorithm list (Level 2 compression): lz4, zlib, zstd, tsz, xz, disabled
- Default compression algorithm list and applicable range for each data type
| Data Type | Optional Encoding Algorithm | Default Encoding Algorithm | Optional Compression Algorithm|Default Compression Algorithm| Default Compression Level|
| :-----------:|:----------:|:-------:|:-------:|:----------:|:----:|
tinyint/untinyint/smallint/usmallint/int/uint | simple8b| simple8b | lz4/zlib/zstd/xz| lz4 | medium|
| bigint/ubigint/timestamp | simple8b/delta-i | delta-i |lz4/zlib/zstd/xz | lz4| medium|
|float/double | delta-d|delta-d |lz4/zlib/zstd/xz/tsz|tsz| medium|
|binary/nchar| disabled| disabled|lz4/zlib/zstd/xz| lz4| medium|
|bool| bit-packing| bit-packing| lz4/zlib/zstd/xz| lz4| medium|
Note: For floating point types, if configured as tsz, its precision is determined by the global configuration of taosd. If configured as tsz, but the lossy compression flag is not configured, lz4 is used for compression by default.
## SQL 语法
### Specify the compression method when creating a table
```
CREATE [dbname.]tabname (colName colType [ENCODE 'encode_type'] [COMPRESS 'compress_type' [LEVEL 'level'], [, other cerate_definition]...])
```
**Parameter Description**
- tabname: Super table or ordinary table name
- encode_type: Level 1 compression, specific parameters see the above list
- compress_type: Level 2 compression, specific parameters see the above list
- level: Specifically refers to the level of secondary compression, the default value is medium, supports abbreviation as 'h'/'l'/'m'
**Function Description**
- Specify the compression method for the column when creating a table
### Change the compression method of the column
```
ALTER TABLE [db_name.]tabName MODIFY COLUMN colName [ENCODE 'ecode_type'] [COMPRESS 'compress_type'] [LEVEL "high"]
```
**Parameter Description**
- tabName: Table name, can be a super table or an ordinary table
- colName: The column to change the compression algorithm, can only be a normal column
**Function Description**
- Change the compression method of the column
### View the compression method of the column
```
DESCRIBE [dbname.]tabName
```
**Function Description**
- Display basic information of the column, including type and compression method
## Compatibility
- Fully compatible with existing data
- Does not support rollback

View File

@ -0,0 +1,89 @@
---
title: 压缩算法
description: TDengine 对数据进压缩
---
# 可配置存储压缩
## 压缩术语定义
### 压缩等级定义
- 一级压缩:对数据进行编码,本质也是一种压缩
- 二级压缩:对数据块进行压缩。
### 压缩算法级别
在本文中特指二级压缩算法内部的级别比如zstd至少8个level可选每个level 下都有不同表现,本质是压缩率、压缩速度、解压速度之间的 tradeoff为了避免选择困难特简化定义为如下三种级别
- high压缩率最高压缩速度和解压速度相对最差。
- low压缩速度和解压速度最好压缩率相对最低。
- medium兼顾压缩率、压缩速度和解压速度。
### 压缩算法列表
- 编码算法列表(一级压缩):simple8b, bit-packing,delta-i, delta-d, disabled
- 压缩算法列表(二级压缩): lz4、zlib、zstd、tsz、xz、disabled
- 各个数据类型的默认压缩算法列表和适用范围
| 数据类型 | 可选编码算法 | 编码算法默认值 | 可选压缩算法|可选压缩算法| 压缩等级默认值|
| :-----------:|:----------:|:-------:|:-------:|:----------:|:----:|
tinyint/untinyint/smallint/usmallint/int/uint | simple8b| simple8b | lz4/zlib/zstd/xz| lz4 | medium|
| bigint/ubigint/timestamp | simple8b/delta-i | delta-i |lz4/zlib/zstd/xz | lz4| medium|
|float/double | delta-d|delta-d |lz4/zlib/zstd/xz/tsz|tsz| medium|
|binary/nchar| disabled| disabled|lz4/zlib/zstd/xz| lz4| medium|
|bool| bit-packing| bit-packing| lz4/zlib/zstd/xz| lz4| medium|
注意: 针对浮点类型如果配置为tsz, 其精度由taosd的全局配置决定如果配置为tsz, 但是没有配置有损压缩标志, 则使用lz4进行压缩
## SQL 语法
### 创建表的的时候指定压缩方式
```
CREATE [dbname.]tabname (colName colType [ENCODE 'encode_type'] [COMPRESS 'compress_type' [LEVEL 'level'], [, other cerate_definition]...])
```
**参数说明**
- tabname超级表或者普通表名称
- encode_type: 一级压缩,具体参数见上面列表
- compress_type: 二级压缩,具体参数见上面列表
- level: 特指二级压缩的级别默认值为medium, 支持简写为 'h'/'l'/'m'
**功能说明**
- 创建表的时候指定列的压缩方式
### 更改列的压缩方式
```
ALTER TABLE [db_name.]tabName MODIFY COLUMN colName [ENCODE 'ecode_type'] [COMPRESS 'compress_type'] [LEVEL "high"]
```
**参数说明**
- tabName: 表名,可以为超级表、普通表
- colName: 待更改压缩算法的列, 只能为普通列
**功能说明**
- 更改列的压缩方式
### 查看列的压缩方式
```
DESCRIBE [dbname.]tabName
```
**功能说明**
- 显示列的基本信息,包括类型、压缩方式
## 兼容性
- 完全兼容已经存在的数据
- 不支持回退