docs: [TS-4897] Add en docs for virtual table. (#30404)

This commit is contained in:
Jing Sima 2025-03-24 21:31:05 +08:00 committed by GitHub
parent 2502086b71
commit fb71cc8be7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 463 additions and 5 deletions

View File

@ -6,6 +6,8 @@ slug: /basic-features/data-model
import Image from '@theme/IdealImage';
import dataModel from '../assets/data-model-01.png';
import origintable from '../assets/data-model-origin-table.png';
import origintable2 from '../assets/data-model-origin-table-2.png';
To clearly explain the concepts of time-series data and facilitate the writing of example programs, the TDengine documentation uses smart meters as an example. These example smart meters can collect three metrics: current, voltage, and phase. In addition, each smart meter also has two static attributes: location and group ID. The data collected by these smart meters is shown in the table below.
@ -79,6 +81,19 @@ To better understand the relationship between metrics, tags, supertables, and su
<figcaption>Figure 1. The TDengine data model</figcaption>
</figure>
### Virtual Tables
The design of "one table per data collection point" and "supertables" addresses most challenges in time-series data management and analysis for industrial and IoT scenarios. However, in real-world scenarios, a single device often has multiple sensors with varying collection frequencies. For example, a wind turbine may have electrical parameters, environmental parameters, and mechanical parameters, each collected by different sensors at different intervals. This makes it difficult to describe a device with a single table, often requiring multiple tables. When analyzing data across multiple sensors, multi-level join queries become necessary, which can lead to usability and performance issues. From a user perspective, "one table per device" is more intuitive. However, directly implementing this model would result in excessive NULL values at each timestamp due to varying collection frequencies, reducing storage and query efficiency.
To resolve this, TDengine introduces **Virtual Tables** (VTables). A virtual table is a logical entity that does not store physical data but enables analytical computations by dynamically combining columns from multiple source tables (subtables or regular tables). Like physical tables, virtual tables can be categorized into **virtual supertables**, **virtual subtables**, and **virtual regular tables**. A virtual supertable can represent a complete dataset for a device or group of devices, while each virtual subtable can flexibly reference columns from different sources. This allows users to define custom data views tailored to specific analytical needs, achieving a "personalized schema per user" effect. Virtual tables cannot be written to or deleted from but are queried like physical tables. The key distinction is that virtual table data is dynamically generated during queries—only columns referenced in a query are merged into the virtual table. Thus, the same virtual table may present entirely different datasets across different queries.
**Key Features of Virtual Supertables:**
1. **Column Selection & Merging**: Users can select specific columns from multiple source tables and combine them into a unified view.
2. **Timestamp-Based Alignment**: Data is aligned by timestamp. If multiple tables have data at the same timestamp, their column values are merged into a single row. Missing values are filled with NULL.
3. **Dynamic Updates**: Virtual tables automatically reflect changes in source tables, ensuring real-time data without physical storage.
By introducing virtual tables, TDengine simplifies the management of complex device data. Regardless of how individual collection points are modeled (single-column or multi-column) or distributed across databases/tables, users can freely define data sources through virtual supertables. This enables cross-collection-point aggregation and analysis, making "one table per device" a practical reality.
### Database
A database in TDengine is used to manage a collection of tables. TDengine allows a running instance to contain multiple databases, and each database can be configured with different storage strategies. Since different types of data collection points usually have different data characteristics, such as data collection frequency, data retention period, number of replicas, data block size, etc., it is recommended to create supertables with different data characteristics in different databases.
@ -226,3 +241,174 @@ TDengine supports flexible data model designs, including multi-column and single
Although TDengine recommends using the multi-column model because it generally offers better writing and storage efficiency, the single-column model might be more suitable in certain specific scenarios. For example, if the types of quantities collected at a data collection point frequently change, using a multi-column model would require frequent modifications to the supertable's structural definition, increasing the complexity of the application. In such cases, using a single-column model can simplify the design and management of the application, as it allows independent management and expansion of each physical quantity's supertable.
Overall, TDengine offers flexible data model options, allowing users to choose the most suitable model based on actual needs and scenarios to optimize performance and manage complexity.
### Creating Virtual Tables
Whether using single-column or multi-column models, TDengine enables cross-table operations through virtual tables. Using smart meters as an example, here we introduce two typical use cases for virtual tables:
1. Single-Source Multi-Dimensional Time-Series Aggregation
2. Cross-Source Metric Comparative Analysis
---
#### 1. Single-Source Multi-Dimensional Time-Series Aggregation
In this scenario, "single-source" refers to multiple **single-column time-series tables** from the **same data collection point**. While these tables are physically split due to business requirements or constraints, they maintain logical consistency through device tags and timestamps. Virtual tables restore "vertically" split data into a complete "horizontal" view of the collection point.
For example, Suppose three supertables are created for current, voltage, and phase measurements using a single-column model. Virtual tables can aggregate these three measurements into one unified view.
The SQL statement for creating a supertable in the single-column model is as follows:
```sql
CREATE STABLE current_stb (
ts timestamp,
current float
) TAGS (
device_id varchar(64),
location varchar(64),
group_id int
);
CREATE STABLE voltage_stb (
ts timestamp,
voltage int
) TAGS (
device_id varchar(64),
location varchar(64),
group_id int
);
CREATE STABLE phase_stb (
ts timestamp,
phase float
) TAGS (
device_id varchar(64),
location varchar(64),
group_id int
);
```
Assume there are four devices: d1001, d1002, d1003, and d1004. To create subtables for their current, voltage, and phase measurements, use the following SQL statements:
```sql
create table current_d1001 using current_stb(deviceid, location, group_id) tags("d1001", "California.SanFrancisco", 2);
create table current_d1002 using current_stb(deviceid, location, group_id) tags("d1002", "California.SanFrancisco", 3);
create table current_d1003 using current_stb(deviceid, location, group_id) tags("d1003", "California.LosAngeles", 3);
create table current_d1004 using current_stb(deviceid, location, group_id) tags("d1004", "California.LosAngeles", 2);
create table voltage_d1001 using voltage_stb(deviceid, location, group_id) tags("d1001", "California.SanFrancisco", 2);
create table voltage_d1002 using voltage_stb(deviceid, location, group_id) tags("d1002", "California.SanFrancisco", 3);
create table voltage_d1003 using voltage_stb(deviceid, location, group_id) tags("d1003", "California.LosAngeles", 3);
create table voltage_d1004 using voltage_stb(deviceid, location, group_id) tags("d1004", "California.LosAngeles", 2);
create table phase_d1001 using phase_stb(deviceid, location, group_id) tags("d1001", "California.SanFrancisco", 2);
create table phase_d1002 using phase_stb(deviceid, location, group_id) tags("d1002", "California.SanFrancisco", 3);
create table phase_d1003 using phase_stb(deviceid, location, group_id) tags("d1003", "California.LosAngeles", 3);
create table phase_d1004 using phase_stb(deviceid, location, group_id) tags("d1004", "California.LosAngeles", 2);
```
A virtual supertable can be used to aggregate these three types of measurements into a single table. The SQL statement to create the virtual supertable is as follows:
```sql
CREATE STABLE meters_v (
ts timestamp,
current float,
voltage int,
phase float
) TAGS (
location varchar(64),
group_id int
) VIRTUAL 1;
```
For the four devices d1001, d1002, d1003, and d1004, create virtual subtables with the following SQL statements:
```sql
CREATE VTABLE d1001_v (
current from current_d1001.current,
voltage from voltage_d1001.voltage,
phase from phase_d1001.phase
)
USING meters_v
TAGS (
"California.SanFrancisco",
2
);
CREATE VTABLE d1002_v (
current from current_d1002.current,
voltage from voltage_d1002.voltage,
phase from phase_d1002.phase
)
USING meters_v
TAGS (
"California.SanFrancisco",
3
);
CREATE VTABLE d1003_v (
current from current_d1003.current,
voltage from voltage_d1003.voltage,
phase from phase_d1003.phase
)
USING meters_v
TAGS (
"California.LosAngeles",
3
);
CREATE VTABLE d1004_v (
current from current_d1004.current,
voltage from voltage_d1004.voltage,
phase from phase_d1004.phase
)
USING meters_v
TAGS (
"California.LosAngeles",
2
);
```
Taking device d1001 as an example, assume that the current, voltage, and phase data of device d1001 are as follows:
<img src={origintable} width="500" alt="data-model-origin-table" />
| Timestamp | Current | Voltage | Phase |
|-------------------|---------|---------|-------|
| 1538548685000 | 10.3 | 219 | 0.31 |
| 1538548695000 | 12.6 | 218 | 0.33 |
| 1538548696800 | 12.3 | 221 | 0.31 |
| 1538548697100 | 12.1 | 220 | NULL |
| 1538548697200 | NULL | NULL | 0.32 |
| 1538548697700 | 11.8 | NULL | NULL |
| 1538548697800 | NULL | 222 | 0.33 |
---
#### 2. Cross-Source Metric Comparative Analysis
In this scenario, "cross-source" refers to data from **different data collection points**. Virtual tables align and merge semantically comparable measurements from multiple devices for comparative analysis.
For example, Compare current measurements across devices `d1001`, `d1002`, `d1003`, and `d1004`. The SQL statement to create the virtual table is as follows:
```sql
CREATE VTABLE current_v (
ts TIMESTAMP,
d1001_current FLOAT FROM current_d1001.current,
d1002_current FLOAT FROM current_d1002.current,
d1003_current FLOAT FROM current_d1003.current,
d1004_current FLOAT FROM current_d1004.current
);
```
Assume that the current data of devices d1001, d1002, d1003, and d1004 are as follows:
<img src={origintable2} width="500" alt="data-model-origin-table-2" />
The virtual table `current_v` aligns current data by timestamp:
| Timestamp | d1001_current | d1002_current | d1003_current | d1004_current |
|-------------------|---------------|---------------|---------------|---------------|
| 1538548685000 | 10.3 | 11.7 | 11.2 | 12.4 |
| 1538548695000 | 12.6 | 11.9 | 10.8 | 11.3 |
| 1538548696800 | 12.3 | 12.4 | 12.3 | 10.1 |
| 1538548697100 | 12.1 | NULL | 11.1 | NULL |
| 1538548697200 | NULL | 12.2 | NULL | 11.7 |
| 1538548697700 | 11.8 | 11.4 | NULL | NULL |
| 1538548697800 | NULL | NULL | 12.1 | 12.6 |

View File

@ -0,0 +1,272 @@
---
sidebar_label: Virtual Tables
title: Virtual Tables
description: Various management operations for virtual tables
---
import origintable from './assets/virtual-table-origin-table.png';
import queryres from './assets/virtual-table-query-res.png';
import partres from './assets/virtual-table-query-res-part.png';
## Create Virtual Table
The `CREATE VTABLE` statement is used to create virtual basic tables and virtual subtables using virtual supertables as templates.
### Create Virtual Supertables
Refer to the `VIRTUAL` parameter in [Create Supertable](./04-stable.md#create-supertable).
### Create Virtual Basic Table
```sql
CREATE VTABLE [IF NOT EXISTS] [db_name].vtb_name
ts_col_name timestamp,
(create_definition[ ,create_definition] ...)
create_definition:
vtb_col_name column_definition
column_definition:
type_name [FROM [db_name.]table_name.col_name]
```
### Create Virtual Subtable
```sql
CREATE VTABLE [IF NOT EXISTS] [db_name].vtb_name
(create_definition[ ,create_definition] ...)
USING [db_name.]stb_name
[(tag_name [, tag_name] ...)]
TAGS (tag_value [, tag_value] ...)
create_definition:
[stb_col_name FROM] [db_name.]table_name.col_name
tag_value:
const_value
```
**Usage Notes**
1. Naming rules for virtual tables/columns follow [Name Rules](./19-limit.md#naming-rules).
2. Maximum table name length: 192 characters.
3. The first column must be TIMESTAMP and is automatically set as primary key.
4. Row length cannot exceed 64KB (Note: VARCHAR/NCHAR/GEOMETRY columns consume 2 extra bytes each).
5. Specify maximum length for VARCHAR/NCHAR/GEOMETRY types (e.g., VARCHAR(20)).
6. Use `FROM` to specify column data sources. Cross-database sources are supported via `db_name`.
7. The timestamp column (ts) values of virtual table are merged results from all involved tables' timestamp primary keys during queries.
8. Virtual supertables only support creating virtual subtables, virtual subtables can only use virtual supertables as template.
9. Ensure virtual tables' column/tag data types match their source columns/tags.
10. Virtual table names must be unique within a database and cannot conflict with table names, and it is recommended that view names do not duplicate virtual table names (not enforced). When a view and a virtual table have the same name, operations such as writing, querying, granting, and revoking permissions prioritize the virtual table with the same name. .
11. When creating virtual subtables/basic tables, `FROM` columns must originate from basic tables/subtables (not supertables, views, or other virtual tables).
## Query Virtual Tables
Virtual tables use the same query syntax as regular tables, but their dataset may vary between queries based on data alignment rules.
### Data Alignment Rules
1. Align data from multiple source tables by timestamp.
2. Combine columns with same timestamp into one row; missing values fill with NULL.
3. Virtual table timestamps are the union of all involved columns' origin tables' timestamps. Therefore, the number of rows in the result set may vary when different queries select different columns.
4. Users can combine any columns from multiple tables; unselected columns are excluded.
**Example**
Given tables t1, t2, t3 with data:
<img src={origintable} width="500" alt="Original Table Structure and Data" />
Create a virtual table v1:
```sql
CREATE VTABLE v1 (
ts timestamp,
c1 int FROM t1.value,
c2 int FROM t2.value,
c3 int FROM t3.value1,
c4 int FROM t3.value2);
```
Querying all columns:
```sql
SELECT * FROM v1;
```
Result:
<img src={queryres} width="200" alt="Full Query Result" />
Partial column query:
```sql
SELECT c1, c2 FROM v1;
```
Result:
<img src={partres} width="200" alt="Partial Query Result" />
Since the original tables t1 and t2 (corresponding to columns c1 and c2) lack the timestamp 0:00:03, this timestamp will not appear in the final result.
**Limitations**
1. Querying virtual supertables does not support subtables from different databases.
## Modify Virtual Basic Tables
```sql
ALTER VTABLE [db_name.]vtb_name alter_table_clause
alter_table_clause: {
ADD COLUMN vtb_col_name vtb_column_type [FROM table_name.col_name]
| DROP COLUMN vtb_col_name
| ALTER COLUMN vtb_col_name SET {table_name.col_name | NULL }
| MODIFY COLUMN col_name column_type
| RENAME COLUMN old_col_name new_col_name
}
```
### Add Column
```sql
ALTER VTABLE vtb_name ADD COLUMN vtb_col_name vtb_col_type [FROM [db_name].table_name.col_name]
```
### Drop Column
```sql
ALTER VTABLE vtb_name DROP COLUMN vtb_col_name
```
### Modify Column Width
```sql
ALTER VTABLE vtb_name MODIFY COLUMN vtb_col_name data_type(length);
```
### Rename Column
```sql
ALTER VTABLE vtb_name RENAME COLUMN old_col_name new_col_name
```
### Change Column Source
```sql
ALTER VTABLE vtb_name ALTER COLUMN vtb_col_name SET {[db_name.]table_name.col_name | NULL}
```
## Modify Virtual Subtables
```sql
ALTER VTABLE [db_name.]vtb_name alter_table_clause
alter_table_clause: {
ALTER COLUMN vtb_col_name SET table_name.col_name
| SET TAG tag_name = new_tag_value
}
```
### Modify Subtable Tag Value
```sql
ALTER VTABLE tb_name SET TAG tag_name1=new_tag_value1, tag_name2=new_tag_value2 ...;
```
### Change Column Source
```sql
ALTER VTABLE vtb_name ALTER COLUMN vtb_col_name SET {[db_name.]table_name.col_name | NULL}
```
## Drop Virtual Tables
```sql
DROP VTABLE [IF EXISTS] [dbname].vtb_name;
```
## View Virtual Table Information
### List Virtual Tables
```sql
SHOW [NORMAL | CHILD] [db_name.]VTABLES [LIKE 'pattern'];
```
### Show Creation Statement
```sql
SHOW CREATE VTABLE [db_name.]vtable_name;
```
### Describe Structure
```sql
DESCRIBE [db_name.]vtb_name;
```
### Query All Virtual Tables' Information
```sql
SELECT ... FROM information_schema.ins_tables WHERE type = 'VIRTUAL_NORMAL_TABLE' OR type = 'VIRTUAL_CHILD_TABLE';
```
## Write to Virtual Tables
Writing or deleting data in virtual tables is **not supported**. Virtual tables are logical views computed from source tables.
## Virtual Tables vs. Views
| Property | Virtual Table | View |
|-----------------------|-----------------------------------|-------------------------------|
| **Definition** | Dynamic structure combining multiple tables by timestamp. | Saved SQL query definition. |
| **Data Source** | Multiple tables with timestamp alignment. | Single/multiple table query results. |
| **Storage** | No physical storage; dynamic generation. | No storage; query logic only. |
| **Timestamp Handling**| Aligns timestamps across tables. | Follows query logic. |
| **Update Mechanism** | Real-time reflection of source changes. | Depends on query execution. |
| **Special Features** | Supports NULL filling and interpolation (prev/next/linear). | No built-in interpolation. |
| **Use Case** | Time series alignment, cross-table analysis. | Simplify complex queries, access control. |
| **Performance** | Potentially higher complexity. | Similar to underlying queries. |
Mutual conversion between virtual tables and views is not supported. For example, you cannot create a view based on a virtual table or create a virtual table from a view.
## Permissions
Virtual table permissions are categorized into READ and WRITE. Query operations require READ permission, while operations to delete or modify the virtual table itself require WRITE permission.
### Syntax
#### Grant
```sql
GRANT privileges ON [db_name.]vtable_name TO user_name
privileges: { ALL | READ | WRITE }
```
#### Revoke
```sql
REVOKE privileges ON [db_name.]vtable_name FROM user_name
privileges: { ALL | READ | WRITE }
```
### Permission Rules
1. The creator of a virtual table and the root user have all permissions by default.
2. Users can grant or revoke read/write permissions for specific virtual tables (including virtual supertables and virtual regular tables) via `dbname.vtbname`. Direct permission operations on virtual subtables are not supported.
3. Virtual subtables and virtual supertables do not support tag-based authorization (table-level authorization). Virtual subtables inherit permissions from their virtual supertables.
4. Granting and revoking permissions for other users must be performed through `GRANT` and `REVOKE` statements, and only the root user can execute these operations.
5. The detailed permission control rules are summarized below:
| No. | Operation | Permission Requirements |
|-----|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | CREATE VTABLE | The user has **WRITE** permission on the database to which the virtual table belongs, and <br />the user has **READ** permission on the source tables corresponding to the virtual table's data sources. |
| 2 | DROP/ALTER VTABLE | The user has **WRITE** permission on the virtual table. If specifying a column's data source, the user must also have **READ** permission on the source table corresponding to that column. |
| 3 | SHOW VTABLES | None |
| 4 | SHOW CREATE VTABLE | None |
| 5 | DESCRIBE VTABLE | None |
| 6 | Query System Tables | None |
| 7 | SELECT FROM VTABLE | The user has **READ** permission on the virtual table. |
| 8 | GRANT/REVOKE | Only the **root user** has permission. |
## Use Cases
| SQL Query | SQL Write | STMT Query | STMT Write | Subscribe | Stream Compute |
|----------|-----------|------------|------------|-----------|----------------|
| Supported | Not Supported | Not Supported | Not Supported | Not Supported | Supported |

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

View File

@ -357,7 +357,7 @@ TAGS (
以设备 d1001 为例,假设 d1001 设备的电流、电压、相位数据如下:
<img src={origintable} width="70%" alt="data-model-origin-table" />
<img src={origintable} width="500" alt="data-model-origin-table" />
虚拟表 d1001_v 中的数据如下:
@ -390,7 +390,7 @@ CREATE VTABLE current_v (
假设 d1001, d1002, d1003, d1004 四个设备的电流数据如下:
<img src={origintable2} width="70%" alt="data-model-origin-table-2" />
<img src={origintable2} width="500" alt="data-model-origin-table-2" />
虚拟表 current_v 中的数据如下:

View File

@ -73,7 +73,7 @@ CREATE VTABLE [IF NOT EXISTS] [db_name].vtb_name
假设有表 t1、t2、t3 结构和数据如下:
<img src={origintable} width="100%" alt="origintable" />
<img src={origintable} width="500" alt="origintable" />
并且有虚拟普通表 v1 ,创建方式如下:
@ -94,7 +94,7 @@ select * from v1;
结果如下:
<img src={queryres} width="100%" alt="queryres" />
<img src={queryres} width="200" alt="queryres" />
如果没有选择全部列,只是选择了部分列,查询的结果只会包含选择的列的原始表的时间戳,例如执行如下查询:
@ -104,7 +104,7 @@ select c1, c2 from v1;
得到的结果如下图所示:
<img src={partres} width="100%" alt="partres" />
<img src={partres} width="200" alt="partres" />
因为 c1、c2 列对应的原始表 t1、t2 中没有 0:00:03 这个时间戳,所以最后的结果也不会包含这个时间戳。