From c75a787ba61d599c8a916b41deead2c609d3a91f Mon Sep 17 00:00:00 2001 From: dapan1121 Date: Wed, 24 Apr 2024 11:20:55 +0800 Subject: [PATCH] docs: add join description page --- docs/en/12-taos-sql/30-join.md | 240 ++++++++++++++++++--------------- docs/zh/12-taos-sql/30-join.md | 78 ++++++----- 2 files changed, 174 insertions(+), 144 deletions(-) diff --git a/docs/en/12-taos-sql/30-join.md b/docs/en/12-taos-sql/30-join.md index 91a400bf2a..532bd5b1be 100755 --- a/docs/en/12-taos-sql/30-join.md +++ b/docs/en/12-taos-sql/30-join.md @@ -4,19 +4,20 @@ title: JOIN description: JOIN Description --- + ## Join Concept -### Driving table +### Driving Table -The table used for driving JOIN queries which is the left table in the Left Join series and the right table in the Right Join series. +The table used for driving Join queries which is the left table in the Left Join series and the right table in the Right Join series. ### Join Conditions -Join conditions refer to the conditions specified for JOIN operation. All JOIN queries supported by TDengine require specifying join conditions. Join conditions usually only appear in `ON` (except for Inner Join and Window Join). For Inner Join, conditions that appear in `WHERE` can also be regarded as join conditions. For Window Join join conditions are specified in `WINDOW_OFFSET`. +Join conditions refer to the conditions specified for join operation. All join queries supported by TDengine require specifying join conditions. Join conditions usually only appear in `ON` (except for Inner Join and Window Join). For Inner Join, conditions that appear in `WHERE` can also be regarded as join conditions. For Window Join join conditions are specified in `WINDOW_OFFSET` clause. Except for ASOF Join, all join types supported by TDengine must explicitly specify join conditions. Since ASOF Join has implicit join conditions defined by default, it is not necessary to explicitly specify the join conditions (if the default conditions meet the requirements). -Except for ASOF/Window Join, the join condition can include not only the primary join condition(refer below), but also any number of other join conditions. The primary join condition must have an `AND` relationship with the other join conditions, while there is no such restriction between the other join conditions. The other join conditions can include any logical operation combination of primary key columns, TAG, normal columns, constants, and their scalar functions or operations. +Except for ASOF/Window Join, the join condition can include not only the primary join condition(refer below), but also any number of other join conditions. The primary join condition must have an `AND` relationship with the other join conditions, while there is no such restriction between the other join conditions. The other join conditions can include any logical operation combination of primary key columns, Tag columns, normal columns, constants, and their scalar functions or operations. Taking smart meters as an example, the following SQL statements all contain valid join conditions: @@ -30,9 +31,9 @@ SELECT a.* FROM meters a LEFT ASOF JOIN meters b ON timetruncate(a.ts, 1s) < tim ### Primary Join Condition -As a time series database, all join queries of TDengine revolve around the primary timestamp column, so all join queries except ASOF/Window Join are required to contain equivalent join condition of the primary key column. The equivalent join condition of the primary key column that first appear in the join conditions in order will be used as the primary join condition. The primary join condition of ASOF Join can contain non-equivalent join condition, for Window Join the primary join condition is specified by `WINDOW_OFFSET`. +As a time series database, all join queries of TDengine revolve around the primary timestamp column, so all join queries except ASOF/Window Join are required to contain equivalent join condition of the primary key column. The equivalent join condition of the primary key column that first appear in the join conditions in order will be used as the primary join condition. The primary join condition of ASOF Join can contain non-equivalent join condition, for Window Join the primary join condition is specified by `WINDOW_OFFSET` clause. -Except for Window Join, TDengine supports performing timetruncate function operation in the primary join condition, e.g. `ON timetruncate (a.ts, 1s) = timetruncate (b.ts, 1s)`. Other functions and scalar operations are not currently supported. +Except for Window Join, TDengine supports performing `timetruncate` function operation in the primary join condition, e.g. `ON timetruncate(a.ts, 1s) = timetruncate(b.ts, 1s)`. Other functions and scalar operations to primary key column are not currently supported in the primary join condition. ### Grouping Conditions @@ -41,15 +42,26 @@ ASOF/Window Join supports grouping the input data of join queries, and then perf ### Primary Key Timeline -TDengine, as a time series database, requires that each table must have a primary key timestamp column, which will perform many time-related operations as the primary key timeline of the table. It is also necessary to clarify which column will be regarded as the primary key timeline for subsequent time-related operations in the results of subqueries or Join operations. In subqueries, the ordered first occurrence of the primary key column (or its operation) or the pseudo-column equivalent to the primary key column (_wstart/_wend) in the query results will be regarded as the primary key timeline of the output table. The selection of the primary key timeline in the Join output results follows the following rules: +TDengine, as a time series database, requires that each table must have a primary key timestamp column, which will perform many time-related operations as the primary key timeline of the table. It is also necessary to clarify which column will be regarded as the primary key timeline for subsequent time-related operations in the results of subqueries or join operations. In subqueries, the ordered first occurrence of the primary key column (or its operation) or the pseudo-column equivalent to the primary key column (`_wstart`/`_wend`) in the query results will be regarded as the primary key timeline of the output table. The selection of the primary key timeline in the join output results follows the following rules: -- The primary key column of the driving table (subquery) in the Left/Right Join series will be used as the primary key timeline for subsequent queries. In addition, in the Window Join window, because the left and right tables are ordered at the same time, the primary key column of any table can be used as the primary key timeline in the window, and the primary key column of current table is preferentially selected as the primary key timeline. +- The primary key column of the driving table (subquery) in the Left/Right Join series will be used as the primary key timeline for subsequent queries. In addition, in each Window Join window, because the left and right tables are ordered at the same time, the primary key column of any table can be used as the primary key timeline in the window, and the primary key column of current table is preferentially selected as the primary key timeline. - The primary key column of any table in Inner Join can be treated as the primary key timeline. When there are similar grouping conditions (equivalent conditions of TAG columns and `AND` relationship with the primary join condition), there will be no available primary key timeline. -- Full Join will not result in any primary key timeline because it cannot generate any valid primary key time series, so no timeline-related operations cannot be performed in Full Join. +- Full Join will not result in any primary key timeline because it cannot generate any valid primary key time series, so no timeline-related operations can be performed in or after a Full Join. +## Syntax Conventions +Because we will introduce the Left/Right Join series simultaneously through sharing below, the introductions of Left/Right Outer, Semi, Anti-Semi, ASOF, and Window series Joins will all use a similar "left/right" approach to introduce Left/Right Join simultaneously. Here is a brief introduction to the meaning of this writing method. The words written before "/" are the words applied to Left Join, and the words written after "/" are the words applied to Right Join. + +For example: + +The phrase "left/right table" means "left table" for Left Join and "right table" for Right Join. + +Similarly, + +The phrase "right/left table" means "right table" for Left Join and "left table" for Right Join. + ## Join Function ### Inner Join @@ -70,13 +82,13 @@ Cartesian product set of left and right table row data that meets the join condi Inner Join are supported between super tables, normal tables, child tables, and subqueries. #### Notes -- For the first type syntax, the `INNER` keyword is optional. The primary join condition and other join conditions can be specified in `ON` and/or `WHERE`, and filters can also be specified in `WHERE`. At least one of `ON/WHERE` must be specified. +- For the first type syntax, the `INNER` keyword is optional. The primary join condition and other join conditions can be specified in `ON` and/or `WHERE`, and filters can also be specified in `WHERE`. At least one of `ON`/`WHERE` must be specified. - For the second type syntax, all primary join condition, other join conditions, and filters can be specified in `WHERE`. -- When performing Inner Join on the super table, the Tag column equivalent conditions with the `AND` relationship of the primary join condition will be used as a similar grouping condition, so the output result cannot remain ordered. +- When performing Inner Join on the super table, the Tag column equivalent conditions with the `AND` relationship of the primary join condition will be used as a similar grouping condition, so the output result cannot remain time serious ordered. #### Examples -The timestamp when the voltage is greater than 220V occurs simultaneously in Table d1001 and Table d1002 and their respective voltage values: +The timestamp when the voltage is greater than 220V occurs simultaneously in table d1001 and table d1002 and their respective voltage values: ```sql SELECT a.ts, a.voltage, b.voltage FROM d1001 a JOIN d1002 b ON a.ts = b.ts and a.voltage > 220 and b.voltage > 220 ``` @@ -84,199 +96,203 @@ SELECT a.ts, a.voltage, b.voltage FROM d1001 a JOIN d1002 b ON a.ts = b.ts and a ### Left/Right Outer Join -#### 含义 -左/右(外)连接 - 既包含左右表同时符合连接条件的数据集合,也包括左/右表中不符合连接条件的数据集合。 -#### 语法 +#### Meaning +It returns data sets that meet the join conditions for both left and right tables, as well as data sets that do not meet the join conditions in the left/right tables. + +#### Grammar ```sql SELECT ... FROM table_name1 LEFT|RIGHT [OUTER] JOIN table_name2 ON ... [WHERE ...] [...] ``` -#### 结果集 -Inner Join 的结果集 + 左/右表中不符合连接条件的行和右/左表的空数据(NULL)组成的行数据集合。 +#### Result set +The result set of Inner Join + rows in the left/right table that do not meet the join conditions combines with null data (`NULL`) in the right/left table. -#### 适用范围 -支持超级表、普通表、子表、子查询间 Left/Right Join。 +#### Scope +Left/Right Outer Join are supported between super tables, normal tables, child tables, and subqueries. -#### 说明 -- OUTER 关键字可选。 +#### Notes +- the `OUTER` keyword is optional. -#### 示例 +#### Examples -表 d1001 所有时刻的电压值以及和表 d1002 中同时出现电压大于 220V 的时刻及各自的电压值: +Timestamp and voltage values at all times in table d1001 and the timestamp when the voltage is greater than 220V occurs simultaneously in table d1001 and table d1002 and their respective voltage values: ```sql SELECT a.ts, a.voltage, b.voltage FROM d1001 a LEFT JOIN d1002 b ON a.ts = b.ts and a.voltage > 220 and b.voltage > 220 ``` ### Left/Right Semi Join -#### 含义 -左/右半连接 - 通常表达的是 IN/EXISTS 的含义,即对左/右表任意一条数据来说,只有当右/左表中存在任一符合连接条件的数据时才返回左/右表行数据。 +#### Meaning +It usually expresses the meaning of `IN`/`EXISTS`, which means that for any data in the left/right table, only when there is any row data in the right/left table that meets the join conditions, will the left/right table row data be returned. -#### 语法 +#### Grammar ```sql SELECT ... FROM table_name1 LEFT|RIGHT SEMI JOIN table_name2 ON ... [WHERE ...] [...] ``` -#### 结果集 -左/右表中符合连接条件的行和右/左表任一符合连接条件的行组成的行数据集合。 +#### Result set +The row data set composed of rows that meet the join conditions in the left/right table and any one row that meets the join conditions in the right/left table. -#### 适用范围 -支持超级表、普通表、子表、子查询间 Left/Right Semi Join。 +#### Scope +Left/Right Semi Join are supported between super tables, normal tables, child tables, and subqueries. -#### 示例 +#### Examples -表 d1001 中出现电压大于 220V 且存在其他电表同一时刻电压也大于 220V 的时间: +The timestamp when the voltage in table d1001 is greater than 220V and there are other meters with voltages greater than 220V at the same time: ```sql SELECT a.ts FROM d1001 a LEFT SEMI JOIN meters b ON a.ts = b.ts and a.voltage > 220 and b.voltage > 220 and b.tbname != 'd1001' ``` ### Left/Right Anti-Semi Join -#### 含义 -左/右反连接 - 同左/右半连接的逻辑正好相反,通常表达的是 NOT IN/NOT EXISTS 的含义,即对左/右表任意一条数据来说,只有当右/左表中不存在任何符合连接条件的数据时才返回左/右表行数据。 +#### Meaning +Opposite meaning to the Left/Right Semi Join. It usually expresses the meaning of `NOT IN`/`NOT EXISTS`, that is, for any row data in the left/right table, only will be returned when there is no row data that meets the join conditions in the right/left table. -#### 语法 +#### Grammar ```sql SELECT ... FROM table_name1 LEFT|RIGHT ANTI JOIN table_name2 ON ... [WHERE ...] [...] ``` -#### 结果集 -左表中不符合连接条件的行和右表的空数据(NULL)组成的行数据集合。 +#### Result set +A collection of rows in the left/right table that do not meet the join conditions and null data (`NULL`) in the right/left table. -#### 适用范围 -支持超级表、普通表、子表、子查询间 Left Anti-Semi Join。 +#### Scope +Left/Right Anti-Semi Join are supported between super tables, normal tables, child tables, and subqueries. -#### 示例 +#### Examples -表 d1001 中出现电压大于 220V 且不存在其他电表同一时刻电压也大于 220V 的时间: +The timestamp when the voltage in table d1001 is greater than 220V and there is not any other meters with voltages greater than 220V at the same time: ```sql SELECT a.ts FROM d1001 a LEFT ANTI JOIN meters b ON a.ts = b.ts and b.voltage > 220 and b.tbname != 'd1001' WHERE a.voltage > 220 ``` ### left/Right ASOF Join -#### 含义 -左/右不完全匹配连接 - 不同于其他传统 Join 的完全匹配模式,ASOF Join 允许以指定的匹配模式进行不完全匹配,即按照主键时间戳最接近的方式进行匹配。 +#### Meaning +Different from other traditional join's exact matching patterns, ASOF Join allows for incomplete matching in a specified matching pattern, that is, matching in the manner closest to the primary key timestamp. -#### 语法 +#### Grammar ```sql SELECT ... FROM table_name1 LEFT|RIGHT ASOF JOIN table_name2 [ON ...] [JLIMIT jlimit_num] [WHERE ...] [...] ``` -##### 结果集 -左/右表中每一行数据与右/左表中符合连接条件的按主键列排序后时间戳最接近的最多 jlimit_num 条数据或空数据(NULL)的笛卡尔积集合。 - -##### 适用范围 -支持超级表、普通表、子表间 Left/Right ASOF Join。 - -#### 说明 -- 只支持表间 ASOF Join,不支持子查询间 ASOF Join。 -- ON 子句中支持指定主键列或主键列的 timetruncate 函数运算(不支持其他标量运算及函数)后的单个匹配规则(主连接条件),支持的运算符及其含义如下: +##### Result set +The Cartesian product set of up to `jlimit_num` rows data or null data (`NULL`) closest to the timestamp of each row in the left/right table, ordered by primary key, that meets the join conditions in the right/left table. - | **运算符** | **Left ASOF 时含义** | +##### Scope +Left/Right ASOF Join are supported between super tables, normal tables, child tables. + +#### Notes +- Only supports ASOF Join between tables, not between subqueries. +- The `ON` clause supports a single matching rule (primary join condition) with the primary key column or the timetruncate function operation of the primary key column (other scalar operations and functions are not supported). The supported operators and their meanings are as follows: + + + | **Operator** | **Meaning for Left ASOF Join** | | :-------------: | ------------------------ | - | > | 匹配右表中主键时间戳小于左表主键时间戳且时间戳最接近的数据行 | - | >= | 匹配右表中主键时间戳小于等于左表主键时间戳且时间戳最接近的数据行 | - | = | 匹配右表中主键时间戳等于左表主键时间戳的行 | - | < | 匹配右表中主键时间戳大于左表主键时间戳且时间戳最接近的数据行 | - | <= | 匹配右表中主键时间戳大于等于左表主键时间戳且时间戳最接近的数据行 | + | > | Match rows in the right table whose primary key timestamp is less than and the most closed to the left table's primary key timestamp | + | >= | Match rows in the right table whose primary key timestamp is less than or equal to and the most closed to the left table's primary key timestamp | + | = | Match rows in the right table whose primary key timestamp is equal to the left table's primary key timestamp | + | < | Match rows in the right table whose the primary key timestamp is greater than and the most closed to the left table's primary key timestamp | + | <= | Match rows in the right table whose primary key timestamp is greater than or equal to and the most closed to the left table's primary key timestamp | - 对于 Right ASOF 来说,上述运算符含义正好相反。 + For Right ASOF Join, the above operators have the opposite meaning. -- 如果不含 ON 子句或 ON 子句中未指定主键列的匹配规则,则默认主键匹配规则运算符是 “>=”, 即(对 Left ASOF Join 来说)右表中主键时戳小于等于左表主键时戳的行数据。不支持多个主连接条件。 -- ON 子句中还可以指定除主键列外的 TAG、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,除此之外不支持其他类型的条件。 -- 所有 ON 条件间只支持 AND 运算。 -- JLIMIT 用于指定单行匹配结果的最大行数,可选,未指定时默认值为1,即左/右表每行数据最多从右/左表中获得一行匹配结果。JLIMIT 取值范围为 [0, 1024]。符合匹配条件的 jlimit_num 条数据不要求时间戳相同,当右/左表中不存在满足条件的 jlimit_num 条数据时,返回的结果行数可能小于 jlimit_num;当右/左表中存在符合条件的多于 jlimit_num 条数据时,如果时间戳相同将随机返回 jlimit_num 条数据。 +- If there is no `ON` clause or no primary join condition is specified in the `ON` clause, the default primary join condition operator will be “>=”, that is, (for Left ASOF Join) matching rows in the right table whose primary key timestamp is less than or equal to the left table's primary key timestamp. Multiple primary join conditions are not supported. +- In the `ON` clause, except for the primary key column, equivalent conditions between Tag columns and ordinary columns (which do not support scalar functions and operations) can be specified for grouping calculations. Other types of conditions are not supported. +- Only `AND` operation is supported between all `ON` conditions. +- `JLIMIT` is used to specify the maximum number of rows for a single row match result. It's optional. The default value is 1 when not specified, which means that each row of data in the left/right table can obtain at most one row of matching results from the right/left table. The value range of `JLIMIT` is [0,1024]. All the `jlimit_num` rows data that meet the join conditions do not require the same timestamp. When there are not enough `jlimit_num` rows data that meet the conditions in the right/left table, the number of returned result rows may be less than `jlimit_num`. When there are more than `jlimit_num` rows data that meet the conditions in the right/left table and all their timestamps are the same, random `jlimit_num` rows data will be returned. -#### 示例 +#### Examples -表 d1001 电压值大于 220V 且表 d1002 中同一时刻或稍早前最后时刻出现电压大于 220V 的时间及各自的电压值: +The moment that voltage in table d1001 is greater than 220V and at the same time or at the last moment the voltage in table d1002 is also greater than 220V and their respective voltage values: ```sql SELECT a.ts, a.voltage, a.ts, b.voltage FROM d1001 a LEFT ASOF JOIN d1002 b ON a.ts >= b.ts where a.voltage > 220 and b.voltage > 220 ``` ### Left/Right Window Join -#### 含义 -左/右窗口连接 - 根据左/右表中每一行的主键时间戳和窗口边界构造窗口并据此进行窗口连接,支持窗口内进行投影、标量和聚合操作。 +#### Meaning +Construct windows based on the primary key timestamp of each row in the left/right table and the window boundary, and then perform window join accordingly, supporting projection, scalar, and aggregation operations within the window. -#### 语法 +#### Grammar ```sql SELECT ... FROM table_name1 LEFT|RIGHT WINDOW JOIN table_name2 [ON ...] WINDOW_OFFSET(start_offset, end_offset) [JLIMIT jlimit_num] [WHERE ...] [...] ``` -#### 结果集 -左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 WINDOW_OFFSET 划分的窗口内的至多 jlimit_num 条数据或空数据(NULL)的笛卡尔积集合 或 -左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 WINDOW_OFFSET 划分的窗口内的至多 jlimit_num 条数据的聚合结果或空数据(NULL)组成的行数据集合。 +#### Result set +The Cartesian product of each row of data in the left/right table and null data (`NULL`) or up to `jlimit_num` rows of data in the constructed window(based on the left/right table primary key timestamp and `WINDOW_OFFSET`) in the right/left table. +Or +The Cartesian product of each row of data in the left/right table and null data (`NULL`) or the aggregation result of up to `jlimit_num` rows of data in the constructed window(based on the left/right table primary key timestamp and `WINDOW_OFFSET`) in the right/left table. -#### 适用范围 -支持超级表、普通表、子表间 Left/Right Window Join。 +#### Scope +Left/Right Window Join are supported between super tables, normal tables, child tables. -#### 说明 -- 只支持表间 Window Join,不支持子查询间 Window Join; -- ON 子句可选,只支持指定除主键列外的 TAG、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,所有条件间只支持 AND 运算; -- WINDOW_OFFSET 用于指定窗口的左右边界相对于左/右表主键时间戳的偏移量,支持自带时间单位的形式,例如:WINDOW_OFFSET(-1a, 1a),对于 Left Window Join 来说,表示每个窗口为 [左表主键时间戳 - 1毫秒,左表主键时间戳 + 1毫秒] ,左右边界均为闭区间。数字后面的时间单位可以是 b(纳秒)、u(微秒)、a(毫秒)、s(秒)、m(分)、h(小时)、d(天)、w(周),不支持自然月(n)、自然年(y),支持的最小时间单位为数据库精度,左右表所在数据库精度需保持一致。 -- JLIMIT 用于指定单个窗口内的最大匹配行数,可选,未指定时默认获取每个窗口内的所有匹配行。JLIMIT 取值范围为 [0, 1024],当右表中不存在满足条件的 jlimit_num 条数据时,返回的结果行数可能小于 jlimit_num;当右表中存在超过 jlimit_num 条满足条件的数据时,优先返回窗口内主键时间戳最小的 jlimit_num 条数据。 -- SQL 语句中不能含其他 GROUP BY/PARTITION BY/窗口查询; -- 支持在 WHERE 子句中进行标量过滤,支持在 HAVING 子句中针对每个窗口进行聚合函数过滤(不支持标量过滤),不支持 SLIMIT,不支持各种窗口伪列; +#### Notes +- Only supports Window Join between tables, not between subqueries. +- The `ON` clause is optional. Except for the primary key column, equivalent conditions between Tag columns and ordinary columns (which do not support scalar functions and operations) can be specified in `ON` clause for grouping calculations. Other types of conditions are not supported. +- Only `AND` operation is supported between all `ON` conditions. +- `WINDOW_OFFSET` is used to specify the offset of the left and right boundaries of the window relative to the timestamp of the left/right table's primary key. It supports the form of built-in time units. For example: `WINDOW_OFFSET (-1a, 1a)`, for Left Window Join, it means that each window boundary is [left table primary key timestamp - 1 millisecond, left table primary key timestamp + 1 millisecond], and both the left and right boundaries are closed intervals. The time unit after the number can be `b` (nanosecond), `u` (microsecond), `a` (millisecond), `s` (second), `m` (minute), `h` (hour), `d` (day), `w` (week). Natural months (`n`) and natural years (`y`) are not supported. The minimum time unit supported is database precision. The precision of the databases where the left and right tables are located should be the same. +- `JLIMIT` is used to specify the maximum number of matching rows in a single window. Optional. If not specified, all matching rows in each window are obtained by default. The value range of `JLIMIT` is [0,1024]. Less than `jlimit_num` rows of data will be returned when there are not enough `jlimit_num` rows of data in the right table that meet the condition. When there are more than `jlimit_num` rows of data in the right table that meet the condition, `jlimit_num` rows of data with the smallest primary key timestamp in the window will be returned. +- No `GROUP BY`/`PARTITION BY`/Window queries could be used together with Window Join in one single SQL statement. +- Supports scalar filtering in the `WHERE` clause, aggregation function filtering for each window in the `HAVING` clause (does not support scalar filtering), does not support `SLIMIT`, and does not support various window pseudo-columns. -#### 示例 +#### Examples -表 d1001 电压值大于 220V 时前后1秒的区间内表 d1002 的电压值: +The voltage value of table d1002 within 1 second before and after the moment that voltage value of table d1001 is greater than 220V: ```sql SELECT a.ts, a.voltage, b.voltage FROM d1001 a LEFT WINDOW JOIN d1002 b WINDOW_OFFSET(-1s, 1s) where a.voltage > 220 ``` -表 d1001 电压值大于 220V 且前后1秒的区间内表 d1002 的电压平均值也大于 220V 的时间及电压值: +The moment that the voltage value of table d1001 is greater than 220V and the average voltage value of table d1002 is also greater than 220V in the interval of 1 second before and after that: ```sql SELECT a.ts, a.voltage, avg(b.voltage) FROM d1001 a LEFT WINDOW JOIN d1002 b WINDOW_OFFSET(-1s, 1s) where a.voltage > 220 HAVING(avg(b.voltage) > 220) ``` ### Full Outer Join -#### 含义 -全(外)连接 - 既包含左右表同时符合连接条件的数据集合,也包括左右表中不符合连接条件的数据集合。 +#### Meaning +It includes data sets that meet the join conditions for both left and right tables, as well as data sets that do not meet the join conditions in the left and right tables. -#### 语法 +#### Grammar SELECT ... FROM table_name1 FULL [OUTER] JOIN table_name2 ON ... [WHERE ...] [...] -#### 结果集 -Inner Join 的结果集 + 左表中不符合连接条件的行加上右表的空数据组成的行数据集合 + 右表中不符合连接条件的行加上左表的空数据组成的行数据集合。 +#### Result set +The result set of Inner Join + rows data set composed of rows in the left table that do not meet the join conditions and null data(`NULL`) in the right table + rows data set composed of rows in the right table that do not meet the join conditions and null data(`NULL`) in the left table. -#### 适用范围 -支持超级表、普通表、子表、子查询间 Full Outer Join。 +#### Scope +Full Outer Join is supported between super tables, normal tables, child tables, and subqueries. -#### 说明 -- OUTER 关键字可选。 +#### Notes +- the `OUTER` keyword is optional. -#### 示例 +#### Examples -表 d1001 和表 d1002 中记录的所有时刻及电压值: +All timestamps and voltage values recorded in both tables d1001 and d1002: ```sql SELECT a.ts, a.voltage, b.ts, b.voltage FROM d1001 a FULL JOIN d1002 b on a.ts = b.ts ``` -## 约束和限制 +## Limitations -### 输入时间线限制 -- 目前所有 Join 都要求输入数据含有效的主键时间线,所有表查询都可以满足,子查询需要注意输出数据是否含有效的主键时间线。 +### Input timeline limits +- Currently, all types of join require input data to contain a valid primary key timeline, which can be satisfied by all table queries. Subqueries need to pay attention to whether the output data contains a valid primary key timeline. -### 连接条件限制 -- 除 ASOF 和 Window Join 之外,其他 Join 的连接条件中必须含主键列的主连接条件; 且 -- 主连接条件与其他连接条件间只支持 AND 运算; -- 作为主连接条件的主键列只支持 timetruncate 函数运算(不支持其他函数和标量运算),作为其他连接条件时无限制; +### Join conditions limits +- Except for ASOF and Window Join, the join conditions of other types of join must include the primary join condition; +- Only `AND` operation is supported between the primary join condition and other join conditions. +- The primary key column used in the primary join condition only supports `timetruncate` function operations (not other functions and scalar operations), and there are no restrictions when used as other join conditions. -### 分组条件限制 -- 只支持除主键列外的 TAG、普通列的等值条件; -- 不支持标量运算; -- 支持多个分组条件,条件间只支持 AND 运算; +### Grouping conditions limits +- Only support equivalent conditions for Tag and ordinary columns except for primary key columns. +- Does not support scalar operations. +- Supports multiple grouping conditions, and only supports `AND` operation between conditions. -### 查询结果顺序限制 -- 普通表、子表、子查询且无分组条件无排序的场景下,查询结果会按照驱动表的主键列顺序输出; -- 超级表查询、Full Join或有分组条件无排序的场景下,查询结果没有固定的输出顺序; -因此,在有排序需求且输出无固定顺序的场景下,需要进行排序操作。部分依赖时间线的函数可能会因为没有有效的时间线输出而无法执行。 +### Query result order limits +- In scenarios where there are normal tables, subtables, and subqueries without grouping conditions or sorting, the query results will be output in the order of the primary key columns of the driving table. +- In scenarios such as super table queries, Full Join, or with grouping conditions and without sorting, there is no fixed output order for query results. +Therefore, in scenarios where sorting is required and the output is not in a fixed order, sorting operations need to be performed. Some functions that rely on timelines may not be able to execute without soring due to the lack of valid timeline output. -### 嵌套 Join 与多表 Join 限制 -- 目前除 Inner Join 支持嵌套与多表 Join 外,其他类型的 JoiN 暂不支持嵌套与多表 Join。 \ No newline at end of file +### Nested join and multi-table join limits +- Currently, except for Inner Join which supports nesting and multi-table Join, other types of join do not support nesting and multi-table join. \ No newline at end of file diff --git a/docs/zh/12-taos-sql/30-join.md b/docs/zh/12-taos-sql/30-join.md index 7b060fa3ad..6184f4851b 100755 --- a/docs/zh/12-taos-sql/30-join.md +++ b/docs/zh/12-taos-sql/30-join.md @@ -12,11 +12,11 @@ description: 关联查询详细描述 ### 连接条件 -连接条件是指进行表关联所指定的条件,TDengine 支持的所有关联查询都需要指定连接条件,连接条件通常(Inner Join 和 Window Join 例外)只出现在 ON 之后。根据语义,Inner Join 中出现在 WHERE 之后的条件也可以视作连接条件,而 Window Join 是通过 WINDOW_OFFSET 来指定连接条件。 +连接条件是指进行表关联所指定的条件,TDengine 支持的所有关联查询都需要指定连接条件,连接条件通常(Inner Join 和 Window Join 例外)只出现在 `ON` 之后。根据语义,Inner Join 中出现在 `WHERE` 之后的条件也可以视作连接条件,而 Window Join 是通过 `WINDOW_OFFSET` 来指定连接条件。 除 ASOF Join 外,TDengine 支持的所有 Join 类型都必须显式指定连接条件,ASOF Join 因为默认定义有隐式的连接条件,所以(在默认条件可以满足需求的情况下)可以不必显式指定连接条件。 -除 ASOF/Window Join 外,连接条件中除了包含主连接条件外,还可以包含任意多条其他连接条件,主连接条件与其他连接条件间必须是 AND 关系,而其他连接条件之间则没有这个限制。其他连接条件中可以包含主键列、TAG 、普通列、常量及其标量函数或运算的任意逻辑运算组合。 +除 ASOF/Window Join 外,连接条件中除了包含主连接条件外,还可以包含任意多条其他连接条件,主连接条件与其他连接条件间必须是 `AND` 关系,而其他连接条件之间则没有这个限制。其他连接条件中可以包含主键列、Tag 、普通列、常量及其标量函数或运算的任意逻辑运算组合。 以智能电表为例,下面这几条 SQL 都包含合法的连接条件: @@ -29,21 +29,35 @@ SELECT a.* FROM meters a LEFT ASOF JOIN meters b ON timetruncate(a.ts, 1s) < tim ### 主连接条件 -作为一款时序数据库,TDengine 所有的关联查询都围绕主键时戳列进行,因此要求除 ASOF/Window Join 外的所有关联查询都必须含有主键列的等值连接条件,而按照顺序首次出现在连接条件中的主键列等值连接条件将会被作为主连接条件。ASOF Join 的主连接条件可以包含非等值的连接条件,而 Window Join 的主连接条件则是通过 WINDOW_OFFSET 来指定。 +作为一款时序数据库,TDengine 所有的关联查询都围绕主键时戳列进行,因此要求除 ASOF/Window Join 外的所有关联查询都必须含有主键列的等值连接条件,而按照顺序首次出现在连接条件中的主键列等值连接条件将会被作为主连接条件。ASOF Join 的主连接条件可以包含非等值的连接条件,而 Window Join 的主连接条件则是通过 `WINDOW_OFFSET` 来指定。 -除 Window Join 外,TDengine 支持在主连接条件中进行 timetruncate 函数操作,例如 ON timetruncate(a.ts, 1s) = timetruncate(b.ts, 1s),除此之外,暂不支持其他函数及标量运算。 +除 Window Join 外,TDengine 支持在主连接条件中进行 `timetruncate` 函数操作,例如 `ON timetruncate(a.ts, 1s) = timetruncate(b.ts, 1s)`,除此之外,暂不支持其他函数及标量运算。 ### 分组条件 -时序数据库特色的 ASOF/Window Join 支持对关联查询的输入数据进行分组,然后每个分组内进行关联操作。分组只对关联查询的输入进行,输出结果将不包含分组信息。ASOF/Window Join 中出现在 ON 之后的等值条件(ASOF 的主连接条件除外)将被作为分组条件。 +时序数据库特色的 ASOF/Window Join 支持对关联查询的输入数据进行分组,然后每个分组内进行关联操作。分组只对关联查询的输入进行,输出结果将不包含分组信息。ASOF/Window Join 中出现在 `ON` 之后的等值条件(ASOF 的主连接条件除外)将被作为分组条件。 ### 主键时间线 -TDengine 作为时序数据库要求每个表(子表)中必须有主键时间戳列,它将作为该表的主键时间线进行很多跟时间相关的运算,而子查询的结果或者 Join 运算的结果中也需要明确哪一列将被视作主键时间线参与后续的时间相关的运算。在子查询中,查询结果中存在的有序的第一个出现的主键列(或其运算)或等同主键列的伪列(_wstart/_wend)将被视作该输出表的主键时间线。Join 输出结果中主键时间线的选择遵从以下规则: +TDengine 作为时序数据库要求每个表(子表)中必须有主键时间戳列,它将作为该表的主键时间线进行很多跟时间相关的运算,而子查询的结果或者 Join 运算的结果中也需要明确哪一列将被视作主键时间线参与后续的时间相关的运算。在子查询中,查询结果中存在的有序的第一个出现的主键列(或其运算)或等同主键列的伪列(`_wstart`/`_wend`)将被视作该输出表的主键时间线。Join 输出结果中主键时间线的选择遵从以下规则: - Left/Right Join 系列中驱动表(子查询)的主键列将被作为后续查询的主键时间线;此外,在 Window Join 窗口内,因为左右表同时有序所以在窗口内可以把任意一个表的主键列做作主键时间线,优先选择本表的主键列作为主键时间线。 -- Inner Join 可以把任意一个表的主键列做作主键时间线,当存在类似分组条件(TAG 列的等值条件且与主连接条件 AND 关系)时将无法产生主键时间线。 +- Inner Join 可以把任意一个表的主键列做作主键时间线,当存在类似分组条件(Tag 列的等值条件且与主连接条件 `AND` 关系)时将无法产生主键时间线。 - Full Join 因为无法产生任何一个有效的主键时间序列,因此没有主键时间线,这也就意味着 Full Join 中无法进行时间线相关的运算。 + +## 语法说明 + +在接下来的章节中会通过共用的方式同时介绍 Left/Right Join 系列,因此后续的包括 Outer、Semi、Anti-Semi、ASOF、Window 系列介绍中都采用了类似 "left/right" 的写法来同时进行 Left/Right Join 的介绍。这里简要介绍这种写法的含义,写在 "/" 前面的表示应用于 Left Join,而写在 "/" 后面的表示应用于 Right Join。 + +举例说明: + +"左/右表" 表示对 Left Join 来说,它指的是"左表",对 Right Join 来说,它指的是“右表”; + +同理, + +"右/左表" 表示对 Left Join 来说,它指的是"右表",对 Right Join 来说,它指的是“左表”; + + ## Join 功能 ### Inner Join @@ -64,9 +78,9 @@ SELECT ... FROM table_name1, table_name2 WHERE ... [...] 支持超级表、普通表、子表、子查询间 Inner Join。 #### 说明 -- 对于第一种语法,INNER 关键字可选, ON 和/或 WHERE 中可以指定主连接条件和其他连接条件,WHERE 中还可以指定过滤条件,ON/WHERE 两者至少指定一个。 -- 对于第二种语法,可以在 WHERE 中指定主连接条件、其他连接条件、过滤条件。 -- 对超级表进行 Inner Join 时,与主连接条件 AND 关系的 Tag 列等值条件将作为类似分组条件使用,因此输出结果不能保持有序。 +- 对于第一种语法,`INNER` 关键字可选, `ON` 和/或 `WHERE` 中可以指定主连接条件和其他连接条件,`WHERE` 中还可以指定过滤条件,`ON`/`WHERE` 两者至少指定一个。 +- 对于第二种语法,可以在 `WHERE` 中指定主连接条件、其他连接条件、过滤条件。 +- 对超级表进行 Inner Join 时,与主连接条件 `AND` 关系的 Tag 列等值条件将作为类似分组条件使用,因此输出结果不能保持有序。 #### 示例 @@ -87,7 +101,7 @@ SELECT ... FROM table_name1 LEFT|RIGHT [OUTER] JOIN table_name2 ON ... [WHERE .. ``` #### 结果集 -Inner Join 的结果集 + 左/右表中不符合连接条件的行和右/左表的空数据(NULL)组成的行数据集合。 +Inner Join 的结果集 + 左/右表中不符合连接条件的行和右/左表的空数据(`NULL`)组成的行数据集合。 #### 适用范围 支持超级表、普通表、子表、子查询间 Left/Right Join。 @@ -105,7 +119,7 @@ SELECT a.ts, a.voltage, b.voltage FROM d1001 a LEFT JOIN d1002 b ON a.ts = b.ts ### Left/Right Semi Join #### 含义 -左/右半连接 - 通常表达的是 IN/EXISTS 的含义,即对左/右表任意一条数据来说,只有当右/左表中存在任一符合连接条件的数据时才返回左/右表行数据。 +左/右半连接 - 通常表达的是 `IN``/EXISTS` 的含义,即对左/右表任意一条数据来说,只有当右/左表中存在任一符合连接条件的数据时才返回左/右表行数据。 #### 语法 ```sql @@ -128,7 +142,7 @@ SELECT a.ts FROM d1001 a LEFT SEMI JOIN meters b ON a.ts = b.ts and a.voltage > ### Left/Right Anti-Semi Join #### 含义 -左/右反连接 - 同左/右半连接的逻辑正好相反,通常表达的是 NOT IN/NOT EXISTS 的含义,即对左/右表任意一条数据来说,只有当右/左表中不存在任何符合连接条件的数据时才返回左/右表行数据。 +左/右反连接 - 同左/右半连接的逻辑正好相反,通常表达的是 `NOT IN`/`NOT EXISTS` 的含义,即对左/右表任意一条数据来说,只有当右/左表中不存在任何符合连接条件的数据时才返回左/右表行数据。 #### 语法 ```sql @@ -136,10 +150,10 @@ SELECT ... FROM table_name1 LEFT|RIGHT ANTI JOIN table_name2 ON ... [WHERE ...] ``` #### 结果集 -左表中不符合连接条件的行和右表的空数据(NULL)组成的行数据集合。 +左/右表中不符合连接条件的行和右/左表的空数据(`NULL`)组成的行数据集合。 #### 适用范围 -支持超级表、普通表、子表、子查询间 Left Anti-Semi Join。 +支持超级表、普通表、子表、子查询间 Left/Right Anti-Semi Join。 #### 示例 @@ -159,7 +173,7 @@ SELECT ... FROM table_name1 LEFT|RIGHT ASOF JOIN table_name2 [ON ...] [JLIMIT jl ``` ##### 结果集 -左/右表中每一行数据与右/左表中符合连接条件的按主键列排序后时间戳最接近的最多 jlimit_num 条数据或空数据(NULL)的笛卡尔积集合。 +左/右表中每一行数据与右/左表中符合连接条件的按主键列排序后时间戳最接近的最多 `jlimit_num` 条数据或空数据(`NULL`)的笛卡尔积集合。 ##### 适用范围 支持超级表、普通表、子表间 Left/Right ASOF Join。 @@ -179,10 +193,10 @@ SELECT ... FROM table_name1 LEFT|RIGHT ASOF JOIN table_name2 [ON ...] [JLIMIT jl 对于 Right ASOF 来说,上述运算符含义正好相反。 -- 如果不含 ON 子句或 ON 子句中未指定主键列的匹配规则,则默认主键匹配规则运算符是 “>=”, 即(对 Left ASOF Join 来说)右表中主键时戳小于等于左表主键时戳的行数据。不支持多个主连接条件。 -- ON 子句中还可以指定除主键列外的 TAG、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,除此之外不支持其他类型的条件。 -- 所有 ON 条件间只支持 AND 运算。 -- JLIMIT 用于指定单行匹配结果的最大行数,可选,未指定时默认值为1,即左/右表每行数据最多从右/左表中获得一行匹配结果。JLIMIT 取值范围为 [0, 1024]。符合匹配条件的 jlimit_num 条数据不要求时间戳相同,当右/左表中不存在满足条件的 jlimit_num 条数据时,返回的结果行数可能小于 jlimit_num;当右/左表中存在符合条件的多于 jlimit_num 条数据时,如果时间戳相同将随机返回 jlimit_num 条数据。 +- 如果不含 `ON` 子句或 `ON` 子句中未指定主键列的匹配规则,则默认主键匹配规则运算符是 “>=”, 即(对 Left ASOF Join 来说)右表中主键时戳小于等于左表主键时戳的行数据。不支持多个主连接条件。 +- `ON` 子句中还可以指定除主键列外的 Tag、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,除此之外不支持其他类型的条件。 +- 所有 ON 条件间只支持 `AND` 运算。 +- `JLIMIT` 用于指定单行匹配结果的最大行数,可选,未指定时默认值为1,即左/右表每行数据最多从右/左表中获得一行匹配结果。`JLIMIT` 取值范围为 [0, 1024]。符合匹配条件的 `jlimit_num` 条数据不要求时间戳相同,当右/左表中不存在满足条件的 `jlimit_num` 条数据时,返回的结果行数可能小于 `jlimit_num`;当右/左表中存在符合条件的多于 `jlimit_num` 条数据时,如果时间戳相同将随机返回 `jlimit_num` 条数据。 #### 示例 @@ -202,19 +216,19 @@ SELECT ... FROM table_name1 LEFT|RIGHT WINDOW JOIN table_name2 [ON ...] WINDOW_O ``` #### 结果集 -左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 WINDOW_OFFSET 划分的窗口内的至多 jlimit_num 条数据或空数据(NULL)的笛卡尔积集合 或 -左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 WINDOW_OFFSET 划分的窗口内的至多 jlimit_num 条数据的聚合结果或空数据(NULL)组成的行数据集合。 +左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 `WINDOW_OFFSET` 划分的窗口内的至多 `jlimit_num` 条数据或空数据(`NULL`)的笛卡尔积集合 或 +左/右表中每一行数据与右/左表中基于左/右表主键时戳列和 `WINDOW_OFFSET` 划分的窗口内的至多 `jlimit_num` 条数据的聚合结果或空数据(`NULL`)组成的行数据集合。 #### 适用范围 支持超级表、普通表、子表间 Left/Right Window Join。 #### 说明 - 只支持表间 Window Join,不支持子查询间 Window Join; -- ON 子句可选,只支持指定除主键列外的 TAG、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,所有条件间只支持 AND 运算; -- WINDOW_OFFSET 用于指定窗口的左右边界相对于左/右表主键时间戳的偏移量,支持自带时间单位的形式,例如:WINDOW_OFFSET(-1a, 1a),对于 Left Window Join 来说,表示每个窗口为 [左表主键时间戳 - 1毫秒,左表主键时间戳 + 1毫秒] ,左右边界均为闭区间。数字后面的时间单位可以是 b(纳秒)、u(微秒)、a(毫秒)、s(秒)、m(分)、h(小时)、d(天)、w(周),不支持自然月(n)、自然年(y),支持的最小时间单位为数据库精度,左右表所在数据库精度需保持一致。 -- JLIMIT 用于指定单个窗口内的最大匹配行数,可选,未指定时默认获取每个窗口内的所有匹配行。JLIMIT 取值范围为 [0, 1024],当右表中不存在满足条件的 jlimit_num 条数据时,返回的结果行数可能小于 jlimit_num;当右表中存在超过 jlimit_num 条满足条件的数据时,优先返回窗口内主键时间戳最小的 jlimit_num 条数据。 -- SQL 语句中不能含其他 GROUP BY/PARTITION BY/窗口查询; -- 支持在 WHERE 子句中进行标量过滤,支持在 HAVING 子句中针对每个窗口进行聚合函数过滤(不支持标量过滤),不支持 SLIMIT,不支持各种窗口伪列; +- `ON` 子句可选,只支持指定除主键列外的 Tag、普通列(不支持标量函数及运算)之间的等值条件用于分组计算,所有条件间只支持 `AND` 运算; +- `WINDOW_OFFSET` 用于指定窗口的左右边界相对于左/右表主键时间戳的偏移量,支持自带时间单位的形式,例如:`WINDOW_OFFSET(-1a, 1a)`,对于 Left Window Join 来说,表示每个窗口为 [左表主键时间戳 - 1毫秒,左表主键时间戳 + 1毫秒] ,左右边界均为闭区间。数字后面的时间单位可以是 `b`(纳秒)、`u`(微秒)、`a`(毫秒)、`s`(秒)、`m`(分)、`h`(小时)、`d`(天)、`w`(周),不支持自然月(`n`)、自然年(`y`),支持的最小时间单位为数据库精度,左右表所在数据库精度需保持一致。 +- `JLIMIT` 用于指定单个窗口内的最大匹配行数,可选,未指定时默认获取每个窗口内的所有匹配行。`JLIMIT` 取值范围为 [0, 1024],当右表中不存在满足条件的 `jlimit_num` 条数据时,返回的结果行数可能小于 `jlimit_num`;当右表中存在超过 `jlimit_num` 条满足条件的数据时,优先返回窗口内主键时间戳最小的 `jlimit_num` 条数据。 +- SQL 语句中不能含其他 `GROUP BY`/`PARTITION BY`/窗口查询; +- 支持在 `WHERE` 子句中进行标量过滤,支持在 `HAVING` 子句中针对每个窗口进行聚合函数过滤(不支持标量过滤),不支持 `SLIMIT`,不支持各种窗口伪列; #### 示例 @@ -237,7 +251,7 @@ SELECT a.ts, a.voltage, avg(b.voltage) FROM d1001 a LEFT WINDOW JOIN d1002 b WIN SELECT ... FROM table_name1 FULL [OUTER] JOIN table_name2 ON ... [WHERE ...] [...] #### 结果集 -Inner Join 的结果集 + 左表中不符合连接条件的行加上右表的空数据组成的行数据集合 + 右表中不符合连接条件的行加上左表的空数据组成的行数据集合。 +Inner Join 的结果集 + 左表中不符合连接条件的行加上右表的空数据组成的行数据集合 + 右表中不符合连接条件的行加上左表的空数据(`NULL`)组成的行数据集合。 #### 适用范围 支持超级表、普通表、子表、子查询间 Full Outer Join。 @@ -259,13 +273,13 @@ SELECT a.ts, a.voltage, b.ts, b.voltage FROM d1001 a FULL JOIN d1002 b on a.ts = ### 连接条件限制 - 除 ASOF 和 Window Join 之外,其他 Join 的连接条件中必须含主键列的主连接条件; 且 -- 主连接条件与其他连接条件间只支持 AND 运算; -- 作为主连接条件的主键列只支持 timetruncate 函数运算(不支持其他函数和标量运算),作为其他连接条件时无限制; +- 主连接条件与其他连接条件间只支持 `AND` 运算; +- 作为主连接条件的主键列只支持 `timetruncate` 函数运算(不支持其他函数和标量运算),作为其他连接条件时无限制; ### 分组条件限制 -- 只支持除主键列外的 TAG、普通列的等值条件; +- 只支持除主键列外的 Tag、普通列的等值条件; - 不支持标量运算; -- 支持多个分组条件,条件间只支持 AND 运算; +- 支持多个分组条件,条件间只支持 `AND` 运算; ### 查询结果顺序限制 - 普通表、子表、子查询且无分组条件无排序的场景下,查询结果会按照驱动表的主键列顺序输出;