fix: add taosbenchmark english documents

2024-12-15 16:29:22 +08:00 · 2024-12-15 16:29:22 +08:00 · f62b0ce25a
parent 3830603b8e
commit f62b0ce25a
4 changed files with 100 additions and 49 deletions
--- a/docs/en/14-reference/02-tools/09-taosdump.md
+++ b/docs/en/14-reference/02-tools/09-taosdump.md
@ -4,22 +4,17 @@ sidebar_label: taosdump
 slug: /tdengine-reference/tools/taosdump
 ---

-taosdump is a tool application that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster.
-
-taosdump can back up data using databases, supertables, or basic tables as logical data units, and can also back up data records within a specified time period from databases, supertables, and basic tables. You can specify the directory path for data backup; if not specified, taosdump defaults to backing up data to the current directory.
-
-If the specified location already has data files, taosdump will prompt the user and exit immediately to avoid data being overwritten. This means the same path can only be used for one backup.
-If you see related prompts, please operate carefully.
-
-taosdump is a logical backup tool, it should not be used to back up any raw data, environment settings, hardware information, server configuration, or cluster topology. taosdump uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
+`taosdump` is a TDengine data backup/recovery tool provided for open source users, and the backed up data files adopt the standard [Apache AVRO](https://avro.apache.org/)
+  Format, convenient for exchanging data with the external ecosystem.  
+ Taosdump provides multiple data backup and recovery options to meet different data needs, and all supported options can be viewed through -- help.

 ## Installation

-There are two ways to install taosdump:
+Taosdump provides two installation methods:

- Install the official taosTools package, please find taosTools on the [release history page](../../../release-history/taostools/) and download it for installation.
+-Taosdump is the default installation component in the TDengine installation package, which can be used after installing TDengine. For how to install TDengine, please refer to [TDengine Installation](../../../get started/)

- Compile taos-tools separately and install, please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository for details.
+-Compile and install taos tools separately, refer to [taos tools](https://github.com/taosdata/taos-tools) .

 ## Common Use Cases

@ -30,6 +25,9 @@ There are two ways to install taosdump:
 3. Backup certain supertables or basic tables in a specified database: use the `dbname stbname1 stbname2 tbname1 tbname2 ...` parameter, note that this input sequence starts with the database name, supports only one database, and the second and subsequent parameters are the names of the supertables or basic tables in that database, separated by spaces;
 4. Backup the system log database: TDengine clusters usually include a system database named `log`, which contains data for TDengine's own operation, taosdump does not back up the log database by default. If there is a specific need to back up the log database, you can use the `-a` or `--allow-sys` command line parameter.
 5. "Tolerant" mode backup: Versions after taosdump 1.4.1 provide the `-n` and `-L` parameters, used for backing up data without using escape characters and in "tolerant" mode, which can reduce backup data time and space occupied when table names, column names, and label names do not use escape characters. If unsure whether to use `-n` and `-L`, use the default parameters for "strict" mode backup. For an explanation of escape characters, please refer to the [official documentation](../../sql-manual/escape-characters/).
+6. If a backup file already exists in the directory specified by the `-o` parameter, to prevent data from being overwritten, taosdump will report an error and exit. Please replace it with another empty directory or clear the original data before backing up.
+7. Currently, taosdump does not support data breakpoint backup function. Once the data backup is interrupted, it needs to be started from scratch.
+ If the backup takes a long time, it is recommended to use the (-S -E options) method to specify the start/end time for segmented backup.

 :::tip

@ -42,7 +40,8 @@ There are two ways to install taosdump:

 ### taosdump Restore Data

-Restore data files from a specified path: use the `-i` parameter along with the data file path. As mentioned earlier, the same directory should not be used to back up different data sets, nor should the same path be used to back up the same data set multiple times, otherwise, the backup data will cause overwriting or multiple backups.
+- Restore data files from a specified path: use the `-i` parameter along with the data file path. As mentioned earlier, the same directory should not be used to back up different data sets, nor should the same path be used to back up the same data set multiple times, otherwise, the backup data will cause overwriting or multiple backups.
+- taosdump supports data recovery to a new database name with the parameter `-W`, please refer to the command line parameter description for details.

 :::tip
 taosdump internally uses the TDengine stmt binding API to write restored data, currently using 16384 as a batch for writing. If there are many columns in the backup data, it may cause a "WAL size exceeds limit" error, in which case you can try adjusting the `-B` parameter to a smaller value.
--- a/docs/en/14-reference/02-tools/10-taosbenchmark.md
+++ b/docs/en/14-reference/02-tools/10-taosbenchmark.md
@ -4,35 +4,36 @@ sidebar_label: taosBenchmark
 slug: /tdengine-reference/tools/taosbenchmark
 ---

-taosBenchmark (formerly known as taosdemo) is a tool for testing the performance of the TDengine product. taosBenchmark can test the performance of TDengine's insert, query, and subscription functions. It can simulate massive data generated by a large number of devices and flexibly control the number of databases, supertables, types and number of tag columns, types and number of data columns, number of subtables, data volume per subtable, data insertion interval, number of working threads in taosBenchmark, whether and how to insert out-of-order data, etc. To accommodate the usage habits of past users, the installation package provides taosdemo as a soft link to taosBenchmark.
+TaosBenchmark is a performance benchmarking tool for TDengine products, providing insertion, query, and subscription performance testing for TDengine products, and outputting performance indicators.

 ## Installation

-There are two ways to install taosBenchmark:
+taosBenchmark provides two installation methods:

- taosBenchmark is automatically installed with the official TDengine installation package, for details please refer to [TDengine Installation](../../../get-started/).
+- taosBenchmark is the default installation component in the TDengine installation package, which can be used after installing TDengine. For how to install TDengine, please refer to [TDengine Installation](../../../get started/)

- Compile and install taos-tools separately, for details please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository.
+- Compile and install taos tools separately, refer to [taos tools](https://github.com/taosdata/taos-tools) .

 ## Operation

 ### Configuration and Operation Methods

-taosBenchmark needs to be executed in the operating system's terminal, and this tool supports two configuration methods: Command Line Arguments and JSON Configuration File. These two methods are mutually exclusive; when using a configuration file, only one command line argument `-f <json file>` can be used to specify the configuration file. When using command line arguments to run taosBenchmark and control its behavior, the `-f` parameter cannot be used; instead, other parameters must be used for configuration. In addition, taosBenchmark also offers a special mode of operation, which is running without any parameters.
-
-taosBenchmark supports comprehensive performance testing for TDengine, and the TDengine features it supports are divided into three categories: writing, querying, and subscribing. These three functions are mutually exclusive, and each run of taosBenchmark can only select one of them. It is important to note that the type of function to be tested is not configurable when using the command line configuration method; the command line configuration method can only test writing performance. To test TDengine's query and subscription performance, you must use the configuration file method and specify the type of function to be tested through the `filetype` parameter in the configuration file.
+taosBbenchmark supports three operating modes:
+- No parameter mode
+- Command line mode
+- JSON configuration file mode
+The command-line approach is a subset of the functionality of JSON configuration files, which immediately uses the command line and then the configuration file, with the parameters specified by the command line taking precedence.

 **Ensure that the TDengine cluster is running correctly before running taosBenchmark.**

 ### Running Without Command Line Arguments

-Execute the following command to quickly experience taosBenchmark performing a write performance test on TDengine based on the default configuration.
-
 ```bash
 taosBenchmark
 ```

-When running without parameters, taosBenchmark by default connects to the TDengine cluster specified under `/etc/taos`, and creates a database named `test` in TDengine, under which a supertable named `meters` is created, and 10,000 tables are created under the supertable, each table having 10,000 records inserted. Note that if a `test` database already exists, this command will delete the existing database and create a new `test` database.
+When running without parameters, taosBenchmark defaults to connecting to the TDengine cluster specified in `/etc/taos/taos.cfg `.
+After successful connection, a smart meter example database test, super meters, and 10000 sub meters will be created, with 10000 records per sub meter. If the test database already exists, it will be deleted before creating a new one.

 ### Running Using Command Line Configuration Parameters

@ -46,9 +47,7 @@ The above command `taosBenchmark` will create a database named `test`, establish

 ### Running Using a Configuration File

-The taosBenchmark installation package includes examples of configuration files, located in `<install_directory>/examples/taosbenchmark-json`
-
-Use the following command line to run taosBenchmark and control its behavior through a configuration file.
+Running in configuration file mode provides all functions, so parameters can be configured to run in the configuration file.  

 ```bash
 taosBenchmark -f <json file>
@ -214,6 +213,61 @@ taosBenchmark -A INT,DOUBLE,NCHAR,BINARY\(16\)
 - **-?/--help**:
  Displays help information and exits. Cannot be used with other parameters.

+
+## Output performance indicators
+
+#### Write indicators
+
+After writing is completed, a summary performance metric will be output in the last two lines in the following format:
+``` bash
+SUCC: Spent 8.527298 (real 8.117379) seconds to insert rows: 10000000 with 8 thread(s) into test 1172704.41 (real 1231924.74) records/second
+SUCC: insert delay, min: 19.6780ms, avg: 64.9390ms, p90: 94.6900ms, p95: 105.1870ms, p99: 130.6660ms, max: 157.0830ms
+```
+First line write speed statistics:
+- Spent: Total write time, in seconds, counting from the start of writing the first data to the end of the last data. This indicates that a total of 8.527298 seconds were spent
+- Real: Total write time (calling the engine), excluding the time spent preparing data for the testing framework. Purely counting the time spent on engine calls, The time spent is 8.117379 seconds. If 8.527298-8.117379=0.409919 seconds, it is the time spent preparing data for the testing framework
+- Rows: Write the total number of rows, which is 10 million pieces of data
+- Threads: The number of threads being written, which is 8 threads writing simultaneously
+- Records/second write speed = `total write time` / `total number of rows written`, real in parentheses is the same as before, indicating pure engine write speed
+
+Second line single write delay statistics:  
+- min: Write minimum delay
+- avg: Write normal delay
+- p90: Write delay p90 percentile delay number
+- p95: Write delay p95 percentile delay number
+- p99: Write delay p99 percentile delay number
+- max: maximum write delay
+Through this series of indicators, the distribution of write request latency can be observed
+
+#### Query indicators
+The query performance test mainly outputs the QPS indicator of query request speed, and the output format is as follows:
+
+``` bash
+complete query with 3 threads and 10000 query delay avg: 	0.002686s min: 	0.001182s max: 	0.012189s p90: 	0.002977s p95: 	0.003493s p99: 	0.004645s SQL command: select ...
+INFO: Total specified queries: 30000
+INFO: Spend 26.9530 second completed total queries: 30000, the QPS of all threads: 1113.049
+```
+
+- The first line represents the percentile distribution of query execution and query request delay for each of the three threads executing 10000 queries. The SQL command is the test query statement
+- The second line indicates that a total of 10000 * 3 = 30000 queries have been completed
+- The third line indicates that the total query time is 26.9653 seconds, and the query rate per second (QPS) is 1113.049 times/second
+
+#### Subscription metrics
+
+The subscription performance test mainly outputs consumer consumption speed indicators, with the following output format:
+``` bash
+INFO: consumer id 0 has poll total msgs: 376, period rate: 37.592 msgs/s, total rows: 3760000, period rate: 375924.815 rows/s
+INFO: consumer id 1 has poll total msgs: 362, period rate: 36.131 msgs/s, total rows: 3620000, period rate: 361313.504 rows/s
+INFO: consumer id 2 has poll total msgs: 364, period rate: 36.378 msgs/s, total rows: 3640000, period rate: 363781.731 rows/s
+INFO: consumerId: 0, consume msgs: 1000, consume rows: 10000000
+INFO: consumerId: 1, consume msgs: 1000, consume rows: 10000000
+INFO: consumerId: 2, consume msgs: 1000, consume rows: 10000000
+INFO: Consumed total msgs: 3000, total rows: 30000000
+```
+- Lines 1 to 3 real-time output of the current consumption speed of each consumer, msgs/s represents the number of consumption messages, each message contains multiple rows of data, and rows/s represents the consumption speed calculated by rows
+- Lines 4 to 6 show the overall statistics of each consumer after the test is completed, including the total number of messages consumed and the total number of lines
+- The overall statistics of all consumers in line 7, `msgs` represents how many messages were consumed in total, `rows` represents how many rows of data were consumed in total
+
 ## Configuration File Parameters Detailed Explanation

 ### General Configuration Parameters
--- a/docs/zh/14-reference/02-tools/09-taosdump.md
+++ b/docs/zh/14-reference/02-tools/09-taosdump.md
@ -4,26 +4,17 @@ sidebar_label: taosdump
 toc_max_heading_level: 4
 ---

-taosdump 是一个支持从运行中的 TDengine 集群备份数据并将备份的数据恢复到相同或另一个运行中的 TDengine 集群中的工具应用程序。
+taosdump 是为开源用户提供的 TDengine 数据备份/恢复工具，备份的数据文件采用标准 [ Apache AVRO ](https://avro.apache.org/) 格式，方便与外界生态交换数据。taosdump 提供多种数据备份及恢复选项来满足不同的数据需求，可通过 --help 查看支持的全部选项。

-taosdump 可以用数据库、超级表或普通表作为逻辑数据单元进行备份，也可以对数据库、超级
-表和普通表中指定时间段内的数据记录进行备份。使用时可以指定数据备份的目录路径，如果
-不指定位置，taosdump 默认会将数据备份到当前目录。
-
-如果指定的位置已经有数据文件，taosdump 会提示用户并立即退出，避免数据被覆盖。这意味着同一路径只能被用于一次备份。
-如果看到相关提示，请小心操作。
-
-taosdump 是一个逻辑备份工具，它不应被用于备份任何原始数据、环境设置、
-硬件信息、服务端配置或集群的拓扑结构。taosdump 使用
-[ Apache AVRO ](https://avro.apache.org/)作为数据文件格式来存储备份数据。

 ## 安装

-taosdump 有两种安装方式:
+taosdump 提供两种安装方式:

- 安装 taosTools 官方安装包, 请从[发布历史页面](https://docs.taosdata.com/releases/tools/)页面找到 taosTools 并下载安装。
+- taosdump 是 TDengine 安装包中默认安装组件，安装 TDengine 后即可使用，如何安装 TDengine 可参考[TDengine 安装](../../../get-started/)。
+
+- 单独编译 taos-tools 并安装, 参考 [taos-tools](https://github.com/taosdata/taos-tools) 仓库。

- 单独编译 taos-tools 并安装, 详情请参考 [taos-tools](https://github.com/taosdata/taos-tools) 仓库。

 ## 常用使用场景

@ -31,9 +22,11 @@ taosdump 有两种安装方式:

 1.  备份所有数据库：指定 `-A` 或 `--all-databases` 参数；
 2.  备份多个指定数据库：使用 `-D db1,db2,...` 参数；
-3.  备份指定数据库中的某些超级表或普通表：使用 `dbname stbname1 stbname2 tbname1 tbname2 ...` 参数，注意这种输入序列第一个参数为数据库名称，且只支持一个数据库，第二个和之后的参数为该数据库中的超级表或普通表名称，中间以空格分隔；
+3.  备份指定数据库中某些超级表或普通表：使用 `dbname stbname1 stbname2 tbname1 tbname2 ...` 参数，注意这种输入序列第一个参数为数据库名称，且只支持一个数据库，第二个和之后的参数为该数据库中的超级表或普通表名称，中间以空格分隔；
 4.  备份系统 log 库：TDengine 集群通常会包含一个系统数据库，名为 `log`，这个数据库内的数据为 TDengine 自我运行的数据，taosdump 默认不会对 log 库进行备份。如果有特定需求对 log 库进行备份，可以使用 `-a` 或 `--allow-sys` 命令行参数。
 5.  “宽容”模式备份：taosdump 1.4.1 之后的版本提供 `-n` 参数和 `-L` 参数，用于备份数据时不使用转义字符和“宽容”模式，可以在表名、列名、标签名没使用转义字符的情况下减少备份数据时间和备份数据占用空间。如果不确定符合使用 `-n` 和 `-L` 条件时请使用默认参数进行“严格”模式进行备份。转义字符的说明请参考[官方文档](../../taos-sql/escape)。
+6.  `-o` 参数指定的目录下如果已存在备份文件，为防止数据被覆盖，taosdump 会报错并退出，请更换其它空目录或清空原来数据后再备份。 
+7.  目前 taosdump 不支持数据断点继备功能，一旦数据备份中断，需要从头开始。如果备份需要很长时间，建议使用（-S -E 选项）指定开始/结束时间进行分段备份的方法，

 :::tip
 - taosdump 1.4.1 之后的版本提供 `-I` 参数，用于解析 avro 文件 schema 和数据，如果指定 `-s` 参数将只解析 schema。
@ -45,7 +38,9 @@ taosdump 有两种安装方式:

 ### taosdump 恢复数据

-恢复指定路径下的数据文件：使用 `-i` 参数加上数据文件所在路径。如前面提及，不应该使用同一个目录备份不同数据集合，也不应该在同一路径多次备份同一数据集，否则备份数据会造成覆盖或多次备份。
+- 恢复指定路径下的数据文件：使用 `-i` 参数加上数据文件所在路径。如前面提及，不应该使用同一个目录备份不同数据集合，也不应该在同一路径多次备份同一数据集，否则备份数据会造成覆盖或多次备份。  
+- taosdump 支持数据恢复至新数据库名下，参数是 -W, 详细见命令行参数说明。
+

 :::tip
 taosdump 内部使用 TDengine stmt binding API 进行恢复数据的写入，为提高数据恢复性能，目前使用 16384 为一次写入批次。如果备份数据中有比较多列数据，可能会导致产生 "WAL size exceeds limit" 错误，此时可以通过使用 `-B` 参数调整为一个更小的值进行尝试。
--- a/docs/zh/14-reference/02-tools/10-taosbenchmark.md
+++ b/docs/zh/14-reference/02-tools/10-taosbenchmark.md
@ -10,7 +10,7 @@ taosBenchmark 是 TDengine 产品性能基准测试工具，提供对 TDengine

 taosBenchmark 提供两种安装方式:

- taosBenchmark 是 TDengine 安装包中默认安装组件， 安装 TDengine 安装包后即可使用，安装 TDengine 可参考[TDengine 安装](../../../get-started/)。
+- taosBenchmark 是 TDengine 安装包中默认安装组件，安装 TDengine 后即可使用，如何安装 TDengine 可参考[TDengine 安装](../../../get-started/)。

 - 单独编译 taos-tools 并安装, 参考 [taos-tools](https://github.com/taosdata/taos-tools) 仓库。

@ -148,7 +148,7 @@ SUCC: insert delay, min: 19.6780ms, avg: 64.9390ms, p90: 94.6900ms, p95: 105.187
 - real : 写入总耗时（调用引擎），此耗时已抛去测试框架准备数据时间，纯统计在引擎调用上花费的时间，花费为 8.117379 秒，8.527298 - 8.117379 = 0.409919 秒则为测试框架准备数据消耗时间
 - rows : 写入总行数，为 1000 万条数据
 - threads: 写入线程数，这里是 8 个线程同时写入
- - records/second 写入速度 = `写入总耗时`/ `写入总行数` ， 括号中 real 同前，表示纯引擎写入速度
+ - records/second 写入速度 = `写入总耗时`/ `写入总行数` ， 括号中 `real` 同前，表示纯引擎写入速度
 第二行单个写入延时统计：
 - min : 写入最小延时
 - avg : 写入平时延时
@ -159,16 +159,19 @@ SUCC: insert delay, min: 19.6780ms, avg: 64.9390ms, p90: 94.6900ms, p95: 105.187
 通过此系列指标，可观察到写入请求延时分布情况

 #### 查询指标 
+
 查询性能测试主要输出查询请求速度 QPS 指标, 输出格式如下：
 ``` bash
 complete query with 3 threads and 10000 query delay avg: 	0.002686s min: 	0.001182s max: 	0.012189s p90: 	0.002977s p95: 	0.003493s p99: 	0.004645s SQL command: select ...
 INFO: Total specified queries: 30000
 INFO: Spend 26.9530 second completed total queries: 30000, the QPS of all threads:   1113.049
 ```
-第一行表示 3 个线程每个线程执行 10000 次查询，后面是查询请求延时百分位分布情况，单位为秒，SQL command 表示执行的是哪个查询语句  
-第二行表示总共完成了 10000 * 3 = 30000 次查询总数  
-第三行表示查询总耗时为 26.9653 秒，每秒查询率(QPS)为：1113.049 次/秒
+- 第一行表示 3 个线程每个线程执行 10000 次查询及查询请求延时百分位分布情况，`SQL command` 为测试的查询语句  
+- 第二行表示总共完成了 10000 * 3 = 30000 次查询总数  
+- 第三行表示查询总耗时为 26.9653 秒，每秒查询率(QPS)为：1113.049 次/秒
+
 #### 订阅指标
+
 订阅性能测试主要输出消费者消费速度指标，输出格式如下：
 ``` bash
 INFO: consumer id 0 has poll total msgs: 376, period rate: 37.592 msgs/s, total rows: 3760000, period rate: 375924.815 rows/s
@ -179,9 +182,9 @@ INFO: consumerId: 1, consume msgs: 1000, consume rows: 10000000
 INFO: consumerId: 2, consume msgs: 1000, consume rows: 10000000
 INFO: Consumed total msgs: 3000, total rows: 30000000
 ```
-1 ~ 3 行实时输出每个消费者当前的消费速度，msgs/s 表示消费消息个数，每个消息中包含多行数据，rows/s 表示按行数统计的消费速度  
-4 ~ 6 行是测试完成后每个消费者总体统计，统计共消费了多少条消息，共计多少行  
-第 7 行所有消费者总体统计，msgs 表示共消费了多少条消息， rows 表示共消费了多少行数据
+- 1 ~ 3 行实时输出每个消费者当前的消费速度，`msgs/s` 表示消费消息个数，每个消息中包含多行数据，`rows/s` 表示按行数统计的消费速度  
+- 4 ~ 6 行是测试完成后每个消费者总体统计，统计共消费了多少条消息，共计多少行  
+- 第 7 行所有消费者总体统计，`msgs` 表示共消费了多少条消息， `rows` 表示共消费了多少行数据

 ## 配置文件参数详解