From f299a2810944c47d71188ec7f27fe89de841018c Mon Sep 17 00:00:00 2001 From: wangjiaming0909 <604227650@qq.com> Date: Thu, 19 Oct 2023 11:34:04 +0800 Subject: [PATCH 1/6] feat: support to_timestamp/to_char --- docs/en/12-taos-sql/10-function.md | 83 ++ docs/zh/12-taos-sql/10-function.md | 83 ++ include/common/ttime.h | 21 + include/libs/function/functionMgt.h | 2 + include/libs/scalar/scalar.h | 2 + include/util/taoserror.h | 1 + include/util/tdef.h | 9 + source/common/src/ttime.c | 877 +++++++++++++++++- source/common/test/commonTests.cpp | 225 ++++- source/libs/function/src/builtins.c | 48 + source/libs/scalar/src/sclfunc.c | 56 ++ source/util/src/terror.c | 1 + tests/parallel_test/cases.task | 4 + .../2-query/func_to_char_timestamp.py | 160 ++++ 14 files changed, 1566 insertions(+), 6 deletions(-) create mode 100644 tests/system-test/2-query/func_to_char_timestamp.py diff --git a/docs/en/12-taos-sql/10-function.md b/docs/en/12-taos-sql/10-function.md index 340a3e917b..266cdb4958 100644 --- a/docs/en/12-taos-sql/10-function.md +++ b/docs/en/12-taos-sql/10-function.md @@ -483,6 +483,89 @@ return_timestamp: { - The precision of the returned timestamp is same as the precision set for the current data base in use - return_timestamp indicates whether the returned value type is TIMESTAMP or not. If this parameter set to 1, function will return TIMESTAMP type. Otherwise function will return BIGINT type. If parameter is omitted, default return value type is BIGINT. +#### TO_CHAR + +```sql +TO_CHAR(ts, str_literal) +``` + +**Description**: Convert a ts column to string as the format specified + +**Return value type**: VARCHAR + +**Applicable column types**: TIMESTAMP + +**Nested query**: It can be used in both the outer query and inner query in a nested query. + +**Applicable table types**: standard tables and supertables + +**Supported Formats** + +| **Format** | **Comment**| **example** | +| --- | --- | --- | +|AM,am,PM,pm| Meridiem indicator(without periods) | 07:00:00am| +|A.M.,a.m.,P.M.,p.m.| Meridiem indicator(with periods)| 07:00:00a.m.| +|YYYY,yyyy|year, 4 or more digits| 2023-10-10| +|YYY,yyy| year, last 3 digits| 023-10-10| +|YY,yy| year, last 2 digits| 23-10-10| +|Y,y| year, last digit| 3-10-10| +|MONTH|full uppercase of month| 2023-JANUARY-01| +|Month|full capitalized month| 2023-January-01| +|month|full lowercase of month| 2023-january-01| +|MON| abbreviated uppercase of month(3 char)| JAN, SEP| +|Mon| abbreviated capitalized month| Jan, Sep| +|mon|abbreviated lowercase of month| jan, sep| +|MM,mm|month number 01-12|2023-01-01| +|DD,dd|month day, 01-31|| +|DAY|full uppercase of week day|MONDAY| +|Day|full capitalized week day|Monday| +|day|full lowercase of week day|monday| +|DY|abbreviated uppercase of week day|MON| +|Dy|abbreviated capitalized week day|Mon| +|dy|abbreviated lowercase of week day|mon| +|DDD|year day, 001-366|| +|D,d|week day number, 1-7, Sunday(1) to Saturday(7)|| +|HH24,hh24|hour of day, 00-23|2023-01-30 23:59:59| +|hh12,HH12, hh, HH| hour of day, 01-12|2023-01-30 12:59:59PM| +|MI,mi|minute, 00-59|| +|SS,ss|second, 00-59|| +|MS,ms|milli second, 000-999|| +|US,us|micro second, 000000-999999|| +|NS,ns|nano second, 000000000-999999999|| +|TZH,tzh|time zone hour|2023-01-30 11:59:59PM +08| + +**More explanations**: +- The output format of `Month`, `Day` are left aligined, like`2023-OCTOBER -01`, `2023-SEPTEMBER-01`, `September` is the longest, no paddings. Week days are slimilar. +- When `ms`,`us`,`ns` are used in `to_char`, like `to_char(ts, 'yyyy-mm-dd hh:mi:ss.ms.us.ns')`, The time of `ms`,`us`,`ns` corresponds to the same fraction seconds. When ts is `1697182085123`, the output of `ms` is `123`, `us` is `123000`, `ns` is `123000000`. +- If we want to output some characters of format without converting, surround it with double quotes. `to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. If want to output double quotes, add a back slash before double quote, like `to_char(ts, '\"yyyy-mm-dd\"')` will output `"2023-10-10"`. +- For formats that output digits, the uppercase and lowercase formats are the same. + +#### TO_TIMESTAMP + +```sql +TO_TIMESTAMP(str_literal, str_literal) +``` + +**Description**: Convert a formated timestamp string to a timestamp + +**Return value type**: TIMESTAMP + +**Applicable column types**: VARCHAR + +**Nested query**: It can be used in both the outer query and inner query in a nested query. + +**Applicable table types**: standard tables and supertables + +**Supported Formats**: The same as `TO_CHAR`. + +**More explanations**: +- When `ms`, `us`, `ns` are used in `to_timestamp`, if multi of them are specified, the results are accumulated. For example, `to_timestamp('2023-10-10 10:10:10.123.000456.000000789', 'yyyy-mm-dd hh:mi:ss.ms.us.ns')` will output the timestamp of `2023-10-10 10:10:10.123456789`. +- The uppercase or lowercase of `MONTH`, `MON`, `DAY`, `DY` and formtas that output digits have same effect when used in `to_timestamp`, like `to_timestamp('2023-JANUARY-01', 'YYYY-month-dd')`, `month` can be replaced by `MONTH`, or `month`. The cases are ignored. +- If multi times are specified for one component, the previous will be overwritten. Like `to_timestamp('2023-22-10-10', 'yyyy-yy-MM-dd')`, the output year will be `2022`. +- The default timetsamp if some components are not specified will be: `1970-01-01 00:00:00` with your local timezone. +- If `AM` or `PM` is specified in formats, the Hour must between `1-12`. +- In some cases, `to_timestamp` can convert correctly even the format and the timestamp string are not totally matched. Like `to_timetamp('200101/2', 'yyyyMM1/dd')`, the digit `1` in format string are ignored, and the output timestsamp is `2001-01-02 00:00:00`. Spaces and tabs in formats and tiemstamp string are also ignored automatically. + ### Time and Date Functions diff --git a/docs/zh/12-taos-sql/10-function.md b/docs/zh/12-taos-sql/10-function.md index 8b87a18e54..806ff3c6a8 100644 --- a/docs/zh/12-taos-sql/10-function.md +++ b/docs/zh/12-taos-sql/10-function.md @@ -483,6 +483,89 @@ return_timestamp: { - 返回的时间戳精度与当前 DATABASE 设置的时间精度一致。 - return_timestamp 指定函数返回值是否为时间戳类型,设置为1时返回 TIMESTAMP 类型,设置为0时返回 BIGINT 类型。如不指定缺省返回 BIGINT 类型。 +#### TO_CHAR + +```sql +TO_CHAR(ts, str_literal) +``` + +**功能说明**: 将timestamp类型按照指定格式转换为字符串 + +**返回结果数据类型**: VARCHAR + +**应用字段**: TIMESTAMP + +**嵌套子查询支持**: 适用于内层查询和外层查询 + +**适用于**: 表和超级表 + +**支持的格式** + +| **格式** | **说明**| **例子** | +| --- | --- | --- | +|AM,am,PM,pm| 无点分隔的上午下午 | 07:00:00am| +|A.M.,a.m.,P.M.,p.m.| 有点分割的上午下午| 07:00:00a.m.| +|YYYY,yyyy|年, 4个及以上数字| 2023-10-10| +|YYY,yyy| 年, 最后3位数字| 023-10-10| +|YY,yy| 年, 最后2位数字| 23-10-10| +|Y,y|年, 最后一位数字| 3-10-10| +|MONTH|月, 全大写| 2023-JANUARY-01| +|Month|月, 首字母大写| 2023-January-01| +|month|月, 全小写| 2023-january-01| +|MON| 月, 缩写, 全大写(三个字符)| JAN, SEP| +|Mon| 月, 缩写, 首字母大写| Jan, Sep| +|mon|月, 缩写, 全小写| jan, sep| +|MM,mm|月, 数字 01-12|2023-01-01| +|DD,dd|月日, 01-31|| +|DAY|周日, 全大写|MONDAY| +|Day|周日, 首字符大写|Monday| +|day|周日, 全小写|monday| +|DY|周日, 缩写, 全大写|MON| +|Dy|周日, 缩写, 首字符大写|Mon| +|dy|周日, 缩写, 全小写|mon| +|DDD|年日, 001-366|| +|D,d|周日, 数字, 1-7, Sunday(1) to Saturday(7)|| +|HH24,hh24|小时, 00-23|2023-01-30 23:59:59| +|hh12,HH12, hh, HH| 小时, 01-12|2023-01-30 12:59:59PM| +|MI,mi|分钟, 00-59|| +|SS,ss|秒, 00-59|| +|MS,ms|毫秒, 000-999|| +|US,us|微秒, 000000-999999|| +|NS,ns|纳秒, 000000000-999999999|| +|TZH,tzh|时区小时|2023-01-30 11:59:59PM +08| + +**使用说明**: +- `Month`, `Day`等的输出格式是左对齐的, 右侧添加空格, 如`2023-OCTOBER -01`, `2023-SEPTEMBER-01`, 9月是月份中英文字母数最长的, 因此9月没有空格. 星期类似. +- 使用`ms`, `us`, `ns`时, 以上三种格式的输出只在精度上不同, 比如ts为 `1697182085123`, `ms` 的输出为 `123`, `us` 的输出为 `123000`, `ns` 的输出为 `123000000`. +- 如果想要在格式串中指定某些部分不做转换, 可以使用双引号, 如`to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. 如果想要输出双引号, 那么在双引号之前加一个反斜杠, 如 `to_char(ts, '\"yyyy-mm-dd\"')` 将会输出 `"2023-10-10"`. +- 那些输出是数字的格式, 如`YYYY`, `DD`, 大写与小写意义相同, 即`yyyy` 和 `YYYY` 可以互换. + +#### TO_TIMESTAMP + +```sql +TO_TIMESTAMP(str_literal, str_literal) +``` + +**功能说明**: 将字符串按照指定格式转化为时间戳. + +**返回结果数据类型**: TIMESTAMP + +**应用字段**: VARCHAR + +**嵌套子查询支持**: 适用于内层查询和外层查询 + +**适用于**: 表和超级表 + +**支持的格式**: 与`to_char`相同 + +**使用说明**: +- 若`ms`, `us`, `ns`同时指定, 那么结果时间戳包含上述三个字段的和. 如 `to_timestamp('2023-10-10 10:10:10.123.000456.000000789', 'yyyy-mm-dd hh:mi:ss.ms.us.ns')` 输出是 `2023-10-10 10:10:10.123456789`. +- `MONTH`, `MON`, `DAY`, `DY` 以及其他输出为数字的格式的大小写意义相同, 如 `to_timestamp('2023-JANUARY-01', 'YYYY-month-dd')`, `month`可以被替换为`MONTH` 或者`Month`. +- 如果同一字段被指定了多次, 那么前面的指定将会被覆盖. 如 `to_timestamp('2023-22-10-10', 'yyyy-yy-MM-dd')`, 输出年份是`2022`. +- 如果某些部分没有指定 那么默认时间为本地时区的 `1970-01-01 00:00:00`, 未指定部分为对应默认值. +- 如果格式串中有`AM`, `PM`等, 那么小时必须是12小时制, 范围必须是01-12. +- `to_timestamp`转换具有一定的容错机制, 在格式串和时间戳串不完全对应时, 有时也可转换, 如: `to_timestamp('200101/2', 'yyyyMM1/dd')`, 格式串中多出来的1会被丢弃. 格式串与时间戳串中多余的空格字符(空格, tab等)也会被 自动忽略. 如`to_timestamp(' 23 年 - 1 月 - 01 日 ', 'yy 年-MM月-dd日')` 可以被成功转换. 虽然`MM`等字段需要两个数字对应(只有一位时前面补0), 在`to_timestamp`时, 一个数字也可以成功转换. + ### 时间和日期函数 diff --git a/include/common/ttime.h b/include/common/ttime.h index 37e3045817..75bbcddd0e 100644 --- a/include/common/ttime.h +++ b/include/common/ttime.h @@ -90,6 +90,27 @@ int32_t convertStringToTimestamp(int16_t type, char* inputData, int64_t timePrec void taosFormatUtcTime(char* buf, int32_t bufLen, int64_t ts, int32_t precision); +struct STm { + struct tm tm; + int64_t fsec; // in NANOSECOND +}; + +int32_t taosTs2Tm(int64_t ts, int32_t precision, struct STm* tm); +int32_t taosTm2Ts(struct STm* tm, int64_t* ts, int32_t precision); + +/// @brief convert a timestamp to a formatted string +/// @param format the timestamp format, must null terminated +void taosTs2Char(const char* format, int64_t ts, int32_t precision, char* out); +/// @brief convert a formatted timestamp string to a timestamp +/// @param format must null terminated +/// @param tsStr must null terminated +/// @retval 0 for success, otherwise error occured +int32_t taosChar2Ts(const char* format, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, + int32_t errMsgLen); + +void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out); +int32_t TEST_char2ts(const char* format, int64_t* ts, int32_t precision, const char* tsStr); + #ifdef __cplusplus } #endif diff --git a/include/libs/function/functionMgt.h b/include/libs/function/functionMgt.h index 48c2210f46..865f1b2295 100644 --- a/include/libs/function/functionMgt.h +++ b/include/libs/function/functionMgt.h @@ -94,6 +94,8 @@ typedef enum EFunctionType { FUNCTION_TYPE_TO_ISO8601, FUNCTION_TYPE_TO_UNIXTIMESTAMP, FUNCTION_TYPE_TO_JSON, + FUNCTION_TYPE_TO_TIMESTAMP, + FUNCTION_TYPE_TO_CHAR, // date and time function FUNCTION_TYPE_NOW = 2500, diff --git a/include/libs/scalar/scalar.h b/include/libs/scalar/scalar.h index 2e6652f860..789ba554e2 100644 --- a/include/libs/scalar/scalar.h +++ b/include/libs/scalar/scalar.h @@ -80,6 +80,8 @@ int32_t castFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutp int32_t toISO8601Function(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); int32_t toUnixtimestampFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); int32_t toJsonFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); +int32_t toTimestampFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); +int32_t toCharFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); int32_t timeTruncateFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); int32_t timeDiffFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); int32_t nowFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput); diff --git a/include/util/taoserror.h b/include/util/taoserror.h index 39ae3fb97a..10784bdb0c 100644 --- a/include/util/taoserror.h +++ b/include/util/taoserror.h @@ -739,6 +739,7 @@ int32_t* taosGetErrno(); #define TSDB_CODE_FUNC_FUNTION_PARA_VALUE TAOS_DEF_ERROR_CODE(0, 0x2803) #define TSDB_CODE_FUNC_NOT_BUILTIN_FUNTION TAOS_DEF_ERROR_CODE(0, 0x2804) #define TSDB_CODE_FUNC_DUP_TIMESTAMP TAOS_DEF_ERROR_CODE(0, 0x2805) +#define TSDB_CODE_FUNC_TO_TIMESTAMP_FAILED TAOS_DEF_ERROR_CODE(0, 0x2806) //udf #define TSDB_CODE_UDF_STOPPING TAOS_DEF_ERROR_CODE(0, 0x2901) diff --git a/include/util/tdef.h b/include/util/tdef.h index 287617970c..14c507b9a2 100644 --- a/include/util/tdef.h +++ b/include/util/tdef.h @@ -109,6 +109,15 @@ extern const int32_t TYPE_BYTES[21]; #define TSDB_INS_USER_STABLES_DBNAME_COLID 2 +static const int64_t TICK_PER_SECOND[] = { + 1000LL, // MILLISECOND + 1000000LL, // MICROSECOND + 1000000000LL, // NANOSECOND + 0LL, // HOUR + 0LL, // MINUTE + 1LL // SECOND +}; + #define TSDB_TICK_PER_SECOND(precision) \ ((int64_t)((precision) == TSDB_TIME_PRECISION_MILLI \ ? 1000LL \ diff --git a/source/common/src/ttime.c b/source/common/src/ttime.c index 425218f0e1..3450e32f4a 100644 --- a/source/common/src/ttime.c +++ b/source/common/src/ttime.c @@ -25,7 +25,6 @@ #include "tlog.h" - // ==== mktime() kernel code =================// static int64_t m_deltaUtc = 0; @@ -679,7 +678,7 @@ int64_t taosTimeAdd(int64_t t, int64_t duration, char unit, int32_t precision) { } // The following code handles the y/n time duration - int64_t numOfMonth = (unit == 'y')? duration*12:duration; + int64_t numOfMonth = (unit == 'y') ? duration * 12 : duration; int64_t fraction = t % TSDB_TICK_PER_SECOND(precision); struct tm tm; @@ -722,7 +721,7 @@ int64_t taosTimeAdd(int64_t t, int64_t duration, char unit, int32_t precision) { * Total num of windows is ret + 1(the first window) */ int32_t taosTimeCountIntervalForFill(int64_t skey, int64_t ekey, int64_t interval, char unit, int32_t precision, - int32_t order) { + int32_t order) { if (ekey < skey) { int64_t tmp = ekey; ekey = skey; @@ -765,7 +764,6 @@ int64_t taosTimeTruncate(int64_t ts, const SInterval* pInterval) { int32_t precision = pInterval->precision; if (IS_CALENDAR_TIME_DURATION(pInterval->slidingUnit)) { - start /= (int64_t)(TSDB_TICK_PER_SECOND(precision)); struct tm tm; time_t tt = (time_t)start; @@ -796,7 +794,7 @@ int64_t taosTimeTruncate(int64_t ts, const SInterval* pInterval) { int64_t newe = taosTimeAdd(news, pInterval->interval, pInterval->intervalUnit, precision) - 1; if (newe < ts) { // move towards the greater endpoint - while(newe < ts && news < ts) { + while (newe < ts && news < ts) { news += pInterval->sliding; newe = taosTimeAdd(news, pInterval->interval, pInterval->intervalUnit, precision) - 1; } @@ -975,3 +973,872 @@ void taosFormatUtcTime(char* buf, int32_t bufLen, int64_t t, int32_t precision) tstrncpy(buf, ts, bufLen); } + +int32_t taosTs2Tm(int64_t ts, int32_t precision, struct STm* tm) { + tm->fsec = ts % TICK_PER_SECOND[precision] * (TICK_PER_SECOND[TSDB_TIME_PRECISION_NANO] / TICK_PER_SECOND[precision]); + time_t t = ts / TICK_PER_SECOND[precision]; + taosLocalTime(&t, &tm->tm, NULL); + return TSDB_CODE_SUCCESS; +} + +int32_t taosTm2Ts(struct STm* tm, int64_t* ts, int32_t precision) { + *ts = taosMktime(&tm->tm); + *ts *= TICK_PER_SECOND[precision]; + *ts += tm->fsec / (TICK_PER_SECOND[TSDB_TIME_PRECISION_NANO] / TICK_PER_SECOND[precision]); + return TSDB_CODE_SUCCESS; +} + +typedef struct { + const char* name; + int len; + int id; + bool isDigit; +} TSFormatKeyWord; + +typedef enum { + // TSFKW_AD, // BC AD + // TSFKW_A_D, // A.D. B.C. + TSFKW_AM, // AM, PM + TSFKW_A_M, // A.M., P.M. + // TSFKW_BC, // BC AD + // TSFKW_B_C, // B.C. A.D. + TSFKW_DAY, // MONDAY, TUESDAY ... + TSFKW_DDD, // Day of year 001-366 + TSFKW_DD, // Day of month 01-31 + TSFKW_Day, // Sunday, Monday + TSFKW_DY, // MON, TUE + TSFKW_Dy, // Mon, Tue + TSFKW_dy, // mon, tue + TSFKW_D, // 1-7 -> Sunday(1) -> Saturday(7) + TSFKW_HH24, + TSFKW_HH12, + TSFKW_HH, + TSFKW_MI, // minute + TSFKW_MM, + TSFKW_MONTH, // JANUARY, FEBRUARY + TSFKW_MON, + TSFKW_Month, + TSFKW_Mon, + TSFKW_MS, + TSFKW_NS, + TSFKW_OF, + TSFKW_PM, + TSFKW_P_M, + TSFKW_SS, + // TSFKW_TZM, + TSFKW_TZH, + // TSFKW_TZ, + TSFKW_US, + TSFKW_YYYY, + TSFKW_YYY, + TSFKW_YY, + TSFKW_Y, + // TSFKW_a_d, + // TSFKW_ad, + TSFKW_am, + TSFKW_a_m, + // TSFKW_b_c, + // TSFKW_bc, + TSFKW_d, + TSFKW_day, + TSFKW_ddd, + TSFKW_dd, + TSFKW_hh24, + TSFKW_hh12, + TSFKW_hh, + TSFKW_mm, + TSFKW_month, + TSFKW_mon, + TSFKW_ms, + TSFKW_ns, + TSFKW_pm, + TSFKW_p_m, + TSFKW_ss, + TSFKW_tzh, + // TSFKW_tzm, + // TSFKW_tz, + TSFKW_us, + TSFKW_yyyy, + TSFKW_yyy, + TSFKW_yy, + TSFKW_y, + TSFKW_last_ +} TSFormatKeywordId; + +// clang-format off +static const TSFormatKeyWord formatKeyWords[] = { + //{"A.D.", 4, TSFKW_A_D}, + {"A.M.", 4, TSFKW_A_M, false}, + //{"AD", 2, TSFKW_AD, false}, + {"AM", 2, TSFKW_AM, false}, + //{"B.C.", 4, TSFKW_B_C, false}, + //{"BC", 2, TSFKW_BC, false}, + {"DAY", 3, TSFKW_DAY, false}, + {"DDD", 3, TSFKW_DDD, true}, + {"DD", 2, TSFKW_DD, true}, + {"DY", 2, TSFKW_DY, false}, + {"Day", 3, TSFKW_Day, false}, + {"Dy", 2, TSFKW_Dy, false}, + {"D", 1, TSFKW_D, true}, + {"HH24", 4, TSFKW_HH24, true}, + {"HH12", 4, TSFKW_HH12, true}, + {"HH", 2, TSFKW_HH, true}, + {"MI", 2, TSFKW_MI, true}, + {"MM", 2, TSFKW_MM, true}, + {"MONTH", 5, TSFKW_MONTH, false}, + {"MON", 3, TSFKW_MON, false}, + {"MS", 2, TSFKW_MS, true}, + {"Month", 5, TSFKW_Month, false}, + {"Mon", 3, TSFKW_Mon, false}, + {"NS", 2, TSFKW_NS, true}, + //{"OF", 2, TSFKW_OF, false}, + {"P.M.", 4, TSFKW_P_M, false}, + {"PM", 2, TSFKW_PM, false}, + {"SS", 2, TSFKW_SS, true}, + {"TZH", 3, TSFKW_TZH, false}, + //{"TZM", 3, TSFKW_TZM}, + //{"TZ", 2, TSFKW_TZ}, + {"US", 2, TSFKW_US, true}, + {"YYYY", 4, TSFKW_YYYY, true}, + {"YYY", 3, TSFKW_YYY, true}, + {"YY", 2, TSFKW_YY, true}, + {"Y", 1, TSFKW_Y, true}, + //{"a.d.", 4, TSFKW_a_d, false}, + {"a.m.", 4, TSFKW_a_m, false}, + //{"ad", 2, TSFKW_ad, false}, + {"am", 2, TSFKW_am, false}, + //{"b.c.", 4, TSFKW_b_c, false}, + //{"bc", 2, TSFKW_bc, false}, + {"day", 3, TSFKW_day, false}, + {"ddd", 3, TSFKW_DDD, true}, + {"dd", 2, TSFKW_DD, true}, + {"dy", 2, TSFKW_dy, false}, + {"d", 1, TSFKW_D, true}, + {"hh24", 4, TSFKW_HH24, true}, + {"hh12", 4, TSFKW_HH12, true}, + {"hh", 2, TSFKW_HH, true}, + {"mi", 2, TSFKW_MI, true}, + {"mm", 2, TSFKW_MM, true}, + {"month", 5, TSFKW_month, false}, + {"mon", 3, TSFKW_mon, false}, + {"ms", 2, TSFKW_MS, true}, + {"ns", 2, TSFKW_NS, true}, + //{"of", 2, TSFKW_OF, false}, + {"p.m.", 4, TSFKW_p_m, false}, + {"pm", 2, TSFKW_pm, false}, + {"ss", 2, TSFKW_SS, true}, + {"tzh", 3, TSFKW_TZH, false}, + //{"tzm", 3, TSFKW_TZM}, + //{"tz", 2, TSFKW_tz}, + {"us", 2, TSFKW_US, true}, + {"yyyy", 4, TSFKW_YYYY, true}, + {"yyy", 3, TSFKW_YYY, true}, + {"yy", 2, TSFKW_YY, true}, + {"y", 1, TSFKW_Y, true}, + {NULL, 0, 0} +}; +// clang-format on + +typedef struct { + uint8_t type; + char c[2]; + const TSFormatKeyWord* key; +} TSFormatNode; + +static const char* const weekDays[] = {"Sunday", "Monday", "Tuesday", "Wednesday", + "Thursday", "Friday", "Saturday", "NULL"}; +static const char* const shortWeekDays[] = {"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "NULL"}; +static const char* const fullMonths[] = {"January", "February", "March", "April", "May", "June", "July", + "August", "September", "October", "November", "December", NULL}; +static const char* const months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", + "Aug", "Sep", "Oct", "Nov", "Dec", NULL}; +#define A_M_STR "A.M." +#define a_m_str "a.m." +#define AM_STR "AM" +#define am_str "am" +#define P_M_STR "P.M." +#define p_m_str "p.m." +#define PM_STR "PM" +#define pm_str "pm" +static const char* const apms[] = {AM_STR, PM_STR, am_str, pm_str, NULL}; +static const char* const long_apms[] = {A_M_STR, P_M_STR, a_m_str, p_m_str, NULL}; + +#define TS_FORMAT_NODE_TYPE_KEYWORD 1 +#define TS_FORMAT_NODE_TYPE_SEPARATOR 2 +#define TS_FORMAT_NODE_TYPE_CHAR 3 + +static const TSFormatKeyWord* keywordSearch(const char* str) { + if (*str < 'A' || *str > 'z' || (*str > 'Z' && *str < 'a')) return NULL; + int32_t idx = 0; + const TSFormatKeyWord* key = &formatKeyWords[idx++]; + while (key->name) { + if (0 == strncmp(key->name, str, key->len)) { + return key; + } + key = &formatKeyWords[idx++]; + } + return NULL; +} + +static bool isSeperatorChar(char c) { + return (c > 0x20 && c < 0x7F && !(c >= 'A' && c <= 'Z') && !(c >= 'a' && c <= 'z') && !(c >= '0' && c <= '9')); +} + +static void parseTsFormat(const char* format_str, SArray* formats) { + while (*format_str) { + const TSFormatKeyWord* key = keywordSearch(format_str); + if (key) { + TSFormatNode format = {.key = key, .type = TS_FORMAT_NODE_TYPE_KEYWORD}; + taosArrayPush(formats, &format); + format_str += key->len; + } else { + if (*format_str == '"') { + // for double quoted string + format_str++; + while (*format_str) { + if (*format_str == '"') { + format_str++; + break; + } + if (*format_str == '\\' && *(format_str + 1)) format_str++; + TSFormatNode format = {.type = TS_FORMAT_NODE_TYPE_CHAR, .key = NULL}; + format.c[0] = *format_str; + format.c[1] = '\0'; + taosArrayPush(formats, &format); + format_str++; + } + } else { + // for other strings + if (*format_str == '\\' && *(format_str + 1)) format_str++; + TSFormatNode format = { + .type = isSeperatorChar(*format_str) ? TS_FORMAT_NODE_TYPE_SEPARATOR : TS_FORMAT_NODE_TYPE_CHAR, + .key = NULL}; + format.c[0] = *format_str; + format.c[1] = '\0'; + taosArrayPush(formats, &format); + format_str++; + } + } + } +} + +static void tm2char(const SArray* formats, const struct STm* tm, char* s) { + int32_t size = taosArrayGetSize(formats); + for (int32_t i = 0; i < size; ++i) { + TSFormatNode* format = taosArrayGet(formats, i); + if (format->type != TS_FORMAT_NODE_TYPE_KEYWORD) { + strcpy(s, format->c); + s += strlen(s); + continue; + } + + switch (format->key->id) { + case TSFKW_AM: + case TSFKW_PM: + sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "PM" : "AM"); + s += strlen(s); + break; + case TSFKW_A_M: + case TSFKW_P_M: + sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "P.M." : "A.M."); + s += strlen(s); + break; + case TSFKW_am: + case TSFKW_pm: + sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "pm" : "am"); + s += strlen(s); + break; + case TSFKW_a_m: + case TSFKW_p_m: + sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "p.m." : "a.m."); + s += strlen(s); + break; + case TSFKW_DDD: + sprintf(s, "%d", tm->tm.tm_yday); + s += strlen(s); + break; + case TSFKW_DD: + sprintf(s, "%02d", tm->tm.tm_mday); + s += strlen(s); + break; + case TSFKW_D: + sprintf(s, "%d", tm->tm.tm_wday + 1); + s += strlen(s); + break; + case TSFKW_DAY: { + // MONDAY, TUESDAY... + const char* wd = weekDays[tm->tm.tm_wday]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = toupper(wd[i]); + sprintf(s, "%-9s", buf); + s += strlen(s); + break; + } + case TSFKW_Day: + // Monday, TuesDay... + sprintf(s, "%-9s", weekDays[tm->tm.tm_wday]); + s += strlen(s); + break; + case TSFKW_day: { + const char* wd = weekDays[tm->tm.tm_wday]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = tolower(wd[i]); + sprintf(s, "%-9s", buf); + s += strlen(s); + break; + } + case TSFKW_DY: { + // MON, TUE + const char* wd = shortWeekDays[tm->tm.tm_wday]; + char buf[8] = {0}; + for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = toupper(wd[i]); + sprintf(s, "%3s", buf); + s += strlen(s); + break; + } + case TSFKW_Dy: + // Mon, Tue + sprintf(s, "%3s", shortWeekDays[tm->tm.tm_wday]); + s += strlen(s); + break; + case TSFKW_dy: { + // mon, tue + const char* wd = shortWeekDays[tm->tm.tm_wday]; + char buf[8] = {0}; + for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = tolower(wd[i]); + sprintf(s, "%3s", buf); + s += strlen(s); + break; + } + case TSFKW_HH24: + sprintf(s, "%02d", tm->tm.tm_hour); + s += strlen(s); + break; + case TSFKW_HH: + case TSFKW_HH12: + // 0 or 12 o'clock in 24H coresponds to 12 o'clock (AM/PM) in 12H + sprintf(s, "%02d", tm->tm.tm_hour % 12 == 0 ? 12 : tm->tm.tm_hour % 12); + s += strlen(s); + break; + case TSFKW_MI: + sprintf(s, "%02d", tm->tm.tm_min); + s += strlen(s); + break; + case TSFKW_MM: + sprintf(s, "%02d", tm->tm.tm_mon + 1); + s += strlen(s); + break; + case TSFKW_MONTH: { + const char* mon = fullMonths[tm->tm.tm_mon]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(mon); ++i) buf[i] = toupper(mon[i]); + sprintf(s, "%-9s", buf); + s += strlen(s); + break; + } + case TSFKW_MON: { + const char* mon = months[tm->tm.tm_mon]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(mon); ++i) buf[i] = toupper(mon[i]); + sprintf(s, "%s", buf); + s += strlen(s); + break; + } + case TSFKW_Month: + sprintf(s, "%-9s", fullMonths[tm->tm.tm_mon]); + s += strlen(s); + break; + case TSFKW_month: { + const char* mon = fullMonths[tm->tm.tm_mon]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(mon); ++i) buf[i] = tolower(mon[i]); + sprintf(s, "%-9s", buf); + s += strlen(s); + break; + } + case TSFKW_Mon: + sprintf(s, "%s", months[tm->tm.tm_mon]); + s += strlen(s); + break; + case TSFKW_mon: { + const char* mon = months[tm->tm.tm_mon]; + char buf[10] = {0}; + for (int32_t i = 0; i < strlen(mon); ++i) buf[i] = tolower(mon[i]); + sprintf(s, "%s", buf); + s += strlen(s); + break; + } + case TSFKW_SS: + sprintf(s, "%02d", tm->tm.tm_sec); + s += strlen(s); + break; + case TSFKW_MS: + sprintf(s, "%03" PRId64, tm->fsec / 1000000L); + s += strlen(s); + break; + case TSFKW_US: + sprintf(s, "%06" PRId64, tm->fsec / 1000L); + s += strlen(s); + break; + case TSFKW_NS: + sprintf(s, "%09" PRId64, tm->fsec); + s += strlen(s); + break; + case TSFKW_TZH: + sprintf(s, "%s%02d", tsTimezone < 0 ? "-" : "+", tsTimezone); + s += strlen(s); + break; + case TSFKW_YYYY: + sprintf(s, "%04d", tm->tm.tm_year + 1900); + s += strlen(s); + break; + case TSFKW_YYY: + sprintf(s, "%03d", (tm->tm.tm_year + 1900) % 1000); + s += strlen(s); + break; + case TSFKW_YY: + sprintf(s, "%02d", (tm->tm.tm_year + 1900) % 100); + s += strlen(s); + break; + case TSFKW_Y: + sprintf(s, "%01d", (tm->tm.tm_year + 1900) % 10); + s += strlen(s); + break; + default: + break; + } + } +} + +/// @brief find s in arr case insensitively +/// @retval the index in arr if found, -1 if not found +static int32_t strArrayCaseSearch(const char* const* arr, const char* s) { + if (!*s) return -1; + const char* const* fmt = arr; + for (; *fmt; ++fmt) { + const char *l, *r; + for (l = fmt[0], r = s;; l++, r++) { + if (*l == '\0') return fmt - arr; + if (*r == '\0' || tolower(*l) != tolower(*r)) break; + } + } + return -1; +} + +static const char* tsFormatStr2Int32(int32_t* dest, const char* str, int32_t len, bool needMoreDigit) { + char* last; + int64_t res; + const char* s = str; + if (len <= 0) { + res = taosStr2Int64(s, &last, 10); + s = last; + } else { + char buf[16] = {0}; + strncpy(buf, s, len); + int32_t copiedLen = strlen(buf); + if (copiedLen < len) { + if (!needMoreDigit) { + // digits not enough, that's ok, cause we do not need more digits + // '2023-1' 'YYYY-MM' + // '202a' 'YYYY' -> 202 + res = taosStr2Int64(s, &last, 10); + s += copiedLen; + } else { + // bytes not enough, and there are other digit formats to match + // '2023-1' 'YYYY-MMDD' + return NULL; + } + } else { + if (needMoreDigit) { + res = taosStr2Int64(buf, &last, 10); + // bytes enough, but digits not enough, like '202a12' 'YYYYMM', YYYY needs four digits + if (last - buf < len) return NULL; + s += last - buf; + } else { + res = taosStr2Int64(s, &last, 10); + s = last; + } + } + } + if (s == str) { + // no integers found + return NULL; + } + if (errno == ERANGE || res > INT32_MAX || res < INT32_MIN) { + // out of range + return NULL; + } + *dest = res; + return s; +} + +static int32_t adjustYearTo2020(int32_t year) { + if (year < 70) return year + 2000; // 2000 - 2069 + if (year < 100) return year + 1900; // 1970 - 1999 + if (year < 520) return year + 2000; // 2100 - 2519 + if (year < 1000) return year + 1000; // 1520 - 1999 + return year; +} + +static bool checkTm(const struct tm* tm) { + if (tm->tm_mon < 0 || tm->tm_mon > 11) return false; + if (tm->tm_wday < 0 || tm->tm_wday > 6) return false; + if (tm->tm_yday < 0 || tm->tm_yday > 365) return false; + if (tm->tm_mday < 0 || tm->tm_mday > 31) return false; + if (tm->tm_hour < 0 || tm->tm_hour > 23) return false; + if (tm->tm_min < 0 || tm->tm_min > 59) return false; + if (tm->tm_sec < 0 || tm->tm_sec > 60) return false; + return true; +} + +static bool needMoreDigits(SArray* formats, int32_t curIdx) { + if (curIdx == taosArrayGetSize(formats) - 1) return false; + TSFormatNode* pNextNode = taosArrayGet(formats, curIdx + 1); + if (pNextNode->type == TS_FORMAT_NODE_TYPE_SEPARATOR) { + return false; + } else if (pNextNode->type == TS_FORMAT_NODE_TYPE_KEYWORD) { + return pNextNode->key->isDigit; + } else { + return isdigit(pNextNode->c[0]); + } +} + +/// @brief convert a formatted time str to timestamp +/// @param[in] s the formatted timestamp str +/// @param[in] formats array of TSFormatNode, output of parseTsFormat +/// @param[out] ts output timestamp +/// @param precision the timestamp precision to convert to, sec/milli/micro/nano +/// @param[out] sErrPos if not NULL, when err occured, points to the failed position of s, only set when ret is -1 +/// @param[out] fErrIdx if not NULL, when err occured, the idx of the failed format idx, only set when ret is -1 +/// @retval 0 for success +/// @retval -1 for format and s mismatch error +/// @retval -2 if datetime err, like 2023-13-32 25:61:69 +static int32_t char2ts(const char* s, SArray* formats, int64_t* ts, int32_t precision, const char** sErrPos, + int32_t* fErrIdx) { + int32_t size = taosArrayGetSize(formats); + int32_t pm = 0; // default am + int32_t hour12 = 0; // default HH24 + int32_t year = 0, mon = 0, yd = 0, md = 1, wd = 0; + int32_t hour = 0, min = 0, sec = 0, us = 0, ms = 0, ns = 0; + int32_t tzSign = 1, tz = tsTimezone; + int32_t err = 0; + + for (int32_t i = 0; i < size; ++i) { + while (isspace(*s)) { + s++; + } + TSFormatNode* node = taosArrayGet(formats, i); + if (node->type == TS_FORMAT_NODE_TYPE_SEPARATOR) { + // separator matches any character + if (isSeperatorChar(s[0])) s += strlen(node->c); + continue; + } + if (node->type == TS_FORMAT_NODE_TYPE_CHAR) { + if (!isspace(node->c[0])) s += strlen(node->c); + continue; + } + assert(node->type == TS_FORMAT_NODE_TYPE_KEYWORD); + switch (node->key->id) { + case TSFKW_A_M: + case TSFKW_P_M: + case TSFKW_a_m: + case TSFKW_p_m: { + int32_t idx = strArrayCaseSearch(long_apms, s); + if (idx >= 0) { + s += strlen(long_apms[idx]); + pm = idx % 2; + hour12 = 1; + } else { + err = -1; + } + } break; + case TSFKW_AM: + case TSFKW_PM: + case TSFKW_am: + case TSFKW_pm: { + int32_t idx = strArrayCaseSearch(apms, s); + if (idx >= 0) { + s += strlen(apms[idx]); + pm = idx % 2; + hour12 = 1; + } else { + err = -1; + } + } break; + case TSFKW_HH: + case TSFKW_HH12: { + const char* newPos = tsFormatStr2Int32(&hour, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos || hour > 12 || hour <= 0) { + err = -1; + } else { + hour12 = 1; + s = newPos; + } + } break; + case TSFKW_HH24: { + const char* newPos = tsFormatStr2Int32(&hour, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + hour12 = 0; + s = newPos; + } + } break; + case TSFKW_MI: { + const char* newPos = tsFormatStr2Int32(&min, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + } + } break; + case TSFKW_SS: { + const char* newPos = tsFormatStr2Int32(&sec, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) + err = -1; + else + s = newPos; + } break; + case TSFKW_MS: { + const char* newPos = tsFormatStr2Int32(&ms, s, 3, needMoreDigits(formats, i)); + if (NULL == newPos) + err = -1; + else { + int32_t len = newPos - s; + ms *= len == 1 ? 100 : len == 2 ? 10 : 1; + s = newPos; + } + } break; + case TSFKW_US: { + const char* newPos = tsFormatStr2Int32(&us, s, 6, needMoreDigits(formats, i)); + if (NULL == newPos) + err = -1; + else { + int32_t len = newPos - s; + us *= len == 1 ? 100000 : len == 2 ? 10000 : len == 3 ? 1000 : len == 4 ? 100 : len == 5 ? 10 : 1; + s = newPos; + } + } break; + case TSFKW_NS: { + const char* newPos = tsFormatStr2Int32(&ns, s, 9, needMoreDigits(formats, i)); + if (NULL == newPos) + err = -1; + else { + int32_t len = newPos - s; + ns *= len == 1 ? 100000000 + : len == 2 ? 10000000 + : len == 3 ? 1000000 + : len == 4 ? 100000 + : len == 5 ? 10000 + : len == 6 ? 1000 + : len == 7 ? 100 + : len == 8 ? 10 + : 1; + s = newPos; + } + } break; + case TSFKW_TZH: { + tzSign = *s == '-' ? -1 : 1; + const char* newPos = tsFormatStr2Int32(&tz, s, -1, needMoreDigits(formats, i)); + if (NULL == newPos) + err = -1; + else { + s = newPos; + } + } break; + case TSFKW_MONTH: + case TSFKW_Month: + case TSFKW_month: { + int32_t idx = strArrayCaseSearch(fullMonths, s); + if (idx >= 0) { + s += strlen(fullMonths[idx]); + mon = idx; + } else { + err = -1; + } + } break; + case TSFKW_MON: + case TSFKW_Mon: + case TSFKW_mon: { + int32_t idx = strArrayCaseSearch(months, s); + if (idx >= 0) { + s += strlen(months[idx]); + mon = idx; + } else { + err = -1; + } + } break; + case TSFKW_MM: { + const char* newPos = tsFormatStr2Int32(&mon, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + mon -= 1; + } + } break; + case TSFKW_DAY: + case TSFKW_Day: + case TSFKW_day: { + int32_t idx = strArrayCaseSearch(weekDays, s); + if (idx >= 0) { + s += strlen(weekDays[idx]); + wd = idx; + } else { + err = -1; + } + } break; + case TSFKW_DY: + case TSFKW_Dy: + case TSFKW_dy: { + int32_t idx = strArrayCaseSearch(shortWeekDays, s); + if (idx >= 0) { + s += strlen(shortWeekDays[idx]); + wd = idx; + } else { + err = -1; + } + } break; + case TSFKW_DDD: { + const char* newPos = tsFormatStr2Int32(&yd, s, 3, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + } + } break; + case TSFKW_DD: { + const char* newPos = tsFormatStr2Int32(&md, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + } + } break; + case TSFKW_D: { + const char* newPos = tsFormatStr2Int32(&wd, s, 1, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + } + } break; + case TSFKW_YYYY: { + const char* newPos = tsFormatStr2Int32(&year, s, 4, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + s = newPos; + } + } break; + case TSFKW_YYY: { + const char* newPos = tsFormatStr2Int32(&year, s, 3, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + year = adjustYearTo2020(year); + s = newPos; + } + } break; + case TSFKW_YY: { + const char* newPos = tsFormatStr2Int32(&year, s, 2, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + year = adjustYearTo2020(year); + s = newPos; + } + } break; + case TSFKW_Y: { + const char* newPos = tsFormatStr2Int32(&year, s, 1, needMoreDigits(formats, i)); + if (NULL == newPos) { + err = -1; + } else { + year = adjustYearTo2020(year); + s = newPos; + } + } break; + default: + break; + } + if (err) { + if (sErrPos) *sErrPos = s; + if (fErrIdx) *fErrIdx = i; + return err; + } + } + struct STm tm = {0}; + tm.tm.tm_year = year - 1900; + tm.tm.tm_mon = mon; + tm.tm.tm_yday = yd; + tm.tm.tm_mday = md; + tm.tm.tm_wday = wd; + if (hour12) { + if (pm && hour < 12) + tm.tm.tm_hour = hour + 12; + else if (!pm && hour == 12) + tm.tm.tm_hour = 0; + else + tm.tm.tm_hour = hour; + } else { + tm.tm.tm_hour = hour; + } + tm.tm.tm_min = min; + tm.tm.tm_sec = sec; + if (!checkTm(&tm.tm)) return -2; + if (tz < -12 || tz > 12) return -2; + tm.fsec = ms * 1000000 + us * 1000 + ns; + int32_t ret = taosTm2Ts(&tm, ts, precision); + *ts += 60 * 60 * (tsTimezone - tz) * TICK_PER_SECOND[precision]; + return ret; +} + +void taosTs2Char(const char* format, int64_t ts, int32_t precision, char* out) { + SArray* formats = taosArrayInit(8, sizeof(TSFormatNode)); + parseTsFormat(format, formats); + struct STm tm; + taosTs2Tm(ts, precision, &tm); + tm2char(formats, &tm, out); + taosArrayDestroy(formats); +} + +int32_t taosChar2Ts(const char* format, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, + int32_t errMsgLen) { + const char* sErrPos; + int32_t fErrIdx; + SArray* formats = taosArrayInit(4, sizeof(TSFormatNode)); + parseTsFormat(format, formats); + int32_t code = char2ts(tsStr, formats, ts, precision, &sErrPos, &fErrIdx); + if (code == -1) { + TSFormatNode* fNode = (taosArrayGet(formats, fErrIdx)); + snprintf(errMsg, errMsgLen, "mismatch format for: %s and %s", sErrPos, + fErrIdx < taosArrayGetSize(formats) ? ((TSFormatNode*)taosArrayGet(formats, fErrIdx))->key->name : ""); + } else if (code == -2) { + snprintf(errMsg, errMsgLen, "timestamp format error: %s -> %s", tsStr, format); + } + taosArrayDestroy(formats); + return code; +} + +void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out) { + SArray* formats = taosArrayInit(4, sizeof(TSFormatNode)); + parseTsFormat(format, formats); + struct STm tm; + taosTs2Tm(ts, precision, &tm); + tm2char(formats, &tm, out); + taosArrayDestroy(formats); +} + +int32_t TEST_char2ts(const char* format, int64_t* ts, int32_t precision, const char* tsStr) { + const char* sErrPos; + int32_t fErrIdx; + SArray* formats = taosArrayInit(4, sizeof(TSFormatNode)); + parseTsFormat(format, formats); + int32_t code = char2ts(tsStr, formats, ts, precision, &sErrPos, &fErrIdx); + if (code == -1) { + printf("failed position: %s\n", sErrPos); + printf("failed format: %s\n", ((TSFormatNode*)taosArrayGet(formats, fErrIdx))->key->name); + } + taosArrayDestroy(formats); + return code; +} diff --git a/source/common/test/commonTests.cpp b/source/common/test/commonTests.cpp index 8a77087d23..49a16351ca 100644 --- a/source/common/test/commonTests.cpp +++ b/source/common/test/commonTests.cpp @@ -13,6 +13,7 @@ #include "tdatablock.h" #include "tdef.h" #include "tvariant.h" +#include "ttime.h" namespace { // @@ -260,4 +261,226 @@ TEST(testCase, var_dataBlock_split_test) { } } -#pragma GCC diagnostic pop \ No newline at end of file +void check_tm(const STm* tm, int32_t y, int32_t mon, int32_t d, int32_t h, int32_t m, int32_t s, int64_t fsec) { + ASSERT_EQ(tm->tm.tm_year, y); + ASSERT_EQ(tm->tm.tm_mon, mon); + ASSERT_EQ(tm->tm.tm_mday, d); + ASSERT_EQ(tm->tm.tm_hour, h); + ASSERT_EQ(tm->tm.tm_min, m); + ASSERT_EQ(tm->tm.tm_sec, s); + ASSERT_EQ(tm->fsec, fsec); +} + +void test_timestamp_tm_conversion(int64_t ts, int32_t precision, int32_t y, int32_t mon, int32_t d, int32_t h, int32_t m, int32_t s, int64_t fsec) { + int64_t ts_tmp; + char buf[128] = {0}; + struct STm tm; + taosFormatUtcTime(buf, 128, ts, precision); + printf("formated ts of %ld, precision: %d is: %s\n", ts, precision, buf); + taosTs2Tm(ts, precision, &tm); + check_tm(&tm, y, mon, d, h, m, s, fsec); + taosTm2Ts(&tm, &ts_tmp, precision); + ASSERT_EQ(ts, ts_tmp); +} + +TEST(timeTest, timestamp2tm) { + const char* ts_str_ns = "2023-10-12T11:29:00.775726171+0800"; + const char* ts_str_us = "2023-10-12T11:29:00.775726+0800"; + const char* ts_str_ms = "2023-10-12T11:29:00.775+0800"; + int64_t ts, tmp_ts = 0; + struct STm tm; + + ASSERT_EQ(TSDB_CODE_SUCCESS, taosParseTime(ts_str_ns, &ts, strlen(ts_str_ns), TSDB_TIME_PRECISION_NANO, 0)); + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_NANO, 2023 - 1900, 9 /* mon start from 0*/, 12, 11, 29, 0, + 775726171L); + + ASSERT_EQ(TSDB_CODE_SUCCESS, taosParseTime(ts_str_us, &ts, strlen(ts_str_us), TSDB_TIME_PRECISION_MICRO, 0)); + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_MICRO, 2023 - 1900, 9 /* mon start from 0*/, 12, 11, 29, 0, + 775726000L); + + ASSERT_EQ(TSDB_CODE_SUCCESS, taosParseTime(ts_str_ms, &ts, strlen(ts_str_ms), TSDB_TIME_PRECISION_MILLI, 0)); + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_MILLI, 2023 - 1900, 9 /* mon start from 0*/, 12, 11, 29, 0, + 775000000L); + + ts = -5364687943000; // milliseconds since epoch, Wednesday, January 1, 1800 1:00:00 AM GMT+08:06 + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_MILLI, 1800 - 1900, 0 /* mon start from 0*/, 1, 1, 0, 0, + 000000000L); + + ts = 0; + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_MILLI, 1970 - 1900, 0 /* mon start from 0*/, 1, 8, 0, 0, + 000000000L); + + ts = -62198784343000; // milliseconds before epoch, Friday, January 1, -0001 12:00:00 AM GMT+08:06 + test_timestamp_tm_conversion(ts, TSDB_TIME_PRECISION_MILLI, -1 - 1900, 0 /* mon start from 0*/, 1, + 0 /* hour start from 0*/, 0, 0, 000000000L); +} + +void test_ts2char(int64_t ts, const char* format, int32_t precison, const char* expected) { + char buf[128] = {0}; + TEST_ts2char(format, ts, precison, buf); + printf("ts: %ld format: %s res: [%s], expected: [%s]\n", ts, format, buf, expected); + ASSERT_STREQ(expected, buf); +} + +TEST(timeTest, ts2char) { + osDefaultInit(); + if (tsTimezone != TdEastZone8) GTEST_SKIP(); + int64_t ts; + const char* format = "YYYY-MM-DD"; + ts = 0; + test_ts2char(ts, format, TSDB_TIME_PRECISION_MILLI, "1970-01-01"); + test_ts2char(ts, format, TSDB_TIME_PRECISION_MICRO, "1970-01-01"); + test_ts2char(ts, format, TSDB_TIME_PRECISION_NANO, "1970-01-01"); + test_ts2char(ts, format, TSDB_TIME_PRECISION_SECONDS, "1970-01-01"); + + ts = 1697163517; + test_ts2char(ts, "YYYY-MM-DD", TSDB_TIME_PRECISION_SECONDS, "2023-10-13"); + ts = 1697163517000; + test_ts2char(ts, "YYYY-MM-DD-Day-DAY", TSDB_TIME_PRECISION_MILLI, "2023-10-13-Friday -FRIDAY "); +#ifndef WINDOWS + // double quoted: year, month, day are not parsed + test_ts2char(ts, + "YYYY-YYY-YY-Y-yyyy-yyy-yy-y-\"年\"-MONTH-MON-Month-Mon-month-mon-\"月\"-DDD-DD-D-ddd-dd-d-DAY-Day-" + "day-\"日\"", + TSDB_TIME_PRECISION_MILLI, + "2023-023-23-3-2023-023-23-3-年-OCTOBER -OCT-October -Oct-october " + "-oct-月-285-13-6-285-13-6-FRIDAY -Friday -friday -日"); +#endif + ts = 1697182085123L; // Friday, October 13, 2023 3:28:05.123 PM GMT+08:00 + test_ts2char(ts, "HH24:hh24:HH12:hh12:HH:hh:MI:mi:SS:ss:MS:ms:US:us:NS:ns:PM:AM:pm:am", TSDB_TIME_PRECISION_MILLI, + "15:15:03:03:03:03:28:28:05:05:123:123:123000:123000:123000000:123000000:PM:PM:pm:pm"); + + // double quotes normal output + test_ts2char(ts, "\\\"HH24:hh24:HH12:hh12:HH:hh:MI:mi:SS:ss:MS:ms:US:us:NS:ns:PM:AM:pm:am\\\"", TSDB_TIME_PRECISION_MILLI, + "\"15:15:03:03:03:03:28:28:05:05:123:123:123000:123000:123000000:123000000:PM:PM:pm:pm\""); + test_ts2char(ts, "\\\"HH24:hh24:HH12:hh12:HH:hh:MI:mi:SS:ss:MS:ms:US:us:NS:ns:PM:AM:pm:am", TSDB_TIME_PRECISION_MILLI, + "\"15:15:03:03:03:03:28:28:05:05:123:123:123000:123000:123000000:123000000:PM:PM:pm:pm"); + // double quoted strings recognized as literal string, parsing skipped + test_ts2char(ts, "\"HH24:hh24:HH12:hh12:HH:hh:MI:mi:SS:ss:MS:ms:US:us:NS:ns:PM:AM:pm:am", TSDB_TIME_PRECISION_MILLI, + "HH24:hh24:HH12:hh12:HH:hh:MI:mi:SS:ss:MS:ms:US:us:NS:ns:PM:AM:pm:am"); + test_ts2char(ts, "yyyy-mm-dd hh24:mi:ss.nsamaaa", TSDB_TIME_PRECISION_MILLI, "2023-10-13 15:28:05.123000000pmaaa"); + test_ts2char(ts, "aaa--yyyy-mm-dd hh24:mi:ss.nsamaaa", TSDB_TIME_PRECISION_MILLI, "aaa--2023-10-13 15:28:05.123000000pmaaa"); + test_ts2char(ts, "add--yyyy-mm-dd hh24:mi:ss.nsamaaa", TSDB_TIME_PRECISION_MILLI, "a13--2023-10-13 15:28:05.123000000pmaaa"); + + ts = 1693946405000; + test_ts2char(ts, "Day, Month dd, YYYY hh24:mi:ss AM TZH:tzh", TSDB_TIME_PRECISION_MILLI, "Wednesday, September 06, 2023 04:40:05 AM +08:+08"); + + ts = -62198784343000; // milliseconds before epoch, Friday, January 1, -0001 12:00:00 AM GMT+08:06 + test_ts2char(ts, "Day, Month dd, YYYY hh12:mi:ss AM", TSDB_TIME_PRECISION_MILLI, "Friday , January 01, -001 12:00:00 AM"); +} + +TEST(timeTest, char2ts) { + osDefaultInit(); + if (tsTimezone != TdEastZone8) GTEST_SKIP(); + int64_t ts; + int32_t code = + TEST_char2ts("YYYY-DD-MM HH12:MI:SS:MSPM", &ts, TSDB_TIME_PRECISION_MILLI, "2023-10-10 12:00:00.000AM"); + ASSERT_EQ(code, 0); + ASSERT_EQ(ts, 1696867200000LL); + + // 2009-1-1 00:00:00 + ASSERT_EQ(0, TEST_char2ts("YYYY-YYY-YY-Y", &ts, TSDB_TIME_PRECISION_MILLI, "2023-123-23-9")); + ASSERT_EQ(1230739200000LL, ts); + // 2023-1-1 + ASSERT_EQ(0, TEST_char2ts("YYYY-YYY-YY", &ts, TSDB_TIME_PRECISION_MILLI, "2023-123-23-9")); + ASSERT_EQ(ts, 1672502400000LL); + + // 2123-1-1, the second year(123) is used, which converted to 2123 + ASSERT_EQ(0, TEST_char2ts("YYYY-YYY", &ts, TSDB_TIME_PRECISION_MILLI, "2023-123-23-9")); + ASSERT_EQ(ts, 4828176000000LL); + // 2023-1-1 12:10:10am + ASSERT_EQ(0, TEST_char2ts("yyyy-mm-dd HH12:MI:SSAM", &ts, TSDB_TIME_PRECISION_MILLI, "2023-1-1 12:10:10am")); + ASSERT_EQ(ts, 1672503010000LL); + + // 2023-1-1 21:10:10.123 + ASSERT_EQ(0, TEST_char2ts("yy-MM-dd HH12:MI:ss.msa.m.", &ts, TSDB_TIME_PRECISION_MILLI, "23-1-01 9:10:10.123p.m.")); + ASSERT_EQ(ts, 1672578610123LL); + + // 2023-1-1 21:10:10.123456789 + ASSERT_EQ(0, TEST_char2ts("yy-MM-dd HH:MI:ss.ms.us.nsa.m.", &ts, TSDB_TIME_PRECISION_NANO, + "23-1-01 9:10:10.123.000456.000000789p.m.")); + ASSERT_EQ(ts, 1672578610123456789LL); + + // 2023-1-1 21:10:10.120450780 + ASSERT_EQ(0, TEST_char2ts("yy-MM-dd HH24:MI:SS.ms.us.ns", &ts, TSDB_TIME_PRECISION_NANO, + " 23 - 1 - 01 \t 21:10:10 . 12 . \t 00045 . 00000078 \t")); + ASSERT_EQ(ts, 1672578610120450780LL); + +#ifndef WINDOWS + // 2023-1-1 21:10:10.120450780 + ASSERT_EQ(0, TEST_char2ts("yy \"年\"-MM 月-dd \"日\" HH24:MI:ss.ms.us.ns TZH", &ts, TSDB_TIME_PRECISION_NANO, + " 23 年 - 1 月 - 01 日 \t 21:10:10 . 12 . \t 00045 . 00000078 \t+08")); + ASSERT_EQ(ts, 1672578610120450780LL); +#endif + + // 2023-1-1 19:10:10.123456789+06 -> 2023-1-1 21:10:10.123456789+08 + ASSERT_EQ(0, TEST_char2ts("yy-MM-dd HH:MI:ss.ms.us.nsa.m.TZH", &ts, TSDB_TIME_PRECISION_NANO, + "23-1-01 7:10:10.123.000456.000000789p.m.6")); + ASSERT_EQ(ts, 1672578610123456789LL); + + // 2023-1-1 12:10:10.123456789-01 -> 2023-1-1 21:10:10.123456789+08 + ASSERT_EQ(0, TEST_char2ts("yy-MM-dd HH24:MI:ss.ms.us.nsTZH", &ts, TSDB_TIME_PRECISION_NANO, + "23-1-01 12:10:10.123.000456.000000789-1")); + ASSERT_EQ(ts, 1672578610123456789LL); + + // 2100-01-01 11:10:10.124456+08 + ASSERT_EQ( + 0, TEST_char2ts("yyyy-MM-dd HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, "2100-01-01 11:10:10.124456+08")); + ASSERT_EQ(ts, 4102456210124456LL); + + // 2100-01-01 11:10:10.124456+08 Firday + ASSERT_EQ(0, TEST_char2ts("yyyy/MONTH/dd DAY HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/january/01 friday 11:10:10.124456+08")); + ASSERT_EQ(ts, 4102456210124456LL); + + ASSERT_EQ(0, TEST_char2ts("yyyy/Month/dd Day HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/january/01 FRIDAY 11:10:10.124456+08")); + ASSERT_EQ(ts, 4102456210124456LL); + ASSERT_EQ(0, TEST_char2ts("yyyy/Month/dd Dy HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/january/01 Fri 11:10:10.124456+08")); + ASSERT_EQ(ts, 4102456210124456LL); + + ASSERT_EQ(0, TEST_char2ts("yyyy/month/dd day HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/january/01 Friday 11:10:10.124456+08")); + ASSERT_EQ(ts, 4102456210124456LL); + + // 2100-02-01 11:10:10.124456+08 Firday + ASSERT_EQ(0, TEST_char2ts("yyyy/mon/dd DY HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/Feb/01 Mon 11:10:10.124456+08")); + ASSERT_EQ(ts, 4105134610124456LL); + + // 2100-02-01 11:10:10.124456+08 Firday + ASSERT_EQ(0, TEST_char2ts("yyyy/mon/dd DY DDD-DD-D HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, + "2100/Feb/01 Mon 100-1-01 11:10:10.124456+08")); + ASSERT_EQ(ts, 4105134610124456LL); + + ASSERT_EQ(0, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "21000101")); + + // What is Fe? + ASSERT_EQ(-1, TEST_char2ts("yyyy/mon/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "2100/Fe/01")); + // '/' cannot convert to MM + ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "2100/2/1")); + // nothing to be converted to dd + ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "210012")); + ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "21001")); + ASSERT_EQ(-1, TEST_char2ts("yyyyMM-dd ", &ts, TSDB_TIME_PRECISION_MICRO, "23a1-1")); + + // 2100-1-2 + ASSERT_EQ(0, TEST_char2ts("yyyyMM/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "21001/2")); + ASSERT_EQ(ts, 4102502400000000LL); + + // default to 1970-1-1 00:00:00+08 -> 1969-12-31 16:00:00+00 + ASSERT_EQ(0, TEST_char2ts("YYYY", &ts, TSDB_TIME_PRECISION_SECONDS, "1970")); + ASSERT_EQ(ts, -1 * tsTimezone * 60 * 60); + + ASSERT_EQ(0, TEST_char2ts("yyyyMM1/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "210001/2")); + ASSERT_EQ(ts, 4102502400000000LL); + + ASSERT_EQ(-2, TEST_char2ts("yyyyMM/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "210013/2")); + ASSERT_EQ(-2, TEST_char2ts("yyyyMM/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "210011/32")); + ASSERT_EQ(-1, TEST_char2ts("HH12:MI:SS", &ts, TSDB_TIME_PRECISION_MICRO, "21:12:12")); + ASSERT_EQ(-1, TEST_char2ts("yyyy/MM1/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "2100111111111/11/2")); + ASSERT_EQ(-2, TEST_char2ts("yyyy/MM1/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "23/11/2-13")); +} + +#pragma GCC diagnostic pop diff --git a/source/libs/function/src/builtins.c b/source/libs/function/src/builtins.c index 68a83fa662..628a609715 100644 --- a/source/libs/function/src/builtins.c +++ b/source/libs/function/src/builtins.c @@ -2083,6 +2083,34 @@ static int32_t translateToUnixtimestamp(SFunctionNode* pFunc, char* pErrBuf, int return TSDB_CODE_SUCCESS; } +static int32_t translateToTimestamp(SFunctionNode* pFunc, char* pErrBuf, int32_t len) { + if (LIST_LENGTH(pFunc->pParameterList) != 2) { + return invaildFuncParaNumErrMsg(pErrBuf, len, pFunc->functionName); + } + uint8_t para1Type = ((SExprNode*)nodesListGetNode(pFunc->pParameterList, 0))->resType.type; + uint8_t para2Type = ((SExprNode*)nodesListGetNode(pFunc->pParameterList, 1))->resType.type; + if (!IS_STR_DATA_TYPE(para1Type) || !IS_STR_DATA_TYPE(para2Type)) { + return invaildFuncParaTypeErrMsg(pErrBuf, len, pFunc->functionName); + } + pFunc->node.resType = + (SDataType){.bytes = tDataTypes[TSDB_DATA_TYPE_TIMESTAMP].bytes, .type = TSDB_DATA_TYPE_TIMESTAMP}; + return TSDB_CODE_SUCCESS; +} + +static int32_t translateToChar(SFunctionNode* pFunc, char* pErrBuf, int32_t len) { + if (LIST_LENGTH(pFunc->pParameterList) != 2) { + return invaildFuncParaNumErrMsg(pErrBuf, len, pFunc->functionName); + } + uint8_t para1Type = ((SExprNode*)nodesListGetNode(pFunc->pParameterList, 0))->resType.type; + uint8_t para2Type = ((SExprNode*)nodesListGetNode(pFunc->pParameterList, 1))->resType.type; + // currently only support to_char(timestamp, str) + if (!IS_STR_DATA_TYPE(para2Type) || !IS_TIMESTAMP_TYPE(para1Type)) { + return invaildFuncParaTypeErrMsg(pErrBuf, len, pFunc->functionName); + } + pFunc->node.resType = (SDataType){.bytes = tDataTypes[TSDB_DATA_TYPE_VARCHAR].bytes, .type = TSDB_DATA_TYPE_VARCHAR}; + return TSDB_CODE_SUCCESS; +} + static int32_t translateTimeTruncate(SFunctionNode* pFunc, char* pErrBuf, int32_t len) { int32_t numOfParams = LIST_LENGTH(pFunc->pParameterList); if (2 != numOfParams && 3 != numOfParams) { @@ -3284,6 +3312,26 @@ const SBuiltinFuncDefinition funcMgtBuiltins[] = { .sprocessFunc = castFunction, .finalizeFunc = NULL }, + { + .name = "to_timestamp", + .type = FUNCTION_TYPE_TO_TIMESTAMP, + .classification = FUNC_MGT_SCALAR_FUNC, + .translateFunc = translateToTimestamp, + .getEnvFunc = NULL, + .initFunc = NULL, + .sprocessFunc = toTimestampFunction, + .finalizeFunc = NULL + }, + { + .name = "to_char", + .type = FUNCTION_TYPE_TO_CHAR, + .classification = FUNC_MGT_SCALAR_FUNC, + .translateFunc = translateToChar, + .getEnvFunc = NULL, + .initFunc = NULL, + .sprocessFunc = toCharFunction, + .finalizeFunc = NULL + }, { .name = "to_iso8601", .type = FUNCTION_TYPE_TO_ISO8601, diff --git a/source/libs/scalar/src/sclfunc.c b/source/libs/scalar/src/sclfunc.c index 4df7454df8..ee2ba47ce8 100644 --- a/source/libs/scalar/src/sclfunc.c +++ b/source/libs/scalar/src/sclfunc.c @@ -1197,6 +1197,62 @@ int32_t toJsonFunction(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOu return TSDB_CODE_SUCCESS; } +#define TS_FORMAT_MAX_LEN 4096 +int32_t toTimestampFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam* pOutput) { + int64_t ts; + char * tsStr = taosMemoryMalloc(TS_FORMAT_MAX_LEN); + char * format = taosMemoryMalloc(TS_FORMAT_MAX_LEN); + int32_t len, code = TSDB_CODE_SUCCESS; + for (int32_t i = 0; i < pInput[0].numOfRows; ++i) { + if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) + colDataSetNULL(pOutput->columnData, i); + + char *tsData = colDataGetData(pInput[0].columnData, i); + char *formatData = colDataGetData(pInput[1].columnData, pInput[1].numOfRows > 1 ? i : 0); + len = TMIN(TS_FORMAT_MAX_LEN - 1, varDataLen(tsData)); + strncpy(tsStr, varDataVal(tsData), len); + tsStr[len] = '\0'; + len = TMIN(TS_FORMAT_MAX_LEN - 1, varDataLen(formatData)); + strncpy(format, varDataVal(formatData), len); + format[len] = '\0'; + int32_t precision = pOutput->columnData->info.precision; + char errMsg[128] = {0}; + code = taosChar2Ts(format, tsStr, &ts, precision, errMsg, 128); + if (code) { + qError("func to_timestamp failed %s", errMsg); + code = TSDB_CODE_FUNC_TO_TIMESTAMP_FAILED; + break; + } + colDataSetVal(pOutput->columnData, i, (char *)&ts, false); + } + taosMemoryFree(tsStr); + taosMemoryFree(format); + return code; +} + +int32_t toCharFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam* pOutput) { + char * format = taosMemoryMalloc(TS_FORMAT_MAX_LEN); + char * out = taosMemoryMalloc(TS_FORMAT_MAX_LEN * 2); + int32_t len; + for (int32_t i = 0; i < pInput[0].numOfRows; ++i) { + if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) + colDataSetNULL(pOutput->columnData, i); + + char *ts = colDataGetData(pInput[0].columnData, i); + char *formatData = colDataGetData(pInput[1].columnData, pInput[1].numOfRows > 1 ? i : 0); + len = TMIN(TS_FORMAT_MAX_LEN - 1, varDataLen(formatData)); + strncpy(format, varDataVal(formatData), len); + format[len] = '\0'; + int32_t precision = pInput[0].columnData->info.precision; + taosTs2Char(format, *(int64_t *)ts, precision, varDataVal(out)); + varDataSetLen(out, strlen(varDataVal(out))); + colDataSetVal(pOutput->columnData, i, out, false); + } + taosMemoryFree(format); + taosMemoryFree(out); + return TSDB_CODE_SUCCESS; +} + /** Time functions **/ static int64_t offsetFromTz(char *timezone, int64_t factor) { char *minStr = &timezone[3]; diff --git a/source/util/src/terror.c b/source/util/src/terror.c index 383e4e9d8a..9e5a50ce85 100644 --- a/source/util/src/terror.c +++ b/source/util/src/terror.c @@ -601,6 +601,7 @@ TAOS_DEFINE_ERROR(TSDB_CODE_FUNC_FUNTION_PARA_TYPE, "Invalid function par TAOS_DEFINE_ERROR(TSDB_CODE_FUNC_FUNTION_PARA_VALUE, "Invalid function para value") TAOS_DEFINE_ERROR(TSDB_CODE_FUNC_NOT_BUILTIN_FUNTION, "Not buildin function") TAOS_DEFINE_ERROR(TSDB_CODE_FUNC_DUP_TIMESTAMP, "Duplicate timestamps not allowed in function") +TAOS_DEFINE_ERROR(TSDB_CODE_FUNC_TO_TIMESTAMP_FAILED, "Func to_timestamp failed, check log for detail") //udf TAOS_DEFINE_ERROR(TSDB_CODE_UDF_STOPPING, "udf is stopping") diff --git a/tests/parallel_test/cases.task b/tests/parallel_test/cases.task index 8b0451604c..bd20a2745f 100644 --- a/tests/parallel_test/cases.task +++ b/tests/parallel_test/cases.task @@ -61,6 +61,10 @@ ,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/interval_limit_opt_2.py -Q 3 ,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/interval_limit_opt_2.py -Q 2 ,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/interval_limit_opt_2.py +,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/func_to_char_timestamp.py +,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/func_to_char_timestamp.py -Q 2 +,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/func_to_char_timestamp.py -Q 3 +,,y,system-test,./pytest.sh python3 ./test.py -f 2-query/func_to_char_timestamp.py -Q 4 ,,y,system-test,./pytest.sh python3 ./test.py -f 7-tmq/tmqShow.py ,,y,system-test,./pytest.sh python3 ./test.py -f 7-tmq/tmqDropStb.py ,,y,system-test,./pytest.sh python3 ./test.py -f 7-tmq/subscribeStb0.py diff --git a/tests/system-test/2-query/func_to_char_timestamp.py b/tests/system-test/2-query/func_to_char_timestamp.py new file mode 100644 index 0000000000..3d3435d9c7 --- /dev/null +++ b/tests/system-test/2-query/func_to_char_timestamp.py @@ -0,0 +1,160 @@ +import taos +import sys +import time +import socket +import os +import threading +import math + +from util.log import * +from util.sql import * +from util.cases import * +from util.dnodes import * +from util.common import * +# from tmqCommon import * + +class TDTestCase: + def __init__(self): + self.vgroups = 4 + self.ctbNum = 10 + self.rowsPerTbl = 10000 + self.duraion = '1h' + + def init(self, conn, logSql, replicaVar=1): + self.replicaVar = int(replicaVar) + tdLog.debug(f"start to excute {__file__}") + tdSql.init(conn.cursor(), False) + + def create_database(self,tsql, dbName,dropFlag=1,vgroups=2,replica=1, duration:str='1d'): + if dropFlag == 1: + tsql.execute("drop database if exists %s"%(dbName)) + + tsql.execute("create database if not exists %s vgroups %d replica %d duration %s"%(dbName, vgroups, replica, duration)) + tdLog.debug("complete to create database %s"%(dbName)) + return + + def create_stable(self,tsql, paraDict): + colString = tdCom.gen_column_type_str(colname_prefix=paraDict["colPrefix"], column_elm_list=paraDict["colSchema"]) + tagString = tdCom.gen_tag_type_str(tagname_prefix=paraDict["tagPrefix"], tag_elm_list=paraDict["tagSchema"]) + sqlString = f"create table if not exists %s.%s (%s) tags (%s)"%(paraDict["dbName"], paraDict["stbName"], colString, tagString) + tdLog.debug("%s"%(sqlString)) + tsql.execute(sqlString) + return + + def create_ctable(self,tsql=None, dbName='dbx',stbName='stb',ctbPrefix='ctb',ctbNum=1,ctbStartIdx=0): + for i in range(ctbNum): + sqlString = "create table %s.%s%d using %s.%s tags(%d, 'tb%d', 'tb%d', %d, %d, %d)" % \ + (dbName,ctbPrefix,i+ctbStartIdx,dbName,stbName,(i+ctbStartIdx) % 5,i+ctbStartIdx,i+ctbStartIdx,i+ctbStartIdx,i+ctbStartIdx,i+ctbStartIdx) + tsql.execute(sqlString) + + tdLog.debug("complete to create %d child tables by %s.%s" %(ctbNum, dbName, stbName)) + return + + def insert_data(self,tsql,dbName,ctbPrefix,ctbNum,rowsPerTbl,batchNum,startTs,tsStep): + tdLog.debug("start to insert data ............") + tsql.execute("use %s" %dbName) + pre_insert = "insert into " + sql = pre_insert + + for i in range(ctbNum): + rowsBatched = 0 + sql += " %s%d values "%(ctbPrefix,i) + for j in range(rowsPerTbl): + if (i < ctbNum/2): + sql += "(%d, %d, %d, %d,%d,%d,%d,true,'binary%d', 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, j%10, j%10, j%10, j%10) + else: + sql += "(%d, %d, NULL, %d,NULL,%d,%d,true,'binary%d', 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, j%10, j%10) + rowsBatched += 1 + if ((rowsBatched == batchNum) or (j == rowsPerTbl - 1)): + tsql.execute(sql) + rowsBatched = 0 + if j < rowsPerTbl - 1: + sql = "insert into %s%d values " %(ctbPrefix,i) + else: + sql = "insert into " + if sql != pre_insert: + tsql.execute(sql) + tdLog.debug("insert data ............ [OK]") + return + + def prepareTestEnv(self): + tdLog.printNoPrefix("======== prepare test env include database, stable, ctables, and insert data: ") + paraDict = {'dbName': 'test', + 'dropFlag': 1, + 'vgroups': 2, + 'stbName': 'meters', + 'colPrefix': 'c', + 'tagPrefix': 't', + 'colSchema': [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'FLOAT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'smallint', 'count':1},{'type': 'tinyint', 'count':1},{'type': 'bool', 'count':1},{'type': 'binary', 'len':10, 'count':1},{'type': 'nchar', 'len':10, 'count':1}], + 'tagSchema': [{'type': 'INT', 'count':1},{'type': 'nchar', 'len':20, 'count':1},{'type': 'binary', 'len':20, 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'smallint', 'count':1},{'type': 'DOUBLE', 'count':1}], + 'ctbPrefix': 't', + 'ctbStartIdx': 0, + 'ctbNum': 100, + 'rowsPerTbl': 10000, + 'batchNum': 3000, + 'startTs': 1537146000000, + 'tsStep': 600000} + + paraDict['vgroups'] = self.vgroups + paraDict['ctbNum'] = self.ctbNum + paraDict['rowsPerTbl'] = self.rowsPerTbl + + tdLog.info("create database") + self.create_database(tsql=tdSql, dbName=paraDict["dbName"], dropFlag=paraDict["dropFlag"], vgroups=paraDict["vgroups"], replica=self.replicaVar, duration=self.duraion) + + tdLog.info("create stb") + self.create_stable(tsql=tdSql, paraDict=paraDict) + + tdLog.info("create child tables") + self.create_ctable(tsql=tdSql, dbName=paraDict["dbName"], \ + stbName=paraDict["stbName"],ctbPrefix=paraDict["ctbPrefix"],\ + ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict["ctbStartIdx"]) + self.insert_data(tsql=tdSql, dbName=paraDict["dbName"],\ + ctbPrefix=paraDict["ctbPrefix"],ctbNum=paraDict["ctbNum"],\ + rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],\ + startTs=paraDict["startTs"],tsStep=paraDict["tsStep"]) + return + + def convert_ts_and_check(self, ts_str: str, ts_format: str, expect_ts_char: str, expect_ts: str): + tdSql.query("select to_timestamp('%s', '%s')" % (ts_str, ts_format), queryTimes=1) + tdSql.checkData(0, 0, expect_ts) + tdSql.query("select to_char(to_timestamp('%s', '%s'), '%s')" % (ts_str, ts_format, ts_format), queryTimes=1) + tdSql.checkData(0, 0, expect_ts_char) + + def test_to_timestamp(self): + self.convert_ts_and_check('2023-10-10 12:13:14.123', 'YYYY-MM-DD HH:MI:SS.MS', '2023-10-10 12:13:14.123', '2023-10-10 00:13:14.123000') + self.convert_ts_and_check('2023-10-10 12:00:00.000AM', 'YYYY-DD-MM HH12:MI:SS.MSPM', '2023-10-10 12:00:00.000AM', '2023-10-10 00:00:00.000000') + self.convert_ts_and_check('2023-01-01 12:10:10am', 'yyyy-mm-dd HH12:MI:SSAM', '2023-01-01 12:10:10AM', '2023-1-1 00:10:10.000000') + self.convert_ts_and_check('23-1-01 9:10:10.123p.m.', 'yy-MM-dd HH12:MI:ss.msa.m.', '23-01-01 09:10:10.123p.m.', '2023-1-1 21:10:10.123000') + self.convert_ts_and_check('23-1-01 9:10:10.123.000456.000000789p.m.', 'yy-MM-dd HH12:MI:ss.ms.us.nsa.m.', '23-01-01 09:10:10.123.123000.123000000p.m.', '2023-1-1 21:10:10.123000') + self.convert_ts_and_check(' 23 -1 - 01 \t 21:10:10 . 12 . \t 00045 . 00000078 \t', 'yy-MM-dd HH24:MI:SS.ms.us.ns', '23-01-01 21:10:10.120.120000.120000000', '2023-1-1 21:10:10.120000') + self.convert_ts_and_check(' 23 年 -1 月 - 01 日 \t 21:10:10 . 12 . \t 00045 . 00000078 \t+08', 'yy\"年\"-MM月-dd日 HH24:MI:SS.ms.us.ns TZH', '23年-01月-01日 21:10:10.120.120000.120000000 +08', '2023-1-1 21:10:10.120000') + self.convert_ts_and_check('23-1-01 7:10:10.123p.m.6', 'yy-MM-dd HH:MI:ss.msa.m.TZH', '23-01-01 09:10:10.123p.m.+08', '2023-1-1 21:10:10.123000') + + self.convert_ts_and_check('2023-OCTober-19 10:10:10AM Thu', 'yyyy-month-dd hh24:mi:ssam dy', '2023-october -19 10:10:10am thu', '2023-10-19 10:10:10') + + tdSql.error("select to_timestamp('210013/2', 'yyyyMM/dd')") + tdSql.error("select to_timestamp('2100111111111/13/2', 'yyyyMM/dd')") + + tdSql.error("select to_timestamp('210a12/2', 'yyyyMM/dd')") + + tdSql.query("select to_timestamp(to_char(ts, 'yy-mon-dd hh24:mi:ss dy'), 'yy-mon-dd hh24:mi:ss dy') == ts from meters limit 10") + tdSql.checkData(0, 0, 1) + tdSql.checkData(1, 0, 1) + tdSql.checkRows(10) + + tdSql.query("select to_char(ts, 'yy-mon-dd hh24:mi:ss.msa.m.TZH Day') from meters where to_timestamp(to_char(ts, 'yy-mon-dd hh24:mi:ss dy'), 'yy-mon-dd hh24:mi:ss dy') != ts") + tdSql.checkRows(0) + + def run(self): + self.prepareTestEnv() + self.test_to_timestamp() + + def stop(self): + tdSql.close() + tdLog.success(f"{__file__} successfully executed") + +event = threading.Event() + +tdCases.addLinux(__file__, TDTestCase()) +tdCases.addWindows(__file__, TDTestCase()) From 70850697a48445e103badb6519e2642ff059e0f8 Mon Sep 17 00:00:00 2001 From: wangjiaming0909 <604227650@qq.com> Date: Wed, 25 Oct 2023 20:01:22 +0800 Subject: [PATCH 2/6] feat: support to_timestamp/to_char fix comments --- docs/en/12-taos-sql/10-function.md | 5 +- docs/zh/12-taos-sql/10-function.md | 9 +- include/common/ttime.h | 13 +- source/common/src/ttime.c | 229 ++++++++++++------ source/common/test/commonTests.cpp | 20 +- source/libs/function/src/builtins.c | 2 +- source/libs/scalar/src/sclfunc.c | 39 ++- .../2-query/func_to_char_timestamp.py | 25 +- 8 files changed, 236 insertions(+), 106 deletions(-) diff --git a/docs/en/12-taos-sql/10-function.md b/docs/en/12-taos-sql/10-function.md index 266cdb4958..c986a98e46 100644 --- a/docs/en/12-taos-sql/10-function.md +++ b/docs/en/12-taos-sql/10-function.md @@ -486,7 +486,7 @@ return_timestamp: { #### TO_CHAR ```sql -TO_CHAR(ts, str_literal) +TO_CHAR(ts, format_str_literal) ``` **Description**: Convert a ts column to string as the format specified @@ -539,11 +539,12 @@ TO_CHAR(ts, str_literal) - When `ms`,`us`,`ns` are used in `to_char`, like `to_char(ts, 'yyyy-mm-dd hh:mi:ss.ms.us.ns')`, The time of `ms`,`us`,`ns` corresponds to the same fraction seconds. When ts is `1697182085123`, the output of `ms` is `123`, `us` is `123000`, `ns` is `123000000`. - If we want to output some characters of format without converting, surround it with double quotes. `to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. If want to output double quotes, add a back slash before double quote, like `to_char(ts, '\"yyyy-mm-dd\"')` will output `"2023-10-10"`. - For formats that output digits, the uppercase and lowercase formats are the same. +- The local time zone will be used to convert the timestamp. #### TO_TIMESTAMP ```sql -TO_TIMESTAMP(str_literal, str_literal) +TO_TIMESTAMP(ts_str_literal, format_str_literal) ``` **Description**: Convert a formated timestamp string to a timestamp diff --git a/docs/zh/12-taos-sql/10-function.md b/docs/zh/12-taos-sql/10-function.md index 806ff3c6a8..44ab3d5091 100644 --- a/docs/zh/12-taos-sql/10-function.md +++ b/docs/zh/12-taos-sql/10-function.md @@ -486,7 +486,7 @@ return_timestamp: { #### TO_CHAR ```sql -TO_CHAR(ts, str_literal) +TO_CHAR(ts, format_str_literal) ``` **功能说明**: 将timestamp类型按照指定格式转换为字符串 @@ -504,7 +504,7 @@ TO_CHAR(ts, str_literal) | **格式** | **说明**| **例子** | | --- | --- | --- | |AM,am,PM,pm| 无点分隔的上午下午 | 07:00:00am| -|A.M.,a.m.,P.M.,p.m.| 有点分割的上午下午| 07:00:00a.m.| +|A.M.,a.m.,P.M.,p.m.| 有点分隔的上午下午| 07:00:00a.m.| |YYYY,yyyy|年, 4个及以上数字| 2023-10-10| |YYY,yyy| 年, 最后3位数字| 023-10-10| |YY,yy| 年, 最后2位数字| 23-10-10| @@ -537,13 +537,14 @@ TO_CHAR(ts, str_literal) **使用说明**: - `Month`, `Day`等的输出格式是左对齐的, 右侧添加空格, 如`2023-OCTOBER -01`, `2023-SEPTEMBER-01`, 9月是月份中英文字母数最长的, 因此9月没有空格. 星期类似. - 使用`ms`, `us`, `ns`时, 以上三种格式的输出只在精度上不同, 比如ts为 `1697182085123`, `ms` 的输出为 `123`, `us` 的输出为 `123000`, `ns` 的输出为 `123000000`. -- 如果想要在格式串中指定某些部分不做转换, 可以使用双引号, 如`to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. 如果想要输出双引号, 那么在双引号之前加一个反斜杠, 如 `to_char(ts, '\"yyyy-mm-dd\"')` 将会输出 `"2023-10-10"`. +- 时间格式中无法匹配规则的内容会直接输出. 如果想要在格式串中指定某些能够匹配规则的部分不做转换, 可以使用双引号, 如`to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. 如果想要输出双引号, 那么在双引号之前加一个反斜杠, 如 `to_char(ts, '\"yyyy-mm-dd\"')` 将会输出 `"2023-10-10"`. - 那些输出是数字的格式, 如`YYYY`, `DD`, 大写与小写意义相同, 即`yyyy` 和 `YYYY` 可以互换. +- 默认输出的时间为本地时区的时间. #### TO_TIMESTAMP ```sql -TO_TIMESTAMP(str_literal, str_literal) +TO_TIMESTAMP(ts_str_literal, format_str_literal) ``` **功能说明**: 将字符串按照指定格式转化为时间戳. diff --git a/include/common/ttime.h b/include/common/ttime.h index 75bbcddd0e..306b5105d0 100644 --- a/include/common/ttime.h +++ b/include/common/ttime.h @@ -100,15 +100,22 @@ int32_t taosTm2Ts(struct STm* tm, int64_t* ts, int32_t precision); /// @brief convert a timestamp to a formatted string /// @param format the timestamp format, must null terminated -void taosTs2Char(const char* format, int64_t ts, int32_t precision, char* out); +/// @param [in,out] formats the formats array pointer generated. Shouldn't be NULL. +/// If (*formats == NULL), [format] will be used and [formats] will be updated to the new generated +/// formats array; If not NULL, [formats] will be used instead of [format] to skip parse formats again. +/// @param out output buffer, should be initialized by memset +/// @notes remember to free the generated formats +void taosTs2Char(const char* format, SArray** formats, int64_t ts, int32_t precision, char* out, int32_t outLen); /// @brief convert a formatted timestamp string to a timestamp /// @param format must null terminated +/// @param [in, out] formats, see taosTs2Char /// @param tsStr must null terminated /// @retval 0 for success, otherwise error occured -int32_t taosChar2Ts(const char* format, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, +/// @notes remember to free the generated formats even when error occured +int32_t taosChar2Ts(const char* format, SArray** formats, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, int32_t errMsgLen); -void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out); +void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out, int32_t outLen); int32_t TEST_char2ts(const char* format, int64_t* ts, int32_t precision, const char* tsStr); #ifdef __cplusplus diff --git a/source/common/src/ttime.c b/source/common/src/ttime.c index 3450e32f4a..aad844da88 100644 --- a/source/common/src/ttime.c +++ b/source/common/src/ttime.c @@ -1008,7 +1008,6 @@ typedef enum { TSFKW_Day, // Sunday, Monday TSFKW_DY, // MON, TUE TSFKW_Dy, // Mon, Tue - TSFKW_dy, // mon, tue TSFKW_D, // 1-7 -> Sunday(1) -> Saturday(7) TSFKW_HH24, TSFKW_HH12, @@ -1021,12 +1020,12 @@ typedef enum { TSFKW_Mon, TSFKW_MS, TSFKW_NS, - TSFKW_OF, + //TSFKW_OF, TSFKW_PM, TSFKW_P_M, TSFKW_SS, - // TSFKW_TZM, TSFKW_TZH, + // TSFKW_TZM, // TSFKW_TZ, TSFKW_US, TSFKW_YYYY, @@ -1039,13 +1038,15 @@ typedef enum { TSFKW_a_m, // TSFKW_b_c, // TSFKW_bc, - TSFKW_d, TSFKW_day, TSFKW_ddd, TSFKW_dd, + TSFKW_dy, // mon, tue + TSFKW_d, TSFKW_hh24, TSFKW_hh12, TSFKW_hh, + TSFKW_mi, TSFKW_mm, TSFKW_month, TSFKW_mon, @@ -1067,17 +1068,17 @@ typedef enum { // clang-format off static const TSFormatKeyWord formatKeyWords[] = { - //{"A.D.", 4, TSFKW_A_D}, - {"A.M.", 4, TSFKW_A_M, false}, //{"AD", 2, TSFKW_AD, false}, + //{"A.D.", 4, TSFKW_A_D}, {"AM", 2, TSFKW_AM, false}, - //{"B.C.", 4, TSFKW_B_C, false}, + {"A.M.", 4, TSFKW_A_M, false}, //{"BC", 2, TSFKW_BC, false}, + //{"B.C.", 4, TSFKW_B_C, false}, {"DAY", 3, TSFKW_DAY, false}, {"DDD", 3, TSFKW_DDD, true}, {"DD", 2, TSFKW_DD, true}, - {"DY", 2, TSFKW_DY, false}, {"Day", 3, TSFKW_Day, false}, + {"DY", 2, TSFKW_DY, false}, {"Dy", 2, TSFKW_Dy, false}, {"D", 1, TSFKW_D, true}, {"HH24", 4, TSFKW_HH24, true}, @@ -1087,13 +1088,13 @@ static const TSFormatKeyWord formatKeyWords[] = { {"MM", 2, TSFKW_MM, true}, {"MONTH", 5, TSFKW_MONTH, false}, {"MON", 3, TSFKW_MON, false}, - {"MS", 2, TSFKW_MS, true}, {"Month", 5, TSFKW_Month, false}, {"Mon", 3, TSFKW_Mon, false}, + {"MS", 2, TSFKW_MS, true}, {"NS", 2, TSFKW_NS, true}, //{"OF", 2, TSFKW_OF, false}, - {"P.M.", 4, TSFKW_P_M, false}, {"PM", 2, TSFKW_PM, false}, + {"P.M.", 4, TSFKW_P_M, false}, {"SS", 2, TSFKW_SS, true}, {"TZH", 3, TSFKW_TZH, false}, //{"TZM", 3, TSFKW_TZM}, @@ -1104,9 +1105,9 @@ static const TSFormatKeyWord formatKeyWords[] = { {"YY", 2, TSFKW_YY, true}, {"Y", 1, TSFKW_Y, true}, //{"a.d.", 4, TSFKW_a_d, false}, - {"a.m.", 4, TSFKW_a_m, false}, //{"ad", 2, TSFKW_ad, false}, {"am", 2, TSFKW_am, false}, + {"a.m.", 4, TSFKW_a_m, false}, //{"b.c.", 4, TSFKW_b_c, false}, //{"bc", 2, TSFKW_bc, false}, {"day", 3, TSFKW_day, false}, @@ -1124,8 +1125,8 @@ static const TSFormatKeyWord formatKeyWords[] = { {"ms", 2, TSFKW_MS, true}, {"ns", 2, TSFKW_NS, true}, //{"of", 2, TSFKW_OF, false}, - {"p.m.", 4, TSFKW_p_m, false}, {"pm", 2, TSFKW_pm, false}, + {"p.m.", 4, TSFKW_p_m, false}, {"ss", 2, TSFKW_SS, true}, {"tzh", 3, TSFKW_TZH, false}, //{"tzm", 3, TSFKW_TZM}, @@ -1139,9 +1140,34 @@ static const TSFormatKeyWord formatKeyWords[] = { }; // clang-format on +#define TS_FROMAT_KEYWORD_INDEX_SIZE ('z' - 'A' + 1) +static const int TSFormatKeywordIndex[TS_FROMAT_KEYWORD_INDEX_SIZE] = { + /*A*/ TSFKW_AM, -1, -1, + /*D*/ TSFKW_DAY, -1, -1, -1, + /*H*/ TSFKW_HH24, -1, -1, -1, -1, + /*M*/ TSFKW_MI, + /*N*/ TSFKW_NS, -1, + /*P*/ TSFKW_PM, -1, -1, + /*S*/ TSFKW_SS, + /*T*/ TSFKW_TZH, + /*U*/ TSFKW_US, -1, -1, -1, + /*Y*/ TSFKW_YYYY, -1, + /*[ \ ] ^ _ `*/ -1, -1, -1, -1, -1, -1, + /*a*/ TSFKW_am, -1, -1, + /*d*/ TSFKW_day, -1, -1, -1, + /*h*/ TSFKW_hh24, -1, -1, -1, -1, + /*m*/ TSFKW_mi, + /*n*/ TSFKW_ns, -1, + /*p*/ TSFKW_pm, -1, -1, + /*s*/ TSFKW_ss, + /*t*/ TSFKW_tzh, + /*u*/ TSFKW_us, -1, -1, -1, + /*y*/ TSFKW_yyyy, -1}; + typedef struct { uint8_t type; - char c[2]; + const char* c; + int32_t len; const TSFormatKeyWord* key; } TSFormatNode; @@ -1169,9 +1195,10 @@ static const char* const long_apms[] = {A_M_STR, P_M_STR, a_m_str, p_m_str, NULL static const TSFormatKeyWord* keywordSearch(const char* str) { if (*str < 'A' || *str > 'z' || (*str > 'Z' && *str < 'a')) return NULL; - int32_t idx = 0; + int32_t idx = TSFormatKeywordIndex[str[0] - 'A']; + if (idx < 0) return NULL; const TSFormatKeyWord* key = &formatKeyWords[idx++]; - while (key->name) { + while (key->name && str[0] == key->name[0]) { if (0 == strncmp(key->name, str, key->len)) { return key; } @@ -1184,74 +1211,110 @@ static bool isSeperatorChar(char c) { return (c > 0x20 && c < 0x7F && !(c >= 'A' && c <= 'Z') && !(c >= 'a' && c <= 'z') && !(c >= '0' && c <= '9')); } -static void parseTsFormat(const char* format_str, SArray* formats) { - while (*format_str) { - const TSFormatKeyWord* key = keywordSearch(format_str); +static void parseTsFormat(const char* formatStr, SArray* formats) { + TSFormatNode* lastOtherFormat = NULL; + while (*formatStr) { + const TSFormatKeyWord* key = keywordSearch(formatStr); if (key) { TSFormatNode format = {.key = key, .type = TS_FORMAT_NODE_TYPE_KEYWORD}; taosArrayPush(formats, &format); - format_str += key->len; + formatStr += key->len; + lastOtherFormat = NULL; } else { - if (*format_str == '"') { + if (*formatStr == '"') { + lastOtherFormat = NULL; // for double quoted string - format_str++; - while (*format_str) { - if (*format_str == '"') { - format_str++; + formatStr++; + TSFormatNode* last = NULL; + while (*formatStr) { + if (*formatStr == '"') { + formatStr++; break; } - if (*format_str == '\\' && *(format_str + 1)) format_str++; - TSFormatNode format = {.type = TS_FORMAT_NODE_TYPE_CHAR, .key = NULL}; - format.c[0] = *format_str; - format.c[1] = '\0'; - taosArrayPush(formats, &format); - format_str++; + if (*formatStr == '\\' && *(formatStr + 1)) { + formatStr++; + last = NULL; // stop expanding last format, create new format + } + if (last) { + // expand + assert(last->type == TS_FORMAT_NODE_TYPE_CHAR); + last->len++; + formatStr++; + } else { + // create new + TSFormatNode format = {.type = TS_FORMAT_NODE_TYPE_CHAR, .key = NULL}; + format.c = formatStr; + format.len = 1; + taosArrayPush(formats, &format); + formatStr++; + last = taosArrayGetLast(formats); + } } } else { // for other strings - if (*format_str == '\\' && *(format_str + 1)) format_str++; - TSFormatNode format = { - .type = isSeperatorChar(*format_str) ? TS_FORMAT_NODE_TYPE_SEPARATOR : TS_FORMAT_NODE_TYPE_CHAR, + if (*formatStr == '\\' && *(formatStr + 1)) { + formatStr++; + lastOtherFormat = NULL; // stop expanding + } else { + if (lastOtherFormat && !isSeperatorChar(*formatStr)) { + // expanding + } else { + // create new + lastOtherFormat = NULL; + } + } + if (lastOtherFormat) { + assert(lastOtherFormat->type == TS_FORMAT_NODE_TYPE_CHAR); + lastOtherFormat->len++; + formatStr++; + } else { + TSFormatNode format = { + .type = isSeperatorChar(*formatStr) ? TS_FORMAT_NODE_TYPE_SEPARATOR : TS_FORMAT_NODE_TYPE_CHAR, .key = NULL}; - format.c[0] = *format_str; - format.c[1] = '\0'; - taosArrayPush(formats, &format); - format_str++; + format.c = formatStr; + format.len = 1; + taosArrayPush(formats, &format); + formatStr++; + if (format.type == TS_FORMAT_NODE_TYPE_CHAR) lastOtherFormat = taosArrayGetLast(formats); + } } } } } -static void tm2char(const SArray* formats, const struct STm* tm, char* s) { +static void tm2char(const SArray* formats, const struct STm* tm, char* s, int32_t outLen) { int32_t size = taosArrayGetSize(formats); + const char* start = s; for (int32_t i = 0; i < size; ++i) { TSFormatNode* format = taosArrayGet(formats, i); if (format->type != TS_FORMAT_NODE_TYPE_KEYWORD) { - strcpy(s, format->c); - s += strlen(s); + if (s - start + format->len + 1 > outLen) break; + strncpy(s, format->c, format->len); + s += format->len; continue; } + if (s - start + 16 > outLen) break; switch (format->key->id) { case TSFKW_AM: case TSFKW_PM: sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "PM" : "AM"); - s += strlen(s); + s += 2; break; case TSFKW_A_M: case TSFKW_P_M: sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "P.M." : "A.M."); - s += strlen(s); + s += 4; break; case TSFKW_am: case TSFKW_pm: sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "pm" : "am"); - s += strlen(s); + s += 2; break; case TSFKW_a_m: case TSFKW_p_m: sprintf(s, tm->tm.tm_hour % 24 >= 12 ? "p.m." : "a.m."); - s += strlen(s); + s += 4; break; case TSFKW_DDD: sprintf(s, "%d", tm->tm.tm_yday); @@ -1259,11 +1322,11 @@ static void tm2char(const SArray* formats, const struct STm* tm, char* s) { break; case TSFKW_DD: sprintf(s, "%02d", tm->tm.tm_mday); - s += strlen(s); + s += 2; break; case TSFKW_D: sprintf(s, "%d", tm->tm.tm_wday + 1); - s += strlen(s); + s += 1; break; case TSFKW_DAY: { // MONDAY, TUESDAY... @@ -1293,13 +1356,13 @@ static void tm2char(const SArray* formats, const struct STm* tm, char* s) { char buf[8] = {0}; for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = toupper(wd[i]); sprintf(s, "%3s", buf); - s += strlen(s); + s += 3; break; } case TSFKW_Dy: // Mon, Tue sprintf(s, "%3s", shortWeekDays[tm->tm.tm_wday]); - s += strlen(s); + s += 3; break; case TSFKW_dy: { // mon, tue @@ -1307,26 +1370,26 @@ static void tm2char(const SArray* formats, const struct STm* tm, char* s) { char buf[8] = {0}; for (int32_t i = 0; i < strlen(wd); ++i) buf[i] = tolower(wd[i]); sprintf(s, "%3s", buf); - s += strlen(s); + s += 3; break; } case TSFKW_HH24: sprintf(s, "%02d", tm->tm.tm_hour); - s += strlen(s); + s += 2; break; case TSFKW_HH: case TSFKW_HH12: // 0 or 12 o'clock in 24H coresponds to 12 o'clock (AM/PM) in 12H sprintf(s, "%02d", tm->tm.tm_hour % 12 == 0 ? 12 : tm->tm.tm_hour % 12); - s += strlen(s); + s += 2; break; case TSFKW_MI: sprintf(s, "%02d", tm->tm.tm_min); - s += strlen(s); + s += 2; break; case TSFKW_MM: sprintf(s, "%02d", tm->tm.tm_mon + 1); - s += strlen(s); + s += 2; break; case TSFKW_MONTH: { const char* mon = fullMonths[tm->tm.tm_mon]; @@ -1370,19 +1433,19 @@ static void tm2char(const SArray* formats, const struct STm* tm, char* s) { } case TSFKW_SS: sprintf(s, "%02d", tm->tm.tm_sec); - s += strlen(s); + s += 2; break; case TSFKW_MS: sprintf(s, "%03" PRId64, tm->fsec / 1000000L); - s += strlen(s); + s += 3; break; case TSFKW_US: sprintf(s, "%06" PRId64, tm->fsec / 1000L); - s += strlen(s); + s += 6; break; case TSFKW_NS: sprintf(s, "%09" PRId64, tm->fsec); - s += strlen(s); + s += 9; break; case TSFKW_TZH: sprintf(s, "%s%02d", tsTimezone < 0 ? "-" : "+", tsTimezone); @@ -1429,6 +1492,7 @@ static const char* tsFormatStr2Int32(int32_t* dest, const char* str, int32_t len char* last; int64_t res; const char* s = str; + if ('\0' == str[0]) return NULL; if (len <= 0) { res = taosStr2Int64(s, &last, 10); s = last; @@ -1523,18 +1587,27 @@ static int32_t char2ts(const char* s, SArray* formats, int64_t* ts, int32_t prec int32_t tzSign = 1, tz = tsTimezone; int32_t err = 0; - for (int32_t i = 0; i < size; ++i) { - while (isspace(*s)) { + for (int32_t i = 0; i < size && *s != '\0'; ++i) { + while (isspace(*s) && *s != '\0') { s++; } + if (!s) break; TSFormatNode* node = taosArrayGet(formats, i); if (node->type == TS_FORMAT_NODE_TYPE_SEPARATOR) { // separator matches any character - if (isSeperatorChar(s[0])) s += strlen(node->c); + if (isSeperatorChar(s[0])) s += node->len; continue; } if (node->type == TS_FORMAT_NODE_TYPE_CHAR) { - if (!isspace(node->c[0])) s += strlen(node->c); + int32_t pos = 0; + // skip leading spaces + while (isspace(node->c[pos]) && node->len > 0) pos++; + while (pos < node->len && *s != '\0') { + if (!isspace(node->c[pos++])) { + while (isspace(*s) && *s != '\0') s++; + if (*s != '\0') s++; // forward together + } + } continue; } assert(node->type == TS_FORMAT_NODE_TYPE_KEYWORD); @@ -1545,7 +1618,7 @@ static int32_t char2ts(const char* s, SArray* formats, int64_t* ts, int32_t prec case TSFKW_p_m: { int32_t idx = strArrayCaseSearch(long_apms, s); if (idx >= 0) { - s += strlen(long_apms[idx]); + s += 4; pm = idx % 2; hour12 = 1; } else { @@ -1558,7 +1631,7 @@ static int32_t char2ts(const char* s, SArray* formats, int64_t* ts, int32_t prec case TSFKW_pm: { int32_t idx = strArrayCaseSearch(apms, s); if (idx >= 0) { - s += strlen(apms[idx]); + s += 2; pm = idx % 2; hour12 = 1; } else { @@ -1793,39 +1866,41 @@ static int32_t char2ts(const char* s, SArray* formats, int64_t* ts, int32_t prec return ret; } -void taosTs2Char(const char* format, int64_t ts, int32_t precision, char* out) { - SArray* formats = taosArrayInit(8, sizeof(TSFormatNode)); - parseTsFormat(format, formats); +void taosTs2Char(const char* format, SArray** formats, int64_t ts, int32_t precision, char* out, int32_t outLen) { + if (!*formats) { + *formats = taosArrayInit(8, sizeof(TSFormatNode)); + parseTsFormat(format, *formats); + } struct STm tm; taosTs2Tm(ts, precision, &tm); - tm2char(formats, &tm, out); - taosArrayDestroy(formats); + tm2char(*formats, &tm, out, outLen); } -int32_t taosChar2Ts(const char* format, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, +int32_t taosChar2Ts(const char* format, SArray** formats, const char* tsStr, int64_t* ts, int32_t precision, char* errMsg, int32_t errMsgLen) { const char* sErrPos; int32_t fErrIdx; - SArray* formats = taosArrayInit(4, sizeof(TSFormatNode)); - parseTsFormat(format, formats); - int32_t code = char2ts(tsStr, formats, ts, precision, &sErrPos, &fErrIdx); + if (!*formats) { + *formats = taosArrayInit(4, sizeof(TSFormatNode)); + parseTsFormat(format, *formats); + } + int32_t code = char2ts(tsStr, *formats, ts, precision, &sErrPos, &fErrIdx); if (code == -1) { - TSFormatNode* fNode = (taosArrayGet(formats, fErrIdx)); + TSFormatNode* fNode = (taosArrayGet(*formats, fErrIdx)); snprintf(errMsg, errMsgLen, "mismatch format for: %s and %s", sErrPos, - fErrIdx < taosArrayGetSize(formats) ? ((TSFormatNode*)taosArrayGet(formats, fErrIdx))->key->name : ""); + fErrIdx < taosArrayGetSize(*formats) ? ((TSFormatNode*)taosArrayGet(*formats, fErrIdx))->key->name : ""); } else if (code == -2) { snprintf(errMsg, errMsgLen, "timestamp format error: %s -> %s", tsStr, format); } - taosArrayDestroy(formats); return code; } -void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out) { +void TEST_ts2char(const char* format, int64_t ts, int32_t precision, char* out, int32_t outLen) { SArray* formats = taosArrayInit(4, sizeof(TSFormatNode)); parseTsFormat(format, formats); struct STm tm; taosTs2Tm(ts, precision, &tm); - tm2char(formats, &tm, out); + tm2char(formats, &tm, out, outLen); taosArrayDestroy(formats); } diff --git a/source/common/test/commonTests.cpp b/source/common/test/commonTests.cpp index 49a16351ca..dc320ebcb2 100644 --- a/source/common/test/commonTests.cpp +++ b/source/common/test/commonTests.cpp @@ -316,8 +316,8 @@ TEST(timeTest, timestamp2tm) { } void test_ts2char(int64_t ts, const char* format, int32_t precison, const char* expected) { - char buf[128] = {0}; - TEST_ts2char(format, ts, precison, buf); + char buf[256] = {0}; + TEST_ts2char(format, ts, precison, buf, 256); printf("ts: %ld format: %s res: [%s], expected: [%s]\n", ts, format, buf, expected); ASSERT_STREQ(expected, buf); } @@ -408,8 +408,8 @@ TEST(timeTest, char2ts) { #ifndef WINDOWS // 2023-1-1 21:10:10.120450780 - ASSERT_EQ(0, TEST_char2ts("yy \"年\"-MM 月-dd \"日\" HH24:MI:ss.ms.us.ns TZH", &ts, TSDB_TIME_PRECISION_NANO, - " 23 年 - 1 月 - 01 日 \t 21:10:10 . 12 . \t 00045 . 00000078 \t+08")); + ASSERT_EQ(0, TEST_char2ts("yy \"年\"-MM 月-dd \"日 子\" HH24:MI:ss.ms.us.ns TZH", &ts, TSDB_TIME_PRECISION_NANO, + " 23 年 - 1 月 - 01 日 子 \t 21:10:10 . 12 . \t 00045 . 00000078 \t+08")); ASSERT_EQ(ts, 1672578610120450780LL); #endif @@ -437,7 +437,7 @@ TEST(timeTest, char2ts) { "2100/january/01 FRIDAY 11:10:10.124456+08")); ASSERT_EQ(ts, 4102456210124456LL); ASSERT_EQ(0, TEST_char2ts("yyyy/Month/dd Dy HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, - "2100/january/01 Fri 11:10:10.124456+08")); + "2100/january/01 Fri 11:10:10.124456+08:00")); ASSERT_EQ(ts, 4102456210124456LL); ASSERT_EQ(0, TEST_char2ts("yyyy/month/dd day HH24:MI:ss.usTZH", &ts, TSDB_TIME_PRECISION_MICRO, @@ -461,7 +461,8 @@ TEST(timeTest, char2ts) { // '/' cannot convert to MM ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "2100/2/1")); // nothing to be converted to dd - ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "210012")); + ASSERT_EQ(0, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "210012")); + ASSERT_EQ(ts, 4131273600000000LL); // 2100-12-1 ASSERT_EQ(-1, TEST_char2ts("yyyyMMdd ", &ts, TSDB_TIME_PRECISION_MICRO, "21001")); ASSERT_EQ(-1, TEST_char2ts("yyyyMM-dd ", &ts, TSDB_TIME_PRECISION_MICRO, "23a1-1")); @@ -481,6 +482,13 @@ TEST(timeTest, char2ts) { ASSERT_EQ(-1, TEST_char2ts("HH12:MI:SS", &ts, TSDB_TIME_PRECISION_MICRO, "21:12:12")); ASSERT_EQ(-1, TEST_char2ts("yyyy/MM1/dd ", &ts, TSDB_TIME_PRECISION_MICRO, "2100111111111/11/2")); ASSERT_EQ(-2, TEST_char2ts("yyyy/MM1/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "23/11/2-13")); + ASSERT_EQ(0, TEST_char2ts("yyyy年 MM/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "1970年1/1+0")); + ASSERT_EQ(ts, 0); + ASSERT_EQ(-1, TEST_char2ts("yyyy年a MM/dd", &ts, TSDB_TIME_PRECISION_MICRO, "2023年1/2")); + ASSERT_EQ(0, TEST_char2ts("yyyy年 MM/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "1970年 1/1+0")); + ASSERT_EQ(ts, 0); + ASSERT_EQ(0, TEST_char2ts("yyyy年 a a a MM/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "1970年 a a a 1/1+0")); + ASSERT_EQ(0, TEST_char2ts("yyyy年 a a a a a a a a a a a a a a a MM/ddTZH", &ts, TSDB_TIME_PRECISION_MICRO, "1970年 a ")); } #pragma GCC diagnostic pop diff --git a/source/libs/function/src/builtins.c b/source/libs/function/src/builtins.c index 628a609715..84aff9fa88 100644 --- a/source/libs/function/src/builtins.c +++ b/source/libs/function/src/builtins.c @@ -2107,7 +2107,7 @@ static int32_t translateToChar(SFunctionNode* pFunc, char* pErrBuf, int32_t len) if (!IS_STR_DATA_TYPE(para2Type) || !IS_TIMESTAMP_TYPE(para1Type)) { return invaildFuncParaTypeErrMsg(pErrBuf, len, pFunc->functionName); } - pFunc->node.resType = (SDataType){.bytes = tDataTypes[TSDB_DATA_TYPE_VARCHAR].bytes, .type = TSDB_DATA_TYPE_VARCHAR}; + pFunc->node.resType = (SDataType){.bytes = 4096, .type = TSDB_DATA_TYPE_VARCHAR}; return TSDB_CODE_SUCCESS; } diff --git a/source/libs/scalar/src/sclfunc.c b/source/libs/scalar/src/sclfunc.c index ee2ba47ce8..48886b1eec 100644 --- a/source/libs/scalar/src/sclfunc.c +++ b/source/libs/scalar/src/sclfunc.c @@ -1203,9 +1203,13 @@ int32_t toTimestampFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam char * tsStr = taosMemoryMalloc(TS_FORMAT_MAX_LEN); char * format = taosMemoryMalloc(TS_FORMAT_MAX_LEN); int32_t len, code = TSDB_CODE_SUCCESS; + SArray *formats = NULL; + for (int32_t i = 0; i < pInput[0].numOfRows; ++i) { - if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) + if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) { colDataSetNULL(pOutput->columnData, i); + continue; + } char *tsData = colDataGetData(pInput[0].columnData, i); char *formatData = colDataGetData(pInput[1].columnData, pInput[1].numOfRows > 1 ? i : 0); @@ -1213,11 +1217,17 @@ int32_t toTimestampFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam strncpy(tsStr, varDataVal(tsData), len); tsStr[len] = '\0'; len = TMIN(TS_FORMAT_MAX_LEN - 1, varDataLen(formatData)); - strncpy(format, varDataVal(formatData), len); - format[len] = '\0'; + if (pInput[1].numOfRows > 1 || i == 0) { + strncpy(format, varDataVal(formatData), len); + format[len] = '\0'; + if (formats) { + taosArrayDestroy(formats); + formats = NULL; + } + } int32_t precision = pOutput->columnData->info.precision; char errMsg[128] = {0}; - code = taosChar2Ts(format, tsStr, &ts, precision, errMsg, 128); + code = taosChar2Ts(format, &formats, tsStr, &ts, precision, errMsg, 128); if (code) { qError("func to_timestamp failed %s", errMsg); code = TSDB_CODE_FUNC_TO_TIMESTAMP_FAILED; @@ -1225,6 +1235,7 @@ int32_t toTimestampFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam } colDataSetVal(pOutput->columnData, i, (char *)&ts, false); } + if (formats) taosArrayDestroy(formats); taosMemoryFree(tsStr); taosMemoryFree(format); return code; @@ -1232,22 +1243,32 @@ int32_t toTimestampFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam int32_t toCharFunction(SScalarParam* pInput, int32_t inputNum, SScalarParam* pOutput) { char * format = taosMemoryMalloc(TS_FORMAT_MAX_LEN); - char * out = taosMemoryMalloc(TS_FORMAT_MAX_LEN * 2); + char * out = taosMemoryCalloc(1, TS_FORMAT_MAX_LEN + VARSTR_HEADER_SIZE); int32_t len; + SArray *formats = NULL; for (int32_t i = 0; i < pInput[0].numOfRows; ++i) { - if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) + if (colDataIsNull_s(pInput[1].columnData, i) || colDataIsNull_s(pInput[0].columnData, i)) { colDataSetNULL(pOutput->columnData, i); + continue; + } char *ts = colDataGetData(pInput[0].columnData, i); char *formatData = colDataGetData(pInput[1].columnData, pInput[1].numOfRows > 1 ? i : 0); len = TMIN(TS_FORMAT_MAX_LEN - 1, varDataLen(formatData)); - strncpy(format, varDataVal(formatData), len); - format[len] = '\0'; + if (pInput[1].numOfRows > 1 || i == 0) { + strncpy(format, varDataVal(formatData), len); + format[len] = '\0'; + if (formats) { + taosArrayDestroy(formats); + formats = NULL; + } + } int32_t precision = pInput[0].columnData->info.precision; - taosTs2Char(format, *(int64_t *)ts, precision, varDataVal(out)); + taosTs2Char(format, &formats, *(int64_t *)ts, precision, varDataVal(out), TS_FORMAT_MAX_LEN); varDataSetLen(out, strlen(varDataVal(out))); colDataSetVal(pOutput->columnData, i, out, false); } + if (formats) taosArrayDestroy(formats); taosMemoryFree(format); taosMemoryFree(out); return TSDB_CODE_SUCCESS; diff --git a/tests/system-test/2-query/func_to_char_timestamp.py b/tests/system-test/2-query/func_to_char_timestamp.py index 3d3435d9c7..639811d275 100644 --- a/tests/system-test/2-query/func_to_char_timestamp.py +++ b/tests/system-test/2-query/func_to_char_timestamp.py @@ -60,10 +60,15 @@ class TDTestCase: rowsBatched = 0 sql += " %s%d values "%(ctbPrefix,i) for j in range(rowsPerTbl): - if (i < ctbNum/2): - sql += "(%d, %d, %d, %d,%d,%d,%d,true,'binary%d', 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, j%10, j%10, j%10, j%10) + if i % 3 == 0: + ts_format = 'NULL' else: - sql += "(%d, %d, NULL, %d,NULL,%d,%d,true,'binary%d', 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, j%10, j%10) + ts_format = "'yyyy-mm-dd hh24:mi:ss'" + + if (i < ctbNum/2): + sql += "(%d, %d, %d, %d,%d,%d,%d,true,'2023-11-01 10:10:%d', %s, 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, j%10, j%10, j%10, ts_format, j%10) + else: + sql += "(%d, %d, NULL, %d,NULL,%d,%d,true,NULL , %s, 'nchar%d') "%(startTs + j*tsStep, j%10, j%10, j%10, j%10, ts_format, j%10) rowsBatched += 1 if ((rowsBatched == batchNum) or (j == rowsPerTbl - 1)): tsql.execute(sql) @@ -85,7 +90,7 @@ class TDTestCase: 'stbName': 'meters', 'colPrefix': 'c', 'tagPrefix': 't', - 'colSchema': [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'FLOAT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'smallint', 'count':1},{'type': 'tinyint', 'count':1},{'type': 'bool', 'count':1},{'type': 'binary', 'len':10, 'count':1},{'type': 'nchar', 'len':10, 'count':1}], + 'colSchema': [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'FLOAT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'smallint', 'count':1},{'type': 'tinyint', 'count':1},{'type': 'bool', 'count':1},{'type': 'varchar', 'len':1024, 'count':2},{'type': 'nchar', 'len':10, 'count':1}], 'tagSchema': [{'type': 'INT', 'count':1},{'type': 'nchar', 'len':20, 'count':1},{'type': 'binary', 'len':20, 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'smallint', 'count':1},{'type': 'DOUBLE', 'count':1}], 'ctbPrefix': 't', 'ctbStartIdx': 0, @@ -146,6 +151,18 @@ class TDTestCase: tdSql.query("select to_char(ts, 'yy-mon-dd hh24:mi:ss.msa.m.TZH Day') from meters where to_timestamp(to_char(ts, 'yy-mon-dd hh24:mi:ss dy'), 'yy-mon-dd hh24:mi:ss dy') != ts") tdSql.checkRows(0) + tdSql.query("select to_timestamp(c8, 'YYYY-MM-DD hh24:mi:ss') from meters") + tdSql.query("select to_timestamp(c8, c9) from meters") + + format = "YYYY-MM-DD HH:MI:SS" + for i in range(500): + format = format + "1234567890" + tdSql.query("select to_char(ts, '%s') from meters" % (format), queryTimes=1) + time_str = '2023-11-11 10:10:10' + for i in range(500): + time_str = time_str + "1234567890" + tdSql.query("select to_timestamp('%s', '%s')" % (time_str, format)) + def run(self): self.prepareTestEnv() self.test_to_timestamp() From 58e61e21a5064ee9be6aeffe1e7bf77a9251d2d1 Mon Sep 17 00:00:00 2001 From: dapan1121 <72057773+dapan1121@users.noreply.github.com> Date: Mon, 30 Oct 2023 13:36:10 +0800 Subject: [PATCH 3/6] Update 10-function.md --- docs/zh/12-taos-sql/10-function.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/zh/12-taos-sql/10-function.md b/docs/zh/12-taos-sql/10-function.md index 44ab3d5091..6840147112 100644 --- a/docs/zh/12-taos-sql/10-function.md +++ b/docs/zh/12-taos-sql/10-function.md @@ -539,7 +539,7 @@ TO_CHAR(ts, format_str_literal) - 使用`ms`, `us`, `ns`时, 以上三种格式的输出只在精度上不同, 比如ts为 `1697182085123`, `ms` 的输出为 `123`, `us` 的输出为 `123000`, `ns` 的输出为 `123000000`. - 时间格式中无法匹配规则的内容会直接输出. 如果想要在格式串中指定某些能够匹配规则的部分不做转换, 可以使用双引号, 如`to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. 如果想要输出双引号, 那么在双引号之前加一个反斜杠, 如 `to_char(ts, '\"yyyy-mm-dd\"')` 将会输出 `"2023-10-10"`. - 那些输出是数字的格式, 如`YYYY`, `DD`, 大写与小写意义相同, 即`yyyy` 和 `YYYY` 可以互换. -- 默认输出的时间为本地时区的时间. +- 推荐在时间格式中带时区信息,如果不带则默认输出的时区为服务端或客户端所配置的时区. #### TO_TIMESTAMP From af1fed4f04f36a05975963c85dc44ff59b1a6997 Mon Sep 17 00:00:00 2001 From: dapan1121 <72057773+dapan1121@users.noreply.github.com> Date: Mon, 30 Oct 2023 13:41:28 +0800 Subject: [PATCH 4/6] Update 10-function.md --- docs/en/12-taos-sql/10-function.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/12-taos-sql/10-function.md b/docs/en/12-taos-sql/10-function.md index c986a98e46..5ff02b0f9d 100644 --- a/docs/en/12-taos-sql/10-function.md +++ b/docs/en/12-taos-sql/10-function.md @@ -539,7 +539,7 @@ TO_CHAR(ts, format_str_literal) - When `ms`,`us`,`ns` are used in `to_char`, like `to_char(ts, 'yyyy-mm-dd hh:mi:ss.ms.us.ns')`, The time of `ms`,`us`,`ns` corresponds to the same fraction seconds. When ts is `1697182085123`, the output of `ms` is `123`, `us` is `123000`, `ns` is `123000000`. - If we want to output some characters of format without converting, surround it with double quotes. `to_char(ts, 'yyyy-mm-dd "is formated by yyyy-mm-dd"')`. If want to output double quotes, add a back slash before double quote, like `to_char(ts, '\"yyyy-mm-dd\"')` will output `"2023-10-10"`. - For formats that output digits, the uppercase and lowercase formats are the same. -- The local time zone will be used to convert the timestamp. +- It's recommended to put time zone in the format, if not, the default time zone zone will be that in server or client. #### TO_TIMESTAMP From 46826ea600a1c7b8d5a8c0fd7f12f44b61b45b9e Mon Sep 17 00:00:00 2001 From: dapan1121 <72057773+dapan1121@users.noreply.github.com> Date: Mon, 30 Oct 2023 13:51:08 +0800 Subject: [PATCH 5/6] Update 10-function.md --- docs/zh/12-taos-sql/10-function.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/zh/12-taos-sql/10-function.md b/docs/zh/12-taos-sql/10-function.md index 6840147112..4371124623 100644 --- a/docs/zh/12-taos-sql/10-function.md +++ b/docs/zh/12-taos-sql/10-function.md @@ -563,7 +563,8 @@ TO_TIMESTAMP(ts_str_literal, format_str_literal) - 若`ms`, `us`, `ns`同时指定, 那么结果时间戳包含上述三个字段的和. 如 `to_timestamp('2023-10-10 10:10:10.123.000456.000000789', 'yyyy-mm-dd hh:mi:ss.ms.us.ns')` 输出是 `2023-10-10 10:10:10.123456789`. - `MONTH`, `MON`, `DAY`, `DY` 以及其他输出为数字的格式的大小写意义相同, 如 `to_timestamp('2023-JANUARY-01', 'YYYY-month-dd')`, `month`可以被替换为`MONTH` 或者`Month`. - 如果同一字段被指定了多次, 那么前面的指定将会被覆盖. 如 `to_timestamp('2023-22-10-10', 'yyyy-yy-MM-dd')`, 输出年份是`2022`. -- 如果某些部分没有指定 那么默认时间为本地时区的 `1970-01-01 00:00:00`, 未指定部分为对应默认值. +- 为避免转换时使用了非预期的时区,推荐在时间中携带时区信息,例如'2023-10-10 10:10:10+08',如果未指定时区则默认时区为服务端或客户端指定的时区。 +- 如果没有指定完整的时间,那么默认时间值为指定或默认时区的 `1970-01-01 00:00:00`, 未指定部分使用该默认值中的对应部分. - 如果格式串中有`AM`, `PM`等, 那么小时必须是12小时制, 范围必须是01-12. - `to_timestamp`转换具有一定的容错机制, 在格式串和时间戳串不完全对应时, 有时也可转换, 如: `to_timestamp('200101/2', 'yyyyMM1/dd')`, 格式串中多出来的1会被丢弃. 格式串与时间戳串中多余的空格字符(空格, tab等)也会被 自动忽略. 如`to_timestamp(' 23 年 - 1 月 - 01 日 ', 'yy 年-MM月-dd日')` 可以被成功转换. 虽然`MM`等字段需要两个数字对应(只有一位时前面补0), 在`to_timestamp`时, 一个数字也可以成功转换. From d73968a6595c863f6991c8fc4bd254f86d1fd63f Mon Sep 17 00:00:00 2001 From: dapan1121 <72057773+dapan1121@users.noreply.github.com> Date: Mon, 30 Oct 2023 13:56:20 +0800 Subject: [PATCH 6/6] Update 10-function.md --- docs/en/12-taos-sql/10-function.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/en/12-taos-sql/10-function.md b/docs/en/12-taos-sql/10-function.md index 5ff02b0f9d..18c7ffc345 100644 --- a/docs/en/12-taos-sql/10-function.md +++ b/docs/en/12-taos-sql/10-function.md @@ -563,7 +563,8 @@ TO_TIMESTAMP(ts_str_literal, format_str_literal) - When `ms`, `us`, `ns` are used in `to_timestamp`, if multi of them are specified, the results are accumulated. For example, `to_timestamp('2023-10-10 10:10:10.123.000456.000000789', 'yyyy-mm-dd hh:mi:ss.ms.us.ns')` will output the timestamp of `2023-10-10 10:10:10.123456789`. - The uppercase or lowercase of `MONTH`, `MON`, `DAY`, `DY` and formtas that output digits have same effect when used in `to_timestamp`, like `to_timestamp('2023-JANUARY-01', 'YYYY-month-dd')`, `month` can be replaced by `MONTH`, or `month`. The cases are ignored. - If multi times are specified for one component, the previous will be overwritten. Like `to_timestamp('2023-22-10-10', 'yyyy-yy-MM-dd')`, the output year will be `2022`. -- The default timetsamp if some components are not specified will be: `1970-01-01 00:00:00` with your local timezone. +- To avoid unexpected time zone used during the convertion, it's recommended to put time zone in the ts string, e.g. '2023-10-10 10:10:10+08'. If time zone not specified, default will be that in server or client. +- The default timestamp if some components are not specified will be: `1970-01-01 00:00:00` with specified or default local timezone. - If `AM` or `PM` is specified in formats, the Hour must between `1-12`. - In some cases, `to_timestamp` can convert correctly even the format and the timestamp string are not totally matched. Like `to_timetamp('200101/2', 'yyyyMM1/dd')`, the digit `1` in format string are ignored, and the output timestsamp is `2001-01-02 00:00:00`. Spaces and tabs in formats and tiemstamp string are also ignored automatically.