mirror of
https://github.com/alibaba/DataX.git
synced 2025-05-02 15:12:22 +08:00
add english doc
This commit is contained in:
commit
16f8439a3f
349
tdenginewriter/doc/tdenginewriter-EN.md
Normal file
349
tdenginewriter/doc/tdenginewriter-EN.md
Normal file
@ -0,0 +1,349 @@
|
||||
# DataX TDengineWriter
|
||||
|
||||
[简体中文](./tdenginewriter.md) | English
|
||||
|
||||
## 1 Quick Introduction
|
||||
|
||||
TDengineWriter Plugin writes data to [TDengine](https://www.taosdata.com/en/). It can be used to offline synchronize data from other databases to TDengine.
|
||||
|
||||
## 2 Implementation
|
||||
|
||||
TDengineWriter get records from DataX Framework that are generated from reader side. It has two whiting strategies:
|
||||
|
||||
1. For data from OpenTSDBReader which is in json format, to leverage the new feature of TDengine Server that support writing json data directly called [schemaless writing](https://www.taosdata.com/cn/documentation/insert#schemaless), we use JNI to call functions in `taos.lib` or `taos.dll`.(Since the feature was not included in taos-jdbcdrive until version 2.0.36).
|
||||
2. For other data sources, we use [taos-jdbcdriver](https://www.taosdata.com/cn/documentation/connector/java) to write data. If the target table is not exists beforehand, then it will be created automatically according to your configuration.
|
||||
|
||||
## 3 Features Introduction
|
||||
### 3.1 From OpenTSDB to TDengine
|
||||
#### 3.1.1 Sample Setting
|
||||
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "opentsdbreader",
|
||||
"parameter": {
|
||||
"endpoint": "http://192.168.1.180:4242",
|
||||
"column": [
|
||||
"weather_temperature"
|
||||
],
|
||||
"beginDateTime": "2021-01-01 00:00:00",
|
||||
"endDateTime": "2021-01-01 01:00:00"
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "tdenginewriter",
|
||||
"parameter": {
|
||||
"host": "192.168.1.180",
|
||||
"port": 6030,
|
||||
"dbname": "test",
|
||||
"user": "root",
|
||||
"password": "taosdata"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 3.1.2 Configuration
|
||||
|
||||
| Parameter | Description | Required | Default |
|
||||
| --------- | ------------------------------ | -------- | -------- |
|
||||
| host | host of TDengine | Yes | |
|
||||
| port | port of TDengine | Yes | |
|
||||
| user | use name of TDengine | No | root |
|
||||
| password | password of TDengine | No | taosdata |
|
||||
| dbname | name of target database | No | |
|
||||
| batchSize | batch size of insert operation | No | 1 |
|
||||
|
||||
|
||||
#### 3.1.3 Type Convert
|
||||
|
||||
| OpenTSDB Type | DataX Type | TDengine Type |
|
||||
| ---------------- | ---------- | ------------- |
|
||||
| timestamp | Date | timestamp |
|
||||
| Integer(value) | Double | double |
|
||||
| Float(value) | Double | double |
|
||||
| String(value) | String | binary |
|
||||
| Integer(tag) | String | binary |
|
||||
| Float(tag) | String | binary |
|
||||
| String(tag) | String | binary |
|
||||
|
||||
### 3.2 From MongoDB to TDengine
|
||||
|
||||
#### 3.2.1 Sample Setting
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 2
|
||||
}
|
||||
},
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "mongodbreader",
|
||||
"parameter": {
|
||||
"address": [
|
||||
"127.0.0.1:27017"
|
||||
],
|
||||
"userName": "user",
|
||||
"mechanism": "SCRAM-SHA-1",
|
||||
"userPassword": "password",
|
||||
"authDb": "admin",
|
||||
"dbName": "test",
|
||||
"collectionName": "stock",
|
||||
"column": [
|
||||
{
|
||||
"name": "stockID",
|
||||
"type": "string"
|
||||
},
|
||||
{
|
||||
"name": "tradeTime",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"name": "lastPrice",
|
||||
"type": "double"
|
||||
},
|
||||
{
|
||||
"name": "askPrice1",
|
||||
"type": "double"
|
||||
},
|
||||
{
|
||||
"name": "bidPrice1",
|
||||
"type": "double"
|
||||
},
|
||||
{
|
||||
"name": "volume",
|
||||
"type": "int"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "tdenginewriter",
|
||||
"parameter": {
|
||||
"host": "localhost",
|
||||
"port": 6030,
|
||||
"dbname": "test",
|
||||
"user": "root",
|
||||
"password": "taosdata",
|
||||
"stable": "stock",
|
||||
"tagColumn": {
|
||||
"industry": "energy",
|
||||
"stockID": 0
|
||||
},
|
||||
"fieldColumn": {
|
||||
"lastPrice": 2,
|
||||
"askPrice1": 3,
|
||||
"bidPrice1": 4,
|
||||
"volume": 5
|
||||
},
|
||||
"timestampColumn": {
|
||||
"tradeTime": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note:the writer part of this setting can also apply to other data source except for OpenTSDB **
|
||||
|
||||
|
||||
#### 3.2.2 Configuration
|
||||
|
||||
| Parameter | Description | Required | Default | Remark |
|
||||
| --------------- | --------------------------------------------------------------- | ------------------------ | -------- | ------------------- |
|
||||
| host | host ofTDengine | Yes | |
|
||||
| port | port of TDengine | Yes | |
|
||||
| user | user name of TDengine | No | root |
|
||||
| password | password of TDengine | No | taosdata |
|
||||
| dbname | name of target database | Yes | |
|
||||
| batchSize | batch size of insert operation | No | 1000 |
|
||||
| stable | name of target super table | Yes(except for OpenTSDB) | |
|
||||
| tagColumn | name and position of tag columns in the record from reader | No | | index starts with 0 |
|
||||
| fieldColumn | name and position of data columns in the record from reader | No | | |
|
||||
| timestampColumn | name and position of timestamp column in the record from reader | No | | |
|
||||
|
||||
#### 3.2.3 Auto table creating
|
||||
##### 3.2.3.1 Rules
|
||||
|
||||
If all of `tagColumn`, `fieldColumn` and `timestampColumn` are offered in writer configuration, then target super table will be created automatically.
|
||||
The type of tag columns will always be `NCHAR(64)`. The sample setting above will produce following sql:
|
||||
|
||||
```sql
|
||||
CREATE STABLE IF NOT EXISTS market_snapshot (
|
||||
tadetime TIMESTAMP,
|
||||
lastprice DOUBLE,
|
||||
askprice1 DOUBLE,
|
||||
bidprice1 DOUBLE,
|
||||
volume INT
|
||||
)
|
||||
TAGS(
|
||||
industry NCHAR(64),
|
||||
stockID NCHAR(64)
|
||||
);
|
||||
```
|
||||
|
||||
##### 3.2.3.2 Sub-table Creating Rules
|
||||
|
||||
The structure of sub-tables are the same with structure of super table. The names of sub-tables are generated by rules below:
|
||||
1. combine value of tags like this:`tag_value1!tag_value2!tag_value3`.
|
||||
2. compute md5 hash hex of above string, named `md5val`
|
||||
3. use "t_md5val" as sub-table name, in which "t" is fixed prefix.
|
||||
|
||||
#### 3.2.4 Use Pre-created Table
|
||||
|
||||
If you have created super table firstly, then all of tagColumn, fieldColumn and timestampColumn can be omitted. The writer plugin will get table schema by executing `describe stableName`.
|
||||
The order of columns of records received by this plugin must be the same as the order of columns returned by `describe stableName`. For example, if you have super table as below:
|
||||
```
|
||||
Field | Type | Length | Note |
|
||||
=================================================================================
|
||||
ts | TIMESTAMP | 8 | |
|
||||
current | DOUBLE | 8 | |
|
||||
location | BINARY | 10 | TAG |
|
||||
```
|
||||
Then the first columns received by this writer plugin must represent timestamp, the second column must represent current with type double, the third column must represent location with internal type string.
|
||||
|
||||
#### 3.2.5 Remarks
|
||||
|
||||
1. Config keys --tagColumn, fieldColumn and timestampColumn, must be presented or omitted at the same time.
|
||||
2. If above three config keys exist and the target table also exists, then the order of columns defined by the config file and the existed table must be the same.
|
||||
|
||||
#### 3.2.6 Type Convert
|
||||
|
||||
| MongoDB Type | DataX Type | TDengine Type |
|
||||
| ---------------- | -------------- | ----------------- |
|
||||
| int, Long | Long | BIGINT |
|
||||
| double | Double | DOUBLE |
|
||||
| string, array | String | NCHAR(64) |
|
||||
| date | Date | TIMESTAMP |
|
||||
| boolean | Boolean | BOOL |
|
||||
| bytes | Bytes | BINARY |
|
||||
|
||||
### 3.3 From Relational Database to TDengine
|
||||
|
||||
Take MySQl as example.
|
||||
|
||||
#### 3.3.1 Table Structure in MySQL
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS weather(
|
||||
station varchar(100),
|
||||
latitude DOUBLE,
|
||||
longtitude DOUBLE,
|
||||
`date` DATE,
|
||||
TMAX int,
|
||||
TMIN int
|
||||
)
|
||||
```
|
||||
|
||||
#### 3.3.2 Sample Setting
|
||||
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "mysqlreader",
|
||||
"parameter": {
|
||||
"username": "root",
|
||||
"password": "passw0rd",
|
||||
"column": [
|
||||
"*"
|
||||
],
|
||||
"splitPk": "station",
|
||||
"connection": [
|
||||
{
|
||||
"table": [
|
||||
"weather"
|
||||
],
|
||||
"jdbcUrl": [
|
||||
"jdbc:mysql://127.0.0.1:3306/test?useSSL=false&useUnicode=true&characterEncoding=utf8"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "tdenginewriter",
|
||||
"parameter": {
|
||||
"host": "127.0.0.1",
|
||||
"port": 6030,
|
||||
"dbname": "test",
|
||||
"user": "root",
|
||||
"password": "taosdata",
|
||||
"batchSize": 1000,
|
||||
"stable": "weather",
|
||||
"tagColumn": {
|
||||
"station": 0
|
||||
},
|
||||
"fieldColumn": {
|
||||
"latitude": 1,
|
||||
"longtitude": 2,
|
||||
"tmax": 4,
|
||||
"tmin": 5
|
||||
},
|
||||
"timestampColumn":{
|
||||
"date": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## 4 Performance Test
|
||||
|
||||
## 5 Restriction
|
||||
|
||||
1. NCHAR type has fixed length 64 when auto creating stable.
|
||||
2. Rows have null tag values will be dropped.
|
||||
|
||||
## FAQ
|
||||
|
||||
### How to filter on source table?
|
||||
|
||||
It depends on reader plugin. For different reader plugins, the way may be different.
|
||||
|
||||
### How to import multiple source tables at once?
|
||||
|
||||
It depends on reader plugin. If the reader plugin supports reading multiple tables at once, then there is no problem.
|
||||
|
||||
### How many sub-tables will be produced?
|
||||
|
||||
The number of sub-tables is determined by tagColumns, equals to the number of different combinations of tag values.
|
||||
|
||||
### Do columns in source table and columns in target table must be in the same order?
|
||||
|
||||
No. TDengine require the first column has timestamp type,which is followed by data columns, followed by tag columns. The writer plugin will create super table in this column order, regardless of origin column orders.
|
||||
|
||||
### How dose the plugin infer the data type of incoming data?
|
||||
|
||||
By the first batch of records it received.
|
||||
|
||||
### Why can't I insert data of 10 years ago? Do this will get error: `TDengine ERROR (2350): failed to execute batch bind`.
|
||||
|
||||
Because the database you created only keep 10 years data by default, you can create table like this: `CREATE DATABASE power KEEP 36500;`, in order to enlarge the time period to 100 years.
|
@ -1,5 +1,7 @@
|
||||
# DataX TDengineWriter
|
||||
|
||||
简体中文| [English](./tdenginewriter-EN.md)
|
||||
|
||||
## 1 快速介绍
|
||||
|
||||
TDengineWriter插件实现了写入数据到TDengine数据库功能。可用于离线同步其它数据库的数据到TDengine。
|
||||
@ -203,7 +205,11 @@ TAGS(
|
||||
|
||||
##### 3.2.3.2 子表创建规则
|
||||
|
||||
<<<<<<< HEAD
|
||||
子表结果与超表相同,子表表名生成规则:
|
||||
=======
|
||||
子表结构与超级表相同,子表表名生成规则:
|
||||
>>>>>>> TD-11503/english-doc-for-writer
|
||||
1. 将标签的value 组合成为如下的字符串: `tag_value1!tag_value2!tag_value3`。
|
||||
2. 计算该字符串的 MD5 散列值 "md5_val"。
|
||||
3. "t_md5val"作为子表名。其中的 "t" 是固定的前缀。
|
||||
@ -225,7 +231,10 @@ TAGS(
|
||||
|
||||
1. tagColumn、 fieldColumn和timestampColumn三个字段用于描述目标表的结构信息,这三个配置字段必须同时存在或同时省略。
|
||||
2. 如果存在以上三个配置,且目标表也已经存在,则两者必须一致。**一致性**由用户自己保证,插件不做检查。不一致可能会导致插入失败或插入数据错乱。
|
||||
<<<<<<< HEAD
|
||||
3. 插件优先使用配置文件中指定的表结构。
|
||||
=======
|
||||
>>>>>>> TD-11503/english-doc-for-writer
|
||||
|
||||
#### 3.2.6 类型转换
|
||||
|
||||
@ -383,11 +392,19 @@ CREATE TABLE IF NOT EXISTS weather(
|
||||
|
||||
### 一张源表导入之后对应TDengine中多少张表?
|
||||
|
||||
<<<<<<< HEAD
|
||||
这是由tagColumn决定的,如果所有tag列的值都相同,那么目标表只有一个。源表有多少不同的tag组合,目标超表就有多少子表。
|
||||
|
||||
### 源表和目标表的字段顺序一致吗?
|
||||
|
||||
TDengine要求每个表第一列是时间戳列,后边是普通字段,最后是标签列。如果源表不是这个顺序,插件在自动建表是自动调整。
|
||||
=======
|
||||
这是由tagColumn决定的,如果所有tag列的值都相同,那么目标表只有一个。源表有多少不同的tag组合,目标超级表就有多少子表。
|
||||
|
||||
### 源表和目标表的字段顺序一致吗?
|
||||
|
||||
TDengine要求每个表第一列是时间戳列,后边是普通字段,最后是标签列。如果源表不是这个顺序,插件在自动建表时会自动调整。
|
||||
>>>>>>> TD-11503/english-doc-for-writer
|
||||
|
||||
### 插件如何确定各列的数据类型?
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user