change doc

This commit is contained in:
zyyang 2022-02-20 16:49:48 +08:00
parent c96256c9a2
commit 7436176b27
2 changed files with 141 additions and 308 deletions

View File

@ -140,7 +140,7 @@ create table test.weather (ts timestamp, temperature int, humidity double) tags(
* 必选:是
* 默认值:无
* batchSize
* 描述:
* 描述:每batchSize条record为一个batch进行写入
* 必选:否
* 默认值1
* ignoreTagsUnmatched
@ -241,14 +241,6 @@ datax中的数据类型可以映射到TDengine的数据类型
## FAQ
### 一张源表导入之后对应TDengine中多少张表
这是由tagColumn决定的如果所有tag列的值都相同那么目标表只有一个。源表有多少不同的tag组合目标超级表就有多少子表。
### 源表和目标表的字段顺序一致吗?
是的TDengineWriter按照column中字段的顺序解析来自datax的数据。
### 插件如何确定各列的数据类型?
根据收到的第一批数据自动推断各列的类型。

View File

@ -4,18 +4,42 @@
## 1 Quick Introduction
TDengineWriter Plugin writes data to [TDengine](https://www.taosdata.com/en/). It can be used to offline synchronize data from other databases to TDengine.
The TDengineWriter plugin enables writing data to the target table of the TDengine database. At the bottom level, TDengineWriter connects TDengine through JDBC, executes insert statement /schemaless statement according to TDengine SQL syntax, and writes data to TDengine.
TDengineWriter can be used as a data migration tool for DBAs to import data from other databases into TDengine.
## 2 Implementation
TDengineWriter get records from DataX Framework that are generated from reader side. It has two whiting strategies:
TDengineWriter obtains the protocol data generated by Reader through DataX framework, connects to TDengine through JDBC Driver, executes insert statement /schemaless statement, and writes the data to TDengine.
In TDengine, table can be divided into super table, sub-table and ordinary table. Super table and sub-table include Colum and Tag. The value of tag column of sub-table is fixed value. (details please refer to: [data model](https://www.taosdata.com/docs/cn/v2.0/architecture#model))
The TDengineWriter can write data to super tables, sub-tables, and ordinary tables using the following methods based on the type of the table and whether the column parameter contains TBName:
1. Table is a super table and column specifies tbname: use the automatic insert statement to create the table and use tbname as the name of the sub-table.
2. Table is a super table and column does not contain tbname: use schemaless to write the table. TDengine will auto-create a tbname based on the super table name and tag value.
3. Table is a sub-table: Use insert statement to write, ignoreTagUnmatched parameter is true, ignore data in record whose tag value is inconsistent with that of table.
4. Table is a common table: use insert statement to write data.
1. For data from OpenTSDBReader which is in json format, to leverage the new feature of TDengine Server that support writing json data directly called [schemaless writing](https://www.taosdata.com/cn/documentation/insert#schemaless), we use JNI to call functions in `taos.lib` or `taos.dll`.(Since the feature was not included in taos-jdbcdrive until version 2.0.36).
2. For other data sources, we use [taos-jdbcdriver](https://www.taosdata.com/cn/documentation/connector/java) to write data. If the target table is not exists beforehand, then it will be created automatically according to your configuration.
## 3 Features Introduction
### 3.1 From OpenTSDB to TDengine
#### 3.1.1 Sample Setting
### 3.1 Sample
Configure a job to write to TDengine
Create a supertable on TDengine:
```sql
create database if not exists test;
create table test.weather (ts timestamp, temperature int, humidity double) tags(is_normal bool, device_id binary(100), address nchar(100));
```
Write data to TDengine using the following Job configuration:
```json
{
@ -23,286 +47,65 @@ TDengineWriter get records from DataX Framework that are generated from reader s
"content": [
{
"reader": {
"name": "opentsdbreader",
"name": "streamreader",
"parameter": {
"endpoint": "http://192.168.1.180:4242",
"column": [
"weather_temperature"
{
"type": "string",
"value": "tb1"
},
{
"type": "date",
"value": "2022-02-20 12:00:01"
},
{
"type": "long",
"random": "0, 10"
},
{
"type": "double",
"random": "0, 10"
},
{
"type": "bool",
"random": "0, 50"
},
{
"type": "bytes",
"value": "abcABC123"
},
{
"type": "string",
"value": "北京朝阳望京"
}
],
"beginDateTime": "2021-01-01 00:00:00",
"endDateTime": "2021-01-01 01:00:00"
"sliceRecordCount": 1
}
},
"writer": {
"name": "tdenginewriter",
"parameter": {
"host": "192.168.1.180",
"port": 6030,
"dbName": "test",
"username": "root",
"password": "taosdata"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
```
#### 3.1.2 Configuration
| Parameter | Description | Required | Default |
| --------- | ------------------------------ | -------- | -------- |
| host | host of TDengine | Yes | |
| port | port of TDengine | Yes | |
| username | use name of TDengine | No | root |
| password | password of TDengine | No | taosdata |
| dbName | name of target database | No | |
| batchSize | batch size of insert operation | No | 1 |
#### 3.1.3 Type Convert
| OpenTSDB Type | DataX Type | TDengine Type |
| ---------------- | ---------- | ------------- |
| timestamp | Date | timestamp |
| Integervalue | Double | double |
| Floatvalue | Double | double |
| Stringvalue | String | binary |
| Integertag | String | binary |
| Floattag | String | binary |
| Stringtag | String | binary |
### 3.2 From MongoDB to TDengine
#### 3.2.1 Sample Setting
```json
{
"job": {
"setting": {
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": [
"127.0.0.1:27017"
],
"userName": "user",
"mechanism": "SCRAM-SHA-1",
"userPassword": "password",
"authDb": "admin",
"dbName": "test",
"collectionName": "stock",
"column": [
{
"name": "stockID",
"type": "string"
},
{
"name": "tradeTime",
"type": "date"
},
{
"name": "lastPrice",
"type": "double"
},
{
"name": "askPrice1",
"type": "double"
},
{
"name": "bidPrice1",
"type": "double"
},
{
"name": "volume",
"type": "int"
}
]
}
},
"writer": {
"name": "tdenginewriter",
"parameter": {
"host": "localhost",
"port": 6030,
"dbName": "test",
"username": "root",
"password": "taosdata",
"stable": "stock",
"tagColumn": {
"industry": "energy",
"stockID": 0
},
"fieldColumn": {
"lastPrice": 2,
"askPrice1": 3,
"bidPrice1": 4,
"volume": 5
},
"timestampColumn": {
"tradeTime": 1
}
}
}
}
]
}
}
```
**Notethe writer part of this setting can also apply to other data source except for OpenTSDB **
#### 3.2.2 Configuration
| Parameter | Description | Required | Default | Remark |
| --------------- | --------------------------------------------------------------- | ------------------------ | -------- | ------------------- |
| host | host ofTDengine | Yes | |
| port | port of TDengine | Yes | |
| username | username of TDengine | No | root |
| password | password of TDengine | No | taosdata |
| dbName | name of target database | Yes | |
| batchSize | batch size of insert operation | No | 1000 |
| stable | name of target super table | Yes(except for OpenTSDB) | |
| tagColumn | name and position of tag columns in the record from reader, format:{tagName1: tagInd1, tagName2: tagInd2} | No | | index starts with 0 |
| fieldColumn | name and position of data columns in the record from reader, format: {fdName1: fdInd1, fdName2: fdInd2} | No | | |
| timestampColumn | name and position of timestamp column in the record from reader | No | | |
**Note**: You see that the value of tagColumn "industry" is a fixed string, this ia a good feature of this plugin. Think about this scenario: you have many tables with the structure and one table corresponds to one device. You want to use the device number as a tag in the target super table, then this feature is designed for you.
#### 3.2.3 Auto table creating
##### 3.2.3.1 Rules
If all of `tagColumn`, `fieldColumn` and `timestampColumn` are offered in writer configuration, then target super table will be created automatically.
The type of tag columns will always be `NCHAR(64)`. The sample setting above will produce following sql:
```sql
CREATE STABLE IF NOT EXISTS market_snapshot (
tadetime TIMESTAMP,
lastprice DOUBLE,
askprice1 DOUBLE,
bidprice1 DOUBLE,
volume INT
)
TAGS(
industry NCHAR(64),
stockID NCHAR(64)
);
```
##### 3.2.3.2 Sub-table Creating Rules
The structure of sub-tables are the same with structure of super table. The names of sub-tables are generated by rules below:
1. combine value of tags like this:`tag_value1!tag_value2!tag_value3`.
2. compute md5 hash hex of above string, named `md5val`
3. use "t_md5val" as sub-table name, in which "t" is fixed prefix.
#### 3.2.4 Use Pre-created Table
If you have created super table firstly, then all of tagColumn, fieldColumn and timestampColumn can be omitted. The writer plugin will get table schema by executing `describe stableName`.
The order of columns of records received by this plugin must be the same as the order of columns returned by `describe stableName`. For example, if you have super table as below:
```
Field | Type | Length | Note |
=================================================================================
ts | TIMESTAMP | 8 | |
current | DOUBLE | 8 | |
location | BINARY | 10 | TAG |
```
Then the first columns received by this writer plugin must represent timestamp, the second column must represent current with type double, the third column must represent location with internal type string.
#### 3.2.5 Remarks
1. Config keys --tagColumn, fieldColumn and timestampColumn, must be presented or omitted at the same time.
2. If above three config keys exist and the target table also exists, then the order of columns defined by the config file and the existed table must be the same.
#### 3.2.6 Type Convert
| MongoDB Type | DataX Type | TDengine Type |
| ---------------- | -------------- | ----------------- |
| int, Long | Long | BIGINT |
| double | Double | DOUBLE |
| string, array | String | NCHAR(64) |
| date | Date | TIMESTAMP |
| boolean | Boolean | BOOL |
| bytes | Bytes | BINARY(64) |
### 3.3 From Relational Database to TDengine
Take MySQl as example.
#### 3.3.1 Table Structure in MySQL
```sql
CREATE TABLE IF NOT EXISTS weather(
station varchar(100),
latitude DOUBLE,
longtitude DOUBLE,
`date` DATE,
TMAX int,
TMIN int
)
```
#### 3.3.2 Sample Setting
```json
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "passw0rd",
"password": "taosdata",
"column": [
"*"
"tbname",
"ts",
"temperature",
"humidity",
"is_normal",
"device_id",
"address"
],
"splitPk": "station",
"connection": [
{
"table": [
"weather"
],
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/test?useSSL=false&useUnicode=true&characterEncoding=utf8"
]
"jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/test"
}
]
}
},
"writer": {
"name": "tdenginewriter",
"parameter": {
"host": "127.0.0.1",
"port": 6030,
"dbName": "test",
"username": "root",
"password": "taosdata",
"batchSize": 1000,
"stable": "weather",
"tagColumn": {
"station": 0
},
"fieldColumn": {
"latitude": 1,
"longtitude": 2,
"tmax": 4,
"tmin": 5
},
"timestampColumn":{
"date": 3
}
],
"batchSize": 100,
"ignoreTagsUnmatched": true
}
}
}
@ -316,41 +119,79 @@ CREATE TABLE IF NOT EXISTS weather(
}
```
### 3.2 Configuration
## 4 Performance Test
* jdbcUrl
* Descrption: Data source JDBC connection information, TDengine JDBC information please refer to: [Java connector](https://www.taosdata.com/docs/cn/v2.0/connector/java#url)
* Required: yes
* Default: none
* username
* Descrption: username
* Required: yes
* Default: none
* password
* Descrption: password of username
* Required: yes
* Default: none
* table
* Descrption: A list of table names that should contain all of the columns in the column parameter (except tbname). Note that tbname in column is used as the TDengine sub-table name.
* Required: yes
* Default: none
* column
* Descrption: A list of field names, the order of the fields should be the column in the record
* Required: yes
* Default: none
* batchSize
* Descrption: Each batchSize record is written to a batch
* Required: no
* Default: 1
* ignoreTagsUnmatched
* Descrption: When table is a sub-table in TDengine, table has a tag value. If the tag value of the data and the tag value of the table are not equal, the data is not written to the table.
* Required: no
* Default: false
#### 3.3 Type Convert
Data types in datax that can be mapped to data types in TDengine
| DataX Type | TDengine Type |
| ---------- | ----------------------------------------- |
| INT | TINYINT, SMALLINT, INT |
| LONG | TIMESTAMP, TINYINT, SMALLINT, INT, BIGINT |
| DOUBLE | FLOAT, DOUBLE |
| STRING | TIMESTAMP, BINARY, NCHAR |
| BOOL | BOOL |
| DATE | TIMESTAMP |
| BYTES | BINARY |
### 3.2 From MongoDB to TDengine
Here are some examples of data sources migrating to TDengine
| Sample | Configuration |
| -------------------- | ------------------------------------------------------------ |
| TDengine to TDengine | [super table to super table with tbname](../src/test/resources/t2t-1.json) |
| TDengine to TDengine | [super table to super table without tbname](../src/test/resources/t2t-2.json) |
| TDengine to TDengine | [super table to sub-table](../src/test/resources/t2t-3.json) |
| TDengine to TDengine | [table to table](../src/test/resources/t2t-4.json) |
| RDBMS to TDengine | [table to super table with tbname](../src/test/resources/dm2t-1.json) |
| RDBMS to TDengine | [table to super table without tbname](../src/test/resources/dm2t-2.json) |
| RDBMS to TDengine | [table to sub-table](../src/test/resources/dm2t-3.json) |
| RDBMS to TDengine | [table to table](../src/test/resources/dm2t-4.json) |
| OpenTSDB to TDengine | [metric to table](../src/test/resources/o2t-1.json) |
## 4 Restriction
## 5 Restriction
1. NCHAR type has fixed length 64 when auto creating stable.
2. Rows have null tag values will be dropped.
## FAQ
### How to filter on source table
It depends on reader plugin. For different reader plugins, the way may be different.
### How to import multiple source tables at once
It depends on reader plugin. If the reader plugin supports reading multiple tables at once, then there is no problem.
### How many sub-tables will be produced?
The number of sub-tables is determined by tagColumns, equals to the number of different combinations of tag values.
### Do columns in source table and columns in target table must be in the same order?
No. TDengine require the first column has timestamp typewhich is followed by data columns, followed by tag columns. The writer plugin will create super table in this column order, regardless of origin column orders.
### How dose the plugin infer the data type of incoming data?
By the first batch of records it received.
### Why can't I insert data of 10 years ago? Do this will get error: `TDengine ERROR (2350): failed to execute batch bind`.
Because the database you created only keep 10 years data by default, you can create table like this: `CREATE DATABASE power KEEP 36500;`, in order to enlarge the time period to 100 years.
### What should I do if some dependencies of a plugin can't be found?
I this plugin is not necessary for you, just remove it from pom.xml under project's root directory.
Yes, TDengineWriter parses the data from the Datax in the order of the fields in the column.