mirror of
https://github.com/alibaba/DataX.git
synced 2025-05-02 11:11:08 +08:00
Merge branch 'master' into gaussdb
This commit is contained in:
commit
dcafd7ac48
107
README.md
107
README.md
@ -1,9 +1,10 @@
|
||||

|
||||
|
||||
|
||||
# DataX
|
||||
|
||||
DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。
|
||||
[](https://opensource.alibaba.com/contribution_leaderboard/details?projectValue=datax)
|
||||
|
||||
DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS, databend 等各种异构数据源之间高效的数据同步功能。
|
||||
|
||||
# DataX 商业版本
|
||||
阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。
|
||||
@ -25,7 +26,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
||||
|
||||
# Quick Start
|
||||
|
||||
##### Download [DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz)
|
||||
##### Download [DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202308/datax.tar.gz)
|
||||
|
||||
|
||||
##### 请点击:[Quick Start](https://github.com/alibaba/DataX/blob/master/userGuid.md)
|
||||
@ -36,44 +37,48 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
||||
|
||||
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
|
||||
|
||||
| 类型 | 数据源 | Reader(读) |Writer(写)| 文档 |
|
||||
|--------------|---------------------------|:---------:|:-------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||
| RDBMS 关系型数据库 | MySQL |√|√| [读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md) |
|
||||
| | Oracle |√|√| [读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md) |
|
||||
| | OceanBase |√|√| [读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) |
|
||||
| | SQLServer |√|√| [读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md) |
|
||||
| | PostgreSQL |√|√| [读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md) |
|
||||
| | DRDS |√|√| [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) |
|
||||
| | Kingbase |√|√| [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) |
|
||||
| | 通用RDBMS(支持所有关系型数据库) |√|√| [读](https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md) |
|
||||
| 阿里云数仓数据存储 | ODPS |√|√| [读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md) |
|
||||
| | ADS | |√| [写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md) |
|
||||
| | OSS |√|√| [读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md) |
|
||||
| | OCS | |√| [写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md) |
|
||||
| | Hologres | |√| [写](https://github.com/alibaba/DataX/blob/master/hologresjdbcwriter/doc/hologresjdbcwriter.md) |
|
||||
| | AnalyticDB For PostgreSQL | |√| 写 |
|
||||
| 阿里云中间件 | datahub |√|√| 读 、写 |
|
||||
| | SLS |√|√| 读 、写 |
|
||||
| 阿里云图数据库 | GDB |√|√| [读](https://github.com/alibaba/DataX/blob/master/gdbreader/doc/gdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/gdbwriter/doc/gdbwriter.md) |
|
||||
| NoSQL数据存储 | OTS |√|√| [读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md) |
|
||||
| | Hbase0.94 |√|√| [读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md) |
|
||||
| | Hbase1.1 |√|√| [读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md) |
|
||||
| | Phoenix4.x |√|√| [读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md) |
|
||||
| | Phoenix5.x |√|√| [读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md) |
|
||||
| | MongoDB |√|√| [读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md) |
|
||||
| | Cassandra |√|√| [读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md) |
|
||||
| 数仓数据存储 | StarRocks |√|√| 读 、[写](https://github.com/alibaba/DataX/blob/master/starrockswriter/doc/starrockswriter.md) |
|
||||
| | ApacheDoris | |√| [写](https://github.com/alibaba/DataX/blob/master/doriswriter/doc/doriswriter.md) |
|
||||
| | ClickHouse | |√| 写|
|
||||
| | Hive |√|√| [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| | kudu | |√| [写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| 无结构化数据存储 | TxtFile |√|√| [读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md) |
|
||||
| | FTP |√|√| [读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md) |
|
||||
| | HDFS |√|√| [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| | Elasticsearch | |√| [写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md) |
|
||||
| 时间序列数据库 | OpenTSDB |√| | [读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md) |
|
||||
| | TSDB |√|√| [读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md) |
|
||||
| | TDengine |√|√| [读](https://github.com/alibaba/DataX/blob/master/tdenginereader/doc/tdenginereader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdenginewriter/doc/tdenginewriter-CN.md) |
|
||||
| 类型 | 数据源 | Reader(读) | Writer(写) | 文档 |
|
||||
|--------------|---------------------------|:---------:|:---------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||
| RDBMS 关系型数据库 | MySQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md) |
|
||||
| | Oracle | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md) |
|
||||
| | OceanBase | √ | √ | [读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) |
|
||||
| | SQLServer | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md) |
|
||||
| | PostgreSQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md) |
|
||||
| | DRDS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) |
|
||||
| | Kingbase | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) |
|
||||
| | 通用RDBMS(支持所有关系型数据库) | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md) |
|
||||
| 阿里云数仓数据存储 | ODPS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md) |
|
||||
| | ADB | | √ | [写](https://github.com/alibaba/DataX/blob/master/adbmysqlwriter/doc/adbmysqlwriter.md) |
|
||||
| | ADS | | √ | [写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md) |
|
||||
| | OSS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md) |
|
||||
| | OCS | | √ | [写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md) |
|
||||
| | Hologres | | √ | [写](https://github.com/alibaba/DataX/blob/master/hologresjdbcwriter/doc/hologresjdbcwriter.md) |
|
||||
| | AnalyticDB For PostgreSQL | | √ | 写 |
|
||||
| 阿里云中间件 | datahub | √ | √ | 读 、写 |
|
||||
| | SLS | √ | √ | 读 、写 |
|
||||
| 图数据库 | 阿里云 GDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/gdbreader/doc/gdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/gdbwriter/doc/gdbwriter.md) |
|
||||
| | Neo4j | | √ | [写](https://github.com/alibaba/DataX/blob/master/neo4jwriter/doc/neo4jwriter.md) |
|
||||
| NoSQL数据存储 | OTS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md) |
|
||||
| | Hbase0.94 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md) |
|
||||
| | Hbase1.1 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md) |
|
||||
| | Phoenix4.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md) |
|
||||
| | Phoenix5.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md) |
|
||||
| | MongoDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md) |
|
||||
| | Cassandra | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md) |
|
||||
| 数仓数据存储 | StarRocks | √ | √ | 读 、[写](https://github.com/alibaba/DataX/blob/master/starrockswriter/doc/starrockswriter.md) |
|
||||
| | ApacheDoris | | √ | [写](https://github.com/alibaba/DataX/blob/master/doriswriter/doc/doriswriter.md) |
|
||||
| | ClickHouse | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/clickhousereader/doc/clickhousereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/clickhousewriter/doc/clickhousewriter.md) |
|
||||
| | Databend | | √ | [写](https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md) |
|
||||
| | Hive | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| | kudu | | √ | [写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| | selectdb | | √ | [写](https://github.com/alibaba/DataX/blob/master/selectdbwriter/doc/selectdbwriter.md) |
|
||||
| 无结构化数据存储 | TxtFile | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md) |
|
||||
| | FTP | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md) |
|
||||
| | HDFS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) |
|
||||
| | Elasticsearch | | √ | [写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md) |
|
||||
| 时间序列数据库 | OpenTSDB | √ | | [读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md) |
|
||||
| | TSDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md) |
|
||||
| | TDengine | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tdenginereader/doc/tdenginereader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdenginewriter/doc/tdenginewriter-CN.md) |
|
||||
|
||||
# 阿里云DataWorks数据集成
|
||||
|
||||
@ -95,7 +100,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
|
||||
- 整库迁移:https://help.aliyun.com/document_detail/137809.html
|
||||
- 批量上云:https://help.aliyun.com/document_detail/146671.html
|
||||
- 更新更多能力请访问:https://help.aliyun.com/document_detail/137663.html
|
||||
|
||||
-
|
||||
|
||||
# 我要开发新的插件
|
||||
|
||||
@ -105,6 +110,24 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
|
||||
|
||||
DataX 后续计划月度迭代更新,也欢迎感兴趣的同学提交 Pull requests,月度更新内容会介绍介绍如下。
|
||||
|
||||
- [datax_v202308](https://github.com/alibaba/DataX/releases/tag/datax_v202308)
|
||||
- OTS 插件更新
|
||||
- databend 插件更新
|
||||
- Oceanbase驱动修复
|
||||
|
||||
|
||||
- [datax_v202306](https://github.com/alibaba/DataX/releases/tag/datax_v202306)
|
||||
- 精简代码
|
||||
- 新增插件(neo4jwriter、clickhousewriter)
|
||||
- 优化插件、修复问题(oceanbase、hdfs、databend、txtfile)
|
||||
|
||||
|
||||
- [datax_v202303](https://github.com/alibaba/DataX/releases/tag/datax_v202303)
|
||||
- 精简代码
|
||||
- 新增插件(adbmysqlwriter、databendwriter、selectdbwriter)
|
||||
- 优化插件、修复问题(sqlserver、hdfs、cassandra、kudu、oss)
|
||||
- fastjson 升级到 fastjson2
|
||||
|
||||
- [datax_v202210](https://github.com/alibaba/DataX/releases/tag/datax_v202210)
|
||||
- 涉及通道能力更新(OceanBase、Tdengine、Doris等)
|
||||
|
||||
|
338
adbmysqlwriter/doc/adbmysqlwriter.md
Normal file
338
adbmysqlwriter/doc/adbmysqlwriter.md
Normal file
@ -0,0 +1,338 @@
|
||||
# DataX AdbMysqlWriter
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
## 1 快速介绍
|
||||
|
||||
AdbMysqlWriter 插件实现了写入数据到 ADB MySQL 目的表的功能。在底层实现上, AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL,内部会分批次提交入库。
|
||||
|
||||
AdbMysqlWriter 面向ETL开发工程师,他们使用 AdbMysqlWriter 从数仓导入数据到 ADB MySQL。同时 AdbMysqlWriter 亦可以作为数据迁移工具为DBA等用户提供服务。
|
||||
|
||||
|
||||
## 2 实现原理
|
||||
|
||||
AdbMysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL。
|
||||
|
||||
|
||||
* `insert into...`(遇到主键重复时会自动忽略当前写入数据,不做更新,作用等同于`insert ignore into`)
|
||||
|
||||
##### 或者
|
||||
|
||||
* `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 MySQL。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。
|
||||
|
||||
<br />
|
||||
|
||||
注意:整个任务至少需要具备 `insert/replace into...` 的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。
|
||||
|
||||
|
||||
## 3 功能说明
|
||||
|
||||
### 3.1 配置样例
|
||||
|
||||
* 这里使用一份从内存产生到 ADB MySQL 导入的数据。
|
||||
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 1
|
||||
}
|
||||
},
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "streamreader",
|
||||
"parameter": {
|
||||
"column" : [
|
||||
{
|
||||
"value": "DataX",
|
||||
"type": "string"
|
||||
},
|
||||
{
|
||||
"value": 19880808,
|
||||
"type": "long"
|
||||
},
|
||||
{
|
||||
"value": "1988-08-08 08:08:08",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"value": true,
|
||||
"type": "bool"
|
||||
},
|
||||
{
|
||||
"value": "test",
|
||||
"type": "bytes"
|
||||
}
|
||||
],
|
||||
"sliceRecordCount": 1000
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "adbmysqlwriter",
|
||||
"parameter": {
|
||||
"writeMode": "replace",
|
||||
"username": "root",
|
||||
"password": "root",
|
||||
"column": [
|
||||
"*"
|
||||
],
|
||||
"preSql": [
|
||||
"truncate table @table"
|
||||
],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:mysql://ip:port/database?useUnicode=true",
|
||||
"table": [
|
||||
"test"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
|
||||
### 3.2 参数说明
|
||||
|
||||
* **jdbcUrl**
|
||||
|
||||
* 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true
|
||||
|
||||
注意:1、在一个数据库上只能配置一个 jdbcUrl
|
||||
2、一个 AdbMySQL 写入任务仅能配置一个 jdbcUrl
|
||||
3、jdbcUrl按照MySQL官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql官方文档或者咨询对应 DBA。
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **username**
|
||||
|
||||
* 描述:目的数据库的用户名 <br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **password**
|
||||
|
||||
* 描述:目的数据库的密码 <br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **table**
|
||||
|
||||
* 描述:目的表的表名称。只能配置一个 AdbMySQL 的表名称。
|
||||
|
||||
注意:table 和 jdbcUrl 必须包含在 connection 配置单元中
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **column**
|
||||
|
||||
* 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id", "name", "age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。
|
||||
|
||||
**column配置项必须指定,不能留空!**
|
||||
|
||||
注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败
|
||||
2、 column 不能配置任何常量值
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:否 <br />
|
||||
|
||||
* **session**
|
||||
|
||||
* 描述: DataX在获取 ADB MySQL 连接时,执行session指定的SQL语句,修改当前connection session属性
|
||||
|
||||
* 必须: 否
|
||||
|
||||
* 默认值: 空
|
||||
|
||||
* **preSql**
|
||||
|
||||
* 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 SQL 语句时,会对变量按照实际表名称进行替换。比如希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["truncate table @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 `truncate table 对应表名称` <br />
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **postSql**
|
||||
|
||||
* 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql ) <br />
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **writeMode**
|
||||
|
||||
* 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句<br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 所有选项:insert/replace/update <br />
|
||||
|
||||
* 默认值:replace <br />
|
||||
|
||||
* **batchSize**
|
||||
|
||||
* 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与 Adb MySQL 的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。<br />
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:2048 <br />
|
||||
|
||||
|
||||
### 3.3 类型转换
|
||||
|
||||
目前 AdbMysqlWriter 支持大部分 MySQL 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。
|
||||
|
||||
下面列出 AdbMysqlWriter 针对 MySQL 类型转换列表:
|
||||
|
||||
| DataX 内部类型 | AdbMysql 数据类型 |
|
||||
|---------------|---------------------------------|
|
||||
| Long | tinyint, smallint, int, bigint |
|
||||
| Double | float, double, decimal |
|
||||
| String | varchar |
|
||||
| Date | date, time, datetime, timestamp |
|
||||
| Boolean | boolean |
|
||||
| Bytes | binary |
|
||||
|
||||
## 4 性能报告
|
||||
|
||||
### 4.1 环境准备
|
||||
|
||||
#### 4.1.1 数据特征
|
||||
TPC-H 数据集 lineitem 表,共 17 个字段, 随机生成总记录行数 59986052。未压缩总数据量:7.3GiB
|
||||
|
||||
建表语句:
|
||||
|
||||
CREATE TABLE `datax_adbmysqlwriter_perf_lineitem` (
|
||||
`l_orderkey` bigint NOT NULL COMMENT '',
|
||||
`l_partkey` int NOT NULL COMMENT '',
|
||||
`l_suppkey` int NOT NULL COMMENT '',
|
||||
`l_linenumber` int NOT NULL COMMENT '',
|
||||
`l_quantity` decimal(15,2) NOT NULL COMMENT '',
|
||||
`l_extendedprice` decimal(15,2) NOT NULL COMMENT '',
|
||||
`l_discount` decimal(15,2) NOT NULL COMMENT '',
|
||||
`l_tax` decimal(15,2) NOT NULL COMMENT '',
|
||||
`l_returnflag` varchar(1024) NOT NULL COMMENT '',
|
||||
`l_linestatus` varchar(1024) NOT NULL COMMENT '',
|
||||
`l_shipdate` date NOT NULL COMMENT '',
|
||||
`l_commitdate` date NOT NULL COMMENT '',
|
||||
`l_receiptdate` date NOT NULL COMMENT '',
|
||||
`l_shipinstruct` varchar(1024) NOT NULL COMMENT '',
|
||||
`l_shipmode` varchar(1024) NOT NULL COMMENT '',
|
||||
`l_comment` varchar(1024) NOT NULL COMMENT '',
|
||||
`dummy` varchar(1024),
|
||||
PRIMARY KEY (`l_orderkey`, `l_linenumber`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='datax perf test';
|
||||
|
||||
单行记录类似于:
|
||||
|
||||
l_orderkey: 2122789
|
||||
l_partkey: 1233571
|
||||
l_suppkey: 8608
|
||||
l_linenumber: 1
|
||||
l_quantity: 35.00
|
||||
l_extendedprice: 52657.85
|
||||
l_discount: 0.02
|
||||
l_tax: 0.07
|
||||
l_returnflag: N
|
||||
l_linestatus: O
|
||||
l_shipdate: 1996-11-03
|
||||
l_commitdate: 1996-12-07
|
||||
l_receiptdate: 1996-11-16
|
||||
l_shipinstruct: COLLECT COD
|
||||
l_shipmode: FOB
|
||||
l_comment: ld, regular theodolites.
|
||||
dummy:
|
||||
|
||||
#### 4.1.2 机器参数
|
||||
|
||||
* DataX ECS: 24Core48GB
|
||||
|
||||
* Adb MySQL 数据库
|
||||
* 计算资源:16Core64GB(集群版)
|
||||
* 弹性IO资源:3
|
||||
|
||||
#### 4.1.3 DataX jvm 参数
|
||||
|
||||
-Xms1G -Xmx10G -XX:+HeapDumpOnOutOfMemoryError
|
||||
|
||||
### 4.2 测试报告
|
||||
|
||||
| 通道数 | 批量提交行数 | DataX速度(Rec/s) | DataX流量(MB/s) | 导入用时(s) |
|
||||
|-----|-------|------------------|---------------|---------|
|
||||
| 1 | 512 | 23071 | 2.34 | 2627 |
|
||||
| 1 | 1024 | 26080 | 2.65 | 2346 |
|
||||
| 1 | 2048 | 28162 | 2.86 | 2153 |
|
||||
| 1 | 4096 | 28978 | 2.94 | 2119 |
|
||||
| 4 | 512 | 56590 | 5.74 | 1105 |
|
||||
| 4 | 1024 | 81062 | 8.22 | 763 |
|
||||
| 4 | 2048 | 107117 | 10.87 | 605 |
|
||||
| 4 | 4096 | 113181 | 11.48 | 579 |
|
||||
| 8 | 512 | 81062 | 8.22 | 786 |
|
||||
| 8 | 1024 | 127629 | 12.95 | 519 |
|
||||
| 8 | 2048 | 187456 | 19.01 | 369 |
|
||||
| 8 | 4096 | 206848 | 20.98 | 341 |
|
||||
| 16 | 512 | 130404 | 13.23 | 513 |
|
||||
| 16 | 1024 | 214235 | 21.73 | 335 |
|
||||
| 16 | 2048 | 299930 | 30.42 | 253 |
|
||||
| 16 | 4096 | 333255 | 33.80 | 227 |
|
||||
| 32 | 512 | 206848 | 20.98 | 347 |
|
||||
| 32 | 1024 | 315716 | 32.02 | 241 |
|
||||
| 32 | 2048 | 399907 | 40.56 | 199 |
|
||||
| 32 | 4096 | 461431 | 46.80 | 184 |
|
||||
| 64 | 512 | 333255 | 33.80 | 231 |
|
||||
| 64 | 1024 | 399907 | 40.56 | 204 |
|
||||
| 64 | 2048 | 428471 | 43.46 | 199 |
|
||||
| 64 | 4096 | 461431 | 46.80 | 187 |
|
||||
| 128 | 512 | 333255 | 33.80 | 235 |
|
||||
| 128 | 1024 | 399907 | 40.56 | 203 |
|
||||
| 128 | 2048 | 425432 | 43.15 | 197 |
|
||||
| 128 | 4096 | 387006 | 39.26 | 211 |
|
||||
|
||||
说明:
|
||||
|
||||
1. datax 使用 txtfilereader 读取本地文件,避免源端存在性能瓶颈。
|
||||
|
||||
#### 性能测试小结
|
||||
1. channel通道个数和batchSize对性能影响比较大
|
||||
2. 通常不建议写入数据库时,通道个数 > 32
|
||||
|
||||
## 5 约束限制
|
||||
|
||||
## FAQ
|
||||
|
||||
***
|
||||
|
||||
**Q: AdbMysqlWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?**
|
||||
|
||||
A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。
|
||||
|
||||
***
|
||||
|
||||
**Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?**
|
||||
|
||||
A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。
|
||||
|
||||
***
|
||||
|
||||
**Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?**
|
||||
|
||||
A: 可以配置临时表导入
|
79
adbmysqlwriter/pom.xml
Executable file
79
adbmysqlwriter/pom.xml
Executable file
@ -0,0 +1,79 @@
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-all</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
<artifactId>adbmysqlwriter</artifactId>
|
||||
<name>adbmysqlwriter</name>
|
||||
<packaging>jar</packaging>
|
||||
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-common</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<artifactId>slf4j-log4j12</artifactId>
|
||||
<groupId>org.slf4j</groupId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.slf4j</groupId>
|
||||
<artifactId>slf4j-api</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>ch.qos.logback</groupId>
|
||||
<artifactId>logback-classic</artifactId>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>plugin-rdbms-util</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>mysql</groupId>
|
||||
<artifactId>mysql-connector-java</artifactId>
|
||||
<version>5.1.40</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<build>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<!-- assembly plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-assembly-plugin</artifactId>
|
||||
<configuration>
|
||||
<descriptors>
|
||||
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||
</descriptors>
|
||||
<finalName>datax</finalName>
|
||||
</configuration>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>dwzip</id>
|
||||
<phase>package</phase>
|
||||
<goals>
|
||||
<goal>single</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
</project>
|
35
adbmysqlwriter/src/main/assembly/package.xml
Executable file
35
adbmysqlwriter/src/main/assembly/package.xml
Executable file
@ -0,0 +1,35 @@
|
||||
<assembly
|
||||
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||
<id></id>
|
||||
<formats>
|
||||
<format>dir</format>
|
||||
</formats>
|
||||
<includeBaseDirectory>false</includeBaseDirectory>
|
||||
<fileSets>
|
||||
<fileSet>
|
||||
<directory>src/main/resources</directory>
|
||||
<includes>
|
||||
<include>plugin.json</include>
|
||||
<include>plugin_job_template.json</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/writer/adbmysqlwriter</outputDirectory>
|
||||
</fileSet>
|
||||
<fileSet>
|
||||
<directory>target/</directory>
|
||||
<includes>
|
||||
<include>adbmysqlwriter-0.0.1-SNAPSHOT.jar</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/writer/adbmysqlwriter</outputDirectory>
|
||||
</fileSet>
|
||||
</fileSets>
|
||||
|
||||
<dependencySets>
|
||||
<dependencySet>
|
||||
<useProjectArtifact>false</useProjectArtifact>
|
||||
<outputDirectory>plugin/writer/adbmysqlwriter/libs</outputDirectory>
|
||||
<scope>runtime</scope>
|
||||
</dependencySet>
|
||||
</dependencySets>
|
||||
</assembly>
|
@ -0,0 +1,138 @@
|
||||
package com.alibaba.datax.plugin.writer.adbmysqlwriter;
|
||||
|
||||
import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||
import com.alibaba.datax.common.spi.Writer;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.Key;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
|
||||
import java.sql.Connection;
|
||||
import java.sql.SQLException;
|
||||
import java.util.List;
|
||||
|
||||
public class AdbMysqlWriter extends Writer {
|
||||
private static final DataBaseType DATABASE_TYPE = DataBaseType.ADB;
|
||||
|
||||
public static class Job extends Writer.Job {
|
||||
private Configuration originalConfig = null;
|
||||
private CommonRdbmsWriter.Job commonRdbmsWriterJob;
|
||||
|
||||
@Override
|
||||
public void preCheck(){
|
||||
this.init();
|
||||
this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void init() {
|
||||
this.originalConfig = super.getPluginJobConf();
|
||||
this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE);
|
||||
this.commonRdbmsWriterJob.init(this.originalConfig);
|
||||
}
|
||||
|
||||
// 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外)
|
||||
@Override
|
||||
public void prepare() {
|
||||
//实跑先不支持 权限 检验
|
||||
//this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE);
|
||||
this.commonRdbmsWriterJob.prepare(this.originalConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<Configuration> split(int mandatoryNumber) {
|
||||
return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber);
|
||||
}
|
||||
|
||||
// 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外)
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsWriterJob.post(this.originalConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsWriterJob.destroy(this.originalConfig);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
public static class Task extends Writer.Task {
|
||||
|
||||
private Configuration writerSliceConfig;
|
||||
private CommonRdbmsWriter.Task commonRdbmsWriterTask;
|
||||
|
||||
public static class DelegateClass extends CommonRdbmsWriter.Task {
|
||||
private long writeTime = 0L;
|
||||
private long writeCount = 0L;
|
||||
private long lastLogTime = 0;
|
||||
|
||||
public DelegateClass(DataBaseType dataBaseType) {
|
||||
super(dataBaseType);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doBatchInsert(Connection connection, List<Record> buffer)
|
||||
throws SQLException {
|
||||
long startTime = System.currentTimeMillis();
|
||||
|
||||
super.doBatchInsert(connection, buffer);
|
||||
|
||||
writeCount = writeCount + buffer.size();
|
||||
writeTime = writeTime + (System.currentTimeMillis() - startTime);
|
||||
|
||||
// log write metrics every 10 seconds
|
||||
if (System.currentTimeMillis() - lastLogTime > 10000) {
|
||||
lastLogTime = System.currentTimeMillis();
|
||||
logTotalMetrics();
|
||||
}
|
||||
}
|
||||
|
||||
public void logTotalMetrics() {
|
||||
LOG.info(Thread.currentThread().getName() + ", AdbMySQL writer take " + writeTime + " ms, write " + writeCount + " records.");
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public void init() {
|
||||
this.writerSliceConfig = super.getPluginJobConf();
|
||||
|
||||
if (StringUtils.isBlank(this.writerSliceConfig.getString(Key.WRITE_MODE))) {
|
||||
this.writerSliceConfig.set(Key.WRITE_MODE, "REPLACE");
|
||||
}
|
||||
|
||||
this.commonRdbmsWriterTask = new DelegateClass(DATABASE_TYPE);
|
||||
this.commonRdbmsWriterTask.init(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void prepare() {
|
||||
this.commonRdbmsWriterTask.prepare(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
//TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session)
|
||||
public void startWrite(RecordReceiver recordReceiver) {
|
||||
this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig,
|
||||
super.getTaskPluginCollector());
|
||||
}
|
||||
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsWriterTask.post(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsWriterTask.destroy(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean supportFailOver(){
|
||||
String writeMode = writerSliceConfig.getString(Key.WRITE_MODE);
|
||||
return "replace".equalsIgnoreCase(writeMode);
|
||||
}
|
||||
|
||||
}
|
||||
}
|
6
adbmysqlwriter/src/main/resources/plugin.json
Executable file
6
adbmysqlwriter/src/main/resources/plugin.json
Executable file
@ -0,0 +1,6 @@
|
||||
{
|
||||
"name": "adbmysqlwriter",
|
||||
"class": "com.alibaba.datax.plugin.writer.adbmysqlwriter.AdbMysqlWriter",
|
||||
"description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.",
|
||||
"developer": "alibaba"
|
||||
}
|
20
adbmysqlwriter/src/main/resources/plugin_job_template.json
Normal file
20
adbmysqlwriter/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,20 @@
|
||||
{
|
||||
"name": "adbmysqlwriter",
|
||||
"parameter": {
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"column": ["col1", "col2", "col3"],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:mysql://<host>:<port>[/<database>]",
|
||||
"table": ["table1", "table2"]
|
||||
}
|
||||
],
|
||||
"preSql": [],
|
||||
"postSql": [],
|
||||
"batchSize": 65536,
|
||||
"batchByteSize": 134217728,
|
||||
"dryRun": false,
|
||||
"writeMode": "insert"
|
||||
}
|
||||
}
|
@ -110,7 +110,6 @@ DataX 将数据直连ADS接口,利用ADS暴露的INSERT接口直写到ADS。
|
||||
"account": "xxx@aliyun.com",
|
||||
"odpsServer": "xxx",
|
||||
"tunnelServer": "xxx",
|
||||
"accountType": "aliyun",
|
||||
"project": "transfer_project"
|
||||
},
|
||||
"writeMode": "load",
|
||||
|
@ -18,7 +18,7 @@ import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode;
|
||||
import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo;
|
||||
import com.alibaba.datax.plugin.writer.adswriter.util.Constant;
|
||||
import com.alibaba.datax.plugin.writer.adswriter.util.Key;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.apache.commons.lang3.tuple.Pair;
|
||||
import org.slf4j.Logger;
|
||||
|
@ -12,7 +12,6 @@ public class TransferProjectConf {
|
||||
public final static String KEY_ACCOUNT = "odps.account";
|
||||
public final static String KEY_ODPS_SERVER = "odps.odpsServer";
|
||||
public final static String KEY_ODPS_TUNNEL = "odps.tunnelServer";
|
||||
public final static String KEY_ACCOUNT_TYPE = "odps.accountType";
|
||||
public final static String KEY_PROJECT = "odps.project";
|
||||
|
||||
private String accessId;
|
||||
@ -20,7 +19,6 @@ public class TransferProjectConf {
|
||||
private String account;
|
||||
private String odpsServer;
|
||||
private String odpsTunnel;
|
||||
private String accountType;
|
||||
private String project;
|
||||
|
||||
public static TransferProjectConf create(Configuration adsWriterConf) {
|
||||
@ -30,7 +28,6 @@ public class TransferProjectConf {
|
||||
res.account = adsWriterConf.getString(KEY_ACCOUNT);
|
||||
res.odpsServer = adsWriterConf.getString(KEY_ODPS_SERVER);
|
||||
res.odpsTunnel = adsWriterConf.getString(KEY_ODPS_TUNNEL);
|
||||
res.accountType = adsWriterConf.getString(KEY_ACCOUNT_TYPE, "aliyun");
|
||||
res.project = adsWriterConf.getString(KEY_PROJECT);
|
||||
return res;
|
||||
}
|
||||
@ -55,9 +52,6 @@ public class TransferProjectConf {
|
||||
return odpsTunnel;
|
||||
}
|
||||
|
||||
public String getAccountType() {
|
||||
return accountType;
|
||||
}
|
||||
|
||||
public String getProject() {
|
||||
return project;
|
||||
|
@ -23,7 +23,7 @@ import com.alibaba.datax.common.element.StringColumn;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.plugin.TaskPluginCollector;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import com.datastax.driver.core.Cluster;
|
||||
import com.datastax.driver.core.CodecRegistry;
|
||||
@ -298,6 +298,7 @@ public class CassandraReaderHelper {
|
||||
record.addColumn(new LongColumn(rs.getInt(i)));
|
||||
break;
|
||||
|
||||
case COUNTER:
|
||||
case BIGINT:
|
||||
record.addColumn(new LongColumn(rs.getLong(i)));
|
||||
break;
|
||||
@ -558,26 +559,6 @@ public class CassandraReaderHelper {
|
||||
String.format(
|
||||
"配置信息有错误.列信息中需要包含'%s'字段 .",Key.COLUMN_NAME));
|
||||
}
|
||||
if( name.startsWith(Key.WRITE_TIME) ) {
|
||||
String colName = name.substring(Key.WRITE_TIME.length(),name.length() - 1 );
|
||||
ColumnMetadata col = tableMetadata.getColumn(colName);
|
||||
if( col == null ) {
|
||||
throw DataXException
|
||||
.asDataXException(
|
||||
CassandraReaderErrorCode.CONF_ERROR,
|
||||
String.format(
|
||||
"配置信息有错误.列'%s'不存在 .",colName));
|
||||
}
|
||||
} else {
|
||||
ColumnMetadata col = tableMetadata.getColumn(name);
|
||||
if( col == null ) {
|
||||
throw DataXException
|
||||
.asDataXException(
|
||||
CassandraReaderErrorCode.CONF_ERROR,
|
||||
String.format(
|
||||
"配置信息有错误.列'%s'不存在 .",name));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -18,10 +18,10 @@ import java.util.UUID;
|
||||
|
||||
import com.alibaba.datax.common.element.Column;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONArray;
|
||||
import com.alibaba.fastjson.JSONException;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONArray;
|
||||
import com.alibaba.fastjson2.JSONException;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
|
||||
import com.datastax.driver.core.BoundStatement;
|
||||
import com.datastax.driver.core.CodecRegistry;
|
||||
@ -204,7 +204,7 @@ public class CassandraWriterHelper {
|
||||
|
||||
case MAP: {
|
||||
Map m = new HashMap();
|
||||
for (JSONObject.Entry e : ((JSONObject)jsonObject).entrySet()) {
|
||||
for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) {
|
||||
Object k = parseFromString((String) e.getKey(), type.getTypeArguments().get(0));
|
||||
Object v = parseFromJson(e.getValue(), type.getTypeArguments().get(1));
|
||||
m.put(k,v);
|
||||
@ -233,7 +233,7 @@ public class CassandraWriterHelper {
|
||||
case UDT: {
|
||||
UDTValue t = ((UserType) type).newValue();
|
||||
UserType userType = t.getType();
|
||||
for (JSONObject.Entry e : ((JSONObject)jsonObject).entrySet()) {
|
||||
for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) {
|
||||
DataType eleType = userType.getFieldType((String)e.getKey());
|
||||
t.set((String)e.getKey(), parseFromJson(e.getValue(), eleType), registry.codecFor(eleType).getJavaType());
|
||||
}
|
||||
|
344
clickhousereader/doc/clickhousereader.md
Normal file
344
clickhousereader/doc/clickhousereader.md
Normal file
@ -0,0 +1,344 @@
|
||||
|
||||
# ClickhouseReader 插件文档
|
||||
|
||||
|
||||
___
|
||||
|
||||
|
||||
## 1 快速介绍
|
||||
|
||||
ClickhouseReader插件实现了从Clickhouse读取数据。在底层实现上,ClickhouseReader通过JDBC连接远程Clickhouse数据库,并执行相应的sql语句将数据从Clickhouse库中SELECT出来。
|
||||
|
||||
## 2 实现原理
|
||||
|
||||
简而言之,ClickhouseReader通过JDBC连接器连接到远程的Clickhouse数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Clickhouse数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。
|
||||
|
||||
对于用户配置Table、Column、Where的信息,ClickhouseReader将其拼接为SQL语句发送到Clickhouse数据库;对于用户配置querySql信息,Clickhouse直接将其发送到Clickhouse数据库。
|
||||
|
||||
|
||||
## 3 功能说明
|
||||
|
||||
### 3.1 配置样例
|
||||
|
||||
* 配置一个从Clickhouse数据库同步抽取数据到本地的作业:
|
||||
|
||||
```
|
||||
{
|
||||
"job": {
|
||||
"setting": {
|
||||
"speed": {
|
||||
//设置传输速度 byte/s 尽量逼近这个速度但是不高于它.
|
||||
// channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel
|
||||
"byte": 1048576
|
||||
},
|
||||
//出错限制
|
||||
"errorLimit": {
|
||||
//先选择record
|
||||
"record": 0,
|
||||
//百分比 1表示100%
|
||||
"percentage": 0.02
|
||||
}
|
||||
},
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "clickhousereader",
|
||||
"parameter": {
|
||||
// 数据库连接用户名
|
||||
"username": "root",
|
||||
// 数据库连接密码
|
||||
"password": "root",
|
||||
"column": [
|
||||
"id","name"
|
||||
],
|
||||
"connection": [
|
||||
{
|
||||
"table": [
|
||||
"table"
|
||||
],
|
||||
"jdbcUrl": [
|
||||
"jdbc:clickhouse://[HOST_NAME]:PORT/[DATABASE_NAME]"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
//writer类型
|
||||
"name": "streamwriter",
|
||||
// 是否打印内容
|
||||
"parameter": {
|
||||
"print": true
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
* 配置一个自定义SQL的数据库同步任务到本地内容的作业:
|
||||
|
||||
```
|
||||
{
|
||||
"job": {
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 5
|
||||
}
|
||||
},
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "clickhousereader",
|
||||
"parameter": {
|
||||
"username": "root",
|
||||
"password": "root",
|
||||
"where": "",
|
||||
"connection": [
|
||||
{
|
||||
"querySql": [
|
||||
"select db_id,on_line_flag from db_info where db_id < 10"
|
||||
],
|
||||
"jdbcUrl": [
|
||||
"jdbc:clickhouse://1.1.1.1:8123/default"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "streamwriter",
|
||||
"parameter": {
|
||||
"visible": false,
|
||||
"encoding": "UTF-8"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### 3.2 参数说明
|
||||
|
||||
* **jdbcUrl**
|
||||
|
||||
* 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,ClickhouseReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,ClickhouseReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。
|
||||
|
||||
jdbcUrl按照Clickhouse官方规范,并可以填写连接附件控制信息。具体请参看[Clickhouse官方文档](https://clickhouse.com/docs/en/engines/table-engines/integrations/jdbc)。
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **username**
|
||||
|
||||
* 描述:数据源的用户名 <br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **password**
|
||||
|
||||
* 描述:数据源指定用户名的密码 <br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **table**
|
||||
|
||||
* 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,ClickhouseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。<br />
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **column**
|
||||
|
||||
* 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。
|
||||
|
||||
支持列裁剪,即列可以挑选部分列进行导出。
|
||||
|
||||
支持列换序,即列可以不按照表schema信息进行导出。
|
||||
|
||||
支持常量配置,用户需要按照JSON格式:
|
||||
["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"]
|
||||
id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。
|
||||
|
||||
Column必须显示填写,不允许为空!
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **splitPk**
|
||||
|
||||
* 描述:ClickhouseReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。
|
||||
|
||||
推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。
|
||||
|
||||
目前splitPk仅支持整形数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,ClickhouseReader将报错!
|
||||
|
||||
splitPk如果不填写,将视作用户不对单表进行切分,ClickhouseReader使用单通道同步全量数据。
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **where**
|
||||
|
||||
* 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。<br />
|
||||
|
||||
where条件可以有效地进行业务增量同步。
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **querySql**
|
||||
|
||||
* 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id <br />
|
||||
|
||||
`当用户配置querySql时,ClickhouseReader直接忽略table、column、where条件的配置`。
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **fetchSize**
|
||||
|
||||
* 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。<br />
|
||||
|
||||
`注意,该值过大(>2048)可能造成DataX进程OOM。`。
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:1024 <br />
|
||||
|
||||
* **session**
|
||||
|
||||
* 描述:控制写入数据的时间格式,时区等的配置,如果表中有时间字段,配置该值以明确告知写入 clickhouse 的时间格式。通常配置的参数为:NLS_DATE_FORMAT,NLS_TIME_FORMAT。其配置的值为 json 格式,例如:
|
||||
```
|
||||
"session": [
|
||||
"alter session set NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'",
|
||||
"alter session set NLS_TIMESTAMP_FORMAT='yyyy-mm-dd hh24:mi:ss'",
|
||||
"alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd hh24:mi:ss'",
|
||||
"alter session set TIME_ZONE='US/Pacific'"
|
||||
]
|
||||
```
|
||||
`(注意"是 " 的转义字符串)`。
|
||||
|
||||
* 必选:否 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
|
||||
### 3.3 类型转换
|
||||
|
||||
目前ClickhouseReader支持大部分Clickhouse类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。
|
||||
|
||||
下面列出ClickhouseReader针对Clickhouse类型转换列表:
|
||||
|
||||
|
||||
| DataX 内部类型| Clickhouse 数据类型 |
|
||||
| -------- |--------------------------------------------------------------------------------------------|
|
||||
| Long | UInt8, UInt16, UInt32, UInt64, UInt128, UInt256, Int8, Int16, Int32, Int64, Int128, Int256 |
|
||||
| Double | Float32, Float64, Decimal |
|
||||
| String | String, FixedString |
|
||||
| Date | DATE, Date32, DateTime, DateTime64 |
|
||||
| Boolean | Boolean |
|
||||
| Bytes | BLOB,BFILE,RAW,LONG RAW |
|
||||
|
||||
|
||||
|
||||
请注意:
|
||||
|
||||
* `除上述罗列字段类型外,其他类型均不支持`。
|
||||
|
||||
|
||||
## 4 性能报告
|
||||
|
||||
### 4.1 环境准备
|
||||
|
||||
#### 4.1.1 数据特征
|
||||
|
||||
为了模拟线上真实数据,我们设计两个Clickhouse数据表,分别为:
|
||||
|
||||
#### 4.1.2 机器参数
|
||||
|
||||
* 执行DataX的机器参数为:
|
||||
|
||||
* Clickhouse数据库机器参数为:
|
||||
|
||||
### 4.2 测试报告
|
||||
|
||||
#### 4.2.1 表1测试报告
|
||||
|
||||
|
||||
| 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载|
|
||||
|--------| --------|--------|--------|--------|--------|
|
||||
|1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载|
|
||||
|
||||
## 5 约束限制
|
||||
|
||||
### 5.1 主备同步数据恢复问题
|
||||
|
||||
主备同步问题指Clickhouse使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。
|
||||
|
||||
针对这个问题,我们提供了preSql功能,该功能待补充。
|
||||
|
||||
### 5.2 一致性约束
|
||||
|
||||
Clickhouse在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,ClickhouseReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control)
|
||||
|
||||
上述是在ClickhouseReader单线程模型下数据同步一致性的特性,由于ClickhouseReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当ClickhouseReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。
|
||||
|
||||
针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择:
|
||||
|
||||
1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。
|
||||
|
||||
2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。
|
||||
|
||||
### 5.3 数据库编码问题
|
||||
|
||||
|
||||
ClickhouseReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此ClickhouseReader不需用户指定编码,可以自动获取编码并转码。
|
||||
|
||||
对于Clickhouse底层写入编码和其设定的编码不一致的混乱情况,ClickhouseReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。
|
||||
|
||||
### 5.4 增量数据同步
|
||||
|
||||
ClickhouseReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种:
|
||||
|
||||
* 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,ClickhouseReader只需要WHERE条件跟上一同步阶段时间戳即可。
|
||||
* 对于新增流水型数据,ClickhouseReader可以WHERE条件后跟上一阶段最大自增ID即可。
|
||||
|
||||
对于业务上无字段区分新增、修改数据情况,ClickhouseReader也无法进行增量数据同步,只能同步全量数据。
|
||||
|
||||
### 5.5 Sql安全性
|
||||
|
||||
ClickhouseReader提供querySql语句交给用户自己实现SELECT抽取语句,ClickhouseReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。
|
||||
|
||||
## 6 FAQ
|
||||
|
||||
***
|
||||
|
||||
**Q: ClickhouseReader同步报错,报错信息为XXX**
|
||||
|
||||
A: 网络或者权限问题,请使用Clickhouse命令行测试
|
||||
|
||||
|
||||
如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。
|
||||
|
||||
|
||||
**Q: ClickhouseReader抽取速度很慢怎么办?**
|
||||
|
||||
A: 影响抽取时间的原因大概有如下几个:(来自专业 DBA 卫绾)
|
||||
1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描;
|
||||
2. 合理sql的并发度,减少抽取时间;
|
||||
3. 抽取sql要简单,尽量不用replace等函数,这个非常消耗cpu,会严重影响抽取速度;
|
91
clickhousereader/pom.xml
Normal file
91
clickhousereader/pom.xml
Normal file
@ -0,0 +1,91 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<parent>
|
||||
<artifactId>datax-all</artifactId>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<artifactId>clickhousereader</artifactId>
|
||||
<name>clickhousereader</name>
|
||||
<packaging>jar</packaging>
|
||||
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>ru.yandex.clickhouse</groupId>
|
||||
<artifactId>clickhouse-jdbc</artifactId>
|
||||
<version>0.2.4</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-core</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-common</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.slf4j</groupId>
|
||||
<artifactId>slf4j-api</artifactId>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>ch.qos.logback</groupId>
|
||||
<artifactId>logback-classic</artifactId>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>plugin-rdbms-util</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<build>
|
||||
<resources>
|
||||
<resource>
|
||||
<directory>src/main/java</directory>
|
||||
<includes>
|
||||
<include>**/*.properties</include>
|
||||
</includes>
|
||||
</resource>
|
||||
</resources>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<!-- assembly plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-assembly-plugin</artifactId>
|
||||
<configuration>
|
||||
<descriptors>
|
||||
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||
</descriptors>
|
||||
<finalName>datax</finalName>
|
||||
</configuration>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>dwzip</id>
|
||||
<phase>package</phase>
|
||||
<goals>
|
||||
<goal>single</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
|
||||
|
||||
</project>
|
35
clickhousereader/src/main/assembly/package.xml
Normal file
35
clickhousereader/src/main/assembly/package.xml
Normal file
@ -0,0 +1,35 @@
|
||||
<assembly
|
||||
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||
<id></id>
|
||||
<formats>
|
||||
<format>dir</format>
|
||||
</formats>
|
||||
<includeBaseDirectory>false</includeBaseDirectory>
|
||||
<fileSets>
|
||||
<fileSet>
|
||||
<directory>src/main/resources</directory>
|
||||
<includes>
|
||||
<include>plugin.json</include>
|
||||
<include>plugin_job_template.json</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/reader/clickhousereader</outputDirectory>
|
||||
</fileSet>
|
||||
<fileSet>
|
||||
<directory>target/</directory>
|
||||
<includes>
|
||||
<include>clickhousereader-0.0.1-SNAPSHOT.jar</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/reader/clickhousereader</outputDirectory>
|
||||
</fileSet>
|
||||
</fileSets>
|
||||
|
||||
<dependencySets>
|
||||
<dependencySet>
|
||||
<useProjectArtifact>false</useProjectArtifact>
|
||||
<outputDirectory>plugin/reader/clickhousereader/libs</outputDirectory>
|
||||
<scope>runtime</scope>
|
||||
</dependencySet>
|
||||
</dependencySets>
|
||||
</assembly>
|
@ -0,0 +1,87 @@
|
||||
package com.alibaba.datax.plugin.reader.clickhousereader;
|
||||
|
||||
import java.sql.Array;
|
||||
import java.sql.ResultSet;
|
||||
import java.sql.ResultSetMetaData;
|
||||
import java.sql.SQLException;
|
||||
import java.sql.Types;
|
||||
import java.util.List;
|
||||
|
||||
import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.datax.common.element.StringColumn;
|
||||
import com.alibaba.datax.common.plugin.RecordSender;
|
||||
import com.alibaba.datax.common.plugin.TaskPluginCollector;
|
||||
import com.alibaba.datax.common.spi.Reader;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.common.util.MessageSource;
|
||||
import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader;
|
||||
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
public class ClickhouseReader extends Reader {
|
||||
|
||||
private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse;
|
||||
private static final Logger LOG = LoggerFactory.getLogger(ClickhouseReader.class);
|
||||
|
||||
public static class Job extends Reader.Job {
|
||||
private static MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(ClickhouseReader.class);
|
||||
|
||||
private Configuration jobConfig = null;
|
||||
private CommonRdbmsReader.Job commonRdbmsReaderMaster;
|
||||
|
||||
@Override
|
||||
public void init() {
|
||||
this.jobConfig = super.getPluginJobConf();
|
||||
this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE);
|
||||
this.commonRdbmsReaderMaster.init(this.jobConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<Configuration> split(int mandatoryNumber) {
|
||||
return this.commonRdbmsReaderMaster.split(this.jobConfig, mandatoryNumber);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsReaderMaster.post(this.jobConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsReaderMaster.destroy(this.jobConfig);
|
||||
}
|
||||
}
|
||||
|
||||
public static class Task extends Reader.Task {
|
||||
|
||||
private Configuration jobConfig;
|
||||
private CommonRdbmsReader.Task commonRdbmsReaderSlave;
|
||||
|
||||
@Override
|
||||
public void init() {
|
||||
this.jobConfig = super.getPluginJobConf();
|
||||
this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId());
|
||||
this.commonRdbmsReaderSlave.init(this.jobConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void startRead(RecordSender recordSender) {
|
||||
int fetchSize = this.jobConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, 1000);
|
||||
|
||||
this.commonRdbmsReaderSlave.startRead(this.jobConfig, recordSender, super.getTaskPluginCollector(), fetchSize);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsReaderSlave.post(this.jobConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsReaderSlave.destroy(this.jobConfig);
|
||||
}
|
||||
}
|
||||
}
|
6
clickhousereader/src/main/resources/plugin.json
Normal file
6
clickhousereader/src/main/resources/plugin.json
Normal file
@ -0,0 +1,6 @@
|
||||
{
|
||||
"name": "clickhousereader",
|
||||
"class": "com.alibaba.datax.plugin.reader.clickhousereader.ClickhouseReader",
|
||||
"description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql.",
|
||||
"developer": "alibaba"
|
||||
}
|
16
clickhousereader/src/main/resources/plugin_job_template.json
Normal file
16
clickhousereader/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,16 @@
|
||||
{
|
||||
"name": "clickhousereader",
|
||||
"parameter": {
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"column": ["col1", "col2", "col3"],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:clickhouse://<host>:<port>[/<database>]",
|
||||
"table": ["table1", "table2"]
|
||||
}
|
||||
],
|
||||
"preSql": [],
|
||||
"postSql": []
|
||||
}
|
||||
}
|
@ -0,0 +1,74 @@
|
||||
package com.alibaba.datax.plugin.reader.clickhousereader;
|
||||
|
||||
import java.io.File;
|
||||
import java.io.FileNotFoundException;
|
||||
import java.io.FileOutputStream;
|
||||
import java.io.OutputStream;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
|
||||
import com.alibaba.datax.common.element.Column;
|
||||
import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.dataxservice.face.eventcenter.EventLogStore;
|
||||
import com.alibaba.datax.dataxservice.face.eventcenter.RuntimeContext;
|
||||
import com.alibaba.datax.test.simulator.BasicReaderPluginTest;
|
||||
import com.alibaba.datax.test.simulator.junit.extend.log.LoggedRunner;
|
||||
import com.alibaba.datax.test.simulator.junit.extend.log.TestLogger;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
|
||||
import org.apache.commons.lang3.ArrayUtils;
|
||||
import org.junit.Assert;
|
||||
import org.junit.Ignore;
|
||||
import org.junit.Test;
|
||||
import org.junit.runner.RunWith;
|
||||
|
||||
|
||||
@RunWith(LoggedRunner.class)
|
||||
@Ignore
|
||||
public class ClickhouseReaderTest extends BasicReaderPluginTest {
|
||||
@TestLogger(log = "测试basic1.json. 配置常量.")
|
||||
@Test
|
||||
public void testBasic1() {
|
||||
RuntimeContext.setGlobalJobId(-1);
|
||||
EventLogStore.init();
|
||||
List<Record> noteRecordForTest = new ArrayList<Record>();
|
||||
|
||||
List<Configuration> subjobs = super.doReaderTest("basic1.json", 1, noteRecordForTest);
|
||||
|
||||
Assert.assertEquals(1, subjobs.size());
|
||||
Assert.assertEquals(1, noteRecordForTest.size());
|
||||
|
||||
Assert.assertEquals("[8,16,32,64,-8,-16,-32,-64,\"3.2\",\"6.4\",1,\"str_col\",\"abc\"," + "\"417ddc5d-e556-4d27-95dd-a34d84e46a50\",1580745600000,1580752800000,\"hello\",\"[1,2,3]\"," + "\"[\\\"abc\\\",\\\"cde\\\"]\",\"(8,'uint8_type')\",null,\"[1,2]\",\"[\\\"x\\\",\\\"y\\\"]\",\"127.0.0.1\",\"::\",\"23.345\"]", JSON.toJSONString(listData(noteRecordForTest.get(0))));
|
||||
}
|
||||
|
||||
@Override
|
||||
protected OutputStream buildDataOutput(String optionalOutputName) {
|
||||
File f = new File(optionalOutputName + "-output.txt");
|
||||
try {
|
||||
return new FileOutputStream(f);
|
||||
} catch (FileNotFoundException e) {
|
||||
e.printStackTrace();
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getTestPluginName() {
|
||||
return "clickhousereader";
|
||||
}
|
||||
|
||||
private Object[] listData(Record record) {
|
||||
if (null == record) {
|
||||
return ArrayUtils.EMPTY_OBJECT_ARRAY;
|
||||
}
|
||||
Object[] arr = new Object[record.getColumnNumber()];
|
||||
for (int i = 0; i < arr.length; i++) {
|
||||
Column col = record.getColumn(i);
|
||||
if (null != col) {
|
||||
arr[i] = col.getRawData();
|
||||
}
|
||||
}
|
||||
return arr;
|
||||
}
|
||||
}
|
57
clickhousereader/src/test/resources/basic1.json
Executable file
57
clickhousereader/src/test/resources/basic1.json
Executable file
@ -0,0 +1,57 @@
|
||||
{
|
||||
"job": {
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 5
|
||||
}
|
||||
},
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "clickhousereader",
|
||||
"parameter": {
|
||||
"username": "XXXX",
|
||||
"password": "XXXX",
|
||||
"column": [
|
||||
"uint8_col",
|
||||
"uint16_col",
|
||||
"uint32_col",
|
||||
"uint64_col",
|
||||
"int8_col",
|
||||
"int16_col",
|
||||
"int32_col",
|
||||
"int64_col",
|
||||
"float32_col",
|
||||
"float64_col",
|
||||
"bool_col",
|
||||
"str_col",
|
||||
"fixedstr_col",
|
||||
"uuid_col",
|
||||
"date_col",
|
||||
"datetime_col",
|
||||
"enum_col",
|
||||
"ary_uint8_col",
|
||||
"ary_str_col",
|
||||
"tuple_col",
|
||||
"nullable_col",
|
||||
"nested_col.nested_id",
|
||||
"nested_col.nested_str",
|
||||
"ipv4_col",
|
||||
"ipv6_col",
|
||||
"decimal_col"
|
||||
],
|
||||
"connection": [
|
||||
{
|
||||
"table": [
|
||||
"all_type_tbl"
|
||||
],
|
||||
"jdbcUrl":["jdbc:clickhouse://XXXX:8123/default"]
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
34
clickhousereader/src/test/resources/basic1.sql
Normal file
34
clickhousereader/src/test/resources/basic1.sql
Normal file
@ -0,0 +1,34 @@
|
||||
CREATE TABLE IF NOT EXISTS default.all_type_tbl
|
||||
(
|
||||
`uint8_col` UInt8,
|
||||
`uint16_col` UInt16,
|
||||
uint32_col UInt32,
|
||||
uint64_col UInt64,
|
||||
int8_col Int8,
|
||||
int16_col Int16,
|
||||
int32_col Int32,
|
||||
int64_col Int64,
|
||||
float32_col Float32,
|
||||
float64_col Float64,
|
||||
bool_col UInt8,
|
||||
str_col String,
|
||||
fixedstr_col FixedString(3),
|
||||
uuid_col UUID,
|
||||
date_col Date,
|
||||
datetime_col DateTime,
|
||||
enum_col Enum('hello' = 1, 'world' = 2),
|
||||
ary_uint8_col Array(UInt8),
|
||||
ary_str_col Array(String),
|
||||
tuple_col Tuple(UInt8, String),
|
||||
nullable_col Nullable(UInt8),
|
||||
nested_col Nested
|
||||
(
|
||||
nested_id UInt32,
|
||||
nested_str String
|
||||
),
|
||||
ipv4_col IPv4,
|
||||
ipv6_col IPv6,
|
||||
decimal_col Decimal(5,3)
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (uint8_col);
|
@ -10,8 +10,8 @@ import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
|
||||
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONArray;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONArray;
|
||||
|
||||
import java.sql.Array;
|
||||
import java.sql.Connection;
|
||||
|
@ -17,8 +17,8 @@
|
||||
<artifactId>commons-lang3</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba</groupId>
|
||||
<artifactId>fastjson</artifactId>
|
||||
<groupId>com.alibaba.fastjson2</groupId>
|
||||
<artifactId>fastjson2</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>commons-io</groupId>
|
||||
|
@ -1,6 +1,6 @@
|
||||
package com.alibaba.datax.common.element;
|
||||
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import java.math.BigDecimal;
|
||||
import java.math.BigInteger;
|
||||
|
@ -31,7 +31,6 @@ public class PerfTrace {
|
||||
private int taskGroupId;
|
||||
private int channelNumber;
|
||||
|
||||
private int priority;
|
||||
private int batchSize = 500;
|
||||
private volatile boolean perfReportEnable = true;
|
||||
|
||||
@ -54,12 +53,12 @@ public class PerfTrace {
|
||||
* @param taskGroupId
|
||||
* @return
|
||||
*/
|
||||
public static PerfTrace getInstance(boolean isJob, long jobId, int taskGroupId, int priority, boolean enable) {
|
||||
public static PerfTrace getInstance(boolean isJob, long jobId, int taskGroupId, boolean enable) {
|
||||
|
||||
if (instance == null) {
|
||||
synchronized (lock) {
|
||||
if (instance == null) {
|
||||
instance = new PerfTrace(isJob, jobId, taskGroupId, priority, enable);
|
||||
instance = new PerfTrace(isJob, jobId, taskGroupId, enable);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -76,22 +75,21 @@ public class PerfTrace {
|
||||
LOG.error("PerfTrace instance not be init! must have some error! ");
|
||||
synchronized (lock) {
|
||||
if (instance == null) {
|
||||
instance = new PerfTrace(false, -1111, -1111, 0, false);
|
||||
instance = new PerfTrace(false, -1111, -1111, false);
|
||||
}
|
||||
}
|
||||
}
|
||||
return instance;
|
||||
}
|
||||
|
||||
private PerfTrace(boolean isJob, long jobId, int taskGroupId, int priority, boolean enable) {
|
||||
private PerfTrace(boolean isJob, long jobId, int taskGroupId, boolean enable) {
|
||||
try {
|
||||
this.perfTraceId = isJob ? "job_" + jobId : String.format("taskGroup_%s_%s", jobId, taskGroupId);
|
||||
this.enable = enable;
|
||||
this.isJob = isJob;
|
||||
this.taskGroupId = taskGroupId;
|
||||
this.instId = jobId;
|
||||
this.priority = priority;
|
||||
LOG.info(String.format("PerfTrace traceId=%s, isEnable=%s, priority=%s", this.perfTraceId, this.enable, this.priority));
|
||||
LOG.info(String.format("PerfTrace traceId=%s, isEnable=%s", this.perfTraceId, this.enable));
|
||||
|
||||
} catch (Exception e) {
|
||||
// do nothing
|
||||
@ -398,7 +396,6 @@ public class PerfTrace {
|
||||
jdo.setWindowEnd(this.windowEnd);
|
||||
jdo.setJobStartTime(jobStartTime);
|
||||
jdo.setJobRunTimeMs(System.currentTimeMillis() - jobStartTime.getTime());
|
||||
jdo.setJobPriority(this.priority);
|
||||
jdo.setChannelNum(this.channelNumber);
|
||||
jdo.setCluster(this.cluster);
|
||||
jdo.setJobDomain(this.jobDomain);
|
||||
@ -609,7 +606,6 @@ public class PerfTrace {
|
||||
private Date jobStartTime;
|
||||
private Date jobEndTime;
|
||||
private Long jobRunTimeMs;
|
||||
private Integer jobPriority;
|
||||
private Integer channelNum;
|
||||
private String cluster;
|
||||
private String jobDomain;
|
||||
@ -680,10 +676,6 @@ public class PerfTrace {
|
||||
return jobRunTimeMs;
|
||||
}
|
||||
|
||||
public Integer getJobPriority() {
|
||||
return jobPriority;
|
||||
}
|
||||
|
||||
public Integer getChannelNum() {
|
||||
return channelNum;
|
||||
}
|
||||
@ -816,10 +808,6 @@ public class PerfTrace {
|
||||
this.jobRunTimeMs = jobRunTimeMs;
|
||||
}
|
||||
|
||||
public void setJobPriority(Integer jobPriority) {
|
||||
this.jobPriority = jobPriority;
|
||||
}
|
||||
|
||||
public void setChannelNum(Integer channelNum) {
|
||||
this.channelNum = channelNum;
|
||||
}
|
||||
|
@ -77,8 +77,8 @@ public class VMInfo {
|
||||
garbageCollectorMXBeanList = java.lang.management.ManagementFactory.getGarbageCollectorMXBeans();
|
||||
memoryPoolMXBeanList = java.lang.management.ManagementFactory.getMemoryPoolMXBeans();
|
||||
|
||||
osInfo = runtimeMXBean.getVmVendor() + " " + runtimeMXBean.getSpecVersion() + " " + runtimeMXBean.getVmVersion();
|
||||
jvmInfo = osMXBean.getName() + " " + osMXBean.getArch() + " " + osMXBean.getVersion();
|
||||
jvmInfo = runtimeMXBean.getVmVendor() + " " + runtimeMXBean.getSpecVersion() + " " + runtimeMXBean.getVmVersion();
|
||||
osInfo = osMXBean.getName() + " " + osMXBean.getArch() + " " + osMXBean.getVersion();
|
||||
totalProcessorCount = osMXBean.getAvailableProcessors();
|
||||
|
||||
//构建startPhyOSStatus
|
||||
|
@ -3,8 +3,8 @@ package com.alibaba.datax.common.util;
|
||||
import com.alibaba.datax.common.exception.CommonErrorCode;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.spi.ErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.serializer.SerializerFeature;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONWriter;
|
||||
import org.apache.commons.io.IOUtils;
|
||||
import org.apache.commons.lang3.CharUtils;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
@ -586,7 +586,7 @@ public class Configuration {
|
||||
*/
|
||||
public String beautify() {
|
||||
return JSON.toJSONString(this.getInternal(),
|
||||
SerializerFeature.PrettyFormat);
|
||||
JSONWriter.Feature.PrettyFormat);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -1,62 +0,0 @@
|
||||
package com.alibaba.datax.common.util;
|
||||
|
||||
import java.util.Map;
|
||||
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
|
||||
public class IdAndKeyRollingUtil {
|
||||
private static Logger LOGGER = LoggerFactory.getLogger(IdAndKeyRollingUtil.class);
|
||||
public static final String SKYNET_ACCESSID = "SKYNET_ACCESSID";
|
||||
public static final String SKYNET_ACCESSKEY = "SKYNET_ACCESSKEY";
|
||||
|
||||
public final static String ACCESS_ID = "accessId";
|
||||
public final static String ACCESS_KEY = "accessKey";
|
||||
|
||||
public static String parseAkFromSkynetAccessKey() {
|
||||
Map<String, String> envProp = System.getenv();
|
||||
String skynetAccessID = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSID);
|
||||
String skynetAccessKey = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSKEY);
|
||||
String accessKey = null;
|
||||
// follow 原有的判断条件
|
||||
// 环境变量中,如果存在SKYNET_ACCESSID/SKYNET_ACCESSKEy(只要有其中一个变量,则认为一定是两个都存在的!
|
||||
// if (StringUtils.isNotBlank(skynetAccessID) ||
|
||||
// StringUtils.isNotBlank(skynetAccessKey)) {
|
||||
// 检查严格,只有加密串不为空的时候才进去,不过 之前能跑的加密串都不应该为空
|
||||
if (StringUtils.isNotBlank(skynetAccessKey)) {
|
||||
LOGGER.info("Try to get accessId/accessKey from environment SKYNET_ACCESSKEY.");
|
||||
accessKey = DESCipher.decrypt(skynetAccessKey);
|
||||
if (StringUtils.isBlank(accessKey)) {
|
||||
// 环境变量里面有,但是解析不到
|
||||
throw DataXException.asDataXException(String.format(
|
||||
"Failed to get the [accessId]/[accessKey] from the environment variable. The [accessId]=[%s]",
|
||||
skynetAccessID));
|
||||
}
|
||||
}
|
||||
if (StringUtils.isNotBlank(accessKey)) {
|
||||
LOGGER.info("Get accessId/accessKey from environment variables SKYNET_ACCESSKEY successfully.");
|
||||
}
|
||||
return accessKey;
|
||||
}
|
||||
|
||||
public static String getAccessIdAndKeyFromEnv(Configuration originalConfig) {
|
||||
String accessId = null;
|
||||
Map<String, String> envProp = System.getenv();
|
||||
accessId = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSID);
|
||||
String accessKey = null;
|
||||
if (StringUtils.isBlank(accessKey)) {
|
||||
// 老的没有出异常,只是获取不到ak
|
||||
accessKey = IdAndKeyRollingUtil.parseAkFromSkynetAccessKey();
|
||||
}
|
||||
|
||||
if (StringUtils.isNotBlank(accessKey)) {
|
||||
// 确认使用这个的都是 accessId、accessKey的命名习惯
|
||||
originalConfig.set(IdAndKeyRollingUtil.ACCESS_ID, accessId);
|
||||
originalConfig.set(IdAndKeyRollingUtil.ACCESS_KEY, accessKey);
|
||||
}
|
||||
return accessKey;
|
||||
}
|
||||
}
|
@ -79,16 +79,9 @@ public class Engine {
|
||||
perfReportEnable = false;
|
||||
}
|
||||
|
||||
int priority = 0;
|
||||
try {
|
||||
priority = Integer.parseInt(System.getenv("SKYNET_PRIORITY"));
|
||||
}catch (NumberFormatException e){
|
||||
LOG.warn("prioriy set to 0, because NumberFormatException, the value is: "+System.getProperty("PROIORY"));
|
||||
}
|
||||
|
||||
Configuration jobInfoConfig = allConf.getConfiguration(CoreConstant.DATAX_JOB_JOBINFO);
|
||||
//初始化PerfTrace
|
||||
PerfTrace perfTrace = PerfTrace.getInstance(isJob, instanceId, taskGroupId, priority, traceEnable);
|
||||
PerfTrace perfTrace = PerfTrace.getInstance(isJob, instanceId, taskGroupId, traceEnable);
|
||||
perfTrace.setJobInfo(jobInfoConfig,perfReportEnable,channelNumber);
|
||||
container.start();
|
||||
|
||||
|
@ -114,7 +114,7 @@ public final class JobAssignUtil {
|
||||
* 需要实现的效果通过例子来说是:
|
||||
* <pre>
|
||||
* a 库上有表:0, 1, 2
|
||||
* a 库上有表:3, 4
|
||||
* b 库上有表:3, 4
|
||||
* c 库上有表:5, 6, 7
|
||||
*
|
||||
* 如果有 4个 taskGroup
|
||||
|
@ -27,7 +27,7 @@ import com.alibaba.datax.core.util.container.ClassLoaderSwapper;
|
||||
import com.alibaba.datax.core.util.container.CoreConstant;
|
||||
import com.alibaba.datax.core.util.container.LoadUtil;
|
||||
import com.alibaba.datax.dataxservice.face.domain.enums.ExecuteMode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import org.apache.commons.lang.StringUtils;
|
||||
import org.apache.commons.lang.Validate;
|
||||
import org.slf4j.Logger;
|
||||
|
@ -2,7 +2,7 @@ package com.alibaba.datax.core.statistics.communication;
|
||||
|
||||
import com.alibaba.datax.common.statistics.PerfTrace;
|
||||
import com.alibaba.datax.common.util.StrUtil;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import org.apache.commons.lang.Validate;
|
||||
|
||||
import java.text.DecimalFormat;
|
||||
|
@ -6,7 +6,7 @@ import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.core.statistics.communication.Communication;
|
||||
import com.alibaba.datax.core.util.container.CoreConstant;
|
||||
import com.alibaba.datax.core.statistics.plugin.task.util.DirtyRecord;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.slf4j.Logger;
|
||||
|
@ -4,7 +4,7 @@ import com.alibaba.datax.common.element.Column;
|
||||
import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.core.util.FrameworkErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import java.math.BigDecimal;
|
||||
import java.math.BigInteger;
|
||||
|
@ -27,7 +27,7 @@ import com.alibaba.datax.core.util.TransformerUtil;
|
||||
import com.alibaba.datax.core.util.container.CoreConstant;
|
||||
import com.alibaba.datax.core.util.container.LoadUtil;
|
||||
import com.alibaba.datax.dataxservice.face.domain.enums.State;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import org.apache.commons.lang3.Validate;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
@ -29,7 +29,7 @@ public class MemoryChannel extends Channel {
|
||||
|
||||
private ReentrantLock lock;
|
||||
|
||||
private Condition notInsufficient, notEmpty;
|
||||
private Condition notSufficient, notEmpty;
|
||||
|
||||
public MemoryChannel(final Configuration configuration) {
|
||||
super(configuration);
|
||||
@ -37,7 +37,7 @@ public class MemoryChannel extends Channel {
|
||||
this.bufferSize = configuration.getInt(CoreConstant.DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE);
|
||||
|
||||
lock = new ReentrantLock();
|
||||
notInsufficient = lock.newCondition();
|
||||
notSufficient = lock.newCondition();
|
||||
notEmpty = lock.newCondition();
|
||||
}
|
||||
|
||||
@ -75,7 +75,7 @@ public class MemoryChannel extends Channel {
|
||||
lock.lockInterruptibly();
|
||||
int bytes = getRecordBytes(rs);
|
||||
while (memoryBytes.get() + bytes > this.byteCapacity || rs.size() > this.queue.remainingCapacity()) {
|
||||
notInsufficient.await(200L, TimeUnit.MILLISECONDS);
|
||||
notSufficient.await(200L, TimeUnit.MILLISECONDS);
|
||||
}
|
||||
this.queue.addAll(rs);
|
||||
waitWriterTime += System.nanoTime() - startTime;
|
||||
@ -116,7 +116,7 @@ public class MemoryChannel extends Channel {
|
||||
waitReaderTime += System.nanoTime() - startTime;
|
||||
int bytes = getRecordBytes(rs);
|
||||
memoryBytes.addAndGet(-bytes);
|
||||
notInsufficient.signalAll();
|
||||
notSufficient.signalAll();
|
||||
} catch (InterruptedException e) {
|
||||
throw DataXException.asDataXException(
|
||||
FrameworkErrorCode.RUNTIME_ERROR, e);
|
||||
|
@ -5,7 +5,7 @@ import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.core.util.ClassSize;
|
||||
import com.alibaba.datax.core.util.FrameworkErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.HashMap;
|
||||
|
@ -168,6 +168,7 @@ public final class ConfigParser {
|
||||
boolean isDefaultPath = StringUtils.isBlank(pluginPath);
|
||||
if (isDefaultPath) {
|
||||
configuration.set("path", path);
|
||||
configuration.set("loadType","jarLoader");
|
||||
}
|
||||
|
||||
Configuration result = Configuration.newDefault();
|
||||
|
@ -105,7 +105,7 @@ public class CoreConstant {
|
||||
|
||||
public static final String DATAX_JOB_POSTHANDLER_PLUGINNAME = "job.postHandler.pluginName";
|
||||
// ----------------------------- 局部使用的变量
|
||||
public static final String JOB_WRITER = "reader";
|
||||
public static final String JOB_WRITER = "writer";
|
||||
|
||||
public static final String JOB_READER = "reader";
|
||||
|
||||
|
@ -15,7 +15,7 @@ import java.util.List;
|
||||
/**
|
||||
* 提供Jar隔离的加载机制,会把传入的路径、及其子路径、以及路径中的jar文件加入到class path。
|
||||
*/
|
||||
public class JarLoader extends URLClassLoader {
|
||||
public class JarLoader extends URLClassLoader{
|
||||
public JarLoader(String[] paths) {
|
||||
this(paths, JarLoader.class.getClassLoader());
|
||||
}
|
||||
|
@ -49,7 +49,7 @@ public class LoadUtil {
|
||||
/**
|
||||
* jarLoader的缓冲
|
||||
*/
|
||||
private static Map<String, JarLoader> jarLoaderCenter = new HashMap<String, JarLoader>();
|
||||
private static Map<String, JarLoader> jarLoaderCenter = new HashMap();
|
||||
|
||||
/**
|
||||
* 设置pluginConfigs,方便后面插件来获取
|
||||
|
183
databendwriter/doc/databendwriter-CN.md
Normal file
183
databendwriter/doc/databendwriter-CN.md
Normal file
@ -0,0 +1,183 @@
|
||||
# DataX DatabendWriter
|
||||
[简体中文](./databendwriter-CN.md) | [English](./databendwriter.md)
|
||||
|
||||
## 1 快速介绍
|
||||
|
||||
Databend Writer 是一个 DataX 的插件,用于从 DataX 中写入数据到 Databend 表中。
|
||||
该插件基于[databend JDBC driver](https://github.com/databendcloud/databend-jdbc) ,它使用 [RESTful http protocol](https://databend.rs/doc/integrations/api/rest)
|
||||
在开源的 databend 和 [databend cloud](https://app.databend.com/) 上执行查询。
|
||||
|
||||
在每个写入批次中,databend writer 将批量数据上传到内部的 S3 stage,然后执行相应的 insert SQL 将数据上传到 databend 表中。
|
||||
|
||||
为了最佳的用户体验,如果您使用的是 databend 社区版本,您应该尝试采用 [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) 作为其底层存储层,因为
|
||||
它们支持预签名上传操作,否则您可能会在数据传输上浪费不必要的成本。
|
||||
|
||||
您可以在[文档](https://databend.rs/doc/deploy/deploying-databend)中了解更多详细信息
|
||||
|
||||
## 2 实现原理
|
||||
|
||||
Databend Writer 将使用 DataX 从 DataX Reader 中获取生成的记录,并将记录批量插入到 databend 表中指定的列中。
|
||||
|
||||
## 3 功能说明
|
||||
|
||||
### 3.1 配置样例
|
||||
|
||||
* 以下配置将从内存中读取一些生成的数据,并将数据上传到databend表中
|
||||
|
||||
#### 准备工作
|
||||
```sql
|
||||
--- create table in databend
|
||||
drop table if exists datax.sample1;
|
||||
drop database if exists datax;
|
||||
create database if not exists datax;
|
||||
create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant);
|
||||
```
|
||||
|
||||
#### 配置样例
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "streamreader",
|
||||
"parameter": {
|
||||
"column" : [
|
||||
{
|
||||
"value": "DataX",
|
||||
"type": "string"
|
||||
},
|
||||
{
|
||||
"value": 19880808,
|
||||
"type": "long"
|
||||
},
|
||||
{
|
||||
"value": "1926-08-08 08:08:08",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"value": "1988-08-08 08:08:08",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"value": true,
|
||||
"type": "bool"
|
||||
},
|
||||
{
|
||||
"value": "test",
|
||||
"type": "bytes"
|
||||
},
|
||||
{
|
||||
"value": "{\"type\": \"variant\", \"value\": \"test\"}",
|
||||
"type": "string"
|
||||
}
|
||||
|
||||
],
|
||||
"sliceRecordCount": 10000
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "databendwriter",
|
||||
"parameter": {
|
||||
"writeMode": "replace",
|
||||
"onConflictColumn": ["id"],
|
||||
"username": "databend",
|
||||
"password": "databend",
|
||||
"column": ["a", "b", "c", "d", "e", "f", "g"],
|
||||
"batchSize": 1000,
|
||||
"preSql": [
|
||||
],
|
||||
"postSql": [
|
||||
],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:databend://localhost:8000/datax",
|
||||
"table": [
|
||||
"sample1"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 参数说明
|
||||
* jdbcUrl
|
||||
* 描述: JDBC 数据源 url。请参阅仓库中的详细[文档](https://github.com/databendcloud/databend-jdbc)
|
||||
* 必选: 是
|
||||
* 默认值: 无
|
||||
* 示例: jdbc:databend://localhost:8000/datax
|
||||
* username
|
||||
* 描述: JDBC 数据源用户名
|
||||
* 必选: 是
|
||||
* 默认值: 无
|
||||
* 示例: databend
|
||||
* password
|
||||
* 描述: JDBC 数据源密码
|
||||
* 必选: 是
|
||||
* 默认值: 无
|
||||
* 示例: databend
|
||||
* table
|
||||
* 描述: 表名的集合,table应该包含column参数中的所有列。
|
||||
* 必选: 是
|
||||
* 默认值: 无
|
||||
* 示例: ["sample1"]
|
||||
* column
|
||||
* 描述: 表中的列名集合,字段顺序应该与reader的record中的column类型对应
|
||||
* 必选: 是
|
||||
* 默认值: 无
|
||||
* 示例: ["a", "b", "c", "d", "e", "f", "g"]
|
||||
* batchSize
|
||||
* 描述: 每个批次的记录数
|
||||
* 必选: 否
|
||||
* 默认值: 1000
|
||||
* 示例: 1000
|
||||
* preSql
|
||||
* 描述: 在写入数据之前执行的SQL语句
|
||||
* 必选: 否
|
||||
* 默认值: 无
|
||||
* 示例: ["delete from datax.sample1"]
|
||||
* postSql
|
||||
* 描述: 在写入数据之后执行的SQL语句
|
||||
* 必选: 否
|
||||
* 默认值: 无
|
||||
* 示例: ["select count(*) from datax.sample1"]
|
||||
* writeMode
|
||||
* 描述:写入模式,支持 insert 和 replace 两种模式,默认为 insert。若为 replace,务必填写 onConflictColumn 参数
|
||||
* 必选:否
|
||||
* 默认值:insert
|
||||
* 示例:"replace"
|
||||
* onConflictColumn
|
||||
* 描述:on conflict 字段,指定 writeMode 为 replace 后,需要此参数
|
||||
* 必选:否
|
||||
* 默认值:无
|
||||
* 示例:["id","user"]
|
||||
|
||||
### 3.3 类型转化
|
||||
DataX中的数据类型可以转换为databend中的相应数据类型。下表显示了两种类型之间的对应关系。
|
||||
|
||||
| DataX 内部类型 | Databend 数据类型 |
|
||||
|------------|-----------------------------------------------------------|
|
||||
| INT | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
|
||||
| LONG | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
|
||||
| STRING | STRING, VARCHAR |
|
||||
| DOUBLE | FLOAT, DOUBLE |
|
||||
| BOOL | BOOLEAN, BOOL |
|
||||
| DATE | DATE, TIMESTAMP |
|
||||
| BYTES | STRING, VARCHAR |
|
||||
|
||||
## 4 性能测试
|
||||
|
||||
## 5 约束限制
|
||||
目前,复杂数据类型支持不稳定,如果您想使用复杂数据类型,例如元组,数组,请检查databend和jdbc驱动程序的进一步版本。
|
||||
|
||||
## FAQ
|
176
databendwriter/doc/databendwriter.md
Normal file
176
databendwriter/doc/databendwriter.md
Normal file
@ -0,0 +1,176 @@
|
||||
# DataX DatabendWriter
|
||||
[简体中文](./databendwriter-CN.md) | [English](./databendwriter.md)
|
||||
|
||||
## 1 Introduction
|
||||
Databend Writer is a plugin for DataX to write data to Databend Table from dataX records.
|
||||
The plugin is based on [databend JDBC driver](https://github.com/databendcloud/databend-jdbc) which use [RESTful http protocol](https://databend.rs/doc/integrations/api/rest)
|
||||
to execute query on open source databend and [databend cloud](https://app.databend.com/).
|
||||
|
||||
During each write batch, databend writer will upload batch data into internal S3 stage and execute corresponding insert SQL to upload data into databend table.
|
||||
|
||||
For best user experience, if you are using databend community distribution, you should try to adopt [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) as its underlying storage layer since
|
||||
they support presign upload operation otherwise you may expend unneeded cost on data transfer.
|
||||
|
||||
You could see more details on the [doc](https://databend.rs/doc/deploy/deploying-databend)
|
||||
|
||||
## 2 Detailed Implementation
|
||||
Databend Writer would use DataX to fetch records generated by DataX Reader, and then batch insert records to the designated columns for your databend table.
|
||||
|
||||
## 3 Features
|
||||
### 3.1 Example Configurations
|
||||
* the following configuration would read some generated data in memory and upload data into databend table
|
||||
|
||||
#### Preparation
|
||||
```sql
|
||||
--- create table in databend
|
||||
drop table if exists datax.sample1;
|
||||
drop database if exists datax;
|
||||
create database if not exists datax;
|
||||
create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant);
|
||||
```
|
||||
|
||||
#### Configurations
|
||||
```json
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "streamreader",
|
||||
"parameter": {
|
||||
"column" : [
|
||||
{
|
||||
"value": "DataX",
|
||||
"type": "string"
|
||||
},
|
||||
{
|
||||
"value": 19880808,
|
||||
"type": "long"
|
||||
},
|
||||
{
|
||||
"value": "1926-08-08 08:08:08",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"value": "1988-08-08 08:08:08",
|
||||
"type": "date"
|
||||
},
|
||||
{
|
||||
"value": true,
|
||||
"type": "bool"
|
||||
},
|
||||
{
|
||||
"value": "test",
|
||||
"type": "bytes"
|
||||
},
|
||||
{
|
||||
"value": "{\"type\": \"variant\", \"value\": \"test\"}",
|
||||
"type": "string"
|
||||
}
|
||||
|
||||
],
|
||||
"sliceRecordCount": 10000
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "databendwriter",
|
||||
"parameter": {
|
||||
"username": "databend",
|
||||
"password": "databend",
|
||||
"column": ["a", "b", "c", "d", "e", "f", "g"],
|
||||
"batchSize": 1000,
|
||||
"preSql": [
|
||||
],
|
||||
"postSql": [
|
||||
],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:databend://localhost:8000/datax",
|
||||
"table": [
|
||||
"sample1"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Configuration Description
|
||||
* jdbcUrl
|
||||
* Description: JDBC Data source url in Databend. Please take a look at repository for detailed [doc](https://github.com/databendcloud/databend-jdbc)
|
||||
* Required: yes
|
||||
* Default: none
|
||||
* Example: jdbc:databend://localhost:8000/datax
|
||||
* username
|
||||
* Description: Databend user name
|
||||
* Required: yes
|
||||
* Default: none
|
||||
* Example: databend
|
||||
* password
|
||||
* Description: Databend user password
|
||||
* Required: yes
|
||||
* Default: none
|
||||
* Example: databend
|
||||
* table
|
||||
* Description: A list of table names that should contain all of the columns in the column parameter.
|
||||
* Required: yes
|
||||
* Default: none
|
||||
* Example: ["sample1"]
|
||||
* column
|
||||
* Description: A list of column field names that should be inserted into the table. if you want to insert all column fields use `["*"]` instead.
|
||||
* Required: yes
|
||||
* Default: none
|
||||
* Example: ["a", "b", "c", "d", "e", "f", "g"]
|
||||
* batchSize
|
||||
* Description: The number of records to be inserted in each batch.
|
||||
* Required: no
|
||||
* Default: 1024
|
||||
* preSql
|
||||
* Description: A list of SQL statements that will be executed before the write operation.
|
||||
* Required: no
|
||||
* Default: none
|
||||
* postSql
|
||||
* Description: A list of SQL statements that will be executed after the write operation.
|
||||
* Required: no
|
||||
* Default: none
|
||||
* writeMode
|
||||
* Description:The write mode, support `insert` and `replace` two mode.
|
||||
* Required:no
|
||||
* Default:insert
|
||||
* Example:"replace"
|
||||
* onConflictColumn
|
||||
* Description:On conflict fields list.
|
||||
* Required:no
|
||||
* Default:none
|
||||
* Example:["id","user"]
|
||||
|
||||
### 3.3 Type Convert
|
||||
Data types in datax can be converted to the corresponding data types in databend. The following table shows the correspondence between the two types.
|
||||
|
||||
| DataX Type | Databend Type |
|
||||
|------------|-----------------------------------------------------------|
|
||||
| INT | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
|
||||
| LONG | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
|
||||
| STRING | STRING, VARCHAR |
|
||||
| DOUBLE | FLOAT, DOUBLE |
|
||||
| BOOL | BOOLEAN, BOOL |
|
||||
| DATE | DATE, TIMESTAMP |
|
||||
| BYTES | STRING, VARCHAR |
|
||||
|
||||
|
||||
## 4 Performance Test
|
||||
|
||||
|
||||
## 5 Restrictions
|
||||
Currently, complex data type support is not stable, if you want to use complex data type such as tuple, array, please check further release version of databend and jdbc driver.
|
||||
|
||||
## FAQ
|
101
databendwriter/pom.xml
Normal file
101
databendwriter/pom.xml
Normal file
@ -0,0 +1,101 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<parent>
|
||||
<artifactId>datax-all</artifactId>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<artifactId>databendwriter</artifactId>
|
||||
<name>databendwriter</name>
|
||||
<packaging>jar</packaging>
|
||||
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>com.databend</groupId>
|
||||
<artifactId>databend-jdbc</artifactId>
|
||||
<version>0.1.0</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-core</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-common</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.slf4j</groupId>
|
||||
<artifactId>slf4j-api</artifactId>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>ch.qos.logback</groupId>
|
||||
<artifactId>logback-classic</artifactId>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>plugin-rdbms-util</artifactId>
|
||||
<version>${datax-project-version}</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<groupId>com.google.guava</groupId>
|
||||
<artifactId>guava</artifactId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
</dependency>
|
||||
|
||||
|
||||
<dependency>
|
||||
<groupId>junit</groupId>
|
||||
<artifactId>junit</artifactId>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
<build>
|
||||
<resources>
|
||||
<resource>
|
||||
<directory>src/main/java</directory>
|
||||
<includes>
|
||||
<include>**/*.properties</include>
|
||||
</includes>
|
||||
</resource>
|
||||
</resources>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<!-- assembly plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-assembly-plugin</artifactId>
|
||||
<configuration>
|
||||
<descriptors>
|
||||
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||
</descriptors>
|
||||
<finalName>datax</finalName>
|
||||
</configuration>
|
||||
<executions>
|
||||
<execution>
|
||||
<id>dwzip</id>
|
||||
<phase>package</phase>
|
||||
<goals>
|
||||
<goal>single</goal>
|
||||
</goals>
|
||||
</execution>
|
||||
</executions>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
</project>
|
34
databendwriter/src/main/assembly/package.xml
Executable file
34
databendwriter/src/main/assembly/package.xml
Executable file
@ -0,0 +1,34 @@
|
||||
<assembly
|
||||
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||
<id></id>
|
||||
<formats>
|
||||
<format>dir</format>
|
||||
</formats>
|
||||
<includeBaseDirectory>false</includeBaseDirectory>
|
||||
<fileSets>
|
||||
<fileSet>
|
||||
<directory>src/main/resources</directory>
|
||||
<includes>
|
||||
<include>plugin.json</include>
|
||||
<include>plugin_job_template.json</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/writer/databendwriter</outputDirectory>
|
||||
</fileSet>
|
||||
<fileSet>
|
||||
<directory>target/</directory>
|
||||
<includes>
|
||||
<include>databendwriter-0.0.1-SNAPSHOT.jar</include>
|
||||
</includes>
|
||||
<outputDirectory>plugin/writer/databendwriter</outputDirectory>
|
||||
</fileSet>
|
||||
</fileSets>
|
||||
|
||||
<dependencySets>
|
||||
<dependencySet>
|
||||
<useProjectArtifact>false</useProjectArtifact>
|
||||
<outputDirectory>plugin/writer/databendwriter/libs</outputDirectory>
|
||||
</dependencySet>
|
||||
</dependencySets>
|
||||
</assembly>
|
@ -0,0 +1,241 @@
|
||||
package com.alibaba.datax.plugin.writer.databendwriter;
|
||||
|
||||
import com.alibaba.datax.common.element.Column;
|
||||
import com.alibaba.datax.common.element.StringColumn;
|
||||
import com.alibaba.datax.common.exception.CommonErrorCode;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||
import com.alibaba.datax.common.spi.Writer;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
|
||||
import com.alibaba.datax.plugin.writer.databendwriter.util.DatabendWriterUtil;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
import java.sql.*;
|
||||
import java.util.List;
|
||||
import java.util.regex.Pattern;
|
||||
|
||||
public class DatabendWriter extends Writer {
|
||||
private static final DataBaseType DATABASE_TYPE = DataBaseType.Databend;
|
||||
|
||||
public static class Job
|
||||
extends Writer.Job {
|
||||
private static final Logger LOG = LoggerFactory.getLogger(Job.class);
|
||||
private Configuration originalConfig;
|
||||
private CommonRdbmsWriter.Job commonRdbmsWriterMaster;
|
||||
|
||||
@Override
|
||||
public void init() throws DataXException {
|
||||
this.originalConfig = super.getPluginJobConf();
|
||||
this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE);
|
||||
this.commonRdbmsWriterMaster.init(this.originalConfig);
|
||||
// placeholder currently not supported by databend driver, needs special treatment
|
||||
DatabendWriterUtil.dealWriteMode(this.originalConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void preCheck() {
|
||||
this.init();
|
||||
this.commonRdbmsWriterMaster.writerPreCheck(this.originalConfig, DATABASE_TYPE);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void prepare() {
|
||||
this.commonRdbmsWriterMaster.prepare(this.originalConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<Configuration> split(int mandatoryNumber) {
|
||||
return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsWriterMaster.post(this.originalConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsWriterMaster.destroy(this.originalConfig);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public static class Task extends Writer.Task {
|
||||
private static final Logger LOG = LoggerFactory.getLogger(Task.class);
|
||||
|
||||
private Configuration writerSliceConfig;
|
||||
|
||||
private CommonRdbmsWriter.Task commonRdbmsWriterSlave;
|
||||
|
||||
@Override
|
||||
public void init() {
|
||||
this.writerSliceConfig = super.getPluginJobConf();
|
||||
|
||||
this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DataBaseType.Databend) {
|
||||
@Override
|
||||
protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException {
|
||||
try {
|
||||
if (column.getRawData() == null) {
|
||||
preparedStatement.setNull(columnIndex + 1, columnSqltype);
|
||||
return preparedStatement;
|
||||
}
|
||||
|
||||
java.util.Date utilDate;
|
||||
switch (columnSqltype) {
|
||||
|
||||
case Types.TINYINT:
|
||||
case Types.SMALLINT:
|
||||
case Types.INTEGER:
|
||||
preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue());
|
||||
break;
|
||||
case Types.BIGINT:
|
||||
preparedStatement.setLong(columnIndex + 1, column.asLong());
|
||||
break;
|
||||
case Types.DECIMAL:
|
||||
preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal());
|
||||
break;
|
||||
case Types.FLOAT:
|
||||
case Types.REAL:
|
||||
preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue());
|
||||
break;
|
||||
case Types.DOUBLE:
|
||||
preparedStatement.setDouble(columnIndex + 1, column.asDouble());
|
||||
break;
|
||||
case Types.DATE:
|
||||
java.sql.Date sqlDate = null;
|
||||
try {
|
||||
utilDate = column.asDate();
|
||||
} catch (DataXException e) {
|
||||
throw new SQLException(String.format(
|
||||
"Date type conversion error: [%s]", column));
|
||||
}
|
||||
|
||||
if (null != utilDate) {
|
||||
sqlDate = new java.sql.Date(utilDate.getTime());
|
||||
}
|
||||
preparedStatement.setDate(columnIndex + 1, sqlDate);
|
||||
break;
|
||||
|
||||
case Types.TIME:
|
||||
java.sql.Time sqlTime = null;
|
||||
try {
|
||||
utilDate = column.asDate();
|
||||
} catch (DataXException e) {
|
||||
throw new SQLException(String.format(
|
||||
"Date type conversion error: [%s]", column));
|
||||
}
|
||||
|
||||
if (null != utilDate) {
|
||||
sqlTime = new java.sql.Time(utilDate.getTime());
|
||||
}
|
||||
preparedStatement.setTime(columnIndex + 1, sqlTime);
|
||||
break;
|
||||
|
||||
case Types.TIMESTAMP:
|
||||
Timestamp sqlTimestamp = null;
|
||||
if (column instanceof StringColumn && column.asString() != null) {
|
||||
String timeStampStr = column.asString();
|
||||
// JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式
|
||||
String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+";
|
||||
boolean isMatch = Pattern.matches(pattern, timeStampStr);
|
||||
if (isMatch) {
|
||||
sqlTimestamp = Timestamp.valueOf(timeStampStr);
|
||||
preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
|
||||
break;
|
||||
}
|
||||
}
|
||||
try {
|
||||
utilDate = column.asDate();
|
||||
} catch (DataXException e) {
|
||||
throw new SQLException(String.format(
|
||||
"Date type conversion error: [%s]", column));
|
||||
}
|
||||
|
||||
if (null != utilDate) {
|
||||
sqlTimestamp = new Timestamp(
|
||||
utilDate.getTime());
|
||||
}
|
||||
preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
|
||||
break;
|
||||
|
||||
case Types.BINARY:
|
||||
case Types.VARBINARY:
|
||||
case Types.BLOB:
|
||||
case Types.LONGVARBINARY:
|
||||
preparedStatement.setBytes(columnIndex + 1, column
|
||||
.asBytes());
|
||||
break;
|
||||
|
||||
case Types.BOOLEAN:
|
||||
|
||||
// warn: bit(1) -> Types.BIT 可使用setBoolean
|
||||
// warn: bit(>1) -> Types.VARBINARY 可使用setBytes
|
||||
case Types.BIT:
|
||||
if (this.dataBaseType == DataBaseType.MySql) {
|
||||
Boolean asBoolean = column.asBoolean();
|
||||
if (asBoolean != null) {
|
||||
preparedStatement.setBoolean(columnIndex + 1, asBoolean);
|
||||
} else {
|
||||
preparedStatement.setNull(columnIndex + 1, Types.BIT);
|
||||
}
|
||||
} else {
|
||||
preparedStatement.setString(columnIndex + 1, column.asString());
|
||||
}
|
||||
break;
|
||||
|
||||
default:
|
||||
// cast variant / array into string is fine.
|
||||
preparedStatement.setString(columnIndex + 1, column.asString());
|
||||
break;
|
||||
}
|
||||
return preparedStatement;
|
||||
} catch (DataXException e) {
|
||||
// fix类型转换或者溢出失败时,将具体哪一列打印出来
|
||||
if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT ||
|
||||
e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) {
|
||||
throw DataXException
|
||||
.asDataXException(
|
||||
e.getErrorCode(),
|
||||
String.format(
|
||||
"type conversion error. columnName: [%s], columnType:[%d], columnJavaType: [%s]. please change the data type in given column field or do not sync on the column.",
|
||||
this.resultSetMetaData.getLeft()
|
||||
.get(columnIndex),
|
||||
this.resultSetMetaData.getMiddle()
|
||||
.get(columnIndex),
|
||||
this.resultSetMetaData.getRight()
|
||||
.get(columnIndex)));
|
||||
} else {
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
};
|
||||
this.commonRdbmsWriterSlave.init(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void destroy() {
|
||||
this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void prepare() {
|
||||
this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void post() {
|
||||
this.commonRdbmsWriterSlave.post(this.writerSliceConfig);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void startWrite(RecordReceiver lineReceiver) {
|
||||
this.commonRdbmsWriterSlave.startWrite(lineReceiver, this.writerSliceConfig, this.getTaskPluginCollector());
|
||||
}
|
||||
|
||||
}
|
||||
}
|
@ -0,0 +1,33 @@
|
||||
package com.alibaba.datax.plugin.writer.databendwriter;
|
||||
|
||||
import com.alibaba.datax.common.spi.ErrorCode;
|
||||
|
||||
|
||||
public enum DatabendWriterErrorCode implements ErrorCode {
|
||||
CONF_ERROR("DatabendWriter-00", "配置错误."),
|
||||
WRITE_DATA_ERROR("DatabendWriter-01", "写入数据时失败."),
|
||||
;
|
||||
|
||||
private final String code;
|
||||
private final String description;
|
||||
|
||||
private DatabendWriterErrorCode(String code, String description) {
|
||||
this.code = code;
|
||||
this.description = description;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getCode() {
|
||||
return this.code;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getDescription() {
|
||||
return this.description;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return String.format("Code:[%s], Description:[%s].", this.code, this.description);
|
||||
}
|
||||
}
|
@ -0,0 +1,72 @@
|
||||
package com.alibaba.datax.plugin.writer.databendwriter.util;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.Constant;
|
||||
import com.alibaba.datax.plugin.rdbms.writer.Key;
|
||||
|
||||
import com.alibaba.datax.plugin.writer.databendwriter.DatabendWriterErrorCode;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
import javax.xml.crypto.Data;
|
||||
import java.util.List;
|
||||
import java.util.StringJoiner;
|
||||
|
||||
public final class DatabendWriterUtil {
|
||||
private static final Logger LOG = LoggerFactory.getLogger(DatabendWriterUtil.class);
|
||||
|
||||
private DatabendWriterUtil() {
|
||||
}
|
||||
|
||||
public static void dealWriteMode(Configuration originalConfig) throws DataXException {
|
||||
List<String> columns = originalConfig.getList(Key.COLUMN, String.class);
|
||||
List<String> onConflictColumns = originalConfig.getList(Key.ONCONFLICT_COLUMN, String.class);
|
||||
StringBuilder writeDataSqlTemplate = new StringBuilder();
|
||||
|
||||
String jdbcUrl = originalConfig.getString(String.format("%s[0].%s",
|
||||
Constant.CONN_MARK, Key.JDBC_URL, String.class));
|
||||
|
||||
String writeMode = originalConfig.getString(Key.WRITE_MODE, "INSERT");
|
||||
LOG.info("write mode is {}", writeMode);
|
||||
if (writeMode.toLowerCase().contains("replace")) {
|
||||
if (onConflictColumns == null || onConflictColumns.size() == 0) {
|
||||
throw DataXException
|
||||
.asDataXException(
|
||||
DatabendWriterErrorCode.CONF_ERROR,
|
||||
String.format(
|
||||
"Replace mode must has onConflictColumn config."
|
||||
));
|
||||
}
|
||||
|
||||
// for databend if you want to use replace mode, the writeMode should be: "writeMode": "replace"
|
||||
writeDataSqlTemplate.append("REPLACE INTO %s (")
|
||||
.append(StringUtils.join(columns, ",")).append(") ").append(onConFlictDoString(onConflictColumns))
|
||||
.append(" VALUES");
|
||||
|
||||
LOG.info("Replace data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl);
|
||||
originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate);
|
||||
} else {
|
||||
writeDataSqlTemplate.append("INSERT INTO %s");
|
||||
StringJoiner columnString = new StringJoiner(",");
|
||||
|
||||
for (String column : columns) {
|
||||
columnString.add(column);
|
||||
}
|
||||
writeDataSqlTemplate.append(String.format("(%s)", columnString));
|
||||
writeDataSqlTemplate.append(" VALUES");
|
||||
|
||||
LOG.info("Insert data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl);
|
||||
|
||||
originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
public static String onConFlictDoString(List<String> conflictColumns) {
|
||||
return " ON " +
|
||||
"(" +
|
||||
StringUtils.join(conflictColumns, ",") + ") ";
|
||||
}
|
||||
}
|
6
databendwriter/src/main/resources/plugin.json
Normal file
6
databendwriter/src/main/resources/plugin.json
Normal file
@ -0,0 +1,6 @@
|
||||
{
|
||||
"name": "databendwriter",
|
||||
"class": "com.alibaba.datax.plugin.writer.databendwriter.DatabendWriter",
|
||||
"description": "execute batch insert sql to write dataX data into databend",
|
||||
"developer": "databend"
|
||||
}
|
19
databendwriter/src/main/resources/plugin_job_template.json
Normal file
19
databendwriter/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,19 @@
|
||||
{
|
||||
"name": "databendwriter",
|
||||
"parameter": {
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"column": ["col1", "col2", "col3"],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl": "jdbc:databend://<host>:<port>[/<database>]",
|
||||
"table": "table1"
|
||||
}
|
||||
],
|
||||
"preSql": [],
|
||||
"postSql": [],
|
||||
|
||||
"maxBatchRows": 65536,
|
||||
"maxBatchSize": 134217728
|
||||
}
|
||||
}
|
@ -1,8 +1,8 @@
|
||||
package com.alibaba.datax.plugin.reader.datahubreader;
|
||||
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import com.aliyun.datahub.client.DatahubClient;
|
||||
import com.aliyun.datahub.client.DatahubClientBuilder;
|
||||
import com.aliyun.datahub.client.auth.Account;
|
||||
|
@ -3,8 +3,8 @@ package com.alibaba.datax.plugin.writer.datahubwriter;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import com.aliyun.datahub.client.DatahubClient;
|
||||
import com.aliyun.datahub.client.DatahubClientBuilder;
|
||||
import com.aliyun.datahub.client.auth.Account;
|
||||
|
@ -8,7 +8,7 @@ import com.alibaba.datax.common.spi.Writer;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.common.util.DataXCaseEnvUtil;
|
||||
import com.alibaba.datax.common.util.RetryUtil;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.aliyun.datahub.client.DatahubClient;
|
||||
import com.aliyun.datahub.client.model.FieldType;
|
||||
import com.aliyun.datahub.client.model.GetTopicResult;
|
||||
|
20
datax-example/datax-example-core/pom.xml
Normal file
20
datax-example/datax-example-core/pom.xml
Normal file
@ -0,0 +1,20 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>datax-example-core</artifactId>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>8</maven.compiler.source>
|
||||
<maven.compiler.target>8</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
</properties>
|
||||
|
||||
</project>
|
@ -0,0 +1,26 @@
|
||||
package com.alibaba.datax.example;
|
||||
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.core.Engine;
|
||||
import com.alibaba.datax.example.util.ExampleConfigParser;
|
||||
|
||||
/**
|
||||
* {@code Date} 2023/8/6 11:22
|
||||
*
|
||||
* @author fuyouj
|
||||
*/
|
||||
|
||||
public class ExampleContainer {
|
||||
/**
|
||||
* example对外暴露的启动入口
|
||||
* 使用前最好看下 datax-example/doc/README.MD
|
||||
* @param jobPath 任务json绝对路径
|
||||
*/
|
||||
public static void start(String jobPath) {
|
||||
|
||||
Configuration configuration = ExampleConfigParser.parse(jobPath);
|
||||
|
||||
Engine engine = new Engine();
|
||||
engine.start(configuration);
|
||||
}
|
||||
}
|
@ -0,0 +1,23 @@
|
||||
package com.alibaba.datax.example;
|
||||
|
||||
|
||||
import com.alibaba.datax.example.util.PathUtil;
|
||||
|
||||
/**
|
||||
* @author fuyouj
|
||||
*/
|
||||
public class Main {
|
||||
|
||||
/**
|
||||
* 1.在example模块pom文件添加你依赖的的调试插件,
|
||||
* 你可以直接打开本模块的pom文件,参考是如何引入streamreader,streamwriter
|
||||
* 2. 在此处指定你的job文件
|
||||
*/
|
||||
public static void main(String[] args) {
|
||||
|
||||
String classPathJobPath = "/job/stream2stream.json";
|
||||
String absJobPath = PathUtil.getAbsolutePathFromClassPath(classPathJobPath);
|
||||
ExampleContainer.start(absJobPath);
|
||||
}
|
||||
|
||||
}
|
@ -0,0 +1,154 @@
|
||||
package com.alibaba.datax.example.util;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.core.util.ConfigParser;
|
||||
import com.alibaba.datax.core.util.FrameworkErrorCode;
|
||||
import com.alibaba.datax.core.util.container.CoreConstant;
|
||||
|
||||
import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.net.URL;
|
||||
import java.nio.file.Paths;
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* @author fuyouj
|
||||
*/
|
||||
public class ExampleConfigParser {
|
||||
private static final String CORE_CONF = "/example/conf/core.json";
|
||||
|
||||
private static final String PLUGIN_DESC_FILE = "plugin.json";
|
||||
|
||||
/**
|
||||
* 指定Job配置路径,ConfigParser会解析Job、Plugin、Core全部信息,并以Configuration返回
|
||||
* 不同于Core的ConfigParser,这里的core,plugin 不依赖于编译后的datax.home,而是扫描程序编译后的target目录
|
||||
*/
|
||||
public static Configuration parse(final String jobPath) {
|
||||
|
||||
Configuration configuration = ConfigParser.parseJobConfig(jobPath);
|
||||
configuration.merge(coreConfig(),
|
||||
false);
|
||||
|
||||
Map<String, String> pluginTypeMap = new HashMap<>();
|
||||
String readerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_READER_NAME);
|
||||
String writerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME);
|
||||
pluginTypeMap.put(readerName, "reader");
|
||||
pluginTypeMap.put(writerName, "writer");
|
||||
Configuration pluginsDescConfig = parsePluginsConfig(pluginTypeMap);
|
||||
configuration.merge(pluginsDescConfig, false);
|
||||
return configuration;
|
||||
}
|
||||
|
||||
private static Configuration parsePluginsConfig(Map<String, String> pluginTypeMap) {
|
||||
|
||||
Configuration configuration = Configuration.newDefault();
|
||||
|
||||
//最初打算通过user.dir获取工作目录来扫描插件,
|
||||
//但是user.dir在不同有一些不确定性,所以废弃了这个选择
|
||||
|
||||
for (File basePackage : runtimeBasePackages()) {
|
||||
if (pluginTypeMap.isEmpty()) {
|
||||
break;
|
||||
}
|
||||
scanPluginByPackage(basePackage, configuration, basePackage.listFiles(), pluginTypeMap);
|
||||
}
|
||||
if (!pluginTypeMap.isEmpty()) {
|
||||
String failedPlugin = pluginTypeMap.keySet().toString();
|
||||
String message = "\nplugin %s load failed :ry to analyze the reasons from the following aspects.。\n" +
|
||||
"1: Check if the name of the plugin is spelled correctly, and verify whether DataX supports this plugin\n" +
|
||||
"2:Verify if the <resource></resource> tag has been added under <build></build> section in the pom file of the relevant plugin.\n<resource>" +
|
||||
" <directory>src/main/resources</directory>\n" +
|
||||
" <includes>\n" +
|
||||
" <include>**/*.*</include>\n" +
|
||||
" </includes>\n" +
|
||||
" <filtering>true</filtering>\n" +
|
||||
" </resource>\n [Refer to the streamreader pom file] \n" +
|
||||
"3: Check that the datax-yourPlugin-example module imported your test plugin";
|
||||
message = String.format(message, failedPlugin);
|
||||
throw DataXException.asDataXException(FrameworkErrorCode.PLUGIN_INIT_ERROR, message);
|
||||
}
|
||||
return configuration;
|
||||
}
|
||||
|
||||
/**
|
||||
* 通过classLoader获取程序编译的输出目录
|
||||
*
|
||||
* @return File[/datax-example/target/classes,xxReader/target/classes,xxWriter/target/classes]
|
||||
*/
|
||||
private static File[] runtimeBasePackages() {
|
||||
List<File> basePackages = new ArrayList<>();
|
||||
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
|
||||
Enumeration<URL> resources = null;
|
||||
try {
|
||||
resources = classLoader.getResources("");
|
||||
} catch (IOException e) {
|
||||
throw DataXException.asDataXException(e.getMessage());
|
||||
}
|
||||
|
||||
while (resources.hasMoreElements()) {
|
||||
URL resource = resources.nextElement();
|
||||
File file = new File(resource.getFile());
|
||||
if (file.isDirectory()) {
|
||||
basePackages.add(file);
|
||||
}
|
||||
}
|
||||
|
||||
return basePackages.toArray(new File[0]);
|
||||
}
|
||||
|
||||
/**
|
||||
* @param packageFile 编译出来的target/classes根目录 便于找到插件时设置插件的URL目录,设置根目录是最保险的方式
|
||||
* @param configuration pluginConfig
|
||||
* @param files 待扫描文件
|
||||
* @param needPluginTypeMap 需要的插件
|
||||
*/
|
||||
private static void scanPluginByPackage(File packageFile,
|
||||
Configuration configuration,
|
||||
File[] files,
|
||||
Map<String, String> needPluginTypeMap) {
|
||||
if (files == null) {
|
||||
return;
|
||||
}
|
||||
for (File file : files) {
|
||||
if (file.isFile() && PLUGIN_DESC_FILE.equals(file.getName())) {
|
||||
Configuration pluginDesc = Configuration.from(file);
|
||||
String descPluginName = pluginDesc.getString("name", "");
|
||||
|
||||
if (needPluginTypeMap.containsKey(descPluginName)) {
|
||||
|
||||
String type = needPluginTypeMap.get(descPluginName);
|
||||
configuration.merge(parseOnePlugin(packageFile.getAbsolutePath(), type, descPluginName, pluginDesc), false);
|
||||
needPluginTypeMap.remove(descPluginName);
|
||||
|
||||
}
|
||||
} else {
|
||||
scanPluginByPackage(packageFile, configuration, file.listFiles(), needPluginTypeMap);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
private static Configuration parseOnePlugin(String packagePath,
|
||||
String pluginType,
|
||||
String pluginName,
|
||||
Configuration pluginDesc) {
|
||||
//设置path 兼容jarLoader的加载方式URLClassLoader
|
||||
pluginDesc.set("path", packagePath);
|
||||
Configuration pluginConfInJob = Configuration.newDefault();
|
||||
pluginConfInJob.set(
|
||||
String.format("plugin.%s.%s", pluginType, pluginName),
|
||||
pluginDesc.getInternal());
|
||||
return pluginConfInJob;
|
||||
}
|
||||
|
||||
private static Configuration coreConfig() {
|
||||
try {
|
||||
URL resource = ExampleConfigParser.class.getResource(CORE_CONF);
|
||||
return Configuration.from(Paths.get(resource.toURI()).toFile());
|
||||
} catch (Exception ignore) {
|
||||
throw DataXException.asDataXException("Failed to load the configuration file core.json. " +
|
||||
"Please check whether /example/conf/core.json exists!");
|
||||
}
|
||||
}
|
||||
}
|
@ -0,0 +1,26 @@
|
||||
package com.alibaba.datax.example.util;
|
||||
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
|
||||
import java.net.URI;
|
||||
import java.net.URISyntaxException;
|
||||
import java.net.URL;
|
||||
import java.nio.file.Paths;
|
||||
|
||||
/**
|
||||
* @author fuyouj
|
||||
*/
|
||||
public class PathUtil {
|
||||
public static String getAbsolutePathFromClassPath(String path) {
|
||||
URL resource = PathUtil.class.getResource(path);
|
||||
try {
|
||||
assert resource != null;
|
||||
URI uri = resource.toURI();
|
||||
return Paths.get(uri).toString();
|
||||
} catch (NullPointerException | URISyntaxException e) {
|
||||
throw DataXException.asDataXException("path error,please check whether the path is correct");
|
||||
}
|
||||
|
||||
}
|
||||
}
|
60
datax-example/datax-example-core/src/main/resources/example/conf/core.json
Executable file
60
datax-example/datax-example-core/src/main/resources/example/conf/core.json
Executable file
@ -0,0 +1,60 @@
|
||||
{
|
||||
"entry": {
|
||||
"jvm": "-Xms1G -Xmx1G",
|
||||
"environment": {}
|
||||
},
|
||||
"common": {
|
||||
"column": {
|
||||
"datetimeFormat": "yyyy-MM-dd HH:mm:ss",
|
||||
"timeFormat": "HH:mm:ss",
|
||||
"dateFormat": "yyyy-MM-dd",
|
||||
"extraFormats":["yyyyMMdd"],
|
||||
"timeZone": "GMT+8",
|
||||
"encoding": "utf-8"
|
||||
}
|
||||
},
|
||||
"core": {
|
||||
"dataXServer": {
|
||||
"address": "http://localhost:7001/api",
|
||||
"timeout": 10000,
|
||||
"reportDataxLog": false,
|
||||
"reportPerfLog": false
|
||||
},
|
||||
"transport": {
|
||||
"channel": {
|
||||
"class": "com.alibaba.datax.core.transport.channel.memory.MemoryChannel",
|
||||
"speed": {
|
||||
"byte": -1,
|
||||
"record": -1
|
||||
},
|
||||
"flowControlInterval": 20,
|
||||
"capacity": 512,
|
||||
"byteCapacity": 67108864
|
||||
},
|
||||
"exchanger": {
|
||||
"class": "com.alibaba.datax.core.plugin.BufferedRecordExchanger",
|
||||
"bufferSize": 32
|
||||
}
|
||||
},
|
||||
"container": {
|
||||
"job": {
|
||||
"reportInterval": 10000
|
||||
},
|
||||
"taskGroup": {
|
||||
"channel": 5
|
||||
},
|
||||
"trace": {
|
||||
"enable": "false"
|
||||
}
|
||||
|
||||
},
|
||||
"statistics": {
|
||||
"collector": {
|
||||
"plugin": {
|
||||
"taskClass": "com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector",
|
||||
"maxDirtyNumber": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
@ -0,0 +1,19 @@
|
||||
package com.alibaba.datax.example.util;
|
||||
|
||||
import org.junit.Assert;
|
||||
import org.junit.Test;
|
||||
|
||||
/**
|
||||
* {@code Author} FuYouJ
|
||||
* {@code Date} 2023/8/19 21:38
|
||||
*/
|
||||
|
||||
public class PathUtilTest {
|
||||
|
||||
@Test
|
||||
public void testParseClassPathFile() {
|
||||
String path = "/pathTest.json";
|
||||
String absolutePathFromClassPath = PathUtil.getAbsolutePathFromClassPath(path);
|
||||
Assert.assertNotNull(absolutePathFromClassPath);
|
||||
}
|
||||
}
|
@ -0,0 +1 @@
|
||||
{}
|
43
datax-example/datax-example-neo4j/pom.xml
Normal file
43
datax-example/datax-example-neo4j/pom.xml
Normal file
@ -0,0 +1,43 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>datax-example-neo4j</artifactId>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>8</maven.compiler.source>
|
||||
<maven.compiler.target>8</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<test.container.version>1.17.6</test.container.version>
|
||||
<neo4j-java-driver.version>4.4.9</neo4j-java-driver.version>
|
||||
</properties>
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example-core</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>testcontainers</artifactId>
|
||||
<version>${test.container.version}</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>neo4jwriter</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example-streamreader</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
</project>
|
@ -0,0 +1,138 @@
|
||||
package com.alibaba.datax.example.neo4j;
|
||||
|
||||
import com.alibaba.datax.example.ExampleContainer;
|
||||
import com.alibaba.datax.example.util.PathUtil;
|
||||
import org.junit.After;
|
||||
import org.junit.Assert;
|
||||
import org.junit.Before;
|
||||
import org.junit.Test;
|
||||
import org.neo4j.driver.*;
|
||||
import org.neo4j.driver.types.Node;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.testcontainers.containers.GenericContainer;
|
||||
import org.testcontainers.containers.Network;
|
||||
import org.testcontainers.containers.output.Slf4jLogConsumer;
|
||||
import org.testcontainers.lifecycle.Startables;
|
||||
import org.testcontainers.shaded.org.awaitility.Awaitility;
|
||||
import org.testcontainers.utility.DockerImageName;
|
||||
import org.testcontainers.utility.DockerLoggerFactory;
|
||||
|
||||
import java.net.URI;
|
||||
import java.util.Arrays;
|
||||
import java.util.concurrent.TimeUnit;
|
||||
import java.util.stream.Stream;
|
||||
|
||||
/**
|
||||
* {@code Author} FuYouJ
|
||||
* {@code Date} 2023/8/19 21:48
|
||||
*/
|
||||
|
||||
public class StreamReader2Neo4jWriterTest {
|
||||
private static final Logger LOGGER = LoggerFactory.getLogger(StreamReader2Neo4jWriterTest.class);
|
||||
private static final String CONTAINER_IMAGE = "neo4j:5.9.0";
|
||||
|
||||
private static final String CONTAINER_HOST = "neo4j-host";
|
||||
private static final int HTTP_PORT = 7474;
|
||||
private static final int BOLT_PORT = 7687;
|
||||
private static final String CONTAINER_NEO4J_USERNAME = "neo4j";
|
||||
private static final String CONTAINER_NEO4J_PASSWORD = "Test@12343";
|
||||
private static final URI CONTAINER_URI = URI.create("neo4j://localhost:" + BOLT_PORT);
|
||||
|
||||
protected static final Network NETWORK = Network.newNetwork();
|
||||
|
||||
private GenericContainer<?> container;
|
||||
protected Driver neo4jDriver;
|
||||
protected Session neo4jSession;
|
||||
private static final int CHANNEL = 5;
|
||||
private static final int READER_NUM = 10;
|
||||
|
||||
@Before
|
||||
public void init() {
|
||||
DockerImageName imageName = DockerImageName.parse(CONTAINER_IMAGE);
|
||||
container =
|
||||
new GenericContainer<>(imageName)
|
||||
.withNetwork(NETWORK)
|
||||
.withNetworkAliases(CONTAINER_HOST)
|
||||
.withExposedPorts(HTTP_PORT, BOLT_PORT)
|
||||
.withEnv(
|
||||
"NEO4J_AUTH",
|
||||
CONTAINER_NEO4J_USERNAME + "/" + CONTAINER_NEO4J_PASSWORD)
|
||||
.withEnv("apoc.export.file.enabled", "true")
|
||||
.withEnv("apoc.import.file.enabled", "true")
|
||||
.withEnv("apoc.import.file.use_neo4j_config", "true")
|
||||
.withEnv("NEO4J_PLUGINS", "[\"apoc\"]")
|
||||
.withLogConsumer(
|
||||
new Slf4jLogConsumer(
|
||||
DockerLoggerFactory.getLogger(CONTAINER_IMAGE)));
|
||||
container.setPortBindings(
|
||||
Arrays.asList(
|
||||
String.format("%s:%s", HTTP_PORT, HTTP_PORT),
|
||||
String.format("%s:%s", BOLT_PORT, BOLT_PORT)));
|
||||
Startables.deepStart(Stream.of(container)).join();
|
||||
LOGGER.info("container started");
|
||||
Awaitility.given()
|
||||
.ignoreExceptions()
|
||||
.await()
|
||||
.atMost(30, TimeUnit.SECONDS)
|
||||
.untilAsserted(this::initConnection);
|
||||
}
|
||||
|
||||
//在neo4jWriter模块使用Example测试整个job,方便发现整个流程的代码问题
|
||||
@Test
|
||||
public void streamReader2Neo4j() {
|
||||
|
||||
deleteHistoryIfExist();
|
||||
|
||||
String path = "/streamreader2neo4j.json";
|
||||
String jobPath = PathUtil.getAbsolutePathFromClassPath(path);
|
||||
|
||||
ExampleContainer.start(jobPath);
|
||||
|
||||
//根据channel和reader的mock数据,校验结果集是否符合预期
|
||||
verifyWriteResult();
|
||||
}
|
||||
|
||||
private void deleteHistoryIfExist() {
|
||||
String query = "match (n:StreamReader) return n limit 1";
|
||||
String delete = "match (n:StreamReader) delete n";
|
||||
if (neo4jSession.run(query).hasNext()) {
|
||||
neo4jSession.run(delete);
|
||||
}
|
||||
}
|
||||
|
||||
private void verifyWriteResult() {
|
||||
int total = CHANNEL * READER_NUM;
|
||||
String query = "match (n:StreamReader) return n";
|
||||
Result run = neo4jSession.run(query);
|
||||
int count = 0;
|
||||
while (run.hasNext()) {
|
||||
Record record = run.next();
|
||||
Node node = record.get("n").asNode();
|
||||
if (node.hasLabel("StreamReader")) {
|
||||
count++;
|
||||
}
|
||||
}
|
||||
Assert.assertEquals(count, total);
|
||||
}
|
||||
@After
|
||||
public void destroy() {
|
||||
if (neo4jSession != null) {
|
||||
neo4jSession.close();
|
||||
}
|
||||
if (neo4jDriver != null) {
|
||||
neo4jDriver.close();
|
||||
}
|
||||
if (container != null) {
|
||||
container.close();
|
||||
}
|
||||
}
|
||||
|
||||
private void initConnection() {
|
||||
neo4jDriver =
|
||||
GraphDatabase.driver(
|
||||
CONTAINER_URI,
|
||||
AuthTokens.basic(CONTAINER_NEO4J_USERNAME, CONTAINER_NEO4J_PASSWORD));
|
||||
neo4jSession = neo4jDriver.session(SessionConfig.forDatabase("neo4j"));
|
||||
}
|
||||
}
|
@ -0,0 +1,51 @@
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "streamreader",
|
||||
"parameter": {
|
||||
"sliceRecordCount": 10,
|
||||
"column": [
|
||||
{
|
||||
"type": "string",
|
||||
"value": "StreamReader"
|
||||
},
|
||||
{
|
||||
"type": "string",
|
||||
"value": "1997"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "neo4jWriter",
|
||||
"parameter": {
|
||||
"uri": "bolt://localhost:7687",
|
||||
"username":"neo4j",
|
||||
"password":"Test@12343",
|
||||
"database":"neo4j",
|
||||
"cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ",
|
||||
"batchDataVariableName": "batch",
|
||||
"batchSize": "3",
|
||||
"properties": [
|
||||
{
|
||||
"name": "Label",
|
||||
"type": "string"
|
||||
},
|
||||
{
|
||||
"name": "id",
|
||||
"type": "STRING"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
37
datax-example/datax-example-streamreader/pom.xml
Normal file
37
datax-example/datax-example-streamreader/pom.xml
Normal file
@ -0,0 +1,37 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>datax-example-streamreader</artifactId>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>8</maven.compiler.source>
|
||||
<maven.compiler.target>8</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
</properties>
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-example-core</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>streamreader</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>streamwriter</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
</project>
|
@ -0,0 +1,19 @@
|
||||
package com.alibaba.datax.example.streamreader;
|
||||
|
||||
import com.alibaba.datax.example.ExampleContainer;
|
||||
import com.alibaba.datax.example.util.PathUtil;
|
||||
import org.junit.Test;
|
||||
|
||||
/**
|
||||
* {@code Author} FuYouJ
|
||||
* {@code Date} 2023/8/14 20:16
|
||||
*/
|
||||
|
||||
public class StreamReader2StreamWriterTest {
|
||||
@Test
|
||||
public void testStreamReader2StreamWriter() {
|
||||
String path = "/stream2stream.json";
|
||||
String jobPath = PathUtil.getAbsolutePathFromClassPath(path);
|
||||
ExampleContainer.start(jobPath);
|
||||
}
|
||||
}
|
@ -0,0 +1,36 @@
|
||||
{
|
||||
"job": {
|
||||
"content": [
|
||||
{
|
||||
"reader": {
|
||||
"name": "streamreader",
|
||||
"parameter": {
|
||||
"sliceRecordCount": 10,
|
||||
"column": [
|
||||
{
|
||||
"type": "long",
|
||||
"value": "10"
|
||||
},
|
||||
{
|
||||
"type": "string",
|
||||
"value": "hello,你好,世界-DataX"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
"name": "streamwriter",
|
||||
"parameter": {
|
||||
"encoding": "UTF-8",
|
||||
"print": true
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"setting": {
|
||||
"speed": {
|
||||
"channel": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
107
datax-example/doc/README.md
Normal file
107
datax-example/doc/README.md
Normal file
@ -0,0 +1,107 @@
|
||||
## [DataX-Example]调试datax插件的模块
|
||||
|
||||
### 为什么要开发这个模块
|
||||
|
||||
一般使用DataX启动数据同步任务是从datax.py 脚本开始,获取程序datax包目录设置到系统变量datax.home里,此后系统核心插件的加载,配置初始化均依赖于变量datax.home,这带来了一些麻烦,以一次本地 DeBug streamreader 插件为例。
|
||||
|
||||
- maven 打包 datax 生成 datax 目录
|
||||
- 在 IDE 中 设置系统环境变量 datax.home,或者在Engine启动类中硬编码设置datax.home。
|
||||
- 修改插件 streamreader 代码
|
||||
- 再次 maven 打包,使JarLoader 能够加载到最新的 streamreader 代码。
|
||||
- 调试代码
|
||||
|
||||
在以上步骤中,打包完全不必要且最耗时,等待打包也最煎熬。
|
||||
|
||||
所以我编写一个新的模块(datax-example),此模块特用于本地调试和复现 BUG。如果模块顺利编写完成,那么以上流程将被简化至两步。
|
||||
|
||||
- 修改插件 streamreader 代码。
|
||||
- 调试代码
|
||||
|
||||
<img src="img/img01.png" alt="img" style="zoom:40%;" />
|
||||
|
||||
### 目录结构
|
||||
该目录结构演示了如何使用datax-example-core编写测试用例,和校验代码流程。
|
||||
<img src="img/img03.png" alt="img" style="zoom:100%;" />
|
||||
|
||||
### 实现原理
|
||||
|
||||
- 不修改原有的ConfigParer,使用新的ExampleConfigParser,仅用于example模块。他不依赖datax.home,而是依赖ide编译后的target目录
|
||||
- 将ide的target目录作为每个插件的目录类加载目录。
|
||||
|
||||

|
||||
|
||||
### 如何使用
|
||||
1.修改插件的pom文件,做如下改动。以streamreader为例。<br/>
|
||||
改动前
|
||||
```xml
|
||||
<build>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
```
|
||||
改动后
|
||||
```xml
|
||||
<build>
|
||||
<resources>
|
||||
<!--将resource目录也输出到target-->
|
||||
<resource>
|
||||
<directory>src/main/resources</directory>
|
||||
<includes>
|
||||
<include>**/*.*</include>
|
||||
</includes>
|
||||
<filtering>true</filtering>
|
||||
</resource>
|
||||
</resources>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
```
|
||||
#### 在测试模块模块使用
|
||||
参考datax-example/datax-example-streamreader的StreamReader2StreamWriterTest.java
|
||||
```java
|
||||
public class StreamReader2StreamWriterTest {
|
||||
@Test
|
||||
public void testStreamReader2StreamWriter() {
|
||||
String path = "/stream2stream.json";
|
||||
String jobPath = PathUtil.getAbsolutePathFromClassPath(path);
|
||||
ExampleContainer.start(jobPath);
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
参考datax-example/datax-example-neo4j的StreamReader2Neo4jWriterTest
|
||||
```java
|
||||
public class StreamReader2Neo4jWriterTest{
|
||||
@Test
|
||||
public void streamReader2Neo4j() {
|
||||
|
||||
deleteHistoryIfExist();
|
||||
|
||||
String path = "/streamreader2neo4j.json";
|
||||
String jobPath = PathUtil.getAbsolutePathFromClassPath(path);
|
||||
|
||||
ExampleContainer.start(jobPath);
|
||||
|
||||
//根据channel和reader的mock数据,校验结果集是否符合预期
|
||||
verifyWriteResult();
|
||||
}
|
||||
}
|
||||
```
|
BIN
datax-example/doc/img/img01.png
Normal file
BIN
datax-example/doc/img/img01.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 71 KiB |
BIN
datax-example/doc/img/img02.png
Normal file
BIN
datax-example/doc/img/img02.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 66 KiB |
BIN
datax-example/doc/img/img03.png
Normal file
BIN
datax-example/doc/img/img03.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 43 KiB |
68
datax-example/pom.xml
Normal file
68
datax-example/pom.xml
Normal file
@ -0,0 +1,68 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-all</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>datax-example</artifactId>
|
||||
<packaging>pom</packaging>
|
||||
<modules>
|
||||
<module>datax-example-core</module>
|
||||
<module>datax-example-streamreader</module>
|
||||
<module>datax-example-neo4j</module>
|
||||
</modules>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>8</maven.compiler.source>
|
||||
<maven.compiler.target>8</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<junit4.version>4.13.2</junit4.version>
|
||||
</properties>
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-common</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.alibaba.datax</groupId>
|
||||
<artifactId>datax-core</artifactId>
|
||||
<version>0.0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>junit</groupId>
|
||||
<artifactId>junit</artifactId>
|
||||
<version>${junit4.version}</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<build>
|
||||
<resources>
|
||||
<resource>
|
||||
<directory>src/main/resources</directory>
|
||||
<includes>
|
||||
<include>**/*.*</include>
|
||||
</includes>
|
||||
<filtering>true</filtering>
|
||||
</resource>
|
||||
</resources>
|
||||
<plugins>
|
||||
<!-- compiler plugin -->
|
||||
<plugin>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<configuration>
|
||||
<source>${jdk-version}</source>
|
||||
<target>${jdk-version}</target>
|
||||
<encoding>${project-sourceEncoding}</encoding>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
|
||||
</project>
|
@ -447,6 +447,9 @@ DataX的内部类型在实现上会选用不同的java类型:
|
||||
3. 用户在插件中在`reader`/`writer`配置的`name`字段指定插件名字。框架根据插件的类型(`reader`/`writer`)和插件名称去插件的路径下扫描所有的jar,加入`classpath`。
|
||||
4. 根据插件配置中定义的入口类,框架通过反射实例化对应的`Job`和`Task`对象。
|
||||
|
||||
### 编写测试用例
|
||||
1. 在datax-example工程下新建新的插件测试模块,调用`ExampleContainer.start(jobPath)`方法来检测你的代码逻辑是否正确。[datax-example使用](https://github.com/alibaba/DataX/datax-example/doc/README.md)
|
||||
|
||||
|
||||
## 三、Last but not Least
|
||||
|
||||
|
@ -29,9 +29,11 @@
|
||||
"postSql": [],
|
||||
"preSql": [],
|
||||
"connection": [
|
||||
{
|
||||
"jdbcUrl":"jdbc:mysql://192.168.1.1:9030/",
|
||||
"table":["xxx"],
|
||||
"selectedDatabase":"xxxx"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
@ -1,7 +1,7 @@
|
||||
package com.alibaba.datax.plugin.writer.doriswriter;
|
||||
|
||||
import com.alibaba.datax.common.element.Record;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
|
||||
import java.util.HashMap;
|
||||
import java.util.List;
|
||||
|
@ -1,6 +1,6 @@
|
||||
package com.alibaba.datax.plugin.writer.doriswriter;
|
||||
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import org.apache.commons.codec.binary.Base64;
|
||||
import org.apache.http.HttpEntity;
|
||||
import org.apache.http.HttpHeaders;
|
||||
|
@ -5,8 +5,8 @@ import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfo;
|
||||
import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfoResult;
|
||||
import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.PutMapping7;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
import com.google.gson.Gson;
|
||||
import com.google.gson.JsonElement;
|
||||
import com.google.gson.JsonObject;
|
||||
@ -53,6 +53,8 @@ public class ElasticSearchClient {
|
||||
public ElasticSearchClient(Configuration conf) {
|
||||
this.conf = conf;
|
||||
String endpoint = Key.getEndpoint(conf);
|
||||
//es是支持集群写入的
|
||||
String[] endpoints = endpoint.split(",");
|
||||
String user = Key.getUsername(conf);
|
||||
String passwd = Key.getPassword(conf);
|
||||
boolean multiThread = Key.isMultiThread(conf);
|
||||
@ -63,7 +65,7 @@ public class ElasticSearchClient {
|
||||
int totalConnection = this.conf.getInt("maxTotalConnection", 200);
|
||||
JestClientFactory factory = new JestClientFactory();
|
||||
Builder httpClientConfig = new HttpClientConfig
|
||||
.Builder(endpoint)
|
||||
.Builder(Arrays.asList(endpoints))
|
||||
// .setPreemptiveAuth(new HttpHost(endpoint))
|
||||
.multiThreaded(multiThread)
|
||||
.connTimeout(readTimeout)
|
||||
|
@ -9,11 +9,11 @@ import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.common.util.DataXCaseEnvUtil;
|
||||
import com.alibaba.datax.common.util.RetryUtil;
|
||||
import com.alibaba.datax.plugin.writer.elasticsearchwriter.Key.ActionType;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONArray;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson.serializer.SerializerFeature;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONArray;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import com.alibaba.fastjson2.JSONWriter;
|
||||
import com.google.common.base.Joiner;
|
||||
import io.searchbox.client.JestResult;
|
||||
import io.searchbox.core.*;
|
||||
@ -927,9 +927,8 @@ public class ElasticSearchWriter extends Writer {
|
||||
Index.Builder builder = null;
|
||||
if (this.enableWriteNull) {
|
||||
builder = new Index.Builder(
|
||||
JSONObject.toJSONString(data, SerializerFeature.WriteMapNullValue,
|
||||
SerializerFeature.QuoteFieldNames, SerializerFeature.SkipTransientField,
|
||||
SerializerFeature.WriteEnumUsingToString, SerializerFeature.SortField));
|
||||
JSONObject.toJSONString(data, JSONWriter.Feature.WriteMapNullValue,
|
||||
JSONWriter.Feature.WriteEnumUsingToString));
|
||||
} else {
|
||||
builder = new Index.Builder(JSONObject.toJSONString(data));
|
||||
}
|
||||
@ -958,9 +957,8 @@ public class ElasticSearchWriter extends Writer {
|
||||
if (this.enableWriteNull) {
|
||||
// write: {a:"1",b:null}
|
||||
update = new Update.Builder(
|
||||
JSONObject.toJSONString(updateDoc, SerializerFeature.WriteMapNullValue,
|
||||
SerializerFeature.QuoteFieldNames, SerializerFeature.SkipTransientField,
|
||||
SerializerFeature.WriteEnumUsingToString, SerializerFeature.SortField));
|
||||
JSONObject.toJSONString(updateDoc, JSONWriter.Feature.WriteMapNullValue,
|
||||
JSONWriter.Feature.WriteEnumUsingToString));
|
||||
// 在DEFAULT_GENERATE_FEATURE基础上,只增加了SerializerFeature.WRITE_MAP_NULL_FEATURES
|
||||
} else {
|
||||
// write: {"a":"1"}
|
||||
|
@ -2,7 +2,7 @@ package com.alibaba.datax.plugin.writer.elasticsearchwriter;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
|
||||
public class JsonPathUtil {
|
||||
|
||||
|
@ -1,8 +1,8 @@
|
||||
package com.alibaba.datax.plugin.writer.elasticsearchwriter;
|
||||
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONException;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONException;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
|
||||
/**
|
||||
* @author bozu
|
||||
|
@ -1,8 +1,8 @@
|
||||
package com.alibaba.datax.plugin.writer.elasticsearchwriter;
|
||||
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
|
||||
|
@ -45,7 +45,7 @@
|
||||
<dependency>
|
||||
<groupId>com.jcraft</groupId>
|
||||
<artifactId>jsch</artifactId>
|
||||
<version>0.1.51</version>
|
||||
<version>0.1.54</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>commons-net</groupId>
|
||||
@ -89,4 +89,4 @@
|
||||
</plugins>
|
||||
</build>
|
||||
|
||||
</project>
|
||||
</project>
|
||||
|
@ -64,6 +64,8 @@ public class SftpHelper extends FtpHelper {
|
||||
String message = String.format("请确认连接ftp服务器端口是否正确,错误的端口: [%s] ", port);
|
||||
LOG.error(message);
|
||||
throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e);
|
||||
}else{
|
||||
throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, "", e);
|
||||
}
|
||||
}else {
|
||||
if("Auth fail".equals(e.getMessage())){
|
||||
|
@ -24,7 +24,7 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结
|
||||
|
||||
我们不能做到:
|
||||
|
||||
1. 单个文件不能支持并发写入。
|
||||
1. 单个文件并发写入。
|
||||
|
||||
|
||||
## 3 功能说明
|
||||
|
@ -45,7 +45,7 @@
|
||||
<dependency>
|
||||
<groupId>com.jcraft</groupId>
|
||||
<artifactId>jsch</artifactId>
|
||||
<version>0.1.51</version>
|
||||
<version>0.1.54</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>commons-net</groupId>
|
||||
|
@ -14,8 +14,8 @@ import org.slf4j.LoggerFactory;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.serializer.SerializerFeature;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONWriter;
|
||||
import com.jcraft.jsch.ChannelSftp;
|
||||
import com.jcraft.jsch.JSch;
|
||||
import com.jcraft.jsch.JSchException;
|
||||
@ -251,7 +251,7 @@ public class SftpHelperImpl implements IFtpHelper {
|
||||
@SuppressWarnings("rawtypes")
|
||||
Vector allFiles = this.channelSftp.ls(dir);
|
||||
LOG.debug(String.format("ls: %s", JSON.toJSONString(allFiles,
|
||||
SerializerFeature.UseSingleQuotes)));
|
||||
JSONWriter.Feature.UseSingleQuotes)));
|
||||
for (int i = 0; i < allFiles.size(); i++) {
|
||||
LsEntry le = (LsEntry) allFiles.get(i);
|
||||
String strName = le.getFilename();
|
||||
|
@ -18,8 +18,8 @@ import org.slf4j.LoggerFactory;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.serializer.SerializerFeature;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONWriter;
|
||||
|
||||
public class StandardFtpHelperImpl implements IFtpHelper {
|
||||
private static final Logger LOG = LoggerFactory
|
||||
@ -244,7 +244,7 @@ public class StandardFtpHelperImpl implements IFtpHelper {
|
||||
FTPFile[] fs = this.ftpClient.listFiles(dir);
|
||||
// LOG.debug(JSON.toJSONString(this.ftpClient.listNames(dir)));
|
||||
LOG.debug(String.format("ls: %s",
|
||||
JSON.toJSONString(fs, SerializerFeature.UseSingleQuotes)));
|
||||
JSON.toJSONString(fs, JSONWriter.Feature.UseSingleQuotes)));
|
||||
for (FTPFile ff : fs) {
|
||||
String strName = ff.getName();
|
||||
if (strName.startsWith(prefixFileName)) {
|
||||
|
@ -19,8 +19,8 @@ import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge;
|
||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex;
|
||||
import com.alibaba.fastjson.JSONArray;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSONArray;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
|
||||
import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
|
@ -12,8 +12,8 @@ import org.apache.commons.lang3.StringUtils;
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.JSONObject;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.JSONObject;
|
||||
|
||||
/**
|
||||
* @author jerrywang
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase094xreader;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.apache.commons.lang3.Validate;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase094xwriter;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.apache.commons.lang3.Validate;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase11xreader;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.apache.commons.lang3.Validate;
|
||||
import org.apache.hadoop.hbase.HBaseConfiguration;
|
||||
|
@ -60,12 +60,16 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
//填写连接Phoenix的hbase集群zk地址
|
||||
"hbaseConfig": {
|
||||
"hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com"
|
||||
},
|
||||
},
|
||||
//填写要读取的phoenix的命名空间
|
||||
"schema": "TAG",
|
||||
//填写要读取的phoenix的表名
|
||||
"table": "US_POPULATION",
|
||||
//填写要读取的列名,不填读取所有列
|
||||
"column": [
|
||||
]
|
||||
],
|
||||
//查询条件
|
||||
"where": "id="
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
@ -92,11 +96,18 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
* **schema**
|
||||
|
||||
* 描述:编写Phoenix中的namespace,该值设置为''
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **table**
|
||||
|
||||
* 描述:编写Phoenix中的表名,如果有namespace,该值设置为'namespace.tablename'
|
||||
* 描述:编写Phoenix中的表名,该值设置为'tablename'
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
@ -109,7 +120,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
* **where**
|
||||
|
||||
* 描述:填写需要从phoenix表中读取条件判断。
|
||||
|
||||
* 可选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
### 3.3 类型转换
|
||||
|
||||
@ -172,11 +189,14 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
"hbaseConfig": {
|
||||
"hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com"
|
||||
},
|
||||
"schema": "TAG",
|
||||
//填写要读取的phoenix的表名
|
||||
"table": "US_POPULATION",
|
||||
//填写要读取的列名,不填读取所有列
|
||||
"column": [
|
||||
]
|
||||
],
|
||||
//查询条件
|
||||
"where": "id="
|
||||
}
|
||||
},
|
||||
"writer": {
|
||||
@ -204,7 +224,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
* **schema**
|
||||
|
||||
* 描述:编写Phoenix中的namespace,该值设置为''
|
||||
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
* **table**
|
||||
|
||||
* 描述:编写Phoenix中的表名,如果有namespace,该值设置为'namespace.tablename'
|
||||
@ -220,7 +246,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实
|
||||
* 必选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
* **where**
|
||||
|
||||
* 描述:填写需要从phoenix表中读取条件判断。
|
||||
|
||||
* 可选:是 <br />
|
||||
|
||||
* 默认值:无 <br />
|
||||
|
||||
### 3.3 类型转换
|
||||
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase11xsqlreader;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.hadoop.hbase.HConstants;
|
||||
import org.apache.hadoop.hbase.util.Pair;
|
||||
import org.apache.hadoop.mapreduce.InputSplit;
|
||||
@ -26,9 +26,7 @@ import java.io.IOException;
|
||||
import java.sql.Connection;
|
||||
import java.sql.DriverManager;
|
||||
import java.sql.SQLException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.*;
|
||||
|
||||
|
||||
public class HbaseSQLHelper {
|
||||
@ -50,11 +48,15 @@ public class HbaseSQLHelper {
|
||||
String zkUrl = readerConfig.getZkUrl();
|
||||
|
||||
PhoenixConfigurationUtil.setInputClass(conf, PhoenixRecordWritable.class);
|
||||
PhoenixConfigurationUtil.setInputTableName(conf, table);
|
||||
|
||||
PhoenixConfigurationUtil.setInputTableName(conf, readerConfig.getSchema()+"."+table);
|
||||
|
||||
if (!columns.isEmpty()) {
|
||||
PhoenixConfigurationUtil.setSelectColumnNames(conf, columns.toArray(new String[columns.size()]));
|
||||
}
|
||||
if(Objects.nonNull(readerConfig.getWhere())){
|
||||
PhoenixConfigurationUtil.setInputTableConditions(conf,readerConfig.getWhere());
|
||||
}
|
||||
PhoenixEmbeddedDriver.ConnectionInfo info = null;
|
||||
try {
|
||||
info = PhoenixEmbeddedDriver.ConnectionInfo.create(zkUrl);
|
||||
@ -67,15 +69,19 @@ public class HbaseSQLHelper {
|
||||
conf.setInt(HConstants.ZOOKEEPER_CLIENT_PORT, info.getPort());
|
||||
if (info.getRootNode() != null)
|
||||
conf.set(HConstants.ZOOKEEPER_ZNODE_PARENT, info.getRootNode());
|
||||
conf.set(Key.NAME_SPACE_MAPPING_ENABLED,"true");
|
||||
conf.set(Key.SYSTEM_TABLES_TO_NAMESPACE,"true");
|
||||
return conf;
|
||||
}
|
||||
|
||||
public static List<String> getPColumnNames(String connectionString, String tableName) throws SQLException {
|
||||
Connection con =
|
||||
DriverManager.getConnection(connectionString);
|
||||
public static List<String> getPColumnNames(String connectionString, String tableName,String schema) throws SQLException {
|
||||
Properties pro = new Properties();
|
||||
pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true);
|
||||
pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true);
|
||||
Connection con = DriverManager.getConnection(connectionString,pro);
|
||||
PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class);
|
||||
MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection);
|
||||
PTable table = metaDataClient.updateCache("", tableName).getTable();
|
||||
PTable table = metaDataClient.updateCache(schema, tableName).getTable();
|
||||
List<String> columnNames = new ArrayList<String>();
|
||||
for (PColumn pColumn : table.getColumns()) {
|
||||
if (!pColumn.getName().getString().equals(SaltingUtil.SALTING_COLUMN_NAME))
|
||||
|
@ -9,6 +9,7 @@ import org.slf4j.LoggerFactory;
|
||||
|
||||
import java.sql.SQLException;
|
||||
import java.util.List;
|
||||
import java.util.StringJoiner;
|
||||
|
||||
public class HbaseSQLReaderConfig {
|
||||
private final static Logger LOG = LoggerFactory.getLogger(HbaseSQLReaderConfig.class);
|
||||
@ -27,6 +28,9 @@ public class HbaseSQLReaderConfig {
|
||||
private String tableName;
|
||||
private List<String> columns; // 目的表的所有列的列名,包括主键和非主键,不包括时间列
|
||||
|
||||
private String where;//条件
|
||||
|
||||
private String schema;//
|
||||
/**
|
||||
* @return 获取原始的datax配置
|
||||
*/
|
||||
@ -96,22 +100,27 @@ public class HbaseSQLReaderConfig {
|
||||
}
|
||||
String zkQuorum = zkCfg.getFirst();
|
||||
String znode = zkCfg.getSecond();
|
||||
|
||||
if (zkQuorum == null || zkQuorum.isEmpty()) {
|
||||
throw DataXException.asDataXException(
|
||||
HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的hbase.zookeeper.quorum配置不能为空" );
|
||||
}
|
||||
// 生成sql使用的连接字符串, 格式: jdbc:hbase:zk_quorum:2181:/znode_parent
|
||||
cfg.connectionString = "jdbc:phoenix:" + zkQuorum;
|
||||
cfg.zkUrl = zkQuorum + ":2181";
|
||||
StringBuilder connectionString=new StringBuilder("jdbc:phoenix:");
|
||||
connectionString.append(zkQuorum);
|
||||
cfg.connectionString = connectionString.toString();
|
||||
StringBuilder zkUrl =new StringBuilder(zkQuorum);
|
||||
cfg.zkUrl = zkUrl.append(":2181").toString();
|
||||
if (!znode.isEmpty()) {
|
||||
cfg.connectionString += cfg.connectionString + ":" + znode;
|
||||
cfg.zkUrl += cfg.zkUrl + ":" + znode;
|
||||
cfg.connectionString = connectionString.append(":").append(znode).toString();
|
||||
cfg.zkUrl=zkUrl.append(":").append(znode).toString();
|
||||
}
|
||||
}
|
||||
|
||||
private static void parseTableConfig(HbaseSQLReaderConfig cfg, Configuration dataxCfg) {
|
||||
// 解析并检查表名
|
||||
cfg.tableName = dataxCfg.getString(Key.TABLE);
|
||||
cfg.schema = dataxCfg.getString(Key.SCHEMA);
|
||||
if (cfg.tableName == null || cfg.tableName.isEmpty()) {
|
||||
throw DataXException.asDataXException(
|
||||
HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的tableName配置不能为空,请检查并修改配置." );
|
||||
@ -124,13 +133,14 @@ public class HbaseSQLReaderConfig {
|
||||
HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "您配置的tableName含有非法字符{0},请检查您的配置.");
|
||||
} else if (cfg.columns.isEmpty()) {
|
||||
try {
|
||||
cfg.columns = HbaseSQLHelper.getPColumnNames(cfg.connectionString, cfg.tableName);
|
||||
cfg.columns = HbaseSQLHelper.getPColumnNames(cfg.connectionString, cfg.tableName,cfg.schema);
|
||||
dataxCfg.set(Key.COLUMN, cfg.columns);
|
||||
} catch (SQLException e) {
|
||||
throw DataXException.asDataXException(
|
||||
HbaseSQLReaderErrorCode.GET_PHOENIX_COLUMN_ERROR, "HBase的columns配置不能为空,请添加目标表的列名配置." + e.getMessage(), e);
|
||||
}
|
||||
}
|
||||
cfg.where=dataxCfg.getString(Key.WHERE);
|
||||
}
|
||||
|
||||
@Override
|
||||
@ -151,6 +161,8 @@ public class HbaseSQLReaderConfig {
|
||||
ret.append(",");
|
||||
}
|
||||
ret.setLength(ret.length() - 1);
|
||||
ret.append("[where=]").append(getWhere());
|
||||
ret.append("[schema=]").append(getSchema());
|
||||
ret.append("\n");
|
||||
|
||||
return ret.toString();
|
||||
@ -161,4 +173,20 @@ public class HbaseSQLReaderConfig {
|
||||
*/
|
||||
private HbaseSQLReaderConfig() {
|
||||
}
|
||||
|
||||
public String getWhere() {
|
||||
return where;
|
||||
}
|
||||
|
||||
public void setWhere(String where) {
|
||||
this.where = where;
|
||||
}
|
||||
|
||||
public String getSchema() {
|
||||
return schema;
|
||||
}
|
||||
|
||||
public void setSchema(String schema) {
|
||||
this.schema = schema;
|
||||
}
|
||||
}
|
||||
|
@ -19,10 +19,8 @@ import org.slf4j.LoggerFactory;
|
||||
import java.io.IOException;
|
||||
import java.math.BigDecimal;
|
||||
import java.sql.*;
|
||||
import java.util.HashMap;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.sql.Date;
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* Created by admin on 1/3/18.
|
||||
@ -42,11 +40,14 @@ public class HbaseSQLReaderTask {
|
||||
}
|
||||
|
||||
private void getPColumns() throws SQLException {
|
||||
Properties pro = new Properties();
|
||||
pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true);
|
||||
pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true);
|
||||
Connection con =
|
||||
DriverManager.getConnection(this.readerConfig.getConnectionString());
|
||||
DriverManager.getConnection(this.readerConfig.getConnectionString(),pro);
|
||||
PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class);
|
||||
MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection);
|
||||
PTable table = metaDataClient.updateCache("", this.readerConfig.getTableName()).getTable();
|
||||
PTable table = metaDataClient.updateCache(this.readerConfig.getSchema(), this.readerConfig.getTableName()).getTable();
|
||||
List<String> columnNames = this.readerConfig.getColumns();
|
||||
for (PColumn pColumn : table.getColumns()) {
|
||||
if (columnNames.contains(pColumn.getName().getString())) {
|
||||
|
@ -24,5 +24,18 @@ public final class Key {
|
||||
* 【必选】列配置
|
||||
*/
|
||||
public final static String COLUMN = "column";
|
||||
/**
|
||||
*
|
||||
*/
|
||||
public static final String WHERE = "where";
|
||||
|
||||
/**
|
||||
* 【可选】Phoenix表所属schema,默认为空
|
||||
*/
|
||||
public static final String SCHEMA = "schema";
|
||||
|
||||
public static final String NAME_SPACE_MAPPING_ENABLED = "phoenix.schema.isNamespaceMappingEnabled";
|
||||
|
||||
public static final String SYSTEM_TABLES_TO_NAMESPACE = "phoenix.schema.mapSystemTablesToNamespace";
|
||||
|
||||
}
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.hadoop.hbase.TableName;
|
||||
import org.apache.hadoop.hbase.client.Admin;
|
||||
import org.apache.hadoop.hbase.util.Pair;
|
||||
|
@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase11xwriter;
|
||||
|
||||
import com.alibaba.datax.common.exception.DataXException;
|
||||
import com.alibaba.datax.common.util.Configuration;
|
||||
import com.alibaba.fastjson.JSON;
|
||||
import com.alibaba.fastjson.TypeReference;
|
||||
import com.alibaba.fastjson2.JSON;
|
||||
import com.alibaba.fastjson2.TypeReference;
|
||||
import org.apache.commons.lang3.StringUtils;
|
||||
import org.apache.commons.lang3.Validate;
|
||||
import org.apache.hadoop.hbase.HBaseConfiguration;
|
||||
|
@ -166,20 +166,20 @@ HdfsReader实现了从Hadoop分布式文件系统Hdfs中读取文件数据并转
|
||||
默认情况下,用户可以全部按照String类型读取数据,配置如下:
|
||||
|
||||
```json
|
||||
"column": ["*"]
|
||||
"column": ["*"]
|
||||
```
|
||||
|
||||
用户可以指定Column字段信息,配置如下:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "long",
|
||||
"index": 0 //从本地文件文本第一列获取int字段
|
||||
},
|
||||
{
|
||||
"type": "string",
|
||||
"value": "alibaba" //HdfsReader内部生成alibaba的字符串字段作为当前字段
|
||||
}
|
||||
{
|
||||
"type": "long",
|
||||
"index": 0 //从本地文件文本第一列获取int字段
|
||||
},
|
||||
{
|
||||
"type": "string",
|
||||
"value": "alibaba" //HdfsReader内部生成alibaba的字符串字段作为当前字段
|
||||
}
|
||||
```
|
||||
|
||||
对于用户指定Column信息,type必须填写,index/value必须选择其一。
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user