diff --git a/README.md b/README.md index b3aa35bb..17f87308 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ ![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/DataX-logo.jpg) - # DataX -DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。 +[![Leaderboard](https://img.shields.io/badge/DataX-%E6%9F%A5%E7%9C%8B%E8%B4%A1%E7%8C%AE%E6%8E%92%E8%A1%8C%E6%A6%9C-orange)](https://opensource.alibaba.com/contribution_leaderboard/details?projectValue=datax) + +DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS, databend 等各种异构数据源之间高效的数据同步功能。 # DataX 商业版本 -阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。 +阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。 商业版本参见: https://www.aliyun.com/product/bigdata/ide @@ -25,7 +26,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 # Quick Start -##### Download [DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/20220530/datax.tar.gz) +##### Download [DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202308/datax.tar.gz) ##### 请点击:[Quick Start](https://github.com/alibaba/DataX/blob/master/userGuid.md) @@ -36,35 +37,49 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) -| 类型 | 数据源 | Reader(读) | Writer(写) |文档| -| ------------ | ---------- | :-------: | :-------: |:-------: | -| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| -|             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| -|             | OceanBase  |     √     |     √     |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)| -| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| -| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| -| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| -| | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md)| -| 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md)| -| | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)| -| | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)| -| | OCS | | √ |[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)| -| NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)| -| | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)| -| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)| -| | Phoenix4.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md)| -| | Phoenix5.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md)| -| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md)| -| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| -| | Cassandra | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md)| -| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)| -| | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)| -| | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| -| | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)| -| 时间序列数据库 | OpenTSDB | √ | |[读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md)| -| | TSDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md)| -| | TDengine2.0 | √ | √ |[读](https://github.com/taosdata/DataX/blob/master/tdengine20reader/doc/tdengine20reader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdengine20writer/doc/tdengine20writer-CN.md)| -| | TDengine3.0 | √ | √ |[读](https://github.com/taosdata/DataX/blob/master/tdengine30reader/doc/tdengine30reader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdengine30writer/doc/tdengine30writer-CN.md)| + +| 类型 | 数据源 | Reader(读) | Writer(写) | 文档 | +|--------------|---------------------------|:---------:|:---------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| RDBMS 关系型数据库 | MySQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md) | +| | Oracle | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md) | +| | OceanBase | √ | √ | [读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) | +| | SQLServer | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md) | +| | PostgreSQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md) | +| | DRDS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) | +| | Kingbase | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) | +| | 通用RDBMS(支持所有关系型数据库) | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md) | +| 阿里云数仓数据存储 | ODPS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md) | +| | ADB | | √ | [写](https://github.com/alibaba/DataX/blob/master/adbmysqlwriter/doc/adbmysqlwriter.md) | +| | ADS | | √ | [写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md) | +| | OSS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md) | +| | OCS | | √ | [写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md) | +| | Hologres | | √ | [写](https://github.com/alibaba/DataX/blob/master/hologresjdbcwriter/doc/hologresjdbcwriter.md) | +| | AnalyticDB For PostgreSQL | | √ | 写 | +| 阿里云中间件 | datahub | √ | √ | 读 、写 | +| | SLS | √ | √ | 读 、写 | +| 图数据库 | 阿里云 GDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/gdbreader/doc/gdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/gdbwriter/doc/gdbwriter.md) | +| | Neo4j | | √ | [写](https://github.com/alibaba/DataX/blob/master/neo4jwriter/doc/neo4jwriter.md) | +| NoSQL数据存储 | OTS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md) | +| | Hbase0.94 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md) | +| | Hbase1.1 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md) | +| | Phoenix4.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md) | +| | Phoenix5.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md) | +| | MongoDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md) | +| | Cassandra | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md) | +| 数仓数据存储 | StarRocks | √ | √ | 读 、[写](https://github.com/alibaba/DataX/blob/master/starrockswriter/doc/starrockswriter.md) | +| | ApacheDoris | | √ | [写](https://github.com/alibaba/DataX/blob/master/doriswriter/doc/doriswriter.md) | +| | ClickHouse | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/clickhousereader/doc/clickhousereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/clickhousewriter/doc/clickhousewriter.md) | +| | Databend | | √ | [写](https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md) | +| | Hive | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | +| | kudu | | √ | [写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | +| | selectdb | | √ | [写](https://github.com/alibaba/DataX/blob/master/selectdbwriter/doc/selectdbwriter.md) | +| 无结构化数据存储 | TxtFile | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md) | +| | FTP | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md) | +| | HDFS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | +| | Elasticsearch | | √ | [写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md) | +| 时间序列数据库 | OpenTSDB | √ | | [读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md) | +| | TSDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md) | +| | TDengine | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tdenginereader/doc/tdenginereader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdenginewriter/doc/tdenginewriter-CN.md) | # 阿里云DataWorks数据集成 @@ -86,7 +101,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N - 整库迁移:https://help.aliyun.com/document_detail/137809.html - 批量上云:https://help.aliyun.com/document_detail/146671.html - 更新更多能力请访问:https://help.aliyun.com/document_detail/137663.html - + - # 我要开发新的插件 @@ -96,6 +111,40 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N DataX 后续计划月度迭代更新,也欢迎感兴趣的同学提交 Pull requests,月度更新内容会介绍介绍如下。 +- [datax_v202309](https://github.com/alibaba/DataX/releases/tag/datax_v202309) + - 支持Phoenix 同步数据添加 where条件 + - 支持华为 GuassDB读写插件 + - 修复ClickReader 插件运行报错 Can't find bundle for base name + - 增加 DataX调试模块 + - 修复 orc空文件报错问题 + - 优化obwriter性能 + - txtfilewriter 增加导出为insert语句功能支持 + - HdfsReader/HdfsWriter 支持parquet读写能力 + +- [datax_v202308](https://github.com/alibaba/DataX/releases/tag/datax_v202308) + - OTS 插件更新 + - databend 插件更新 + - Oceanbase驱动修复 + + +- [datax_v202306](https://github.com/alibaba/DataX/releases/tag/datax_v202306) + - 精简代码 + - 新增插件(neo4jwriter、clickhousewriter) + - 优化插件、修复问题(oceanbase、hdfs、databend、txtfile) + + +- [datax_v202303](https://github.com/alibaba/DataX/releases/tag/datax_v202303) + - 精简代码 + - 新增插件(adbmysqlwriter、databendwriter、selectdbwriter) + - 优化插件、修复问题(sqlserver、hdfs、cassandra、kudu、oss) + - fastjson 升级到 fastjson2 + +- [datax_v202210](https://github.com/alibaba/DataX/releases/tag/datax_v202210) + - 涉及通道能力更新(OceanBase、Tdengine、Doris等) + +- [datax_v202209](https://github.com/alibaba/DataX/releases/tag/datax_v202209) + - 涉及通道能力更新(MaxCompute、Datahub、SLS等)、安全漏洞更新、通用打包更新等 + - [datax_v202205](https://github.com/alibaba/DataX/releases/tag/datax_v202205) - 涉及通道能力更新(MaxCompute、Hologres、OSS、Tdengine等)、安全漏洞更新、通用打包更新等 diff --git a/adbmysqlwriter/doc/adbmysqlwriter.md b/adbmysqlwriter/doc/adbmysqlwriter.md new file mode 100644 index 00000000..27ac6b10 --- /dev/null +++ b/adbmysqlwriter/doc/adbmysqlwriter.md @@ -0,0 +1,338 @@ +# DataX AdbMysqlWriter + + +--- + + +## 1 快速介绍 + +AdbMysqlWriter 插件实现了写入数据到 ADB MySQL 目的表的功能。在底层实现上, AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL,内部会分批次提交入库。 + +AdbMysqlWriter 面向ETL开发工程师,他们使用 AdbMysqlWriter 从数仓导入数据到 ADB MySQL。同时 AdbMysqlWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 + + +## 2 实现原理 + +AdbMysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL。 + + +* `insert into...`(遇到主键重复时会自动忽略当前写入数据,不做更新,作用等同于`insert ignore into`) + +##### 或者 + +* `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 MySQL。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。 + +
+ + 注意:整个任务至少需要具备 `insert/replace into...` 的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 这里使用一份从内存产生到 ADB MySQL 导入的数据。 + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + } + }, + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column" : [ + { + "value": "DataX", + "type": "string" + }, + { + "value": 19880808, + "type": "long" + }, + { + "value": "1988-08-08 08:08:08", + "type": "date" + }, + { + "value": true, + "type": "bool" + }, + { + "value": "test", + "type": "bytes" + } + ], + "sliceRecordCount": 1000 + } + }, + "writer": { + "name": "adbmysqlwriter", + "parameter": { + "writeMode": "replace", + "username": "root", + "password": "root", + "column": [ + "*" + ], + "preSql": [ + "truncate table @table" + ], + "connection": [ + { + "jdbcUrl": "jdbc:mysql://ip:port/database?useUnicode=true", + "table": [ + "test" + ] + } + ] + } + } + } + ] + } +} + +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true + + 注意:1、在一个数据库上只能配置一个 jdbcUrl + 2、一个 AdbMySQL 写入任务仅能配置一个 jdbcUrl + 3、jdbcUrl按照MySQL官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql官方文档或者咨询对应 DBA。 + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:目的数据库的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:目的数据库的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:目的表的表名称。只能配置一个 AdbMySQL 的表名称。 + + 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 + + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id", "name", "age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 + + **column配置项必须指定,不能留空!** + + 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 + 2、 column 不能配置任何常量值 + + * 必选:是
+ + * 默认值:否
+ +* **session** + + * 描述: DataX在获取 ADB MySQL 连接时,执行session指定的SQL语句,修改当前connection session属性 + + * 必须: 否 + + * 默认值: 空 + +* **preSql** + + * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 SQL 语句时,会对变量按照实际表名称进行替换。比如希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["truncate table @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 `truncate table 对应表名称`
+ + * 必选:否
+ + * 默认值:无
+ +* **postSql** + + * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
+ + * 必选:否
+ + * 默认值:无
+ +* **writeMode** + + * 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句
+ + * 必选:是
+ + * 所有选项:insert/replace/update
+ + * 默认值:replace
+ +* **batchSize** + + * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与 Adb MySQL 的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
+ + * 必选:否
+ + * 默认值:2048
+ + +### 3.3 类型转换 + +目前 AdbMysqlWriter 支持大部分 MySQL 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出 AdbMysqlWriter 针对 MySQL 类型转换列表: + +| DataX 内部类型 | AdbMysql 数据类型 | +|---------------|---------------------------------| +| Long | tinyint, smallint, int, bigint | +| Double | float, double, decimal | +| String | varchar | +| Date | date, time, datetime, timestamp | +| Boolean | boolean | +| Bytes | binary | + +## 4 性能报告 + +### 4.1 环境准备 + +#### 4.1.1 数据特征 +TPC-H 数据集 lineitem 表,共 17 个字段, 随机生成总记录行数 59986052。未压缩总数据量:7.3GiB + +建表语句: + + CREATE TABLE `datax_adbmysqlwriter_perf_lineitem` ( + `l_orderkey` bigint NOT NULL COMMENT '', + `l_partkey` int NOT NULL COMMENT '', + `l_suppkey` int NOT NULL COMMENT '', + `l_linenumber` int NOT NULL COMMENT '', + `l_quantity` decimal(15,2) NOT NULL COMMENT '', + `l_extendedprice` decimal(15,2) NOT NULL COMMENT '', + `l_discount` decimal(15,2) NOT NULL COMMENT '', + `l_tax` decimal(15,2) NOT NULL COMMENT '', + `l_returnflag` varchar(1024) NOT NULL COMMENT '', + `l_linestatus` varchar(1024) NOT NULL COMMENT '', + `l_shipdate` date NOT NULL COMMENT '', + `l_commitdate` date NOT NULL COMMENT '', + `l_receiptdate` date NOT NULL COMMENT '', + `l_shipinstruct` varchar(1024) NOT NULL COMMENT '', + `l_shipmode` varchar(1024) NOT NULL COMMENT '', + `l_comment` varchar(1024) NOT NULL COMMENT '', + `dummy` varchar(1024), + PRIMARY KEY (`l_orderkey`, `l_linenumber`) + ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='datax perf test'; + +单行记录类似于: + + l_orderkey: 2122789 + l_partkey: 1233571 + l_suppkey: 8608 + l_linenumber: 1 + l_quantity: 35.00 + l_extendedprice: 52657.85 + l_discount: 0.02 + l_tax: 0.07 + l_returnflag: N + l_linestatus: O + l_shipdate: 1996-11-03 + l_commitdate: 1996-12-07 + l_receiptdate: 1996-11-16 + l_shipinstruct: COLLECT COD + l_shipmode: FOB + l_comment: ld, regular theodolites. + dummy: + +#### 4.1.2 机器参数 + +* DataX ECS: 24Core48GB + +* Adb MySQL 数据库 + * 计算资源:16Core64GB(集群版) + * 弹性IO资源:3 + +#### 4.1.3 DataX jvm 参数 + + -Xms1G -Xmx10G -XX:+HeapDumpOnOutOfMemoryError + +### 4.2 测试报告 + +| 通道数 | 批量提交行数 | DataX速度(Rec/s) | DataX流量(MB/s) | 导入用时(s) | +|-----|-------|------------------|---------------|---------| +| 1 | 512 | 23071 | 2.34 | 2627 | +| 1 | 1024 | 26080 | 2.65 | 2346 | +| 1 | 2048 | 28162 | 2.86 | 2153 | +| 1 | 4096 | 28978 | 2.94 | 2119 | +| 4 | 512 | 56590 | 5.74 | 1105 | +| 4 | 1024 | 81062 | 8.22 | 763 | +| 4 | 2048 | 107117 | 10.87 | 605 | +| 4 | 4096 | 113181 | 11.48 | 579 | +| 8 | 512 | 81062 | 8.22 | 786 | +| 8 | 1024 | 127629 | 12.95 | 519 | +| 8 | 2048 | 187456 | 19.01 | 369 | +| 8 | 4096 | 206848 | 20.98 | 341 | +| 16 | 512 | 130404 | 13.23 | 513 | +| 16 | 1024 | 214235 | 21.73 | 335 | +| 16 | 2048 | 299930 | 30.42 | 253 | +| 16 | 4096 | 333255 | 33.80 | 227 | +| 32 | 512 | 206848 | 20.98 | 347 | +| 32 | 1024 | 315716 | 32.02 | 241 | +| 32 | 2048 | 399907 | 40.56 | 199 | +| 32 | 4096 | 461431 | 46.80 | 184 | +| 64 | 512 | 333255 | 33.80 | 231 | +| 64 | 1024 | 399907 | 40.56 | 204 | +| 64 | 2048 | 428471 | 43.46 | 199 | +| 64 | 4096 | 461431 | 46.80 | 187 | +| 128 | 512 | 333255 | 33.80 | 235 | +| 128 | 1024 | 399907 | 40.56 | 203 | +| 128 | 2048 | 425432 | 43.15 | 197 | +| 128 | 4096 | 387006 | 39.26 | 211 | + +说明: + +1. datax 使用 txtfilereader 读取本地文件,避免源端存在性能瓶颈。 + +#### 性能测试小结 +1. channel通道个数和batchSize对性能影响比较大 +2. 通常不建议写入数据库时,通道个数 > 32 + +## 5 约束限制 + +## FAQ + +*** + +**Q: AdbMysqlWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** + +A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 + +*** + +**Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** + +A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 + +*** + +**Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** + +A: 可以配置临时表导入 diff --git a/adbmysqlwriter/pom.xml b/adbmysqlwriter/pom.xml new file mode 100755 index 00000000..6ffcab85 --- /dev/null +++ b/adbmysqlwriter/pom.xml @@ -0,0 +1,79 @@ + + 4.0.0 + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + adbmysqlwriter + adbmysqlwriter + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + mysql + mysql-connector-java + 5.1.40 + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/adbmysqlwriter/src/main/assembly/package.xml b/adbmysqlwriter/src/main/assembly/package.xml new file mode 100755 index 00000000..7192e531 --- /dev/null +++ b/adbmysqlwriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/adbmysqlwriter + + + target/ + + adbmysqlwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/adbmysqlwriter + + + + + + false + plugin/writer/adbmysqlwriter/libs + runtime + + + diff --git a/adbmysqlwriter/src/main/java/com/alibaba/datax/plugin/writer/adbmysqlwriter/AdbMysqlWriter.java b/adbmysqlwriter/src/main/java/com/alibaba/datax/plugin/writer/adbmysqlwriter/AdbMysqlWriter.java new file mode 100755 index 00000000..762c4934 --- /dev/null +++ b/adbmysqlwriter/src/main/java/com/alibaba/datax/plugin/writer/adbmysqlwriter/AdbMysqlWriter.java @@ -0,0 +1,138 @@ +package com.alibaba.datax.plugin.writer.adbmysqlwriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Key; +import org.apache.commons.lang3.StringUtils; + +import java.sql.Connection; +import java.sql.SQLException; +import java.util.List; + +public class AdbMysqlWriter extends Writer { + private static final DataBaseType DATABASE_TYPE = DataBaseType.ADB; + + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterJob; + + @Override + public void preCheck(){ + this.init(); + this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); + } + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonRdbmsWriterJob.init(this.originalConfig); + } + + // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) + @Override + public void prepare() { + //实跑先不支持 权限 检验 + //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); + this.commonRdbmsWriterJob.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); + } + + // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) + @Override + public void post() { + this.commonRdbmsWriterJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterJob.destroy(this.originalConfig); + } + + } + + public static class Task extends Writer.Task { + + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task commonRdbmsWriterTask; + + public static class DelegateClass extends CommonRdbmsWriter.Task { + private long writeTime = 0L; + private long writeCount = 0L; + private long lastLogTime = 0; + + public DelegateClass(DataBaseType dataBaseType) { + super(dataBaseType); + } + + @Override + protected void doBatchInsert(Connection connection, List buffer) + throws SQLException { + long startTime = System.currentTimeMillis(); + + super.doBatchInsert(connection, buffer); + + writeCount = writeCount + buffer.size(); + writeTime = writeTime + (System.currentTimeMillis() - startTime); + + // log write metrics every 10 seconds + if (System.currentTimeMillis() - lastLogTime > 10000) { + lastLogTime = System.currentTimeMillis(); + logTotalMetrics(); + } + } + + public void logTotalMetrics() { + LOG.info(Thread.currentThread().getName() + ", AdbMySQL writer take " + writeTime + " ms, write " + writeCount + " records."); + } + } + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + + if (StringUtils.isBlank(this.writerSliceConfig.getString(Key.WRITE_MODE))) { + this.writerSliceConfig.set(Key.WRITE_MODE, "REPLACE"); + } + + this.commonRdbmsWriterTask = new DelegateClass(DATABASE_TYPE); + this.commonRdbmsWriterTask.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); + } + + //TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session) + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, + super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterTask.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); + } + + @Override + public boolean supportFailOver(){ + String writeMode = writerSliceConfig.getString(Key.WRITE_MODE); + return "replace".equalsIgnoreCase(writeMode); + } + + } +} diff --git a/adbmysqlwriter/src/main/resources/plugin.json b/adbmysqlwriter/src/main/resources/plugin.json new file mode 100755 index 00000000..58c69533 --- /dev/null +++ b/adbmysqlwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "adbmysqlwriter", + "class": "com.alibaba.datax.plugin.writer.adbmysqlwriter.AdbMysqlWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/adbmysqlwriter/src/main/resources/plugin_job_template.json b/adbmysqlwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..9537ee5a --- /dev/null +++ b/adbmysqlwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,20 @@ +{ + "name": "adbmysqlwriter", + "parameter": { + "username": "username", + "password": "password", + "column": ["col1", "col2", "col3"], + "connection": [ + { + "jdbcUrl": "jdbc:mysql://:[/]", + "table": ["table1", "table2"] + } + ], + "preSql": [], + "postSql": [], + "batchSize": 65536, + "batchByteSize": 134217728, + "dryRun": false, + "writeMode": "insert" + } +} \ No newline at end of file diff --git a/adswriter/doc/adswriter.md b/adswriter/doc/adswriter.md index 4a0fd961..c02f8018 100644 --- a/adswriter/doc/adswriter.md +++ b/adswriter/doc/adswriter.md @@ -110,7 +110,6 @@ DataX 将数据直连ADS接口,利用ADS暴露的INSERT接口直写到ADS。 "account": "xxx@aliyun.com", "odpsServer": "xxx", "tunnelServer": "xxx", - "accountType": "aliyun", "project": "transfer_project" }, "writeMode": "load", diff --git a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsClientProxy.java b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsClientProxy.java index 8fdc70d6..326b464d 100644 --- a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsClientProxy.java +++ b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsClientProxy.java @@ -18,7 +18,7 @@ import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.util.Constant; import com.alibaba.datax.plugin.writer.adswriter.util.Key; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; diff --git a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TransferProjectConf.java b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TransferProjectConf.java index bff4b7b9..3d28a833 100644 --- a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TransferProjectConf.java +++ b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TransferProjectConf.java @@ -12,7 +12,6 @@ public class TransferProjectConf { public final static String KEY_ACCOUNT = "odps.account"; public final static String KEY_ODPS_SERVER = "odps.odpsServer"; public final static String KEY_ODPS_TUNNEL = "odps.tunnelServer"; - public final static String KEY_ACCOUNT_TYPE = "odps.accountType"; public final static String KEY_PROJECT = "odps.project"; private String accessId; @@ -20,7 +19,6 @@ public class TransferProjectConf { private String account; private String odpsServer; private String odpsTunnel; - private String accountType; private String project; public static TransferProjectConf create(Configuration adsWriterConf) { @@ -30,7 +28,6 @@ public class TransferProjectConf { res.account = adsWriterConf.getString(KEY_ACCOUNT); res.odpsServer = adsWriterConf.getString(KEY_ODPS_SERVER); res.odpsTunnel = adsWriterConf.getString(KEY_ODPS_TUNNEL); - res.accountType = adsWriterConf.getString(KEY_ACCOUNT_TYPE, "aliyun"); res.project = adsWriterConf.getString(KEY_PROJECT); return res; } @@ -55,9 +52,6 @@ public class TransferProjectConf { return odpsTunnel; } - public String getAccountType() { - return accountType; - } public String getProject() { return project; diff --git a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/DataType.java b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/DataType.java index 595b1dfd..f625336e 100644 --- a/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/DataType.java +++ b/adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/DataType.java @@ -70,7 +70,7 @@ public class DataType { } else if ("datetime".equals(type)) { return DATETIME; } else { - throw new IllegalArgumentException("unkown type: " + type); + throw new IllegalArgumentException("unknown type: " + type); } } diff --git a/cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderHelper.java b/cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderHelper.java index 0a4e83fa..f5937c2f 100644 --- a/cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderHelper.java +++ b/cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderHelper.java @@ -23,7 +23,7 @@ import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.datastax.driver.core.Cluster; import com.datastax.driver.core.CodecRegistry; @@ -298,6 +298,7 @@ public class CassandraReaderHelper { record.addColumn(new LongColumn(rs.getInt(i))); break; + case COUNTER: case BIGINT: record.addColumn(new LongColumn(rs.getLong(i))); break; @@ -558,26 +559,6 @@ public class CassandraReaderHelper { String.format( "配置信息有错误.列信息中需要包含'%s'字段 .",Key.COLUMN_NAME)); } - if( name.startsWith(Key.WRITE_TIME) ) { - String colName = name.substring(Key.WRITE_TIME.length(),name.length() - 1 ); - ColumnMetadata col = tableMetadata.getColumn(colName); - if( col == null ) { - throw DataXException - .asDataXException( - CassandraReaderErrorCode.CONF_ERROR, - String.format( - "配置信息有错误.列'%s'不存在 .",colName)); - } - } else { - ColumnMetadata col = tableMetadata.getColumn(name); - if( col == null ) { - throw DataXException - .asDataXException( - CassandraReaderErrorCode.CONF_ERROR, - String.format( - "配置信息有错误.列'%s'不存在 .",name)); - } - } } } diff --git a/cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterHelper.java b/cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterHelper.java index b68af281..5ac392b7 100644 --- a/cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterHelper.java +++ b/cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterHelper.java @@ -18,10 +18,10 @@ import java.util.UUID; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONException; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONException; +import com.alibaba.fastjson2.JSONObject; import com.datastax.driver.core.BoundStatement; import com.datastax.driver.core.CodecRegistry; @@ -204,7 +204,7 @@ public class CassandraWriterHelper { case MAP: { Map m = new HashMap(); - for (JSONObject.Entry e : ((JSONObject)jsonObject).entrySet()) { + for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) { Object k = parseFromString((String) e.getKey(), type.getTypeArguments().get(0)); Object v = parseFromJson(e.getValue(), type.getTypeArguments().get(1)); m.put(k,v); @@ -233,7 +233,7 @@ public class CassandraWriterHelper { case UDT: { UDTValue t = ((UserType) type).newValue(); UserType userType = t.getType(); - for (JSONObject.Entry e : ((JSONObject)jsonObject).entrySet()) { + for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) { DataType eleType = userType.getFieldType((String)e.getKey()); t.set((String)e.getKey(), parseFromJson(e.getValue(), eleType), registry.codecFor(eleType).getJavaType()); } diff --git a/clickhousereader/doc/clickhousereader.md b/clickhousereader/doc/clickhousereader.md new file mode 100644 index 00000000..bf3cd203 --- /dev/null +++ b/clickhousereader/doc/clickhousereader.md @@ -0,0 +1,344 @@ + +# ClickhouseReader 插件文档 + + +___ + + +## 1 快速介绍 + +ClickhouseReader插件实现了从Clickhouse读取数据。在底层实现上,ClickhouseReader通过JDBC连接远程Clickhouse数据库,并执行相应的sql语句将数据从Clickhouse库中SELECT出来。 + +## 2 实现原理 + +简而言之,ClickhouseReader通过JDBC连接器连接到远程的Clickhouse数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Clickhouse数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 + +对于用户配置Table、Column、Where的信息,ClickhouseReader将其拼接为SQL语句发送到Clickhouse数据库;对于用户配置querySql信息,Clickhouse直接将其发送到Clickhouse数据库。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 配置一个从Clickhouse数据库同步抽取数据到本地的作业: + +``` +{ + "job": { + "setting": { + "speed": { + //设置传输速度 byte/s 尽量逼近这个速度但是不高于它. + // channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel + "byte": 1048576 + }, + //出错限制 + "errorLimit": { + //先选择record + "record": 0, + //百分比 1表示100% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "clickhousereader", + "parameter": { + // 数据库连接用户名 + "username": "root", + // 数据库连接密码 + "password": "root", + "column": [ + "id","name" + ], + "connection": [ + { + "table": [ + "table" + ], + "jdbcUrl": [ + "jdbc:clickhouse://[HOST_NAME]:PORT/[DATABASE_NAME]" + ] + } + ] + } + }, + "writer": { + //writer类型 + "name": "streamwriter", + // 是否打印内容 + "parameter": { + "print": true + } + } + } + ] + } +} + +``` + +* 配置一个自定义SQL的数据库同步任务到本地内容的作业: + +``` +{ + "job": { + "setting": { + "speed": { + "channel": 5 + } + }, + "content": [ + { + "reader": { + "name": "clickhousereader", + "parameter": { + "username": "root", + "password": "root", + "where": "", + "connection": [ + { + "querySql": [ + "select db_id,on_line_flag from db_info where db_id < 10" + ], + "jdbcUrl": [ + "jdbc:clickhouse://1.1.1.1:8123/default" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "visible": false, + "encoding": "UTF-8" + } + } + } + ] + } +} +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,ClickhouseReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,ClickhouseReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 + + jdbcUrl按照Clickhouse官方规范,并可以填写连接附件控制信息。具体请参看[Clickhouse官方文档](https://clickhouse.com/docs/en/engines/table-engines/integrations/jdbc)。 + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:数据源的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:数据源指定用户名的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,ClickhouseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
+ + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 + + 支持列裁剪,即列可以挑选部分列进行导出。 + + 支持列换序,即列可以不按照表schema信息进行导出。 + + 支持常量配置,用户需要按照JSON格式: + ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] + id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 + + Column必须显示填写,不允许为空! + + * 必选:是
+ + * 默认值:无
+ +* **splitPk** + + * 描述:ClickhouseReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 + + 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 + + 目前splitPk仅支持整形数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,ClickhouseReader将报错! + + splitPk如果不填写,将视作用户不对单表进行切分,ClickhouseReader使用单通道同步全量数据。 + + * 必选:否
+ + * 默认值:无
+ +* **where** + + * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
+ + where条件可以有效地进行业务增量同步。 + + * 必选:否
+ + * 默认值:无
+ +* **querySql** + + * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
+ + `当用户配置querySql时,ClickhouseReader直接忽略table、column、where条件的配置`。 + + * 必选:否
+ + * 默认值:无
+ +* **fetchSize** + + * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
+ + `注意,该值过大(>2048)可能造成DataX进程OOM。`。 + + * 必选:否
+ + * 默认值:1024
+ +* **session** + + * 描述:控制写入数据的时间格式,时区等的配置,如果表中有时间字段,配置该值以明确告知写入 clickhouse 的时间格式。通常配置的参数为:NLS_DATE_FORMAT,NLS_TIME_FORMAT。其配置的值为 json 格式,例如: +``` +"session": [ + "alter session set NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'", + "alter session set NLS_TIMESTAMP_FORMAT='yyyy-mm-dd hh24:mi:ss'", + "alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd hh24:mi:ss'", + "alter session set TIME_ZONE='US/Pacific'" + ] +``` + `(注意"是 " 的转义字符串)`。 + + * 必选:否
+ + * 默认值:无
+ + +### 3.3 类型转换 + +目前ClickhouseReader支持大部分Clickhouse类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出ClickhouseReader针对Clickhouse类型转换列表: + + +| DataX 内部类型| Clickhouse 数据类型 | +| -------- |--------------------------------------------------------------------------------------------| +| Long | UInt8, UInt16, UInt32, UInt64, UInt128, UInt256, Int8, Int16, Int32, Int64, Int128, Int256 | +| Double | Float32, Float64, Decimal | +| String | String, FixedString | +| Date | DATE, Date32, DateTime, DateTime64 | +| Boolean | Boolean | +| Bytes | BLOB,BFILE,RAW,LONG RAW | + + + +请注意: + +* `除上述罗列字段类型外,其他类型均不支持`。 + + +## 4 性能报告 + +### 4.1 环境准备 + +#### 4.1.1 数据特征 + +为了模拟线上真实数据,我们设计两个Clickhouse数据表,分别为: + +#### 4.1.2 机器参数 + +* 执行DataX的机器参数为: + +* Clickhouse数据库机器参数为: + +### 4.2 测试报告 + +#### 4.2.1 表1测试报告 + + +| 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载| +|--------| --------|--------|--------|--------|--------| +|1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载| + +## 5 约束限制 + +### 5.1 主备同步数据恢复问题 + +主备同步问题指Clickhouse使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。 + +针对这个问题,我们提供了preSql功能,该功能待补充。 + +### 5.2 一致性约束 + +Clickhouse在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,ClickhouseReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) + +上述是在ClickhouseReader单线程模型下数据同步一致性的特性,由于ClickhouseReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当ClickhouseReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 + +针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: + +1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 + +2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 + +### 5.3 数据库编码问题 + + +ClickhouseReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此ClickhouseReader不需用户指定编码,可以自动获取编码并转码。 + +对于Clickhouse底层写入编码和其设定的编码不一致的混乱情况,ClickhouseReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 + +### 5.4 增量数据同步 + +ClickhouseReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: + +* 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,ClickhouseReader只需要WHERE条件跟上一同步阶段时间戳即可。 +* 对于新增流水型数据,ClickhouseReader可以WHERE条件后跟上一阶段最大自增ID即可。 + +对于业务上无字段区分新增、修改数据情况,ClickhouseReader也无法进行增量数据同步,只能同步全量数据。 + +### 5.5 Sql安全性 + +ClickhouseReader提供querySql语句交给用户自己实现SELECT抽取语句,ClickhouseReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 + +## 6 FAQ + +*** + +**Q: ClickhouseReader同步报错,报错信息为XXX** + + A: 网络或者权限问题,请使用Clickhouse命令行测试 + + +如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 + + +**Q: ClickhouseReader抽取速度很慢怎么办?** + + A: 影响抽取时间的原因大概有如下几个:(来自专业 DBA 卫绾) + 1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描; + 2. 合理sql的并发度,减少抽取时间; + 3. 抽取sql要简单,尽量不用replace等函数,这个非常消耗cpu,会严重影响抽取速度; diff --git a/clickhousereader/pom.xml b/clickhousereader/pom.xml new file mode 100644 index 00000000..4b095796 --- /dev/null +++ b/clickhousereader/pom.xml @@ -0,0 +1,91 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + + 4.0.0 + clickhousereader + clickhousereader + jar + + + + ru.yandex.clickhouse + clickhouse-jdbc + 0.2.4 + + + com.alibaba.datax + datax-core + ${datax-project-version} + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + + + + src/main/java + + **/*.properties + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + + \ No newline at end of file diff --git a/clickhousereader/src/main/assembly/package.xml b/clickhousereader/src/main/assembly/package.xml new file mode 100644 index 00000000..9dc7fc13 --- /dev/null +++ b/clickhousereader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/clickhousereader + + + target/ + + clickhousereader-0.0.1-SNAPSHOT.jar + + plugin/reader/clickhousereader + + + + + + false + plugin/reader/clickhousereader/libs + runtime + + + \ No newline at end of file diff --git a/clickhousereader/src/main/java/com/alibaba/datax/plugin/reader/clickhousereader/ClickhouseReader.java b/clickhousereader/src/main/java/com/alibaba/datax/plugin/reader/clickhousereader/ClickhouseReader.java new file mode 100644 index 00000000..cfa6be99 --- /dev/null +++ b/clickhousereader/src/main/java/com/alibaba/datax/plugin/reader/clickhousereader/ClickhouseReader.java @@ -0,0 +1,85 @@ +package com.alibaba.datax.plugin.reader.clickhousereader; + +import java.sql.Array; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Types; +import java.util.List; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.MessageSource; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.fastjson2.JSON; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class ClickhouseReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse; + private static final Logger LOG = LoggerFactory.getLogger(ClickhouseReader.class); + + public static class Job extends Reader.Job { + private Configuration jobConfig = null; + private CommonRdbmsReader.Job commonRdbmsReaderMaster; + + @Override + public void init() { + this.jobConfig = super.getPluginJobConf(); + this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); + this.commonRdbmsReaderMaster.init(this.jobConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsReaderMaster.split(this.jobConfig, mandatoryNumber); + } + + @Override + public void post() { + this.commonRdbmsReaderMaster.post(this.jobConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderMaster.destroy(this.jobConfig); + } + } + + public static class Task extends Reader.Task { + + private Configuration jobConfig; + private CommonRdbmsReader.Task commonRdbmsReaderSlave; + + @Override + public void init() { + this.jobConfig = super.getPluginJobConf(); + this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderSlave.init(this.jobConfig); + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.jobConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, 1000); + + this.commonRdbmsReaderSlave.startRead(this.jobConfig, recordSender, super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderSlave.post(this.jobConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderSlave.destroy(this.jobConfig); + } + } +} diff --git a/clickhousereader/src/main/resources/plugin.json b/clickhousereader/src/main/resources/plugin.json new file mode 100644 index 00000000..5d608f6c --- /dev/null +++ b/clickhousereader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "clickhousereader", + "class": "com.alibaba.datax.plugin.reader.clickhousereader.ClickhouseReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/clickhousereader/src/main/resources/plugin_job_template.json b/clickhousereader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..1814e510 --- /dev/null +++ b/clickhousereader/src/main/resources/plugin_job_template.json @@ -0,0 +1,16 @@ +{ + "name": "clickhousereader", + "parameter": { + "username": "username", + "password": "password", + "column": ["col1", "col2", "col3"], + "connection": [ + { + "jdbcUrl": "jdbc:clickhouse://:[/]", + "table": ["table1", "table2"] + } + ], + "preSql": [], + "postSql": [] + } +} \ No newline at end of file diff --git a/clickhousereader/src/test/resources/basic1.json b/clickhousereader/src/test/resources/basic1.json new file mode 100755 index 00000000..c45a45e7 --- /dev/null +++ b/clickhousereader/src/test/resources/basic1.json @@ -0,0 +1,57 @@ +{ + "job": { + "setting": { + "speed": { + "channel": 5 + } + }, + "content": [ + { + "reader": { + "name": "clickhousereader", + "parameter": { + "username": "XXXX", + "password": "XXXX", + "column": [ + "uint8_col", + "uint16_col", + "uint32_col", + "uint64_col", + "int8_col", + "int16_col", + "int32_col", + "int64_col", + "float32_col", + "float64_col", + "bool_col", + "str_col", + "fixedstr_col", + "uuid_col", + "date_col", + "datetime_col", + "enum_col", + "ary_uint8_col", + "ary_str_col", + "tuple_col", + "nullable_col", + "nested_col.nested_id", + "nested_col.nested_str", + "ipv4_col", + "ipv6_col", + "decimal_col" + ], + "connection": [ + { + "table": [ + "all_type_tbl" + ], + "jdbcUrl":["jdbc:clickhouse://XXXX:8123/default"] + } + ] + } + }, + "writer": {} + } + ] + } +} \ No newline at end of file diff --git a/clickhousereader/src/test/resources/basic1.sql b/clickhousereader/src/test/resources/basic1.sql new file mode 100644 index 00000000..f937b889 --- /dev/null +++ b/clickhousereader/src/test/resources/basic1.sql @@ -0,0 +1,34 @@ +CREATE TABLE IF NOT EXISTS default.all_type_tbl +( +`uint8_col` UInt8, +`uint16_col` UInt16, +uint32_col UInt32, +uint64_col UInt64, +int8_col Int8, +int16_col Int16, +int32_col Int32, +int64_col Int64, +float32_col Float32, +float64_col Float64, +bool_col UInt8, +str_col String, +fixedstr_col FixedString(3), +uuid_col UUID, +date_col Date, +datetime_col DateTime, +enum_col Enum('hello' = 1, 'world' = 2), +ary_uint8_col Array(UInt8), +ary_str_col Array(String), +tuple_col Tuple(UInt8, String), +nullable_col Nullable(UInt8), +nested_col Nested + ( + nested_id UInt32, + nested_str String + ), +ipv4_col IPv4, +ipv6_col IPv6, +decimal_col Decimal(5,3) +) +ENGINE = MergeTree() +ORDER BY (uint8_col); \ No newline at end of file diff --git a/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java index b928d421..83c421ee 100644 --- a/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java +++ b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java @@ -10,8 +10,8 @@ import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; import java.sql.Array; import java.sql.Connection; @@ -68,7 +68,7 @@ public class ClickhouseWriter extends Writer { this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE) { @Override - protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, Column column) throws SQLException { + protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { try { if (column.getRawData() == null) { preparedStatement.setNull(columnIndex + 1, columnSqltype); diff --git a/clickhousewriter/src/main/resources/plugin.json b/clickhousewriter/src/main/resources/plugin.json index ff1acf01..d70e2b1d 100755 --- a/clickhousewriter/src/main/resources/plugin.json +++ b/clickhousewriter/src/main/resources/plugin.json @@ -2,5 +2,5 @@ "name": "clickhousewriter", "class": "com.alibaba.datax.plugin.writer.clickhousewriter.ClickhouseWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql.", - "developer": "jiye.tjy" + "developer": "alibaba" } \ No newline at end of file diff --git a/common/pom.xml b/common/pom.xml index eafdb5da..59d7073d 100755 --- a/common/pom.xml +++ b/common/pom.xml @@ -17,8 +17,8 @@ commons-lang3 - com.alibaba - fastjson + com.alibaba.fastjson2 + fastjson2 commons-io diff --git a/common/src/main/java/com/alibaba/datax/common/element/Column.java b/common/src/main/java/com/alibaba/datax/common/element/Column.java index 2e093a7a..13cfc7de 100755 --- a/common/src/main/java/com/alibaba/datax/common/element/Column.java +++ b/common/src/main/java/com/alibaba/datax/common/element/Column.java @@ -1,6 +1,6 @@ package com.alibaba.datax.common.element; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import java.math.BigDecimal; import java.math.BigInteger; diff --git a/common/src/main/java/com/alibaba/datax/common/element/DateColumn.java b/common/src/main/java/com/alibaba/datax/common/element/DateColumn.java index f688d163..df5e1e4a 100755 --- a/common/src/main/java/com/alibaba/datax/common/element/DateColumn.java +++ b/common/src/main/java/com/alibaba/datax/common/element/DateColumn.java @@ -5,6 +5,7 @@ import com.alibaba.datax.common.exception.DataXException; import java.math.BigDecimal; import java.math.BigInteger; +import java.sql.Time; import java.util.Date; /** @@ -12,18 +13,54 @@ import java.util.Date; */ public class DateColumn extends Column { - private DateType subType = DateType.DATETIME; + private DateType subType = DateType.DATETIME; - public static enum DateType { - DATE, TIME, DATETIME - } + private int nanos = 0; - /** - * 构建值为null的DateColumn,使用Date子类型为DATETIME - * */ - public DateColumn() { - this((Long)null); - } + private int precision = -1; + + public static enum DateType { + DATE, TIME, DATETIME + } + + /** + * 构建值为time(java.sql.Time)的DateColumn,使用Date子类型为TIME,只有时间,没有日期 + */ + public DateColumn(Time time, int nanos, int jdbcPrecision) { + this(time); + if (time != null) { + setNanos(nanos); + } + if (jdbcPrecision == 10) { + setPrecision(0); + } + if (jdbcPrecision >= 12 && jdbcPrecision <= 17) { + setPrecision(jdbcPrecision - 11); + } + } + + public long getNanos() { + return nanos; + } + + public void setNanos(int nanos) { + this.nanos = nanos; + } + + public int getPrecision() { + return precision; + } + + public void setPrecision(int precision) { + this.precision = precision; + } + + /** + * 构建值为null的DateColumn,使用Date子类型为DATETIME + */ + public DateColumn() { + this((Long) null); + } /** * 构建值为stamp(Unix时间戳)的DateColumn,使用Date子类型为DATETIME diff --git a/common/src/main/java/com/alibaba/datax/common/statistics/PerfTrace.java b/common/src/main/java/com/alibaba/datax/common/statistics/PerfTrace.java index ea9aa421..cf0457bc 100644 --- a/common/src/main/java/com/alibaba/datax/common/statistics/PerfTrace.java +++ b/common/src/main/java/com/alibaba/datax/common/statistics/PerfTrace.java @@ -31,7 +31,6 @@ public class PerfTrace { private int taskGroupId; private int channelNumber; - private int priority; private int batchSize = 500; private volatile boolean perfReportEnable = true; @@ -54,12 +53,12 @@ public class PerfTrace { * @param taskGroupId * @return */ - public static PerfTrace getInstance(boolean isJob, long jobId, int taskGroupId, int priority, boolean enable) { + public static PerfTrace getInstance(boolean isJob, long jobId, int taskGroupId, boolean enable) { if (instance == null) { synchronized (lock) { if (instance == null) { - instance = new PerfTrace(isJob, jobId, taskGroupId, priority, enable); + instance = new PerfTrace(isJob, jobId, taskGroupId, enable); } } } @@ -76,22 +75,21 @@ public class PerfTrace { LOG.error("PerfTrace instance not be init! must have some error! "); synchronized (lock) { if (instance == null) { - instance = new PerfTrace(false, -1111, -1111, 0, false); + instance = new PerfTrace(false, -1111, -1111, false); } } } return instance; } - private PerfTrace(boolean isJob, long jobId, int taskGroupId, int priority, boolean enable) { + private PerfTrace(boolean isJob, long jobId, int taskGroupId, boolean enable) { try { this.perfTraceId = isJob ? "job_" + jobId : String.format("taskGroup_%s_%s", jobId, taskGroupId); this.enable = enable; this.isJob = isJob; this.taskGroupId = taskGroupId; this.instId = jobId; - this.priority = priority; - LOG.info(String.format("PerfTrace traceId=%s, isEnable=%s, priority=%s", this.perfTraceId, this.enable, this.priority)); + LOG.info(String.format("PerfTrace traceId=%s, isEnable=%s", this.perfTraceId, this.enable)); } catch (Exception e) { // do nothing @@ -398,7 +396,6 @@ public class PerfTrace { jdo.setWindowEnd(this.windowEnd); jdo.setJobStartTime(jobStartTime); jdo.setJobRunTimeMs(System.currentTimeMillis() - jobStartTime.getTime()); - jdo.setJobPriority(this.priority); jdo.setChannelNum(this.channelNumber); jdo.setCluster(this.cluster); jdo.setJobDomain(this.jobDomain); @@ -609,7 +606,6 @@ public class PerfTrace { private Date jobStartTime; private Date jobEndTime; private Long jobRunTimeMs; - private Integer jobPriority; private Integer channelNum; private String cluster; private String jobDomain; @@ -680,10 +676,6 @@ public class PerfTrace { return jobRunTimeMs; } - public Integer getJobPriority() { - return jobPriority; - } - public Integer getChannelNum() { return channelNum; } @@ -816,10 +808,6 @@ public class PerfTrace { this.jobRunTimeMs = jobRunTimeMs; } - public void setJobPriority(Integer jobPriority) { - this.jobPriority = jobPriority; - } - public void setChannelNum(Integer channelNum) { this.channelNum = channelNum; } diff --git a/common/src/main/java/com/alibaba/datax/common/statistics/VMInfo.java b/common/src/main/java/com/alibaba/datax/common/statistics/VMInfo.java index cab42a4b..423c794e 100644 --- a/common/src/main/java/com/alibaba/datax/common/statistics/VMInfo.java +++ b/common/src/main/java/com/alibaba/datax/common/statistics/VMInfo.java @@ -77,8 +77,8 @@ public class VMInfo { garbageCollectorMXBeanList = java.lang.management.ManagementFactory.getGarbageCollectorMXBeans(); memoryPoolMXBeanList = java.lang.management.ManagementFactory.getMemoryPoolMXBeans(); - osInfo = runtimeMXBean.getVmVendor() + " " + runtimeMXBean.getSpecVersion() + " " + runtimeMXBean.getVmVersion(); - jvmInfo = osMXBean.getName() + " " + osMXBean.getArch() + " " + osMXBean.getVersion(); + jvmInfo = runtimeMXBean.getVmVendor() + " " + runtimeMXBean.getSpecVersion() + " " + runtimeMXBean.getVmVersion(); + osInfo = osMXBean.getName() + " " + osMXBean.getArch() + " " + osMXBean.getVersion(); totalProcessorCount = osMXBean.getAvailableProcessors(); //构建startPhyOSStatus diff --git a/common/src/main/java/com/alibaba/datax/common/util/Configuration.java b/common/src/main/java/com/alibaba/datax/common/util/Configuration.java index f570dd00..c1194532 100755 --- a/common/src/main/java/com/alibaba/datax/common/util/Configuration.java +++ b/common/src/main/java/com/alibaba/datax/common/util/Configuration.java @@ -3,8 +3,8 @@ package com.alibaba.datax.common.util; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.serializer.SerializerFeature; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONWriter; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.CharUtils; import org.apache.commons.lang3.StringUtils; @@ -411,6 +411,15 @@ public class Configuration { return list; } + public List getListWithJson(final String path, Class t) { + Object object = this.get(path, List.class); + if (null == object) { + return null; + } + + return JSON.parseArray(JSON.toJSONString(object),t); + } + /** * 根据用户提供的json path,寻址List对象,如果对象不存在,返回null */ @@ -577,7 +586,7 @@ public class Configuration { */ public String beautify() { return JSON.toJSONString(this.getInternal(), - SerializerFeature.PrettyFormat); + JSONWriter.Feature.PrettyFormat); } /** diff --git a/common/src/main/java/com/alibaba/datax/common/util/IdAndKeyRollingUtil.java b/common/src/main/java/com/alibaba/datax/common/util/IdAndKeyRollingUtil.java deleted file mode 100644 index 8bab301e..00000000 --- a/common/src/main/java/com/alibaba/datax/common/util/IdAndKeyRollingUtil.java +++ /dev/null @@ -1,62 +0,0 @@ -package com.alibaba.datax.common.util; - -import java.util.Map; - -import org.apache.commons.lang3.StringUtils; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.alibaba.datax.common.exception.DataXException; - -public class IdAndKeyRollingUtil { - private static Logger LOGGER = LoggerFactory.getLogger(IdAndKeyRollingUtil.class); - public static final String SKYNET_ACCESSID = "SKYNET_ACCESSID"; - public static final String SKYNET_ACCESSKEY = "SKYNET_ACCESSKEY"; - - public final static String ACCESS_ID = "accessId"; - public final static String ACCESS_KEY = "accessKey"; - - public static String parseAkFromSkynetAccessKey() { - Map envProp = System.getenv(); - String skynetAccessID = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSID); - String skynetAccessKey = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSKEY); - String accessKey = null; - // follow 原有的判断条件 - // 环境变量中,如果存在SKYNET_ACCESSID/SKYNET_ACCESSKEy(只要有其中一个变量,则认为一定是两个都存在的! - // if (StringUtils.isNotBlank(skynetAccessID) || - // StringUtils.isNotBlank(skynetAccessKey)) { - // 检查严格,只有加密串不为空的时候才进去,不过 之前能跑的加密串都不应该为空 - if (StringUtils.isNotBlank(skynetAccessKey)) { - LOGGER.info("Try to get accessId/accessKey from environment SKYNET_ACCESSKEY."); - accessKey = DESCipher.decrypt(skynetAccessKey); - if (StringUtils.isBlank(accessKey)) { - // 环境变量里面有,但是解析不到 - throw DataXException.asDataXException(String.format( - "Failed to get the [accessId]/[accessKey] from the environment variable. The [accessId]=[%s]", - skynetAccessID)); - } - } - if (StringUtils.isNotBlank(accessKey)) { - LOGGER.info("Get accessId/accessKey from environment variables SKYNET_ACCESSKEY successfully."); - } - return accessKey; - } - - public static String getAccessIdAndKeyFromEnv(Configuration originalConfig) { - String accessId = null; - Map envProp = System.getenv(); - accessId = envProp.get(IdAndKeyRollingUtil.SKYNET_ACCESSID); - String accessKey = null; - if (StringUtils.isBlank(accessKey)) { - // 老的没有出异常,只是获取不到ak - accessKey = IdAndKeyRollingUtil.parseAkFromSkynetAccessKey(); - } - - if (StringUtils.isNotBlank(accessKey)) { - // 确认使用这个的都是 accessId、accessKey的命名习惯 - originalConfig.set(IdAndKeyRollingUtil.ACCESS_ID, accessId); - originalConfig.set(IdAndKeyRollingUtil.ACCESS_KEY, accessKey); - } - return accessKey; - } -} diff --git a/common/src/main/java/com/alibaba/datax/common/util/LimitLogger.java b/common/src/main/java/com/alibaba/datax/common/util/LimitLogger.java new file mode 100644 index 00000000..a307e0fb --- /dev/null +++ b/common/src/main/java/com/alibaba/datax/common/util/LimitLogger.java @@ -0,0 +1,34 @@ +package com.alibaba.datax.common.util; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashMap; +import java.util.Map; + +/** + * @author jitongchen + * @date 2023/9/7 9:47 AM + */ +public class LimitLogger { + + private static Map lastPrintTime = new HashMap<>(); + + public static void limit(String name, long limit, LoggerFunction function) { + if (StringUtils.isBlank(name)) { + name = "__all__"; + } + if (limit <= 0) { + function.apply(); + } else { + if (!lastPrintTime.containsKey(name)) { + lastPrintTime.put(name, System.currentTimeMillis()); + function.apply(); + } else { + if (System.currentTimeMillis() > lastPrintTime.get(name) + limit) { + lastPrintTime.put(name, System.currentTimeMillis()); + function.apply(); + } + } + } + } +} diff --git a/common/src/main/java/com/alibaba/datax/common/util/LoggerFunction.java b/common/src/main/java/com/alibaba/datax/common/util/LoggerFunction.java new file mode 100644 index 00000000..ef24504f --- /dev/null +++ b/common/src/main/java/com/alibaba/datax/common/util/LoggerFunction.java @@ -0,0 +1,10 @@ +package com.alibaba.datax.common.util; + +/** + * @author molin.lxd + * @date 2021-05-09 + */ +public interface LoggerFunction { + + void apply(); +} diff --git a/common/src/main/java/com/alibaba/datax/common/util/StrUtil.java b/common/src/main/java/com/alibaba/datax/common/util/StrUtil.java index 82222b0d..867a9516 100755 --- a/common/src/main/java/com/alibaba/datax/common/util/StrUtil.java +++ b/common/src/main/java/com/alibaba/datax/common/util/StrUtil.java @@ -3,6 +3,8 @@ package com.alibaba.datax.common.util; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; import java.text.DecimalFormat; import java.util.HashMap; import java.util.Map; @@ -82,4 +84,20 @@ public class StrUtil { return s.substring(0, headLength) + "..." + s.substring(s.length() - tailLength); } + public static String getMd5(String plainText) { + try { + StringBuilder builder = new StringBuilder(); + for (byte b : MessageDigest.getInstance("MD5").digest(plainText.getBytes())) { + int i = b & 0xff; + if (i < 0x10) { + builder.append('0'); + } + builder.append(Integer.toHexString(i)); + } + return builder.toString(); + } catch (NoSuchAlgorithmException e) { + throw new RuntimeException(e); + } + } + } diff --git a/core/pom.xml b/core/pom.xml index 970f95a6..7685001b 100755 --- a/core/pom.xml +++ b/core/pom.xml @@ -41,7 +41,7 @@ org.apache.httpcomponents httpclient - 4.5 + 4.5.13 org.apache.httpcomponents diff --git a/core/src/main/java/com/alibaba/datax/core/Engine.java b/core/src/main/java/com/alibaba/datax/core/Engine.java index 38342532..4ba9fc18 100755 --- a/core/src/main/java/com/alibaba/datax/core/Engine.java +++ b/core/src/main/java/com/alibaba/datax/core/Engine.java @@ -79,16 +79,9 @@ public class Engine { perfReportEnable = false; } - int priority = 0; - try { - priority = Integer.parseInt(System.getenv("SKYNET_PRIORITY")); - }catch (NumberFormatException e){ - LOG.warn("prioriy set to 0, because NumberFormatException, the value is: "+System.getProperty("PROIORY")); - } - Configuration jobInfoConfig = allConf.getConfiguration(CoreConstant.DATAX_JOB_JOBINFO); //初始化PerfTrace - PerfTrace perfTrace = PerfTrace.getInstance(isJob, instanceId, taskGroupId, priority, traceEnable); + PerfTrace perfTrace = PerfTrace.getInstance(isJob, instanceId, taskGroupId, traceEnable); perfTrace.setJobInfo(jobInfoConfig,perfReportEnable,channelNumber); container.start(); diff --git a/core/src/main/java/com/alibaba/datax/core/container/util/JobAssignUtil.java b/core/src/main/java/com/alibaba/datax/core/container/util/JobAssignUtil.java index 31ba60a4..cbd0d2a1 100755 --- a/core/src/main/java/com/alibaba/datax/core/container/util/JobAssignUtil.java +++ b/core/src/main/java/com/alibaba/datax/core/container/util/JobAssignUtil.java @@ -114,7 +114,7 @@ public final class JobAssignUtil { * 需要实现的效果通过例子来说是: *
      * a 库上有表:0, 1, 2
-     * a 库上有表:3, 4
+     * b 库上有表:3, 4
      * c 库上有表:5, 6, 7
      *
      * 如果有 4个 taskGroup
diff --git a/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java b/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java
index 26b2989f..49f5a0a1 100755
--- a/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java
+++ b/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java
@@ -27,7 +27,7 @@ import com.alibaba.datax.core.util.container.ClassLoaderSwapper;
 import com.alibaba.datax.core.util.container.CoreConstant;
 import com.alibaba.datax.core.util.container.LoadUtil;
 import com.alibaba.datax.dataxservice.face.domain.enums.ExecuteMode;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 import org.apache.commons.lang.StringUtils;
 import org.apache.commons.lang.Validate;
 import org.slf4j.Logger;
diff --git a/core/src/main/java/com/alibaba/datax/core/statistics/communication/CommunicationTool.java b/core/src/main/java/com/alibaba/datax/core/statistics/communication/CommunicationTool.java
index 51a601ae..1815ea02 100755
--- a/core/src/main/java/com/alibaba/datax/core/statistics/communication/CommunicationTool.java
+++ b/core/src/main/java/com/alibaba/datax/core/statistics/communication/CommunicationTool.java
@@ -2,7 +2,7 @@ package com.alibaba.datax.core.statistics.communication;
 
 import com.alibaba.datax.common.statistics.PerfTrace;
 import com.alibaba.datax.common.util.StrUtil;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 import org.apache.commons.lang.Validate;
 
 import java.text.DecimalFormat;
diff --git a/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/StdoutPluginCollector.java b/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/StdoutPluginCollector.java
index 8b2a8378..d88ad0a8 100755
--- a/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/StdoutPluginCollector.java
+++ b/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/StdoutPluginCollector.java
@@ -6,7 +6,7 @@ import com.alibaba.datax.common.util.Configuration;
 import com.alibaba.datax.core.statistics.communication.Communication;
 import com.alibaba.datax.core.util.container.CoreConstant;
 import com.alibaba.datax.core.statistics.plugin.task.util.DirtyRecord;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 
 import org.apache.commons.lang3.StringUtils;
 import org.slf4j.Logger;
diff --git a/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/util/DirtyRecord.java b/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/util/DirtyRecord.java
index 1b0d5238..caa4cb5b 100755
--- a/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/util/DirtyRecord.java
+++ b/core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/util/DirtyRecord.java
@@ -4,7 +4,7 @@ import com.alibaba.datax.common.element.Column;
 import com.alibaba.datax.common.element.Record;
 import com.alibaba.datax.common.exception.DataXException;
 import com.alibaba.datax.core.util.FrameworkErrorCode;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 
 import java.math.BigDecimal;
 import java.math.BigInteger;
diff --git a/core/src/main/java/com/alibaba/datax/core/taskgroup/TaskGroupContainer.java b/core/src/main/java/com/alibaba/datax/core/taskgroup/TaskGroupContainer.java
index c30c94d9..b4b45695 100755
--- a/core/src/main/java/com/alibaba/datax/core/taskgroup/TaskGroupContainer.java
+++ b/core/src/main/java/com/alibaba/datax/core/taskgroup/TaskGroupContainer.java
@@ -27,7 +27,7 @@ import com.alibaba.datax.core.util.TransformerUtil;
 import com.alibaba.datax.core.util.container.CoreConstant;
 import com.alibaba.datax.core.util.container.LoadUtil;
 import com.alibaba.datax.dataxservice.face.domain.enums.State;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 import org.apache.commons.lang3.Validate;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/channel/memory/MemoryChannel.java b/core/src/main/java/com/alibaba/datax/core/transport/channel/memory/MemoryChannel.java
index e49c7878..5bce085f 100755
--- a/core/src/main/java/com/alibaba/datax/core/transport/channel/memory/MemoryChannel.java
+++ b/core/src/main/java/com/alibaba/datax/core/transport/channel/memory/MemoryChannel.java
@@ -29,7 +29,7 @@ public class MemoryChannel extends Channel {
 
 	private ReentrantLock lock;
 
-	private Condition notInsufficient, notEmpty;
+	private Condition notSufficient, notEmpty;
 
 	public MemoryChannel(final Configuration configuration) {
 		super(configuration);
@@ -37,7 +37,7 @@ public class MemoryChannel extends Channel {
 		this.bufferSize = configuration.getInt(CoreConstant.DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE);
 
 		lock = new ReentrantLock();
-		notInsufficient = lock.newCondition();
+		notSufficient = lock.newCondition();
 		notEmpty = lock.newCondition();
 	}
 
@@ -75,7 +75,7 @@ public class MemoryChannel extends Channel {
 			lock.lockInterruptibly();
 			int bytes = getRecordBytes(rs);
 			while (memoryBytes.get() + bytes > this.byteCapacity || rs.size() > this.queue.remainingCapacity()) {
-				notInsufficient.await(200L, TimeUnit.MILLISECONDS);
+				notSufficient.await(200L, TimeUnit.MILLISECONDS);
             }
 			this.queue.addAll(rs);
 			waitWriterTime += System.nanoTime() - startTime;
@@ -116,7 +116,7 @@ public class MemoryChannel extends Channel {
 			waitReaderTime += System.nanoTime() - startTime;
 			int bytes = getRecordBytes(rs);
 			memoryBytes.addAndGet(-bytes);
-			notInsufficient.signalAll();
+			notSufficient.signalAll();
 		} catch (InterruptedException e) {
 			throw DataXException.asDataXException(
 					FrameworkErrorCode.RUNTIME_ERROR, e);
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/record/DefaultRecord.java b/core/src/main/java/com/alibaba/datax/core/transport/record/DefaultRecord.java
index c78a2a87..1dfa02e8 100755
--- a/core/src/main/java/com/alibaba/datax/core/transport/record/DefaultRecord.java
+++ b/core/src/main/java/com/alibaba/datax/core/transport/record/DefaultRecord.java
@@ -5,7 +5,7 @@ import com.alibaba.datax.common.element.Record;
 import com.alibaba.datax.common.exception.DataXException;
 import com.alibaba.datax.core.util.ClassSize;
 import com.alibaba.datax.core.util.FrameworkErrorCode;
-import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson2.JSON;
 
 import java.util.ArrayList;
 import java.util.HashMap;
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/transformer/DigestTransformer.java b/core/src/main/java/com/alibaba/datax/core/transport/transformer/DigestTransformer.java
new file mode 100644
index 00000000..d2bf1431
--- /dev/null
+++ b/core/src/main/java/com/alibaba/datax/core/transport/transformer/DigestTransformer.java
@@ -0,0 +1,87 @@
+package com.alibaba.datax.core.transport.transformer;
+
+import com.alibaba.datax.common.element.Column;
+import com.alibaba.datax.common.element.Record;
+import com.alibaba.datax.common.element.StringColumn;
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.transformer.Transformer;
+
+import org.apache.commons.codec.digest.DigestUtils;
+import org.apache.commons.lang.StringUtils;
+
+import java.util.Arrays;
+
+/**
+ * no comments.
+ *
+ * @author XuDaojie
+ * @since 2021-08-16
+ */
+public class DigestTransformer extends Transformer {
+
+    private static final String MD5 = "md5";
+    private static final String SHA1 = "sha1";
+    private static final String TO_UPPER_CASE = "toUpperCase";
+    private static final String TO_LOWER_CASE = "toLowerCase";
+
+    public DigestTransformer() {
+        setTransformerName("dx_digest");
+    }
+
+    @Override
+    public Record evaluate(Record record, Object... paras) {
+
+        int columnIndex;
+        String type;
+        String charType;
+
+        try {
+            if (paras.length != 3) {
+                throw new RuntimeException("dx_digest paras length must be 3");
+            }
+
+            columnIndex = (Integer) paras[0];
+            type = (String) paras[1];
+            charType = (String) paras[2];
+
+            if (!StringUtils.equalsIgnoreCase(MD5, type) && !StringUtils.equalsIgnoreCase(SHA1, type)) {
+                throw new RuntimeException("dx_digest paras index 1 must be md5 or sha1");
+            }
+            if (!StringUtils.equalsIgnoreCase(TO_UPPER_CASE, charType) && !StringUtils.equalsIgnoreCase(TO_LOWER_CASE, charType)) {
+                throw new RuntimeException("dx_digest paras index 2 must be toUpperCase or toLowerCase");
+            }
+        } catch (Exception e) {
+            throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras) + " => " + e.getMessage());
+        }
+
+        Column column = record.getColumn(columnIndex);
+
+        try {
+            String oriValue = column.asString();
+
+            // 如果字段为空,作为空字符串处理
+            if (oriValue == null) {
+                oriValue = "";
+            }
+            String newValue;
+            if (MD5.equals(type)) {
+                newValue = DigestUtils.md5Hex(oriValue);
+            } else {
+                newValue = DigestUtils.sha1Hex(oriValue);
+            }
+
+            if (TO_UPPER_CASE.equals(charType)) {
+                newValue = newValue.toUpperCase();
+            } else {
+                newValue = newValue.toLowerCase();
+            }
+
+            record.setColumn(columnIndex, new StringColumn(newValue));
+
+        } catch (Exception e) {
+            throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(), e);
+        }
+        return record;
+    }
+
+}
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/transformer/FilterTransformer.java b/core/src/main/java/com/alibaba/datax/core/transport/transformer/FilterTransformer.java
index 8f6492fa..a3251715 100644
--- a/core/src/main/java/com/alibaba/datax/core/transport/transformer/FilterTransformer.java
+++ b/core/src/main/java/com/alibaba/datax/core/transport/transformer/FilterTransformer.java
@@ -61,7 +61,7 @@ public class FilterTransformer extends Transformer {
             } else if (code.equalsIgnoreCase("<=")) {
                 return doLess(record, value, column, true);
             } else {
-                throw new RuntimeException("dx_filter can't suport code:" + code);
+                throw new RuntimeException("dx_filter can't support code:" + code);
             }
         } catch (Exception e) {
             throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(), e);
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformerStaticUtil.java b/core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformerStaticUtil.java
index 4c872993..487a8be8 100644
--- a/core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformerStaticUtil.java
+++ b/core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformerStaticUtil.java
@@ -1,10 +1,18 @@
 package com.alibaba.datax.core.transport.transformer;
 
+import org.apache.commons.codec.digest.DigestUtils;
+
 /**
  * GroovyTransformer的帮助类,供groovy代码使用,必须全是static的方法
  * Created by liqiang on 16/3/4.
  */
 public class GroovyTransformerStaticUtil  {
 
+    public static String md5(final String data) {
+        return DigestUtils.md5Hex(data);
+    }
 
+    public static String sha1(final String data) {
+        return DigestUtils.sha1Hex(data);
+    }
 }
diff --git a/core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerRegistry.java b/core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerRegistry.java
index 96a0d988..3c625153 100644
--- a/core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerRegistry.java
+++ b/core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerRegistry.java
@@ -36,6 +36,7 @@ public class TransformerRegistry {
         registTransformer(new ReplaceTransformer());
         registTransformer(new FilterTransformer());
         registTransformer(new GroovyTransformer());
+        registTransformer(new DigestTransformer());
     }
 
     public static void loadTransformerFromLocalStorage() {
diff --git a/core/src/main/java/com/alibaba/datax/core/util/ConfigParser.java b/core/src/main/java/com/alibaba/datax/core/util/ConfigParser.java
index 20039864..24f43d55 100755
--- a/core/src/main/java/com/alibaba/datax/core/util/ConfigParser.java
+++ b/core/src/main/java/com/alibaba/datax/core/util/ConfigParser.java
@@ -168,6 +168,7 @@ public final class ConfigParser {
         boolean isDefaultPath = StringUtils.isBlank(pluginPath);
         if (isDefaultPath) {
             configuration.set("path", path);
+            configuration.set("loadType","jarLoader");
         }
 
         Configuration result = Configuration.newDefault();
diff --git a/core/src/main/java/com/alibaba/datax/core/util/container/CoreConstant.java b/core/src/main/java/com/alibaba/datax/core/util/container/CoreConstant.java
index 6a0b6205..a1ca164d 100755
--- a/core/src/main/java/com/alibaba/datax/core/util/container/CoreConstant.java
+++ b/core/src/main/java/com/alibaba/datax/core/util/container/CoreConstant.java
@@ -105,7 +105,7 @@ public class CoreConstant {
 
     public static final String DATAX_JOB_POSTHANDLER_PLUGINNAME = "job.postHandler.pluginName";
     // ----------------------------- 局部使用的变量
-    public static final String JOB_WRITER = "reader";
+    public static final String JOB_WRITER = "writer";
 
 	public static final String JOB_READER = "reader";
 
diff --git a/core/src/main/java/com/alibaba/datax/core/util/container/JarLoader.java b/core/src/main/java/com/alibaba/datax/core/util/container/JarLoader.java
index 9fc113dc..ddf22bae 100755
--- a/core/src/main/java/com/alibaba/datax/core/util/container/JarLoader.java
+++ b/core/src/main/java/com/alibaba/datax/core/util/container/JarLoader.java
@@ -15,7 +15,7 @@ import java.util.List;
 /**
  * 提供Jar隔离的加载机制,会把传入的路径、及其子路径、以及路径中的jar文件加入到class path。
  */
-public class JarLoader extends URLClassLoader {
+public class JarLoader extends URLClassLoader{
     public JarLoader(String[] paths) {
         this(paths, JarLoader.class.getClassLoader());
     }
diff --git a/core/src/main/java/com/alibaba/datax/core/util/container/LoadUtil.java b/core/src/main/java/com/alibaba/datax/core/util/container/LoadUtil.java
index 30e926c3..9a6a8302 100755
--- a/core/src/main/java/com/alibaba/datax/core/util/container/LoadUtil.java
+++ b/core/src/main/java/com/alibaba/datax/core/util/container/LoadUtil.java
@@ -49,7 +49,7 @@ public class LoadUtil {
     /**
      * jarLoader的缓冲
      */
-    private static Map jarLoaderCenter = new HashMap();
+    private static Map jarLoaderCenter = new HashMap();
 
     /**
      * 设置pluginConfigs,方便后面插件来获取
diff --git a/core/src/main/job/job.json b/core/src/main/job/job.json
index 58206592..cc353877 100755
--- a/core/src/main/job/job.json
+++ b/core/src/main/job/job.json
@@ -2,7 +2,7 @@
     "job": {
         "setting": {
             "speed": {
-                "byte":10485760
+                "channel":1
             },
             "errorLimit": {
                 "record": 0,
diff --git a/databendwriter/doc/databendwriter-CN.md b/databendwriter/doc/databendwriter-CN.md
new file mode 100644
index 00000000..5b26ed7e
--- /dev/null
+++ b/databendwriter/doc/databendwriter-CN.md
@@ -0,0 +1,183 @@
+# DataX DatabendWriter
+[简体中文](./databendwriter-CN.md) | [English](./databendwriter.md)
+
+## 1 快速介绍
+
+Databend Writer 是一个 DataX 的插件,用于从 DataX 中写入数据到 Databend 表中。
+该插件基于[databend JDBC driver](https://github.com/databendcloud/databend-jdbc) ,它使用 [RESTful http protocol](https://databend.rs/doc/integrations/api/rest)
+在开源的 databend 和 [databend cloud](https://app.databend.com/) 上执行查询。
+
+在每个写入批次中,databend writer 将批量数据上传到内部的 S3 stage,然后执行相应的 insert SQL 将数据上传到 databend 表中。
+
+为了最佳的用户体验,如果您使用的是 databend 社区版本,您应该尝试采用 [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) 作为其底层存储层,因为
+它们支持预签名上传操作,否则您可能会在数据传输上浪费不必要的成本。
+
+您可以在[文档](https://databend.rs/doc/deploy/deploying-databend)中了解更多详细信息
+
+## 2 实现原理
+
+Databend Writer 将使用 DataX 从 DataX Reader 中获取生成的记录,并将记录批量插入到 databend 表中指定的列中。
+
+## 3 功能说明
+
+### 3.1 配置样例
+
+* 以下配置将从内存中读取一些生成的数据,并将数据上传到databend表中
+
+#### 准备工作
+```sql
+--- create table in databend
+drop table if exists datax.sample1;
+drop database if exists datax;
+create database if not exists datax;
+create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant);
+```
+
+#### 配置样例
+```json
+{
+  "job": {
+    "content": [
+      {
+        "reader": {
+          "name": "streamreader",
+          "parameter": {
+            "column" : [
+              {
+                "value": "DataX",
+                "type": "string"
+              },
+              {
+                "value": 19880808,
+                "type": "long"
+              },
+              {
+                "value": "1926-08-08 08:08:08",
+                "type": "date"
+              },
+              {
+                "value": "1988-08-08 08:08:08",
+                "type": "date"
+              },
+              {
+                "value": true,
+                "type": "bool"
+              },
+              {
+                "value": "test",
+                "type": "bytes"
+              },
+              {
+                "value": "{\"type\": \"variant\", \"value\": \"test\"}",
+                "type": "string"
+              }
+
+            ],
+            "sliceRecordCount": 10000
+          }
+        },
+        "writer": {
+          "name": "databendwriter",
+          "parameter": {
+            "writeMode": "replace", 
+            "onConflictColumn": ["id"],
+            "username": "databend",
+            "password": "databend",
+            "column": ["a", "b", "c", "d", "e", "f", "g"],
+            "batchSize": 1000,
+            "preSql": [
+            ],
+            "postSql": [
+            ],
+            "connection": [
+              {
+                "jdbcUrl": "jdbc:databend://localhost:8000/datax",
+                "table": [
+                  "sample1"
+                ]
+              }
+            ]
+          }
+        }
+      }
+    ],
+    "setting": {
+      "speed": {
+        "channel": 1
+       }
+    }
+  }
+}
+```
+
+### 3.2 参数说明
+* jdbcUrl
+    * 描述: JDBC 数据源 url。请参阅仓库中的详细[文档](https://github.com/databendcloud/databend-jdbc)
+    * 必选: 是
+    * 默认值: 无
+    * 示例: jdbc:databend://localhost:8000/datax
+* username
+    * 描述: JDBC 数据源用户名
+    * 必选: 是
+    * 默认值: 无
+    * 示例: databend
+* password
+    * 描述: JDBC 数据源密码
+    * 必选: 是
+    * 默认值: 无
+    * 示例: databend
+* table
+    * 描述: 表名的集合,table应该包含column参数中的所有列。
+    * 必选: 是
+    * 默认值: 无
+    * 示例: ["sample1"]
+* column
+    * 描述: 表中的列名集合,字段顺序应该与reader的record中的column类型对应
+    * 必选: 是
+    * 默认值: 无
+    * 示例: ["a", "b", "c", "d", "e", "f", "g"]
+* batchSize
+    * 描述: 每个批次的记录数
+    * 必选: 否
+    * 默认值: 1000
+    * 示例: 1000
+* preSql
+    * 描述: 在写入数据之前执行的SQL语句
+    * 必选: 否
+    * 默认值: 无
+    * 示例: ["delete from datax.sample1"]
+* postSql
+    * 描述: 在写入数据之后执行的SQL语句
+    * 必选: 否
+    * 默认值: 无
+    * 示例: ["select count(*) from datax.sample1"]
+* writeMode
+    * 描述:写入模式,支持 insert 和 replace 两种模式,默认为 insert。若为 replace,务必填写 onConflictColumn 参数
+    * 必选:否
+    * 默认值:insert
+    * 示例:"replace"
+* onConflictColumn
+    * 描述:on conflict 字段,指定 writeMode 为 replace 后,需要此参数
+    * 必选:否
+    * 默认值:无
+    * 示例:["id","user"]
+
+### 3.3 类型转化
+DataX中的数据类型可以转换为databend中的相应数据类型。下表显示了两种类型之间的对应关系。
+
+| DataX 内部类型 | Databend 数据类型                                             |
+|------------|-----------------------------------------------------------|
+| INT        | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
+| LONG       | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
+| STRING     | STRING, VARCHAR                                           |
+| DOUBLE     | FLOAT, DOUBLE                                             |
+| BOOL       | BOOLEAN, BOOL                                             |
+| DATE       | DATE, TIMESTAMP                                           |
+| BYTES      | STRING, VARCHAR                                           |
+
+## 4 性能测试
+
+## 5 约束限制
+目前,复杂数据类型支持不稳定,如果您想使用复杂数据类型,例如元组,数组,请检查databend和jdbc驱动程序的进一步版本。
+
+## FAQ
\ No newline at end of file
diff --git a/databendwriter/doc/databendwriter.md b/databendwriter/doc/databendwriter.md
new file mode 100644
index 00000000..c92d6387
--- /dev/null
+++ b/databendwriter/doc/databendwriter.md
@@ -0,0 +1,176 @@
+# DataX DatabendWriter
+[简体中文](./databendwriter-CN.md) | [English](./databendwriter.md)
+
+## 1 Introduction
+Databend Writer is a plugin for DataX to write data to Databend Table from dataX records.
+The plugin is based on [databend JDBC driver](https://github.com/databendcloud/databend-jdbc) which use [RESTful http protocol](https://databend.rs/doc/integrations/api/rest)
+to execute query on open source databend and [databend cloud](https://app.databend.com/).
+
+During each write batch, databend writer will upload batch data into internal S3 stage and execute corresponding insert SQL to upload data into databend table.
+
+For best user experience, if you are using databend community distribution, you should try to adopt [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) as its underlying storage layer since 
+they support presign upload operation otherwise you may expend unneeded cost on data transfer.
+
+You could see more details on the [doc](https://databend.rs/doc/deploy/deploying-databend)
+
+## 2 Detailed Implementation
+Databend Writer would use DataX to fetch records generated by DataX Reader, and then batch insert records to the designated columns for your databend table.
+
+## 3 Features
+### 3.1 Example Configurations
+* the following configuration would read some generated data in memory and upload data into databend table
+
+#### Preparation
+```sql
+--- create table in databend
+drop table if exists datax.sample1;
+drop database if exists datax;
+create database if not exists datax;
+create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant);
+```
+
+#### Configurations
+```json
+{
+  "job": {
+    "content": [
+      {
+        "reader": {
+          "name": "streamreader",
+          "parameter": {
+            "column" : [
+              {
+                "value": "DataX",
+                "type": "string"
+              },
+              {
+                "value": 19880808,
+                "type": "long"
+              },
+              {
+                "value": "1926-08-08 08:08:08",
+                "type": "date"
+              },
+              {
+                "value": "1988-08-08 08:08:08",
+                "type": "date"
+              },
+              {
+                "value": true,
+                "type": "bool"
+              },
+              {
+                "value": "test",
+                "type": "bytes"
+              },
+              {
+                "value": "{\"type\": \"variant\", \"value\": \"test\"}",
+                "type": "string"
+              }
+
+            ],
+            "sliceRecordCount": 10000
+          }
+        },
+        "writer": {
+          "name": "databendwriter",
+          "parameter": {
+            "username": "databend",
+            "password": "databend",
+            "column": ["a", "b", "c", "d", "e", "f", "g"],
+            "batchSize": 1000,
+            "preSql": [
+            ],
+            "postSql": [
+            ],
+            "connection": [
+              {
+                "jdbcUrl": "jdbc:databend://localhost:8000/datax",
+                "table": [
+                  "sample1"
+                ]
+              }
+            ]
+          }
+        }
+      }
+    ],
+    "setting": {
+      "speed": {
+        "channel": 1
+       }
+    }
+  }
+}
+```
+
+### 3.2 Configuration Description
+* jdbcUrl
+  * Description: JDBC Data source url in Databend. Please take a look at repository for detailed [doc](https://github.com/databendcloud/databend-jdbc)
+  * Required: yes
+  * Default: none
+  * Example: jdbc:databend://localhost:8000/datax
+* username
+  * Description: Databend user name
+  * Required: yes
+  * Default: none
+  * Example: databend
+* password
+  * Description: Databend user password
+  * Required: yes
+  * Default: none
+  * Example: databend
+* table
+  * Description: A list of table names that should contain all of the columns in the column parameter.
+  * Required: yes
+  * Default: none
+  * Example: ["sample1"]
+* column
+  * Description: A list of column field names that should be inserted into the table. if you want to insert all column fields use `["*"]` instead.
+  * Required: yes
+  * Default: none
+  * Example: ["a", "b", "c", "d", "e", "f", "g"]
+* batchSize
+  * Description: The number of records to be inserted in each batch.
+  * Required: no
+  * Default: 1024
+* preSql
+  * Description: A list of SQL statements that will be executed before the write operation.
+  * Required: no
+  * Default: none
+* postSql
+  * Description: A list of SQL statements that will be executed after the write operation.
+  * Required: no
+  * Default: none
+* writeMode
+  * Description:The write mode, support `insert` and `replace` two mode.
+  * Required:no
+  * Default:insert
+  * Example:"replace"
+* onConflictColumn
+  * Description:On conflict fields list.
+  * Required:no
+  * Default:none
+  * Example:["id","user"]
+
+### 3.3 Type Convert
+Data types in datax can be converted to the corresponding data types in databend. The following table shows the correspondence between the two types.
+
+| DataX Type | Databend Type                                             |
+|------------|-----------------------------------------------------------|
+| INT        | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
+| LONG       | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 |
+| STRING     | STRING, VARCHAR                                           |
+| DOUBLE     | FLOAT, DOUBLE                                             |
+| BOOL       | BOOLEAN, BOOL                                             |
+| DATE       | DATE, TIMESTAMP                                           |
+| BYTES      | STRING, VARCHAR                                           |
+
+
+## 4 Performance Test
+
+
+## 5 Restrictions
+Currently, complex data type support is not stable, if you want to use complex data type such as tuple, array, please check further release version of databend and jdbc driver.
+
+## FAQ
diff --git a/databendwriter/pom.xml b/databendwriter/pom.xml
new file mode 100644
index 00000000..b99ca5d8
--- /dev/null
+++ b/databendwriter/pom.xml
@@ -0,0 +1,101 @@
+
+
+    
+        datax-all
+        com.alibaba.datax
+        0.0.1-SNAPSHOT
+    
+
+    4.0.0
+    databendwriter
+    databendwriter
+    jar
+
+    
+        
+            com.databend
+            databend-jdbc
+            0.1.0
+        
+        
+            com.alibaba.datax
+            datax-core
+            ${datax-project-version}
+        
+        
+            com.alibaba.datax
+            datax-common
+            ${datax-project-version}
+        
+        
+            org.slf4j
+            slf4j-api
+        
+
+        
+            ch.qos.logback
+            logback-classic
+        
+
+        
+            com.alibaba.datax
+            plugin-rdbms-util
+            ${datax-project-version}
+            
+                
+                    com.google.guava
+                    guava
+                
+            
+        
+
+
+        
+            junit
+            junit
+            test
+        
+    
+    
+        
+            
+                src/main/java
+                
+                    **/*.properties
+                
+            
+        
+        
+            
+            
+                maven-compiler-plugin
+                
+                    ${jdk-version}
+                    ${jdk-version}
+                    ${project-sourceEncoding}
+                
+            
+            
+            
+                maven-assembly-plugin
+                
+                    
+                        src/main/assembly/package.xml
+                    
+                    datax
+                
+                
+                    
+                        dwzip
+                        package
+                        
+                            single
+                        
+                    
+                
+            
+        
+    
+
diff --git a/databendwriter/src/main/assembly/package.xml b/databendwriter/src/main/assembly/package.xml
new file mode 100755
index 00000000..8a9ba1b2
--- /dev/null
+++ b/databendwriter/src/main/assembly/package.xml
@@ -0,0 +1,34 @@
+
+    
+    
+        dir
+    
+    false
+    
+        
+            src/main/resources
+            
+                plugin.json
+ 				plugin_job_template.json
+ 			
+            plugin/writer/databendwriter
+        
+        
+            target/
+            
+                databendwriter-0.0.1-SNAPSHOT.jar
+            
+            plugin/writer/databendwriter
+        
+    
+
+    
+        
+            false
+            plugin/writer/databendwriter/libs
+        
+    
+
diff --git a/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriter.java b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriter.java
new file mode 100644
index 00000000..ddb8fc9a
--- /dev/null
+++ b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriter.java
@@ -0,0 +1,241 @@
+package com.alibaba.datax.plugin.writer.databendwriter;
+
+import com.alibaba.datax.common.element.Column;
+import com.alibaba.datax.common.element.StringColumn;
+import com.alibaba.datax.common.exception.CommonErrorCode;
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.common.plugin.RecordReceiver;
+import com.alibaba.datax.common.spi.Writer;
+import com.alibaba.datax.common.util.Configuration;
+import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
+import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
+import com.alibaba.datax.plugin.writer.databendwriter.util.DatabendWriterUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.sql.*;
+import java.util.List;
+import java.util.regex.Pattern;
+
+public class DatabendWriter extends Writer {
+    private static final DataBaseType DATABASE_TYPE = DataBaseType.Databend;
+
+    public static class Job
+            extends Writer.Job {
+        private static final Logger LOG = LoggerFactory.getLogger(Job.class);
+        private Configuration originalConfig;
+        private CommonRdbmsWriter.Job commonRdbmsWriterMaster;
+
+        @Override
+        public void init() throws DataXException {
+            this.originalConfig = super.getPluginJobConf();
+            this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE);
+            this.commonRdbmsWriterMaster.init(this.originalConfig);
+            // placeholder currently not supported by databend driver, needs special treatment
+            DatabendWriterUtil.dealWriteMode(this.originalConfig);
+        }
+
+        @Override
+        public void preCheck() {
+            this.init();
+            this.commonRdbmsWriterMaster.writerPreCheck(this.originalConfig, DATABASE_TYPE);
+        }
+
+        @Override
+        public void prepare() {
+            this.commonRdbmsWriterMaster.prepare(this.originalConfig);
+        }
+
+        @Override
+        public List split(int mandatoryNumber) {
+            return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber);
+        }
+
+        @Override
+        public void post() {
+            this.commonRdbmsWriterMaster.post(this.originalConfig);
+        }
+
+        @Override
+        public void destroy() {
+            this.commonRdbmsWriterMaster.destroy(this.originalConfig);
+        }
+    }
+
+
+    public static class Task extends Writer.Task {
+        private static final Logger LOG = LoggerFactory.getLogger(Task.class);
+
+        private Configuration writerSliceConfig;
+
+        private CommonRdbmsWriter.Task commonRdbmsWriterSlave;
+
+        @Override
+        public void init() {
+            this.writerSliceConfig = super.getPluginJobConf();
+
+            this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DataBaseType.Databend) {
+                @Override
+                protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException {
+                    try {
+                        if (column.getRawData() == null) {
+                            preparedStatement.setNull(columnIndex + 1, columnSqltype);
+                            return preparedStatement;
+                        }
+
+                        java.util.Date utilDate;
+                        switch (columnSqltype) {
+
+                            case Types.TINYINT:
+                            case Types.SMALLINT:
+                            case Types.INTEGER:
+                                preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue());
+                                break;
+                            case Types.BIGINT:
+                                preparedStatement.setLong(columnIndex + 1, column.asLong());
+                                break;
+                            case Types.DECIMAL:
+                                preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal());
+                                break;
+                            case Types.FLOAT:
+                            case Types.REAL:
+                                preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue());
+                                break;
+                            case Types.DOUBLE:
+                                preparedStatement.setDouble(columnIndex + 1, column.asDouble());
+                                break;
+                            case Types.DATE:
+                                java.sql.Date sqlDate = null;
+                                try {
+                                    utilDate = column.asDate();
+                                } catch (DataXException e) {
+                                    throw new SQLException(String.format(
+                                            "Date type conversion error: [%s]", column));
+                                }
+
+                                if (null != utilDate) {
+                                    sqlDate = new java.sql.Date(utilDate.getTime());
+                                }
+                                preparedStatement.setDate(columnIndex + 1, sqlDate);
+                                break;
+
+                            case Types.TIME:
+                                java.sql.Time sqlTime = null;
+                                try {
+                                    utilDate = column.asDate();
+                                } catch (DataXException e) {
+                                    throw new SQLException(String.format(
+                                            "Date type conversion error: [%s]", column));
+                                }
+
+                                if (null != utilDate) {
+                                    sqlTime = new java.sql.Time(utilDate.getTime());
+                                }
+                                preparedStatement.setTime(columnIndex + 1, sqlTime);
+                                break;
+
+                            case Types.TIMESTAMP:
+                                Timestamp sqlTimestamp = null;
+                                if (column instanceof StringColumn && column.asString() != null) {
+                                    String timeStampStr = column.asString();
+                                    // JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式
+                                    String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+";
+                                    boolean isMatch = Pattern.matches(pattern, timeStampStr);
+                                    if (isMatch) {
+                                        sqlTimestamp = Timestamp.valueOf(timeStampStr);
+                                        preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
+                                        break;
+                                    }
+                                }
+                                try {
+                                    utilDate = column.asDate();
+                                } catch (DataXException e) {
+                                    throw new SQLException(String.format(
+                                            "Date type conversion error: [%s]", column));
+                                }
+
+                                if (null != utilDate) {
+                                    sqlTimestamp = new Timestamp(
+                                            utilDate.getTime());
+                                }
+                                preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
+                                break;
+
+                            case Types.BINARY:
+                            case Types.VARBINARY:
+                            case Types.BLOB:
+                            case Types.LONGVARBINARY:
+                                preparedStatement.setBytes(columnIndex + 1, column
+                                        .asBytes());
+                                break;
+
+                            case Types.BOOLEAN:
+
+                                // warn: bit(1) -> Types.BIT 可使用setBoolean
+                                // warn: bit(>1) -> Types.VARBINARY 可使用setBytes
+                            case Types.BIT:
+                                if (this.dataBaseType == DataBaseType.MySql) {
+                                    Boolean asBoolean = column.asBoolean();
+                                    if (asBoolean != null) {
+                                        preparedStatement.setBoolean(columnIndex + 1, asBoolean);
+                                    } else {
+                                        preparedStatement.setNull(columnIndex + 1, Types.BIT);
+                                    }
+                                } else {
+                                    preparedStatement.setString(columnIndex + 1, column.asString());
+                                }
+                                break;
+
+                            default:
+                                // cast variant / array into string is fine.
+                                preparedStatement.setString(columnIndex + 1, column.asString());
+                                break;
+                        }
+                        return preparedStatement;
+                    } catch (DataXException e) {
+                        // fix类型转换或者溢出失败时,将具体哪一列打印出来
+                        if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT ||
+                                e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) {
+                            throw DataXException
+                                    .asDataXException(
+                                            e.getErrorCode(),
+                                            String.format(
+                                                    "type conversion error. columnName: [%s], columnType:[%d], columnJavaType: [%s]. please change the data type in given column field or do not sync on the column.",
+                                                    this.resultSetMetaData.getLeft()
+                                                            .get(columnIndex),
+                                                    this.resultSetMetaData.getMiddle()
+                                                            .get(columnIndex),
+                                                    this.resultSetMetaData.getRight()
+                                                            .get(columnIndex)));
+                        } else {
+                            throw e;
+                        }
+                    }
+                }
+
+            };
+            this.commonRdbmsWriterSlave.init(this.writerSliceConfig);
+        }
+
+        @Override
+        public void destroy() {
+            this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig);
+        }
+
+        @Override
+        public void prepare() {
+            this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig);
+        }
+
+        @Override
+        public void post() {
+            this.commonRdbmsWriterSlave.post(this.writerSliceConfig);
+        }
+
+        @Override
+        public void startWrite(RecordReceiver lineReceiver) {
+            this.commonRdbmsWriterSlave.startWrite(lineReceiver, this.writerSliceConfig, this.getTaskPluginCollector());
+        }
+
+    }
+}
diff --git a/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriterErrorCode.java b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriterErrorCode.java
new file mode 100644
index 00000000..21cbf428
--- /dev/null
+++ b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriterErrorCode.java
@@ -0,0 +1,33 @@
+package com.alibaba.datax.plugin.writer.databendwriter;
+
+import com.alibaba.datax.common.spi.ErrorCode;
+
+
+public enum DatabendWriterErrorCode implements ErrorCode {
+    CONF_ERROR("DatabendWriter-00", "配置错误."),
+    WRITE_DATA_ERROR("DatabendWriter-01", "写入数据时失败."),
+    ;
+
+    private final String code;
+    private final String description;
+
+    private DatabendWriterErrorCode(String code, String description) {
+        this.code = code;
+        this.description = description;
+    }
+
+    @Override
+    public String getCode() {
+        return this.code;
+    }
+
+    @Override
+    public String getDescription() {
+        return this.description;
+    }
+
+    @Override
+    public String toString() {
+        return String.format("Code:[%s], Description:[%s].", this.code, this.description);
+    }
+}
\ No newline at end of file
diff --git a/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/util/DatabendWriterUtil.java b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/util/DatabendWriterUtil.java
new file mode 100644
index 00000000..516a75eb
--- /dev/null
+++ b/databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/util/DatabendWriterUtil.java
@@ -0,0 +1,72 @@
+package com.alibaba.datax.plugin.writer.databendwriter.util;
+
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.common.util.Configuration;
+import com.alibaba.datax.plugin.rdbms.writer.Constant;
+import com.alibaba.datax.plugin.rdbms.writer.Key;
+
+import com.alibaba.datax.plugin.writer.databendwriter.DatabendWriterErrorCode;
+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.xml.crypto.Data;
+import java.util.List;
+import java.util.StringJoiner;
+
+public final class DatabendWriterUtil {
+    private static final Logger LOG = LoggerFactory.getLogger(DatabendWriterUtil.class);
+
+    private DatabendWriterUtil() {
+    }
+
+    public static void dealWriteMode(Configuration originalConfig) throws DataXException {
+        List columns = originalConfig.getList(Key.COLUMN, String.class);
+        List onConflictColumns = originalConfig.getList(Key.ONCONFLICT_COLUMN, String.class);
+        StringBuilder writeDataSqlTemplate = new StringBuilder();
+
+        String jdbcUrl = originalConfig.getString(String.format("%s[0].%s",
+                Constant.CONN_MARK, Key.JDBC_URL, String.class));
+
+        String writeMode = originalConfig.getString(Key.WRITE_MODE, "INSERT");
+        LOG.info("write mode is {}", writeMode);
+        if (writeMode.toLowerCase().contains("replace")) {
+            if (onConflictColumns == null || onConflictColumns.size() == 0) {
+                throw DataXException
+                        .asDataXException(
+                                DatabendWriterErrorCode.CONF_ERROR,
+                                String.format(
+                                        "Replace mode must has onConflictColumn config."
+                                ));
+            }
+
+            // for databend if you want to use replace mode, the writeMode should be:  "writeMode": "replace"
+            writeDataSqlTemplate.append("REPLACE INTO %s (")
+                    .append(StringUtils.join(columns, ",")).append(") ").append(onConFlictDoString(onConflictColumns))
+                    .append(" VALUES");
+
+            LOG.info("Replace data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl);
+            originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate);
+        } else {
+            writeDataSqlTemplate.append("INSERT INTO %s");
+            StringJoiner columnString = new StringJoiner(",");
+
+            for (String column : columns) {
+                columnString.add(column);
+            }
+            writeDataSqlTemplate.append(String.format("(%s)", columnString));
+            writeDataSqlTemplate.append(" VALUES");
+
+            LOG.info("Insert data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl);
+
+            originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate);
+        }
+
+    }
+
+    public static String onConFlictDoString(List conflictColumns) {
+        return " ON " +
+                "(" +
+                StringUtils.join(conflictColumns, ",") + ") ";
+    }
+}
diff --git a/databendwriter/src/main/resources/plugin.json b/databendwriter/src/main/resources/plugin.json
new file mode 100644
index 00000000..bab0130d
--- /dev/null
+++ b/databendwriter/src/main/resources/plugin.json
@@ -0,0 +1,6 @@
+{
+  "name": "databendwriter",
+  "class": "com.alibaba.datax.plugin.writer.databendwriter.DatabendWriter",
+  "description": "execute batch insert sql to write dataX data into databend",
+  "developer": "databend"
+}
\ No newline at end of file
diff --git a/databendwriter/src/main/resources/plugin_job_template.json b/databendwriter/src/main/resources/plugin_job_template.json
new file mode 100644
index 00000000..34d4b251
--- /dev/null
+++ b/databendwriter/src/main/resources/plugin_job_template.json
@@ -0,0 +1,19 @@
+{
+  "name": "databendwriter",
+  "parameter": {
+    "username": "username",
+    "password": "password",
+    "column": ["col1", "col2", "col3"],
+    "connection": [
+      {
+        "jdbcUrl": "jdbc:databend://:[/]",
+        "table": "table1"
+      }
+    ],
+    "preSql": [],
+    "postSql": [],
+
+    "maxBatchRows": 65536,
+    "maxBatchSize": 134217728
+  }
+}
\ No newline at end of file
diff --git a/datahubreader/pom.xml b/datahubreader/pom.xml
new file mode 100644
index 00000000..c0022b44
--- /dev/null
+++ b/datahubreader/pom.xml
@@ -0,0 +1,79 @@
+
+
+    
+        datax-all
+        com.alibaba.datax
+        0.0.1-SNAPSHOT
+    
+    4.0.0
+
+    datahubreader
+
+    0.0.1-SNAPSHOT
+
+    
+        
+            com.alibaba.datax
+            datax-common
+            ${datax-project-version}
+            
+                
+                    slf4j-log4j12
+                    org.slf4j
+                
+            
+        
+        
+            org.slf4j
+            slf4j-api
+        
+        
+            ch.qos.logback
+            logback-classic
+        
+        
+            com.aliyun.datahub
+            aliyun-sdk-datahub
+            2.21.6-public
+        
+        
+            junit
+            junit
+            4.12
+            test
+        
+    
+
+    
+        
+            
+            
+                maven-compiler-plugin
+                
+                    ${jdk-version}
+                    ${jdk-version}
+                    ${project-sourceEncoding}
+                
+            
+            
+            
+                maven-assembly-plugin
+                
+                    
+                        src/main/assembly/package.xml
+                    
+                    datax
+                
+                
+                    
+                        dwzip
+                        package
+                        
+                            single
+                        
+                    
+                
+            
+        
+    
+
diff --git a/datahubreader/src/main/assembly/package.xml b/datahubreader/src/main/assembly/package.xml
new file mode 100644
index 00000000..d14ea981
--- /dev/null
+++ b/datahubreader/src/main/assembly/package.xml
@@ -0,0 +1,34 @@
+
+    
+    
+        dir
+    
+    false
+    
+        
+            src/main/resources
+            
+                plugin.json
+            
+            plugin/reader/datahubreader
+        
+        
+            target/
+            
+                datahubreader-0.0.1-SNAPSHOT.jar
+            
+            plugin/reader/datahubreader
+        
+    
+
+    
+        
+            false
+            plugin/reader/datahubreader/libs
+            runtime
+        
+    
+
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Constant.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Constant.java
new file mode 100644
index 00000000..bee3ccd7
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Constant.java
@@ -0,0 +1,8 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+public class Constant {
+
+    public static String DATETIME_FORMAT = "yyyyMMddHHmmss";
+    public static String DATE_FORMAT = "yyyyMMdd";
+
+}
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubClientHelper.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubClientHelper.java
new file mode 100644
index 00000000..2b7bcec4
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubClientHelper.java
@@ -0,0 +1,42 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+import com.alibaba.datax.common.util.Configuration;
+import com.alibaba.fastjson2.JSON;
+import com.alibaba.fastjson2.TypeReference;
+import com.aliyun.datahub.client.DatahubClient;
+import com.aliyun.datahub.client.DatahubClientBuilder;
+import com.aliyun.datahub.client.auth.Account;
+import com.aliyun.datahub.client.auth.AliyunAccount;
+import com.aliyun.datahub.client.common.DatahubConfig;
+import com.aliyun.datahub.client.http.HttpConfig;
+import org.apache.commons.lang3.StringUtils;
+
+public class DatahubClientHelper {
+    public static DatahubClient getDatahubClient(Configuration jobConfig) {
+        String accessId = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        String accessKey = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        String endpoint = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        Account account = new AliyunAccount(accessId, accessKey);
+        // 是否开启二进制传输,服务端2.12版本开始支持
+        boolean enableBinary = jobConfig.getBool("enableBinary", false);
+        DatahubConfig datahubConfig = new DatahubConfig(endpoint, account, enableBinary);
+        // HttpConfig可不设置,不设置时采用默认值
+        // 读写数据推荐打开网络传输 LZ4压缩
+        HttpConfig httpConfig = null;
+        String httpConfigStr = jobConfig.getString("httpConfig");
+        if (StringUtils.isNotBlank(httpConfigStr)) {
+            httpConfig = JSON.parseObject(httpConfigStr, new TypeReference() {
+            });
+        }
+
+        DatahubClientBuilder builder = DatahubClientBuilder.newBuilder().setDatahubConfig(datahubConfig);
+        if (null != httpConfig) {
+            builder.setHttpConfig(httpConfig);
+        }
+        DatahubClient datahubClient = builder.build();
+        return datahubClient;
+    }
+}
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReader.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReader.java
new file mode 100644
index 00000000..4792ac39
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReader.java
@@ -0,0 +1,292 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import com.aliyun.datahub.client.model.*;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.alibaba.datax.common.element.Column;
+import com.alibaba.datax.common.element.Record;
+import com.alibaba.datax.common.element.StringColumn;
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.common.plugin.RecordSender;
+import com.alibaba.datax.common.spi.Reader;
+import com.alibaba.datax.common.util.Configuration;
+
+
+import com.aliyun.datahub.client.DatahubClient;
+
+
+public class DatahubReader extends Reader {
+    public static class Job extends Reader.Job {
+        private static final Logger LOG = LoggerFactory.getLogger(Job.class);
+        
+        private Configuration originalConfig;
+        
+        private Long beginTimestampMillis;
+        private Long endTimestampMillis;
+        
+        DatahubClient datahubClient;
+        
+        @Override
+        public void init() {
+            LOG.info("datahub reader job init begin ...");
+            this.originalConfig = super.getPluginJobConf();
+            validateParameter(originalConfig);
+            this.datahubClient = DatahubClientHelper.getDatahubClient(this.originalConfig);
+            LOG.info("datahub reader job init end.");
+        }
+        
+        private void validateParameter(Configuration conf){
+            conf.getNecessaryValue(Key.ENDPOINT,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.ACCESSKEYID,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.ACCESSKEYSECRET,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.PROJECT,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.TOPIC,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.COLUMN,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.BEGINDATETIME,DatahubReaderErrorCode.REQUIRE_VALUE);
+            conf.getNecessaryValue(Key.ENDDATETIME,DatahubReaderErrorCode.REQUIRE_VALUE);
+            
+            int batchSize = this.originalConfig.getInt(Key.BATCHSIZE, 1024);
+            if (batchSize > 10000) {
+                throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                        "Invalid batchSize[" + batchSize + "] value (0,10000]!");
+            }
+            
+            String beginDateTime = this.originalConfig.getString(Key.BEGINDATETIME);            
+            if (beginDateTime != null) {
+                try {
+                    beginTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(beginDateTime);
+                } catch (ParseException e) {
+                    throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                            "Invalid beginDateTime[" + beginDateTime + "], format [yyyyMMddHHmmss]!");    
+                }
+            }
+            
+            if (beginTimestampMillis != null && beginTimestampMillis <= 0) {
+                throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                        "Invalid beginTimestampMillis[" + beginTimestampMillis + "]!");               
+            }
+            
+            String endDateTime = this.originalConfig.getString(Key.ENDDATETIME);            
+            if (endDateTime != null) {
+                try {
+                    endTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(endDateTime);
+                } catch (ParseException e) {
+                    throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                            "Invalid beginDateTime[" + endDateTime + "], format [yyyyMMddHHmmss]!");    
+                }
+            }
+            
+            if (endTimestampMillis != null && endTimestampMillis <= 0) {
+                throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                        "Invalid endTimestampMillis[" + endTimestampMillis + "]!");                
+            }
+            
+            if (beginTimestampMillis != null && endTimestampMillis != null
+                    && endTimestampMillis <= beginTimestampMillis) {
+                throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                        "endTimestampMillis[" + endTimestampMillis + "] must bigger than beginTimestampMillis[" + beginTimestampMillis + "]!");  
+            }
+        }
+        
+        @Override
+        public void prepare() {
+            // create datahub client
+            String project = originalConfig.getNecessaryValue(Key.PROJECT, DatahubReaderErrorCode.REQUIRE_VALUE);
+            String topic = originalConfig.getNecessaryValue(Key.TOPIC, DatahubReaderErrorCode.REQUIRE_VALUE);
+            RecordType recordType = null;
+            try {
+                DatahubClient client = DatahubClientHelper.getDatahubClient(this.originalConfig);
+                GetTopicResult getTopicResult = client.getTopic(project, topic);
+                recordType = getTopicResult.getRecordType();
+            } catch (Exception e) {
+                LOG.warn("get topic type error: {}", e.getMessage());
+            }
+            if (null != recordType) {
+                if (recordType == RecordType.BLOB) {
+                    throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                            "DatahubReader only support 'Tuple' RecordType now, but your RecordType is 'BLOB'");
+                }
+            }
+        }
+
+        @Override
+        public void destroy() {
+        }
+
+        @Override
+        public List split(int adviceNumber) {
+            LOG.info("split() begin...");
+            
+            List readerSplitConfigs = new ArrayList();
+            
+            String project = this.originalConfig.getString(Key.PROJECT);
+            String topic = this.originalConfig.getString(Key.TOPIC);
+            
+            List shardEntrys = DatahubReaderUtils.getShardsWithRetry(this.datahubClient, project, topic);
+            if (shardEntrys == null || shardEntrys.isEmpty() || shardEntrys.size() == 0) {
+                throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                        "Project [" + project + "] Topic [" + topic + "] has no shards, please check !");       
+            }
+            
+            for (ShardEntry shardEntry : shardEntrys) {
+                Configuration splitedConfig = this.originalConfig.clone();
+                splitedConfig.set(Key.SHARDID, shardEntry.getShardId());
+                readerSplitConfigs.add(splitedConfig);
+            }
+            
+            LOG.info("split() ok and end...");
+            return readerSplitConfigs;
+        }
+        
+    }
+    
+    public static class Task extends Reader.Task {
+        private static final Logger LOG = LoggerFactory.getLogger(Task.class);
+        
+        private Configuration taskConfig;
+        
+        private String accessId;
+        private String accessKey;
+        private String endpoint;
+        private String project;
+        private String topic;
+        private String shardId;
+        private Long beginTimestampMillis;
+        private Long endTimestampMillis;
+        private int batchSize;
+        private List columns;
+        private RecordSchema schema;
+        private String timeStampUnit;
+        
+        DatahubClient datahubClient;
+        
+        @Override
+        public void init() {
+            this.taskConfig = super.getPluginJobConf();
+            
+            this.accessId = this.taskConfig.getString(Key.ACCESSKEYID);
+            this.accessKey = this.taskConfig.getString(Key.ACCESSKEYSECRET);
+            this.endpoint = this.taskConfig.getString(Key.ENDPOINT);
+            this.project = this.taskConfig.getString(Key.PROJECT);
+            this.topic = this.taskConfig.getString(Key.TOPIC);
+            this.shardId = this.taskConfig.getString(Key.SHARDID);
+            this.batchSize = this.taskConfig.getInt(Key.BATCHSIZE, 1024);
+            this.timeStampUnit = this.taskConfig.getString(Key.TIMESTAMP_UNIT, "MICROSECOND");
+            try {
+                this.beginTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(this.taskConfig.getString(Key.BEGINDATETIME));
+            } catch (ParseException e) {                
+            }
+            
+            try {
+                this.endTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(this.taskConfig.getString(Key.ENDDATETIME));
+            } catch (ParseException e) {                
+            }
+            
+            this.columns = this.taskConfig.getList(Key.COLUMN, String.class);
+            
+            this.datahubClient = DatahubClientHelper.getDatahubClient(this.taskConfig);
+
+
+            this.schema = DatahubReaderUtils.getDatahubSchemaWithRetry(this.datahubClient, this.project, topic);
+            
+            LOG.info("init datahub reader task finished.project:{} topic:{} batchSize:{}", project, topic, batchSize);
+        }
+
+        @Override
+        public void destroy() {
+        }
+
+        @Override
+        public void startRead(RecordSender recordSender) {
+            LOG.info("read start");
+            
+            String beginCursor = DatahubReaderUtils.getCursorWithRetry(this.datahubClient, this.project, 
+                    this.topic, this.shardId, this.beginTimestampMillis);
+            String endCursor = DatahubReaderUtils.getCursorWithRetry(this.datahubClient, this.project, 
+                    this.topic, this.shardId, this.endTimestampMillis);
+            
+            if (beginCursor == null) {
+                LOG.info("Shard:{} has no data!", this.shardId);
+                return;
+            } else if (endCursor == null) {
+                endCursor = DatahubReaderUtils.getLatestCursorWithRetry(this.datahubClient, this.project,
+                        this.topic, this.shardId);
+            }
+            
+            String curCursor = beginCursor;
+            
+            boolean exit = false;
+            
+            while (true) {
+                
+                GetRecordsResult result = DatahubReaderUtils.getRecordsResultWithRetry(this.datahubClient, this.project, this.topic,
+                        this.shardId, this.batchSize, curCursor, this.schema);
+                                
+                List records = result.getRecords();
+                if (records.size() > 0) {
+                    for (RecordEntry record : records) {
+                        if (record.getSystemTime() >= this.endTimestampMillis) {
+                            exit = true;
+                            break;
+                        }
+                        
+                        HashMap dataMap = new HashMap();
+                        List fields = ((TupleRecordData) record.getRecordData()).getRecordSchema().getFields();
+                        for (int i = 0; i < fields.size(); i++) {
+                            Field field = fields.get(i);
+                            Column column = DatahubReaderUtils.getColumnFromField(record, field, this.timeStampUnit);
+                            dataMap.put(field.getName(), column);
+                        }
+                        
+                        Record dataxRecord = recordSender.createRecord();
+                        
+                        if (null != this.columns && 1 == this.columns.size()) {
+                            String columnsInStr = columns.get(0).toString();
+                            if ("\"*\"".equals(columnsInStr) || "*".equals(columnsInStr)) {
+                                for (int i = 0; i < fields.size(); i++) {
+                                    dataxRecord.addColumn(dataMap.get(fields.get(i).getName()));
+                                }
+
+                            } else {
+                                if (dataMap.containsKey(columnsInStr)) {
+                                    dataxRecord.addColumn(dataMap.get(columnsInStr));
+                                } else {
+                                    dataxRecord.addColumn(new StringColumn(null));
+                                }
+                            }
+                        } else {
+                            for (String col : this.columns) {
+                                if (dataMap.containsKey(col)) {
+                                    dataxRecord.addColumn(dataMap.get(col));
+                                } else {
+                                    dataxRecord.addColumn(new StringColumn(null));
+                                }
+                            }
+                        }                         
+
+                        recordSender.sendToWriter(dataxRecord);                           
+                    }
+                } else {
+                    break;
+                }
+                
+                if (exit) {
+                    break;
+                }
+                
+                curCursor = result.getNextCursor();
+            }
+            
+            
+            LOG.info("end read datahub shard...");
+        }
+        
+    }
+
+}
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderErrorCode.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderErrorCode.java
new file mode 100644
index 00000000..949a66f0
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderErrorCode.java
@@ -0,0 +1,35 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+import com.alibaba.datax.common.spi.ErrorCode;
+
+public enum DatahubReaderErrorCode implements ErrorCode {
+    BAD_CONFIG_VALUE("DatahubReader-00", "The value you configured is invalid."),
+    LOG_HUB_ERROR("DatahubReader-01","Datahub exception"),
+    REQUIRE_VALUE("DatahubReader-02","Missing parameters"),
+    EMPTY_LOGSTORE_VALUE("DatahubReader-03","There is no shard under this LogStore");
+
+
+    private final String code;
+    private final String description;
+
+    private DatahubReaderErrorCode(String code, String description) {
+        this.code = code;
+        this.description = description;
+    }
+
+    @Override
+    public String getCode() {
+        return this.code;
+    }
+
+    @Override
+    public String getDescription() {
+        return this.description;
+    }
+
+    @Override
+    public String toString() {
+        return String.format("Code:[%s], Description:[%s]. ", this.code,
+                this.description);
+    }
+}
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderUtils.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderUtils.java
new file mode 100644
index 00000000..6c3455df
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderUtils.java
@@ -0,0 +1,200 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+import java.math.BigDecimal;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.List;
+import java.util.concurrent.Callable;
+
+import com.alibaba.datax.common.element.*;
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.common.util.DataXCaseEnvUtil;
+import com.alibaba.datax.common.util.RetryUtil;
+
+import com.aliyun.datahub.client.DatahubClient;
+import com.aliyun.datahub.client.exception.InvalidParameterException;
+import com.aliyun.datahub.client.model.*;
+
+public class DatahubReaderUtils {
+
+    public static long getUnixTimeFromDateTime(String dateTime) throws ParseException {
+        try {
+            String format = Constant.DATETIME_FORMAT;
+            SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format);
+            return simpleDateFormat.parse(dateTime).getTime();
+        } catch (ParseException ignored) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "Invalid DateTime[" + dateTime + "]!");   
+        }
+    }
+    
+    public static List getShardsWithRetry(final DatahubClient datahubClient, final String project, final String topic) {
+        
+        List shards = null;
+        try {
+            shards = RetryUtil.executeWithRetry(new Callable>() {
+                @Override
+                public List call() throws Exception {
+                    ListShardResult listShardResult = datahubClient.listShard(project, topic);
+                    return listShardResult.getShards(); 
+                }
+            }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true));
+            
+        } catch (Exception e) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "get Shards error, please check ! detail error messsage: " + e.toString());
+        }         
+        return shards;
+    }
+    
+    public static String getCursorWithRetry(final DatahubClient datahubClient, final String project, final String topic, 
+            final String shardId, final long timestamp) {
+        
+        String cursor;
+        try {
+            cursor = RetryUtil.executeWithRetry(new Callable() {
+                @Override
+                public String call() throws Exception {
+                    try {
+                        return datahubClient.getCursor(project, topic, shardId, CursorType.SYSTEM_TIME, timestamp).getCursor();
+                    } catch (InvalidParameterException e) {
+                        if (e.getErrorMessage().indexOf("Time in seek request is out of range") >= 0) {
+                            return null;
+                        } else {
+                            throw e;
+                        }
+                        
+                    }
+                }
+            }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true));
+            
+        } catch (Exception e) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "get Cursor error, please check ! detail error messsage: " + e.toString());
+        }         
+        return cursor;
+    }
+    
+    public static String getLatestCursorWithRetry(final DatahubClient datahubClient, final String project, final String topic,
+            final String shardId) {
+        
+        String cursor;
+        try {
+            cursor = RetryUtil.executeWithRetry(new Callable() {
+                @Override
+                public String call() throws Exception {
+                    return datahubClient.getCursor(project, topic, shardId, CursorType.LATEST).getCursor();
+                }
+            }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true));
+            
+        } catch (Exception e) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "get Cursor error, please check ! detail error messsage: " + e.toString());
+        }         
+        return cursor;
+    }    
+    
+    public static RecordSchema getDatahubSchemaWithRetry(final DatahubClient datahubClient, final String project, final String topic) {
+        
+        RecordSchema schema;
+        try {
+            schema = RetryUtil.executeWithRetry(new Callable() {
+                @Override
+                public RecordSchema call() throws Exception {
+                    return datahubClient.getTopic(project, topic).getRecordSchema();
+                }
+            }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true));
+            
+        } catch (Exception e) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "get Topic Schema error, please check ! detail error messsage: " + e.toString());
+        }         
+        return schema;
+    } 
+    
+    public static GetRecordsResult getRecordsResultWithRetry(final DatahubClient datahubClient, final String project,
+            final String topic, final String shardId, final int batchSize, final String cursor, final RecordSchema schema) {
+        
+        GetRecordsResult result;
+        try  {
+            result = RetryUtil.executeWithRetry(new Callable() {
+                @Override
+                public GetRecordsResult call() throws Exception {
+                    return datahubClient.getRecords(project, topic, shardId, schema, cursor, batchSize);
+                }
+            }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true));
+            
+        } catch (Exception e) {
+            throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE,
+                    "get Record Result error, please check ! detail error messsage: " + e.toString());
+        }     
+        return result;
+        
+    }
+    
+    public static Column getColumnFromField(RecordEntry record, Field field, String timeStampUnit) {
+        Column col = null;
+        TupleRecordData o = (TupleRecordData) record.getRecordData();
+
+        switch (field.getType()) {
+            case SMALLINT:
+                Short shortValue = ((Short) o.getField(field.getName()));
+                col = new LongColumn(shortValue == null ? null: shortValue.longValue());
+                break;
+            case INTEGER:
+                col = new LongColumn((Integer) o.getField(field.getName()));
+                break;
+            case BIGINT: {
+                col = new LongColumn((Long) o.getField(field.getName()));
+                break;
+            }
+            case TINYINT: {
+                Byte byteValue = ((Byte) o.getField(field.getName()));
+                col = new LongColumn(byteValue == null ? null : byteValue.longValue());
+                break;
+            }
+            case BOOLEAN: {
+                col = new BoolColumn((Boolean) o.getField(field.getName()));
+                break;
+            }
+            case FLOAT:
+                col = new DoubleColumn((Float) o.getField(field.getName()));
+                break;
+            case DOUBLE: {
+                col = new DoubleColumn((Double) o.getField(field.getName()));
+                break;
+            }
+            case STRING: {
+                col = new StringColumn((String) o.getField(field.getName()));
+                break;
+            }
+            case DECIMAL: {
+                BigDecimal value = (BigDecimal) o.getField(field.getName());
+                col = new DoubleColumn(value == null ? null : value.doubleValue());
+                break;
+            }
+            case TIMESTAMP: {
+                Long value = (Long) o.getField(field.getName());
+
+                if ("MILLISECOND".equals(timeStampUnit)) {
+                    // MILLISECOND, 13位精度,直接 new Date()
+                    col = new DateColumn(value == null ? null : new Date(value));
+                }
+                else if ("SECOND".equals(timeStampUnit)){
+                    col = new DateColumn(value == null ? null : new Date(value * 1000));
+                }
+                else {
+                    // 默认都是 MICROSECOND, 16位精度, 和之前的逻辑保持一致。
+                    col = new DateColumn(value == null ? null : new Date(value / 1000));
+                }
+                break;
+            }
+            default:
+                throw new RuntimeException("Unknown column type: " + field.getType());
+        }
+        
+        return col;
+    }
+    
+}
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubWriterErrorCode.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubWriterErrorCode.java
new file mode 100644
index 00000000..c8633ea8
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubWriterErrorCode.java
@@ -0,0 +1,37 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+import com.alibaba.datax.common.spi.ErrorCode;
+import com.alibaba.datax.common.util.MessageSource;
+
+public enum DatahubWriterErrorCode implements ErrorCode {
+    MISSING_REQUIRED_VALUE("DatahubWriter-01", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.missing_required_value")),
+    INVALID_CONFIG_VALUE("DatahubWriter-02", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.invalid_config_value")),
+    GET_TOPOIC_INFO_FAIL("DatahubWriter-03", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.get_topic_info_fail")),
+    WRITE_DATAHUB_FAIL("DatahubWriter-04", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.write_datahub_fail")),
+    SCHEMA_NOT_MATCH("DatahubWriter-05", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.schema_not_match")),
+    ;
+
+    private final String code;
+    private final String description;
+
+    private DatahubWriterErrorCode(String code, String description) {
+        this.code = code;
+        this.description = description;
+    }
+
+    @Override
+    public String getCode() {
+        return this.code;
+    }
+
+    @Override
+    public String getDescription() {
+        return this.description;
+    }
+
+    @Override
+    public String toString() {
+        return String.format("Code:[%s], Description:[%s]. ", this.code,
+                this.description);
+    }
+}
\ No newline at end of file
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Key.java b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Key.java
new file mode 100644
index 00000000..3cb84b4b
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Key.java
@@ -0,0 +1,35 @@
+package com.alibaba.datax.plugin.reader.datahubreader;
+
+public final class Key {
+
+    /**
+     * 此处声明插件用到的需要插件使用者提供的配置项
+     */
+    public static final String ENDPOINT = "endpoint";
+
+    public static final String ACCESSKEYID = "accessId";
+
+    public static final String ACCESSKEYSECRET = "accessKey";
+
+    public static final String PROJECT = "project";
+    
+    public static final String TOPIC = "topic";
+        
+    public static final String BEGINDATETIME = "beginDateTime";
+    
+    public static final String ENDDATETIME = "endDateTime";
+
+    public static final String BATCHSIZE = "batchSize";
+    
+    public static final String COLUMN = "column";
+    
+    public static final String SHARDID = "shardId";
+
+    public static final String CONFIG_KEY_ENDPOINT = "endpoint";
+    public static final String CONFIG_KEY_ACCESS_ID = "accessId";
+    public static final String CONFIG_KEY_ACCESS_KEY = "accessKey";
+
+
+    public static final String TIMESTAMP_UNIT = "timeStampUnit";
+    
+}
\ No newline at end of file
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings.properties
new file mode 100644
index 00000000..e85c8ab3
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings.properties
@@ -0,0 +1,5 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_en_US.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_en_US.properties
new file mode 100644
index 00000000..31a291e6
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_en_US.properties
@@ -0,0 +1,5 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.
\ No newline at end of file
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_ja_JP.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_ja_JP.properties
new file mode 100644
index 00000000..31a291e6
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_ja_JP.properties
@@ -0,0 +1,5 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.
\ No newline at end of file
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_CN.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_CN.properties
new file mode 100644
index 00000000..31a291e6
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_CN.properties
@@ -0,0 +1,5 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.
\ No newline at end of file
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_HK.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_HK.properties
new file mode 100644
index 00000000..c6a3a0e0
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_HK.properties
@@ -0,0 +1,9 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值.
+errorcode.invalid_config_value=您的參數配寘錯誤.
+errorcode.get_topic_info_fail=獲取shard清單失敗.
+errorcode.write_datahub_fail=寫數據失敗.
+errorcode.schema_not_match=數據格式錯誤.
diff --git a/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_TW.properties b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_TW.properties
new file mode 100644
index 00000000..c6a3a0e0
--- /dev/null
+++ b/datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_TW.properties
@@ -0,0 +1,9 @@
+errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C.
+errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF.
+errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25.
+errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25.
+errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值.
+errorcode.invalid_config_value=您的參數配寘錯誤.
+errorcode.get_topic_info_fail=獲取shard清單失敗.
+errorcode.write_datahub_fail=寫數據失敗.
+errorcode.schema_not_match=數據格式錯誤.
diff --git a/datahubreader/src/main/resources/job_config_template.json b/datahubreader/src/main/resources/job_config_template.json
new file mode 100644
index 00000000..eaf89804
--- /dev/null
+++ b/datahubreader/src/main/resources/job_config_template.json
@@ -0,0 +1,14 @@
+{
+    "name": "datahubreader",
+    "parameter": {
+        "endpoint":"",
+        "accessId": "",
+        "accessKey": "",
+        "project": "",
+        "topic": "",
+        "beginDateTime": "20180913121019",
+        "endDateTime": "20180913121119",
+        "batchSize": 1024,
+        "column": []
+    }
+}
\ No newline at end of file
diff --git a/datahubreader/src/main/resources/plugin.json b/datahubreader/src/main/resources/plugin.json
new file mode 100644
index 00000000..47b1c86b
--- /dev/null
+++ b/datahubreader/src/main/resources/plugin.json
@@ -0,0 +1,6 @@
+{
+    "name": "datahubreader",
+    "class": "com.alibaba.datax.plugin.reader.datahubreader.DatahubReader",
+    "description": "datahub reader",
+    "developer": "alibaba"
+}
\ No newline at end of file
diff --git a/datahubwriter/pom.xml b/datahubwriter/pom.xml
new file mode 100644
index 00000000..1ee1fe9b
--- /dev/null
+++ b/datahubwriter/pom.xml
@@ -0,0 +1,79 @@
+
+
+    
+        datax-all
+        com.alibaba.datax
+        0.0.1-SNAPSHOT
+    
+    4.0.0
+
+    datahubwriter
+
+    0.0.1-SNAPSHOT
+
+    
+        
+            com.alibaba.datax
+            datax-common
+            ${datax-project-version}
+            
+                
+                    slf4j-log4j12
+                    org.slf4j
+                
+            
+        
+        
+            org.slf4j
+            slf4j-api
+        
+        
+            ch.qos.logback
+            logback-classic
+        
+        
+            com.aliyun.datahub
+            aliyun-sdk-datahub
+            2.21.6-public
+        
+        
+            junit
+            junit
+            4.12
+            test
+        
+    
+
+    
+        
+            
+            
+                maven-compiler-plugin
+                
+                    ${jdk-version}
+                    ${jdk-version}
+                    ${project-sourceEncoding}
+                
+            
+            
+            
+                maven-assembly-plugin
+                
+                    
+                        src/main/assembly/package.xml
+                    
+                    datax
+                
+                
+                    
+                        dwzip
+                        package
+                        
+                            single
+                        
+                    
+                
+            
+        
+    
+
diff --git a/datahubwriter/src/main/assembly/package.xml b/datahubwriter/src/main/assembly/package.xml
new file mode 100644
index 00000000..aaef9f99
--- /dev/null
+++ b/datahubwriter/src/main/assembly/package.xml
@@ -0,0 +1,34 @@
+
+    
+    
+        dir
+    
+    false
+    
+        
+            src/main/resources
+            
+                plugin.json
+            
+            plugin/writer/datahubwriter
+        
+        
+            target/
+            
+                datahubwriter-0.0.1-SNAPSHOT.jar
+            
+            plugin/writer/datahubwriter
+        
+    
+
+    
+        
+            false
+            plugin/writer/datahubwriter/libs
+            runtime
+        
+    
+
diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubClientHelper.java b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubClientHelper.java
new file mode 100644
index 00000000..c25d1210
--- /dev/null
+++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubClientHelper.java
@@ -0,0 +1,43 @@
+package com.alibaba.datax.plugin.writer.datahubwriter;
+
+import org.apache.commons.lang3.StringUtils;
+
+import com.alibaba.datax.common.util.Configuration;
+import com.alibaba.fastjson2.JSON;
+import com.alibaba.fastjson2.TypeReference;
+import com.aliyun.datahub.client.DatahubClient;
+import com.aliyun.datahub.client.DatahubClientBuilder;
+import com.aliyun.datahub.client.auth.Account;
+import com.aliyun.datahub.client.auth.AliyunAccount;
+import com.aliyun.datahub.client.common.DatahubConfig;
+import com.aliyun.datahub.client.http.HttpConfig;
+
+public class DatahubClientHelper {
+    public static DatahubClient getDatahubClient(Configuration jobConfig) {
+        String accessId = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        String accessKey = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        String endpoint = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT,
+                DatahubWriterErrorCode.MISSING_REQUIRED_VALUE);
+        Account account = new AliyunAccount(accessId, accessKey);
+        // 是否开启二进制传输,服务端2.12版本开始支持
+        boolean enableBinary = jobConfig.getBool("enableBinary", false);
+        DatahubConfig datahubConfig = new DatahubConfig(endpoint, account, enableBinary);
+        // HttpConfig可不设置,不设置时采用默认值
+        // 读写数据推荐打开网络传输 LZ4压缩
+        HttpConfig httpConfig = null;
+        String httpConfigStr = jobConfig.getString("httpConfig");
+        if (StringUtils.isNotBlank(httpConfigStr)) {
+            httpConfig = JSON.parseObject(httpConfigStr, new TypeReference() {
+            });
+        }
+
+        DatahubClientBuilder builder = DatahubClientBuilder.newBuilder().setDatahubConfig(datahubConfig);
+        if (null != httpConfig) {
+            builder.setHttpConfig(httpConfig);
+        }
+        DatahubClient datahubClient = builder.build();
+        return datahubClient;
+    }
+}
diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriter.java b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriter.java
new file mode 100644
index 00000000..cd414fc5
--- /dev/null
+++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriter.java
@@ -0,0 +1,355 @@
+package com.alibaba.datax.plugin.writer.datahubwriter;
+
+import com.alibaba.datax.common.element.Column;
+import com.alibaba.datax.common.element.Record;
+import com.alibaba.datax.common.exception.DataXException;
+import com.alibaba.datax.common.plugin.RecordReceiver;
+import com.alibaba.datax.common.spi.Writer;
+import com.alibaba.datax.common.util.Configuration;
+import com.alibaba.datax.common.util.DataXCaseEnvUtil;
+import com.alibaba.datax.common.util.RetryUtil;
+import com.alibaba.fastjson2.JSON;
+import com.aliyun.datahub.client.DatahubClient;
+import com.aliyun.datahub.client.model.FieldType;
+import com.aliyun.datahub.client.model.GetTopicResult;
+import com.aliyun.datahub.client.model.ListShardResult;
+import com.aliyun.datahub.client.model.PutErrorEntry;
+import com.aliyun.datahub.client.model.PutRecordsResult;
+import com.aliyun.datahub.client.model.RecordEntry;
+import com.aliyun.datahub.client.model.RecordSchema;
+import com.aliyun.datahub.client.model.RecordType;
+import com.aliyun.datahub.client.model.ShardEntry;
+import com.aliyun.datahub.client.model.ShardState;
+import com.aliyun.datahub.client.model.TupleRecordData;
+
+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.Callable;
+
+public class DatahubWriter extends Writer {
+
+    /**
+     * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。
+     * 

+ * 整个 Writer 执行流程是: + *

+     * Job类init-->prepare-->split
+     *
+     *                          Task类init-->prepare-->startWrite-->post-->destroy
+     *                          Task类init-->prepare-->startWrite-->post-->destroy
+     *
+     *                                                                            Job类post-->destroy
+     * 
+ */ + public static class Job extends Writer.Job { + private static final Logger LOG = LoggerFactory + .getLogger(Job.class); + + private Configuration jobConfig = null; + + @Override + public void init() { + this.jobConfig = super.getPluginJobConf(); + jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + jobConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + jobConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + } + + @Override + public void prepare() { + String project = jobConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, + DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + String topic = jobConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, + DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + RecordType recordType = null; + DatahubClient client = DatahubClientHelper.getDatahubClient(this.jobConfig); + try { + GetTopicResult getTopicResult = client.getTopic(project, topic); + recordType = getTopicResult.getRecordType(); + } catch (Exception e) { + LOG.warn("get topic type error: {}", e.getMessage()); + } + if (null != recordType) { + if (recordType == RecordType.BLOB) { + throw DataXException.asDataXException(DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, + "DatahubWriter only support 'Tuple' RecordType now, but your RecordType is 'BLOB'"); + } + } + } + + @Override + public List split(int mandatoryNumber) { + List configs = new ArrayList(); + for (int i = 0; i < mandatoryNumber; ++i) { + configs.add(jobConfig.clone()); + } + return configs; + } + + @Override + public void post() {} + + @Override + public void destroy() {} + + } + + public static class Task extends Writer.Task { + private static final Logger LOG = LoggerFactory + .getLogger(Task.class); + private static final List FATAL_ERRORS_DEFAULT = Arrays.asList( + "InvalidParameterM", + "MalformedRecord", + "INVALID_SHARDID", + "NoSuchTopic", + "NoSuchShard" + ); + + private Configuration taskConfig; + private DatahubClient client; + private String project; + private String topic; + private List shards; + private int maxCommitSize; + private int maxRetryCount; + private RecordSchema schema; + private long retryInterval; + private Random random; + private List column; + private List columnIndex; + private boolean enableColumnConfig; + private List fatalErrors; + + @Override + public void init() { + this.taskConfig = super.getPluginJobConf(); + project = taskConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + topic = taskConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); + maxCommitSize = taskConfig.getInt(Key.CONFIG_KEY_MAX_COMMIT_SIZE, 1024*1024); + maxRetryCount = taskConfig.getInt(Key.CONFIG_KEY_MAX_RETRY_COUNT, 500); + this.retryInterval = taskConfig.getInt(Key.RETRY_INTERVAL, 650); + this.random = new Random(); + this.column = this.taskConfig.getList(Key.CONFIG_KEY_COLUMN, String.class); + // ["*"] + if (null != this.column && 1 == this.column.size()) { + if (StringUtils.equals("*", this.column.get(0))) { + this.column = null; + } + } + this.columnIndex = new ArrayList(); + // 留个开关保平安 + this.enableColumnConfig = this.taskConfig.getBool("enableColumnConfig", true); + this.fatalErrors = this.taskConfig.getList("fatalErrors", Task.FATAL_ERRORS_DEFAULT, String.class); + this.client = DatahubClientHelper.getDatahubClient(this.taskConfig); + } + + @Override + public void prepare() { + final String shardIdConfig = this.taskConfig.getString(Key.CONFIG_KEY_SHARD_ID); + this.shards = new ArrayList(); + try { + RetryUtil.executeWithRetry(new Callable() { + @Override + public Void call() throws Exception { + ListShardResult result = client.listShard(project, topic); + if (StringUtils.isNotBlank(shardIdConfig)) { + shards.add(shardIdConfig); + } else { + for (ShardEntry shard : result.getShards()) { + if (shard.getState() == ShardState.ACTIVE || shard.getState() == ShardState.OPENING) { + shards.add(shard.getShardId()); + } + } + } + schema = client.getTopic(project, topic).getRecordSchema(); + return null; + } + }, DataXCaseEnvUtil.getRetryTimes(5), DataXCaseEnvUtil.getRetryInterval(10000L), DataXCaseEnvUtil.getRetryExponential(false)); + } catch (Exception e) { + throw DataXException.asDataXException(DatahubWriterErrorCode.GET_TOPOIC_INFO_FAIL, + "get topic info failed", e); + } + LOG.info("datahub topic {} shard to write: {}", this.topic, JSON.toJSONString(this.shards)); + LOG.info("datahub topic {} has schema: {}", this.topic, JSON.toJSONString(this.schema)); + + // 根据 schmea 顺序 和用户配置的 column,计算写datahub的顺序关系,以支持列换序 + // 后续统一使用 columnIndex 的顺位关系写 datahub + int totalSize = this.schema.getFields().size(); + if (null != this.column && !this.column.isEmpty() && this.enableColumnConfig) { + for (String eachCol : this.column) { + int indexFound = -1; + for (int i = 0; i < totalSize; i++) { + // warn: 大小写ignore + if (StringUtils.equalsIgnoreCase(eachCol, this.schema.getField(i).getName())) { + indexFound = i; + break; + } + } + if (indexFound >= 0) { + this.columnIndex.add(indexFound); + } else { + throw DataXException.asDataXException(DatahubWriterErrorCode.SCHEMA_NOT_MATCH, + String.format("can not find column %s in datahub topic %s", eachCol, this.topic)); + } + } + } else { + for (int i = 0; i < totalSize; i++) { + this.columnIndex.add(i); + } + } + } + + @Override + public void startWrite(RecordReceiver recordReceiver) { + Record record; + List records = new ArrayList(); + String shardId = null; + if (1 == this.shards.size()) { + shardId = shards.get(0); + } else { + shardId = shards.get(this.random.nextInt(shards.size())); + } + int commitSize = 0; + try { + while ((record = recordReceiver.getFromReader()) != null) { + RecordEntry dhRecord = convertRecord(record, shardId); + if (dhRecord != null) { + records.add(dhRecord); + } + commitSize += record.getByteSize(); + if (commitSize >= maxCommitSize) { + commit(records); + records.clear(); + commitSize = 0; + if (1 == this.shards.size()) { + shardId = shards.get(0); + } else { + shardId = shards.get(this.random.nextInt(shards.size())); + } + } + } + if (commitSize > 0) { + commit(records); + } + } catch (Exception e) { + throw DataXException.asDataXException( + DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, e); + } + } + + @Override + public void post() {} + + @Override + public void destroy() {} + + private void commit(List records) throws InterruptedException { + PutRecordsResult result = client.putRecords(project, topic, records); + if (result.getFailedRecordCount() > 0) { + for (int i = 0; i < maxRetryCount; ++i) { + boolean limitExceededMessagePrinted = false; + for (PutErrorEntry error : result.getPutErrorEntries()) { + // 如果是 LimitExceeded 这样打印日志,不能每行记录打印一次了 + if (StringUtils.equalsIgnoreCase("LimitExceeded", error.getErrorcode())) { + if (!limitExceededMessagePrinted) { + LOG.warn("write record error, request id: {}, error code: {}, error message: {}", + result.getRequestId(), error.getErrorcode(), error.getMessage()); + limitExceededMessagePrinted = true; + } + } else { + LOG.error("write record error, request id: {}, error code: {}, error message: {}", + result.getRequestId(), error.getErrorcode(), error.getMessage()); + } + if (this.fatalErrors.contains(error.getErrorcode())) { + throw DataXException.asDataXException( + DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, + error.getMessage()); + } + } + + if (this.retryInterval >= 0) { + Thread.sleep(this.retryInterval); + } else { + Thread.sleep(new Random().nextInt(700) + 300); + } + + result = client.putRecords(project, topic, result.getFailedRecords()); + if (result.getFailedRecordCount() == 0) { + return; + } + } + throw DataXException.asDataXException( + DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, + "write datahub failed"); + } + } + + private RecordEntry convertRecord(Record dxRecord, String shardId) { + try { + RecordEntry dhRecord = new RecordEntry(); + dhRecord.setShardId(shardId); + TupleRecordData data = new TupleRecordData(this.schema); + for (int i = 0; i < this.columnIndex.size(); ++i) { + int orderInSchema = this.columnIndex.get(i); + FieldType type = this.schema.getField(orderInSchema).getType(); + Column column = dxRecord.getColumn(i); + switch (type) { + case BIGINT: + data.setField(orderInSchema, column.asLong()); + break; + case DOUBLE: + data.setField(orderInSchema, column.asDouble()); + break; + case STRING: + data.setField(orderInSchema, column.asString()); + break; + case BOOLEAN: + data.setField(orderInSchema, column.asBoolean()); + break; + case TIMESTAMP: + if (null == column.asDate()) { + data.setField(orderInSchema, null); + } else { + data.setField(orderInSchema, column.asDate().getTime() * 1000); + } + break; + case DECIMAL: + // warn + data.setField(orderInSchema, column.asBigDecimal()); + break; + case INTEGER: + data.setField(orderInSchema, column.asLong()); + break; + case FLOAT: + data.setField(orderInSchema, column.asDouble()); + break; + case TINYINT: + data.setField(orderInSchema, column.asLong()); + break; + case SMALLINT: + data.setField(orderInSchema, column.asLong()); + break; + default: + throw DataXException.asDataXException( + DatahubWriterErrorCode.SCHEMA_NOT_MATCH, + String.format("does not support type: %s", type)); + } + } + dhRecord.setRecordData(data); + return dhRecord; + } catch (Exception e) { + super.getTaskPluginCollector().collectDirtyRecord(dxRecord, e, "convert recor failed"); + } + return null; + } + } + +} \ No newline at end of file diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriterErrorCode.java b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriterErrorCode.java new file mode 100644 index 00000000..ad03abd1 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriterErrorCode.java @@ -0,0 +1,37 @@ +package com.alibaba.datax.plugin.writer.datahubwriter; + +import com.alibaba.datax.common.spi.ErrorCode; +import com.alibaba.datax.common.util.MessageSource; + +public enum DatahubWriterErrorCode implements ErrorCode { + MISSING_REQUIRED_VALUE("DatahubWriter-01", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.missing_required_value")), + INVALID_CONFIG_VALUE("DatahubWriter-02", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.invalid_config_value")), + GET_TOPOIC_INFO_FAIL("DatahubWriter-03", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.get_topic_info_fail")), + WRITE_DATAHUB_FAIL("DatahubWriter-04", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.write_datahub_fail")), + SCHEMA_NOT_MATCH("DatahubWriter-05", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.schema_not_match")), + ; + + private final String code; + private final String description; + + private DatahubWriterErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s]. ", this.code, + this.description); + } +} \ No newline at end of file diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/Key.java b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/Key.java new file mode 100644 index 00000000..5f179234 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/Key.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.plugin.writer.datahubwriter; + +public final class Key { + + /** + * 此处声明插件用到的需要插件使用者提供的配置项 + */ + public static final String CONFIG_KEY_ENDPOINT = "endpoint"; + public static final String CONFIG_KEY_ACCESS_ID = "accessId"; + public static final String CONFIG_KEY_ACCESS_KEY = "accessKey"; + public static final String CONFIG_KEY_PROJECT = "project"; + public static final String CONFIG_KEY_TOPIC = "topic"; + public static final String CONFIG_KEY_WRITE_MODE = "mode"; + public static final String CONFIG_KEY_SHARD_ID = "shardId"; + public static final String CONFIG_KEY_MAX_COMMIT_SIZE = "maxCommitSize"; + public static final String CONFIG_KEY_MAX_RETRY_COUNT = "maxRetryCount"; + + public static final String CONFIG_VALUE_SEQUENCE_MODE = "sequence"; + public static final String CONFIG_VALUE_RANDOM_MODE = "random"; + + public final static String MAX_RETRY_TIME = "maxRetryTime"; + + public final static String RETRY_INTERVAL = "retryInterval"; + + public final static String CONFIG_KEY_COLUMN = "column"; +} diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings.properties new file mode 100644 index 00000000..e85c8ab3 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings.properties @@ -0,0 +1,5 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_en_US.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_en_US.properties new file mode 100644 index 00000000..31a291e6 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_en_US.properties @@ -0,0 +1,5 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. \ No newline at end of file diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_ja_JP.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..31a291e6 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_ja_JP.properties @@ -0,0 +1,5 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. \ No newline at end of file diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_CN.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..31a291e6 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_CN.properties @@ -0,0 +1,5 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. \ No newline at end of file diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_HK.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..c6a3a0e0 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_HK.properties @@ -0,0 +1,9 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. +errorcode.invalid_config_value=您的參數配寘錯誤. +errorcode.get_topic_info_fail=獲取shard清單失敗. +errorcode.write_datahub_fail=寫數據失敗. +errorcode.schema_not_match=數據格式錯誤. diff --git a/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_TW.properties b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..c6a3a0e0 --- /dev/null +++ b/datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_TW.properties @@ -0,0 +1,9 @@ +errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. +errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. +errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. +errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. +errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. +errorcode.invalid_config_value=您的參數配寘錯誤. +errorcode.get_topic_info_fail=獲取shard清單失敗. +errorcode.write_datahub_fail=寫數據失敗. +errorcode.schema_not_match=數據格式錯誤. diff --git a/datahubwriter/src/main/resources/job_config_template.json b/datahubwriter/src/main/resources/job_config_template.json new file mode 100644 index 00000000..8b0b41ae --- /dev/null +++ b/datahubwriter/src/main/resources/job_config_template.json @@ -0,0 +1,14 @@ +{ + "name": "datahubwriter", + "parameter": { + "endpoint":"", + "accessId": "", + "accessKey": "", + "project": "", + "topic": "", + "mode": "random", + "shardId": "", + "maxCommitSize": 524288, + "maxRetryCount": 500 + } +} \ No newline at end of file diff --git a/datahubwriter/src/main/resources/plugin.json b/datahubwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..91c17292 --- /dev/null +++ b/datahubwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "datahubwriter", + "class": "com.alibaba.datax.plugin.writer.datahubwriter.DatahubWriter", + "description": "datahub writer", + "developer": "alibaba" +} \ No newline at end of file diff --git a/datax-example/datax-example-core/pom.xml b/datax-example/datax-example-core/pom.xml new file mode 100644 index 00000000..6a2e9e8e --- /dev/null +++ b/datax-example/datax-example-core/pom.xml @@ -0,0 +1,20 @@ + + + 4.0.0 + + com.alibaba.datax + datax-example + 0.0.1-SNAPSHOT + + + datax-example-core + + + 8 + 8 + UTF-8 + + + \ No newline at end of file diff --git a/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/ExampleContainer.java b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/ExampleContainer.java new file mode 100644 index 00000000..a4229fd1 --- /dev/null +++ b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/ExampleContainer.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.example; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.core.Engine; +import com.alibaba.datax.example.util.ExampleConfigParser; + +/** + * {@code Date} 2023/8/6 11:22 + * + * @author fuyouj + */ + +public class ExampleContainer { + /** + * example对外暴露的启动入口 + * 使用前最好看下 datax-example/doc/README.MD + * @param jobPath 任务json绝对路径 + */ + public static void start(String jobPath) { + + Configuration configuration = ExampleConfigParser.parse(jobPath); + + Engine engine = new Engine(); + engine.start(configuration); + } +} diff --git a/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/Main.java b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/Main.java new file mode 100644 index 00000000..56bf9f0b --- /dev/null +++ b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/Main.java @@ -0,0 +1,23 @@ +package com.alibaba.datax.example; + + +import com.alibaba.datax.example.util.PathUtil; + +/** + * @author fuyouj + */ +public class Main { + + /** + * 1.在example模块pom文件添加你依赖的的调试插件, + * 你可以直接打开本模块的pom文件,参考是如何引入streamreader,streamwriter + * 2. 在此处指定你的job文件 + */ + public static void main(String[] args) { + + String classPathJobPath = "/job/stream2stream.json"; + String absJobPath = PathUtil.getAbsolutePathFromClassPath(classPathJobPath); + ExampleContainer.start(absJobPath); + } + +} diff --git a/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/ExampleConfigParser.java b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/ExampleConfigParser.java new file mode 100644 index 00000000..6bbb4a23 --- /dev/null +++ b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/ExampleConfigParser.java @@ -0,0 +1,154 @@ +package com.alibaba.datax.example.util; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.core.util.ConfigParser; +import com.alibaba.datax.core.util.FrameworkErrorCode; +import com.alibaba.datax.core.util.container.CoreConstant; + +import java.io.File; +import java.io.IOException; +import java.net.URL; +import java.nio.file.Paths; +import java.util.*; + +/** + * @author fuyouj + */ +public class ExampleConfigParser { + private static final String CORE_CONF = "/example/conf/core.json"; + + private static final String PLUGIN_DESC_FILE = "plugin.json"; + + /** + * 指定Job配置路径,ConfigParser会解析Job、Plugin、Core全部信息,并以Configuration返回 + * 不同于Core的ConfigParser,这里的core,plugin 不依赖于编译后的datax.home,而是扫描程序编译后的target目录 + */ + public static Configuration parse(final String jobPath) { + + Configuration configuration = ConfigParser.parseJobConfig(jobPath); + configuration.merge(coreConfig(), + false); + + Map pluginTypeMap = new HashMap<>(); + String readerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_READER_NAME); + String writerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME); + pluginTypeMap.put(readerName, "reader"); + pluginTypeMap.put(writerName, "writer"); + Configuration pluginsDescConfig = parsePluginsConfig(pluginTypeMap); + configuration.merge(pluginsDescConfig, false); + return configuration; + } + + private static Configuration parsePluginsConfig(Map pluginTypeMap) { + + Configuration configuration = Configuration.newDefault(); + + //最初打算通过user.dir获取工作目录来扫描插件, + //但是user.dir在不同有一些不确定性,所以废弃了这个选择 + + for (File basePackage : runtimeBasePackages()) { + if (pluginTypeMap.isEmpty()) { + break; + } + scanPluginByPackage(basePackage, configuration, basePackage.listFiles(), pluginTypeMap); + } + if (!pluginTypeMap.isEmpty()) { + String failedPlugin = pluginTypeMap.keySet().toString(); + String message = "\nplugin %s load failed :ry to analyze the reasons from the following aspects.。\n" + + "1: Check if the name of the plugin is spelled correctly, and verify whether DataX supports this plugin\n" + + "2:Verify if the tag has been added under section in the pom file of the relevant plugin.\n" + + " src/main/resources\n" + + " \n" + + " **/*.*\n" + + " \n" + + " true\n" + + " \n [Refer to the streamreader pom file] \n" + + "3: Check that the datax-yourPlugin-example module imported your test plugin"; + message = String.format(message, failedPlugin); + throw DataXException.asDataXException(FrameworkErrorCode.PLUGIN_INIT_ERROR, message); + } + return configuration; + } + + /** + * 通过classLoader获取程序编译的输出目录 + * + * @return File[/datax-example/target/classes,xxReader/target/classes,xxWriter/target/classes] + */ + private static File[] runtimeBasePackages() { + List basePackages = new ArrayList<>(); + ClassLoader classLoader = Thread.currentThread().getContextClassLoader(); + Enumeration resources = null; + try { + resources = classLoader.getResources(""); + } catch (IOException e) { + throw DataXException.asDataXException(e.getMessage()); + } + + while (resources.hasMoreElements()) { + URL resource = resources.nextElement(); + File file = new File(resource.getFile()); + if (file.isDirectory()) { + basePackages.add(file); + } + } + + return basePackages.toArray(new File[0]); + } + + /** + * @param packageFile 编译出来的target/classes根目录 便于找到插件时设置插件的URL目录,设置根目录是最保险的方式 + * @param configuration pluginConfig + * @param files 待扫描文件 + * @param needPluginTypeMap 需要的插件 + */ + private static void scanPluginByPackage(File packageFile, + Configuration configuration, + File[] files, + Map needPluginTypeMap) { + if (files == null) { + return; + } + for (File file : files) { + if (file.isFile() && PLUGIN_DESC_FILE.equals(file.getName())) { + Configuration pluginDesc = Configuration.from(file); + String descPluginName = pluginDesc.getString("name", ""); + + if (needPluginTypeMap.containsKey(descPluginName)) { + + String type = needPluginTypeMap.get(descPluginName); + configuration.merge(parseOnePlugin(packageFile.getAbsolutePath(), type, descPluginName, pluginDesc), false); + needPluginTypeMap.remove(descPluginName); + + } + } else { + scanPluginByPackage(packageFile, configuration, file.listFiles(), needPluginTypeMap); + } + } + } + + + private static Configuration parseOnePlugin(String packagePath, + String pluginType, + String pluginName, + Configuration pluginDesc) { + //设置path 兼容jarLoader的加载方式URLClassLoader + pluginDesc.set("path", packagePath); + Configuration pluginConfInJob = Configuration.newDefault(); + pluginConfInJob.set( + String.format("plugin.%s.%s", pluginType, pluginName), + pluginDesc.getInternal()); + return pluginConfInJob; + } + + private static Configuration coreConfig() { + try { + URL resource = ExampleConfigParser.class.getResource(CORE_CONF); + return Configuration.from(Paths.get(resource.toURI()).toFile()); + } catch (Exception ignore) { + throw DataXException.asDataXException("Failed to load the configuration file core.json. " + + "Please check whether /example/conf/core.json exists!"); + } + } +} diff --git a/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/PathUtil.java b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/PathUtil.java new file mode 100644 index 00000000..e197fa73 --- /dev/null +++ b/datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/PathUtil.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.example.util; + + +import com.alibaba.datax.common.exception.DataXException; + +import java.net.URI; +import java.net.URISyntaxException; +import java.net.URL; +import java.nio.file.Paths; + +/** + * @author fuyouj + */ +public class PathUtil { + public static String getAbsolutePathFromClassPath(String path) { + URL resource = PathUtil.class.getResource(path); + try { + assert resource != null; + URI uri = resource.toURI(); + return Paths.get(uri).toString(); + } catch (NullPointerException | URISyntaxException e) { + throw DataXException.asDataXException("path error,please check whether the path is correct"); + } + + } +} diff --git a/datax-example/datax-example-core/src/main/resources/example/conf/core.json b/datax-example/datax-example-core/src/main/resources/example/conf/core.json new file mode 100755 index 00000000..33281ac0 --- /dev/null +++ b/datax-example/datax-example-core/src/main/resources/example/conf/core.json @@ -0,0 +1,60 @@ +{ + "entry": { + "jvm": "-Xms1G -Xmx1G", + "environment": {} + }, + "common": { + "column": { + "datetimeFormat": "yyyy-MM-dd HH:mm:ss", + "timeFormat": "HH:mm:ss", + "dateFormat": "yyyy-MM-dd", + "extraFormats":["yyyyMMdd"], + "timeZone": "GMT+8", + "encoding": "utf-8" + } + }, + "core": { + "dataXServer": { + "address": "http://localhost:7001/api", + "timeout": 10000, + "reportDataxLog": false, + "reportPerfLog": false + }, + "transport": { + "channel": { + "class": "com.alibaba.datax.core.transport.channel.memory.MemoryChannel", + "speed": { + "byte": -1, + "record": -1 + }, + "flowControlInterval": 20, + "capacity": 512, + "byteCapacity": 67108864 + }, + "exchanger": { + "class": "com.alibaba.datax.core.plugin.BufferedRecordExchanger", + "bufferSize": 32 + } + }, + "container": { + "job": { + "reportInterval": 10000 + }, + "taskGroup": { + "channel": 5 + }, + "trace": { + "enable": "false" + } + + }, + "statistics": { + "collector": { + "plugin": { + "taskClass": "com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector", + "maxDirtyNumber": 10 + } + } + } + } +} diff --git a/datax-example/datax-example-core/src/test/java/com/alibaba/datax/example/util/PathUtilTest.java b/datax-example/datax-example-core/src/test/java/com/alibaba/datax/example/util/PathUtilTest.java new file mode 100644 index 00000000..8985b54c --- /dev/null +++ b/datax-example/datax-example-core/src/test/java/com/alibaba/datax/example/util/PathUtilTest.java @@ -0,0 +1,19 @@ +package com.alibaba.datax.example.util; + +import org.junit.Assert; +import org.junit.Test; + +/** + * {@code Author} FuYouJ + * {@code Date} 2023/8/19 21:38 + */ + +public class PathUtilTest { + + @Test + public void testParseClassPathFile() { + String path = "/pathTest.json"; + String absolutePathFromClassPath = PathUtil.getAbsolutePathFromClassPath(path); + Assert.assertNotNull(absolutePathFromClassPath); + } +} diff --git a/datax-example/datax-example-core/src/test/resources/pathTest.json b/datax-example/datax-example-core/src/test/resources/pathTest.json new file mode 100644 index 00000000..9e26dfee --- /dev/null +++ b/datax-example/datax-example-core/src/test/resources/pathTest.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/datax-example/datax-example-neo4j/pom.xml b/datax-example/datax-example-neo4j/pom.xml new file mode 100644 index 00000000..303b14a8 --- /dev/null +++ b/datax-example/datax-example-neo4j/pom.xml @@ -0,0 +1,43 @@ + + + 4.0.0 + + com.alibaba.datax + datax-example + 0.0.1-SNAPSHOT + + + datax-example-neo4j + + + 8 + 8 + UTF-8 + 1.17.6 + 4.4.9 + + + + com.alibaba.datax + datax-example-core + 0.0.1-SNAPSHOT + + + org.testcontainers + testcontainers + ${test.container.version} + + + com.alibaba.datax + neo4jwriter + 0.0.1-SNAPSHOT + + + com.alibaba.datax + datax-example-streamreader + 0.0.1-SNAPSHOT + + + \ No newline at end of file diff --git a/datax-example/datax-example-neo4j/src/test/java/com/alibaba/datax/example/neo4j/StreamReader2Neo4jWriterTest.java b/datax-example/datax-example-neo4j/src/test/java/com/alibaba/datax/example/neo4j/StreamReader2Neo4jWriterTest.java new file mode 100644 index 00000000..9cf01253 --- /dev/null +++ b/datax-example/datax-example-neo4j/src/test/java/com/alibaba/datax/example/neo4j/StreamReader2Neo4jWriterTest.java @@ -0,0 +1,138 @@ +package com.alibaba.datax.example.neo4j; + +import com.alibaba.datax.example.ExampleContainer; +import com.alibaba.datax.example.util.PathUtil; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.neo4j.driver.*; +import org.neo4j.driver.types.Node; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.testcontainers.containers.GenericContainer; +import org.testcontainers.containers.Network; +import org.testcontainers.containers.output.Slf4jLogConsumer; +import org.testcontainers.lifecycle.Startables; +import org.testcontainers.shaded.org.awaitility.Awaitility; +import org.testcontainers.utility.DockerImageName; +import org.testcontainers.utility.DockerLoggerFactory; + +import java.net.URI; +import java.util.Arrays; +import java.util.concurrent.TimeUnit; +import java.util.stream.Stream; + +/** + * {@code Author} FuYouJ + * {@code Date} 2023/8/19 21:48 + */ + +public class StreamReader2Neo4jWriterTest { + private static final Logger LOGGER = LoggerFactory.getLogger(StreamReader2Neo4jWriterTest.class); + private static final String CONTAINER_IMAGE = "neo4j:5.9.0"; + + private static final String CONTAINER_HOST = "neo4j-host"; + private static final int HTTP_PORT = 7474; + private static final int BOLT_PORT = 7687; + private static final String CONTAINER_NEO4J_USERNAME = "neo4j"; + private static final String CONTAINER_NEO4J_PASSWORD = "Test@12343"; + private static final URI CONTAINER_URI = URI.create("neo4j://localhost:" + BOLT_PORT); + + protected static final Network NETWORK = Network.newNetwork(); + + private GenericContainer container; + protected Driver neo4jDriver; + protected Session neo4jSession; + private static final int CHANNEL = 5; + private static final int READER_NUM = 10; + + @Before + public void init() { + DockerImageName imageName = DockerImageName.parse(CONTAINER_IMAGE); + container = + new GenericContainer<>(imageName) + .withNetwork(NETWORK) + .withNetworkAliases(CONTAINER_HOST) + .withExposedPorts(HTTP_PORT, BOLT_PORT) + .withEnv( + "NEO4J_AUTH", + CONTAINER_NEO4J_USERNAME + "/" + CONTAINER_NEO4J_PASSWORD) + .withEnv("apoc.export.file.enabled", "true") + .withEnv("apoc.import.file.enabled", "true") + .withEnv("apoc.import.file.use_neo4j_config", "true") + .withEnv("NEO4J_PLUGINS", "[\"apoc\"]") + .withLogConsumer( + new Slf4jLogConsumer( + DockerLoggerFactory.getLogger(CONTAINER_IMAGE))); + container.setPortBindings( + Arrays.asList( + String.format("%s:%s", HTTP_PORT, HTTP_PORT), + String.format("%s:%s", BOLT_PORT, BOLT_PORT))); + Startables.deepStart(Stream.of(container)).join(); + LOGGER.info("container started"); + Awaitility.given() + .ignoreExceptions() + .await() + .atMost(30, TimeUnit.SECONDS) + .untilAsserted(this::initConnection); + } + + //在neo4jWriter模块使用Example测试整个job,方便发现整个流程的代码问题 + @Test + public void streamReader2Neo4j() { + + deleteHistoryIfExist(); + + String path = "/streamreader2neo4j.json"; + String jobPath = PathUtil.getAbsolutePathFromClassPath(path); + + ExampleContainer.start(jobPath); + + //根据channel和reader的mock数据,校验结果集是否符合预期 + verifyWriteResult(); + } + + private void deleteHistoryIfExist() { + String query = "match (n:StreamReader) return n limit 1"; + String delete = "match (n:StreamReader) delete n"; + if (neo4jSession.run(query).hasNext()) { + neo4jSession.run(delete); + } + } + + private void verifyWriteResult() { + int total = CHANNEL * READER_NUM; + String query = "match (n:StreamReader) return n"; + Result run = neo4jSession.run(query); + int count = 0; + while (run.hasNext()) { + Record record = run.next(); + Node node = record.get("n").asNode(); + if (node.hasLabel("StreamReader")) { + count++; + } + } + Assert.assertEquals(count, total); + } + @After + public void destroy() { + if (neo4jSession != null) { + neo4jSession.close(); + } + if (neo4jDriver != null) { + neo4jDriver.close(); + } + if (container != null) { + container.close(); + } + } + + private void initConnection() { + neo4jDriver = + GraphDatabase.driver( + CONTAINER_URI, + AuthTokens.basic(CONTAINER_NEO4J_USERNAME, CONTAINER_NEO4J_PASSWORD)); + neo4jSession = neo4jDriver.session(SessionConfig.forDatabase("neo4j")); + } +} diff --git a/datax-example/datax-example-neo4j/src/test/resources/streamreader2neo4j.json b/datax-example/datax-example-neo4j/src/test/resources/streamreader2neo4j.json new file mode 100644 index 00000000..3d543ce3 --- /dev/null +++ b/datax-example/datax-example-neo4j/src/test/resources/streamreader2neo4j.json @@ -0,0 +1,51 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "sliceRecordCount": 10, + "column": [ + { + "type": "string", + "value": "StreamReader" + }, + { + "type": "string", + "value": "1997" + } + ] + } + }, + "writer": { + "name": "neo4jWriter", + "parameter": { + "uri": "bolt://localhost:7687", + "username":"neo4j", + "password":"Test@12343", + "database":"neo4j", + "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", + "batchDataVariableName": "batch", + "batchSize": "3", + "properties": [ + { + "name": "Label", + "type": "string" + }, + { + "name": "id", + "type": "STRING" + } + ] + } + } + } + ], + "setting": { + "speed": { + "channel": 5 + } + } + } +} \ No newline at end of file diff --git a/datax-example/datax-example-streamreader/pom.xml b/datax-example/datax-example-streamreader/pom.xml new file mode 100644 index 00000000..ea70de10 --- /dev/null +++ b/datax-example/datax-example-streamreader/pom.xml @@ -0,0 +1,37 @@ + + + 4.0.0 + + com.alibaba.datax + datax-example + 0.0.1-SNAPSHOT + + + datax-example-streamreader + + + 8 + 8 + UTF-8 + + + + com.alibaba.datax + datax-example-core + 0.0.1-SNAPSHOT + + + com.alibaba.datax + streamreader + 0.0.1-SNAPSHOT + + + com.alibaba.datax + streamwriter + 0.0.1-SNAPSHOT + + + + \ No newline at end of file diff --git a/datax-example/datax-example-streamreader/src/test/java/com/alibaba/datax/example/streamreader/StreamReader2StreamWriterTest.java b/datax-example/datax-example-streamreader/src/test/java/com/alibaba/datax/example/streamreader/StreamReader2StreamWriterTest.java new file mode 100644 index 00000000..71d083d0 --- /dev/null +++ b/datax-example/datax-example-streamreader/src/test/java/com/alibaba/datax/example/streamreader/StreamReader2StreamWriterTest.java @@ -0,0 +1,19 @@ +package com.alibaba.datax.example.streamreader; + +import com.alibaba.datax.example.ExampleContainer; +import com.alibaba.datax.example.util.PathUtil; +import org.junit.Test; + +/** + * {@code Author} FuYouJ + * {@code Date} 2023/8/14 20:16 + */ + +public class StreamReader2StreamWriterTest { + @Test + public void testStreamReader2StreamWriter() { + String path = "/stream2stream.json"; + String jobPath = PathUtil.getAbsolutePathFromClassPath(path); + ExampleContainer.start(jobPath); + } +} diff --git a/datax-example/datax-example-streamreader/src/test/resources/stream2stream.json b/datax-example/datax-example-streamreader/src/test/resources/stream2stream.json new file mode 100644 index 00000000..b2a57395 --- /dev/null +++ b/datax-example/datax-example-streamreader/src/test/resources/stream2stream.json @@ -0,0 +1,36 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "sliceRecordCount": 10, + "column": [ + { + "type": "long", + "value": "10" + }, + { + "type": "string", + "value": "hello,你好,世界-DataX" + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "encoding": "UTF-8", + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 5 + } + } + } +} \ No newline at end of file diff --git a/datax-example/doc/README.md b/datax-example/doc/README.md new file mode 100644 index 00000000..15f77e87 --- /dev/null +++ b/datax-example/doc/README.md @@ -0,0 +1,107 @@ +## [DataX-Example]调试datax插件的模块 + +### 为什么要开发这个模块 + +一般使用DataX启动数据同步任务是从datax.py 脚本开始,获取程序datax包目录设置到系统变量datax.home里,此后系统核心插件的加载,配置初始化均依赖于变量datax.home,这带来了一些麻烦,以一次本地 DeBug streamreader 插件为例。 + +- maven 打包 datax 生成 datax 目录 +- 在 IDE 中 设置系统环境变量 datax.home,或者在Engine启动类中硬编码设置datax.home。 +- 修改插件 streamreader 代码 +- 再次 maven 打包,使JarLoader 能够加载到最新的 streamreader 代码。 +- 调试代码 + +在以上步骤中,打包完全不必要且最耗时,等待打包也最煎熬。 + +所以我编写一个新的模块(datax-example),此模块特用于本地调试和复现 BUG。如果模块顺利编写完成,那么以上流程将被简化至两步。 + +- 修改插件 streamreader 代码。 +- 调试代码 + +img + +### 目录结构 +该目录结构演示了如何使用datax-example-core编写测试用例,和校验代码流程。 +img + +### 实现原理 + +- 不修改原有的ConfigParer,使用新的ExampleConfigParser,仅用于example模块。他不依赖datax.home,而是依赖ide编译后的target目录 +- 将ide的target目录作为每个插件的目录类加载目录。 + +![img](img/img02.png) + +### 如何使用 +1.修改插件的pom文件,做如下改动。以streamreader为例。
+改动前 +```xml + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + +``` +改动后 +```xml + + + + + src/main/resources + + **/*.* + + true + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + +``` +#### 在测试模块模块使用 +参考datax-example/datax-example-streamreader的StreamReader2StreamWriterTest.java +```java +public class StreamReader2StreamWriterTest { + @Test + public void testStreamReader2StreamWriter() { + String path = "/stream2stream.json"; + String jobPath = PathUtil.getAbsolutePathFromClassPath(path); + ExampleContainer.start(jobPath); + } +} + +``` +参考datax-example/datax-example-neo4j的StreamReader2Neo4jWriterTest +```java +public class StreamReader2Neo4jWriterTest{ +@Test + public void streamReader2Neo4j() { + + deleteHistoryIfExist(); + + String path = "/streamreader2neo4j.json"; + String jobPath = PathUtil.getAbsolutePathFromClassPath(path); + + ExampleContainer.start(jobPath); + + //根据channel和reader的mock数据,校验结果集是否符合预期 + verifyWriteResult(); + } +} +``` \ No newline at end of file diff --git a/datax-example/doc/img/img01.png b/datax-example/doc/img/img01.png new file mode 100644 index 00000000..d0431c1a Binary files /dev/null and b/datax-example/doc/img/img01.png differ diff --git a/datax-example/doc/img/img02.png b/datax-example/doc/img/img02.png new file mode 100644 index 00000000..eec860d4 Binary files /dev/null and b/datax-example/doc/img/img02.png differ diff --git a/datax-example/doc/img/img03.png b/datax-example/doc/img/img03.png new file mode 100644 index 00000000..731f81bd Binary files /dev/null and b/datax-example/doc/img/img03.png differ diff --git a/datax-example/pom.xml b/datax-example/pom.xml new file mode 100644 index 00000000..9c4c9200 --- /dev/null +++ b/datax-example/pom.xml @@ -0,0 +1,68 @@ + + + 4.0.0 + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + + datax-example + pom + + datax-example-core + datax-example-streamreader + datax-example-neo4j + + + + 8 + 8 + UTF-8 + 4.13.2 + + + + com.alibaba.datax + datax-common + 0.0.1-SNAPSHOT + + + com.alibaba.datax + datax-core + 0.0.1-SNAPSHOT + + + junit + junit + ${junit4.version} + test + + + + + + + src/main/resources + + **/*.* + + true + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + + \ No newline at end of file diff --git a/dataxPluginDev.md b/dataxPluginDev.md index 4483f270..8c7241bf 100644 --- a/dataxPluginDev.md +++ b/dataxPluginDev.md @@ -447,6 +447,9 @@ DataX的内部类型在实现上会选用不同的java类型: 3. 用户在插件中在`reader`/`writer`配置的`name`字段指定插件名字。框架根据插件的类型(`reader`/`writer`)和插件名称去插件的路径下扫描所有的jar,加入`classpath`。 4. 根据插件配置中定义的入口类,框架通过反射实例化对应的`Job`和`Task`对象。 +### 编写测试用例 +1. 在datax-example工程下新建新的插件测试模块,调用`ExampleContainer.start(jobPath)`方法来检测你的代码逻辑是否正确。[datax-example使用](https://github.com/alibaba/DataX/blob/master/datax-example/doc/README.md) + ## 三、Last but not Least diff --git a/dorisreader/doc/dorisreader.md b/dorisreader/doc/dorisreader.md new file mode 100644 index 00000000..c249c178 --- /dev/null +++ b/dorisreader/doc/dorisreader.md @@ -0,0 +1,224 @@ +# DorisReader 插件文档 + +___ + +## 1 快速介绍 + +DorisReader插件实现了从Doris读取数据。在底层实现上,DorisReader通过JDBC连接远程Doris数据库,并执行相应的sql语句将数据从doris库中SELECT出来。 + +## 2 实现原理 + +简而言之,DorisReader通过JDBC连接器连接到远程的Doris数据库,并根据用户配置的信息生成查询SELECT +SQL语句,然后发送到远程Doris数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 + +对于用户配置Table、Column、Where的信息,DorisReader将其拼接为SQL语句发送到Doris数据库;对于用户配置querySql信息,DorisReader直接将其发送到Doris数据库。 + +## 3 功能说明 + +### 3.1 配置样例 + +* 配置一个从Doris数据库同步抽取数据到本地的作业: + +``` +{ + "job": { + "setting": { + "speed": { + "channel": 3 + }, + "errorLimit": { + "record": 0, + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "dorisreader", + "parameter": { + "username": "root", + "password": "root", + "column": [ + "id", + "name" + ], + "splitPk": "db_id", + "connection": [ + { + "table": [ + "table" + ], + "jdbcUrl": [ + "jdbc:Doris://127.0.0.1:9030/database" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print":true + } + } + } + ] + } +} + +``` + +* 配置一个自定义SQL的数据库同步任务到本地内容的作业: + +``` +{ + "job": { + "setting": { + "speed": { + "channel":1 + } + }, + "content": [ + { + "reader": { + "name": "dorisreader", + "parameter": { + "username": "root", + "password": "root", + "connection": [ + { + "querySql": [ + "select db_id,on_line_flag from db_info where db_id < 10;", + "select db_id,on_line_flag from db_info where db_id >= 10;" + + ], + "jdbcUrl": [ + "jdbc:Doris://127.0.0.1:9030/database" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": false, + "encoding": "UTF-8" + } + } + } + ] + } +} +``` + +### 3.2 参数说明 + +* **jdbcUrl** + + * + 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,DorisReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,DorisReader报错。 + 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:数据源的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:数据源指定用户名的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * + 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,DorisReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
+ + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 + + 支持列裁剪,即列可以挑选部分列进行导出。 + + 支持列换序,即列可以不按照表schema信息进行导出。 + + 支持常量配置,用户需要按照Doris SQL语法格式: + ["id", "\`table\`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] + id为普通列名,\`table\`为包含保留字的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 + + * 必选:是
+ + * 默认值:无
+ +* **splitPk** + + * 描述:DorisReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 + + 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 + + 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,DorisReader将报错! + + 如果splitPk不填写,包括不提供splitPk或者splitPk值为空,DataX视作使用单通道同步该表数据。 + + * 必选:否
+ + * 默认值:空
+ +* **where** + + * 描述:筛选条件,DorisReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > + $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
+ + where条件可以有效地进行业务增量同步。如果不填写where语句,包括不提供where的key或者value,DataX均视作同步全量数据。 + + * 必选:否
+ + * 默认值:无
+ +* **querySql** + + * + 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select + a,b from table_a join table_b on table_a.id = table_b.id
+ + `当用户配置querySql时,DorisReader直接忽略table、column、where条件的配置`,querySql优先级大于table、column、where选项。 + + * 必选:否
+ + * 默认值:无
+ +### 3.3 类型转换 + +目前DorisReader支持大部分Doris类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出DorisReaderr针对Doris类型转换列表: + +| DataX 内部类型| doris 数据类型 | +| -------- |-------------------------------------------------------| +| Long | int, tinyint, smallint, int, bigint,Largint | +| Double | float, double, decimal | +| String | varchar, char, text, string, map, json, array, struct | +| Date | date, datetime | +| Boolean | Boolean | + +请注意: + +* `tinyint(1) DataX视作为整形`。 + + + diff --git a/dorisreader/pom.xml b/dorisreader/pom.xml new file mode 100755 index 00000000..15a025b6 --- /dev/null +++ b/dorisreader/pom.xml @@ -0,0 +1,81 @@ + + + 4.0.0 + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + dorisreader + dorisreader + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + mysql + mysql-connector-java + ${mysql.driver.version} + + + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/dorisreader/src/main/assembly/package.xml b/dorisreader/src/main/assembly/package.xml new file mode 100755 index 00000000..724613f9 --- /dev/null +++ b/dorisreader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/dorisreader + + + target/ + + dorisreader-0.0.1-SNAPSHOT.jar + + plugin/reader/dorisreader + + + + + + false + plugin/reader/dorisreader/libs + runtime + + + diff --git a/dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReader.java b/dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReader.java new file mode 100755 index 00000000..56a44316 --- /dev/null +++ b/dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReader.java @@ -0,0 +1,94 @@ +package com.alibaba.datax.plugin.reader.dorisreader; + +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.reader.Constant; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + +public class DorisReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.Doris; + + public static class Job extends Reader.Job { + private static final Logger LOG = LoggerFactory + .getLogger(Job.class); + + private Configuration originalConfig = null; + private CommonRdbmsReader.Job commonRdbmsReaderJob; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + Integer fetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE,Integer.MIN_VALUE); + this.originalConfig.set(Constant.FETCH_SIZE, fetchSize); + + this.commonRdbmsReaderJob = new CommonRdbmsReader.Job(DATABASE_TYPE); + this.commonRdbmsReaderJob.init(this.originalConfig); + } + + @Override + public void preCheck(){ + init(); + this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); + + } + + @Override + public List split(int adviceNumber) { + return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); + + } + + @Override + public void post() { + this.commonRdbmsReaderJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderJob.destroy(this.originalConfig); + } + + } + + public static class Task extends Reader.Task { + + private Configuration readerSliceConfig; + private CommonRdbmsReader.Task commonRdbmsReaderTask; + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderTask = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderTask.init(this.readerSliceConfig); + + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); + + this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, + super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderTask.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); + } + + } + +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriterErrorCode.java b/dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReaderErrorCode.java old mode 100644 new mode 100755 similarity index 50% rename from elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriterErrorCode.java rename to dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReaderErrorCode.java index 59dcbd0a..f9a8c449 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriterErrorCode.java +++ b/dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReaderErrorCode.java @@ -1,20 +1,14 @@ -package com.alibaba.datax.plugin.writer.elasticsearchwriter; +package com.alibaba.datax.plugin.reader.dorisreader; import com.alibaba.datax.common.spi.ErrorCode; -public enum ESWriterErrorCode implements ErrorCode { - BAD_CONFIG_VALUE("ESWriter-00", "您配置的值不合法."), - ES_INDEX_DELETE("ESWriter-01", "删除index错误."), - ES_INDEX_CREATE("ESWriter-02", "创建index错误."), - ES_MAPPINGS("ESWriter-03", "mappings错误."), - ES_INDEX_INSERT("ESWriter-04", "插入数据错误."), - ES_ALIAS_MODIFY("ESWriter-05", "别名修改错误."), +public enum DorisReaderErrorCode implements ErrorCode { ; private final String code; private final String description; - ESWriterErrorCode(String code, String description) { + private DorisReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @@ -34,4 +28,4 @@ public enum ESWriterErrorCode implements ErrorCode { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } -} \ No newline at end of file +} diff --git a/dorisreader/src/main/resources/plugin.json b/dorisreader/src/main/resources/plugin.json new file mode 100755 index 00000000..981d1af8 --- /dev/null +++ b/dorisreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "dorisreader", + "class": "com.alibaba.datax.plugin.reader.dorisreader.DorisReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/dorisreader/src/main/resources/plugin_job_template.json b/dorisreader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..2e3d9fa8 --- /dev/null +++ b/dorisreader/src/main/resources/plugin_job_template.json @@ -0,0 +1,15 @@ +{ + "name": "dorisreader", + "parameter": { + "username": "", + "password": "", + "column": [], + "connection": [ + { + "jdbcUrl": [], + "table": [] + } + ], + "where": "" + } +} \ No newline at end of file diff --git a/doriswriter/doc/doriswriter.md b/doriswriter/doc/doriswriter.md new file mode 100644 index 00000000..58a688b8 --- /dev/null +++ b/doriswriter/doc/doriswriter.md @@ -0,0 +1,181 @@ +# DorisWriter 插件文档 + +## 1 快速介绍 +DorisWriter支持将大批量数据写入Doris中。 + +## 2 实现原理 +DorisWriter 通过Doris原生支持Stream load方式导入数据, DorisWriter会将`reader`读取的数据进行缓存在内存中,拼接成Json文本,然后批量导入至Doris。 + +## 3 功能说明 + +### 3.1 配置样例 + +这里是一份从Stream读取数据后导入至Doris的配置文件。 + +``` +{ + "job": { + "content": [ + { + "reader": { + "name": "mysqlreader", + "parameter": { + "column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"], + "connection": [ + { + "jdbcUrl": ["jdbc:mysql://localhost:3306/demo"], + "table": ["employees_1"] + } + ], + "username": "root", + "password": "xxxxx", + "where": "" + } + }, + "writer": { + "name": "doriswriter", + "parameter": { + "loadUrl": ["172.16.0.13:8030"], + "loadProps": { + }, + "column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"], + "username": "root", + "password": "xxxxxx", + "postSql": ["select count(1) from all_employees_info"], + "preSql": [], + "flushInterval":30000, + "connection": [ + { + "jdbcUrl": "jdbc:mysql://172.16.0.13:9030/demo", + "selectedDatabase": "demo", + "table": ["all_employees_info"] + } + ], + "loadProps": { + "format": "json", + "strip_outer_array": true + } + } + } + } + ], + "setting": { + "speed": { + "channel": "1" + } + } + } +} +``` + +### 3.2 参数说明 + +* **jdbcUrl** + + - 描述:Doris 的 JDBC 连接串,用户执行 preSql 或 postSQL。 + - 必选:是 + - 默认值:无 + +* **loadUrl** + + - 描述:作为 Stream Load 的连接目标。格式为 "ip:port"。其中 IP 是 FE 节点 IP,port 是 FE 节点的 http_port。可以填写多个,多个之间使用英文状态的分号隔开:`;`,doriswriter 将以轮询的方式访问。 + - 必选:是 + - 默认值:无 + +* **username** + + - 描述:访问Doris数据库的用户名 + - 必选:是 + - 默认值:无 + +* **password** + + - 描述:访问Doris数据库的密码 + - 必选:否 + - 默认值:空 + +* **connection.selectedDatabase** + - 描述:需要写入的Doris数据库名称。 + - 必选:是 + - 默认值:无 + +* **connection.table** + - 描述:需要写入的Doris表名称。 + - 必选:是 + - 默认值:无 + +* **column** + + - 描述:目的表**需要写入数据**的字段,这些字段将作为生成的 Json 数据的字段名。字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 + - 必选:是 + - 默认值:否 + +* **preSql** + + - 描述:写入数据到目的表前,会先执行这里的标准语句。 + - 必选:否 + - 默认值:无 + +* **postSql** + + - 描述:写入数据到目的表后,会执行这里的标准语句。 + - 必选:否 + - 默认值:无 + + +* **maxBatchRows** + + - 描述:每批次导入数据的最大行数。和 **batchSize** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 + - 必选:否 + - 默认值:500000 + +* **batchSize** + + - 描述:每批次导入数据的最大数据量。和 **maxBatchRows** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 + - 必选:否 + - 默认值:104857600 + +* **maxRetries** + + - 描述:每批次导入数据失败后的重试次数。 + - 必选:否 + - 默认值:0 + +* **labelPrefix** + + - 描述:每批次导入任务的 label 前缀。最终的 label 将有 `labelPrefix + UUID` 组成全局唯一的 label,确保数据不会重复导入 + - 必选:否 + - 默认值:`datax_doris_writer_` + +* **loadProps** + + - 描述:StreamLoad 的请求参数,详情参照StreamLoad介绍页面。[Stream load - Apache Doris](https://doris.apache.org/zh-CN/docs/data-operate/import/import-way/stream-load-manual) + + 这里包括导入的数据格式:format等,导入数据格式默认我们使用csv,支持JSON,具体可以参照下面类型转换部分,也可以参照上面Stream load 官方信息 + + - 必选:否 + + - 默认值:无 + +### 类型转换 + +默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行StreamLoad导入操作。 + +默认是csv格式导入,如需更改列分隔符, 则正确配置 `loadProps` 即可: + +```json +"loadProps": { + "column_separator": "\\x01", + "line_delimiter": "\\x02" +} +``` + +如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: +```json +"loadProps": { + "format": "json", + "strip_outer_array": true +} +``` + +更多信息请参照 Doris 官网:[Stream load - Apache Doris](https://doris.apache.org/zh-CN/docs/data-operate/import/import-way/stream-load-manual) \ No newline at end of file diff --git a/doriswriter/doc/mysql2doris.json b/doriswriter/doc/mysql2doris.json new file mode 100644 index 00000000..5810d6db --- /dev/null +++ b/doriswriter/doc/mysql2doris.json @@ -0,0 +1,48 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "mysqlreader", + "parameter": { + "column": ["k1", "k2", "k3"], + "connection": [ + { + "jdbcUrl": ["jdbc:mysql://192.168.10.10:3306/db1"], + "table": ["t1"] + } + ], + "username": "root", + "password": "", + "where": "" + } + }, + "writer": { + "name": "doriswriter", + "parameter": { + "loadUrl": ["192.168.1.1:8030"], + "loadProps": {}, + "database": "db1", + "column": ["k1", "k2", "k3"], + "username": "root", + "password": "", + "postSql": [], + "preSql": [], + "connection": [ + { + "jdbcUrl":"jdbc:mysql://192.168.1.1:9030/", + "table":["xxx"], + "selectedDatabase":"xxxx" + } + ] + } + } + } + ], + "setting": { + "speed": { + "channel": "1" + } + } + } +} diff --git a/doriswriter/pom.xml b/doriswriter/pom.xml new file mode 100644 index 00000000..aa1e6ff0 --- /dev/null +++ b/doriswriter/pom.xml @@ -0,0 +1,99 @@ + + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + doriswriter + doriswriter + jar + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + mysql + mysql-connector-java + ${mysql.driver.version} + + + org.apache.httpcomponents + httpclient + 4.5.13 + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/doriswriter/src/main/assembly/package.xml b/doriswriter/src/main/assembly/package.xml new file mode 100644 index 00000000..71596332 --- /dev/null +++ b/doriswriter/src/main/assembly/package.xml @@ -0,0 +1,52 @@ + + + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/doriswriter + + + target/ + + doriswriter-0.0.1-SNAPSHOT.jar + + plugin/writer/doriswriter + + + + + false + plugin/writer/doriswriter/libs + runtime + + + diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DelimiterParser.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DelimiterParser.java new file mode 100644 index 00000000..e84bd7dd --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DelimiterParser.java @@ -0,0 +1,54 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.google.common.base.Strings; + +import java.io.StringWriter; + +public class DelimiterParser { + + private static final String HEX_STRING = "0123456789ABCDEF"; + + public static String parse(String sp, String dSp) throws RuntimeException { + if ( Strings.isNullOrEmpty(sp)) { + return dSp; + } + if (!sp.toUpperCase().startsWith("\\X")) { + return sp; + } + String hexStr = sp.substring(2); + // check hex str + if (hexStr.isEmpty()) { + throw new RuntimeException("Failed to parse delimiter: `Hex str is empty`"); + } + if (hexStr.length() % 2 != 0) { + throw new RuntimeException("Failed to parse delimiter: `Hex str length error`"); + } + for (char hexChar : hexStr.toUpperCase().toCharArray()) { + if (HEX_STRING.indexOf(hexChar) == -1) { + throw new RuntimeException("Failed to parse delimiter: `Hex str format error`"); + } + } + // transform to separator + StringWriter writer = new StringWriter(); + for (byte b : hexStrToBytes(hexStr)) { + writer.append((char) b); + } + return writer.toString(); + } + + private static byte[] hexStrToBytes(String hexStr) { + String upperHexStr = hexStr.toUpperCase(); + int length = upperHexStr.length() / 2; + char[] hexChars = upperHexStr.toCharArray(); + byte[] bytes = new byte[length]; + for (int i = 0; i < length; i++) { + int pos = i * 2; + bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); + } + return bytes; + } + + private static byte charToByte(char c) { + return (byte) HEX_STRING.indexOf(c); + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisBaseCodec.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisBaseCodec.java new file mode 100644 index 00000000..ee7ded56 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisBaseCodec.java @@ -0,0 +1,23 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.element.Column; + +public class DorisBaseCodec { + protected String convertionField( Column col) { + if (null == col.getRawData() || Column.Type.NULL == col.getType()) { + return null; + } + if ( Column.Type.BOOL == col.getType()) { + return String.valueOf(col.asLong()); + } + if ( Column.Type.BYTES == col.getType()) { + byte[] bts = (byte[])col.getRawData(); + long value = 0; + for (int i = 0; i < bts.length; i++) { + value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); + } + return String.valueOf(value); + } + return col.asString(); + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodec.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodec.java new file mode 100644 index 00000000..a2437a1c --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodec.java @@ -0,0 +1,10 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.element.Record; + +import java.io.Serializable; + +public interface DorisCodec extends Serializable { + + String codec( Record row); +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodecFactory.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodecFactory.java new file mode 100644 index 00000000..22c4b409 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodecFactory.java @@ -0,0 +1,19 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import java.util.Map; + +public class DorisCodecFactory { + public DorisCodecFactory (){ + + } + public static DorisCodec createCodec( Keys writerOptions) { + if ( Keys.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { + Map props = writerOptions.getLoadProps(); + return new DorisCsvCodec (null == props || !props.containsKey("column_separator") ? null : String.valueOf(props.get("column_separator"))); + } + if ( Keys.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { + return new DorisJsonCodec (writerOptions.getColumns()); + } + throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java new file mode 100644 index 00000000..518aa304 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java @@ -0,0 +1,27 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.element.Record; + +public class DorisCsvCodec extends DorisBaseCodec implements DorisCodec { + + private static final long serialVersionUID = 1L; + + private final String columnSeparator; + + public DorisCsvCodec ( String sp) { + this.columnSeparator = DelimiterParser.parse(sp, "\t"); + } + + @Override + public String codec( Record row) { + StringBuilder sb = new StringBuilder(); + for (int i = 0; i < row.getColumnNumber(); i++) { + String value = convertionField(row.getColumn(i)); + sb.append(null == value ? "\\N" : value); + if (i < row.getColumnNumber() - 1) { + sb.append(columnSeparator); + } + } + return sb.toString(); + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisJsonCodec.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisJsonCodec.java new file mode 100644 index 00000000..68abd9eb --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisJsonCodec.java @@ -0,0 +1,33 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.fastjson2.JSON; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +public class DorisJsonCodec extends DorisBaseCodec implements DorisCodec { + + private static final long serialVersionUID = 1L; + + private final List fieldNames; + + public DorisJsonCodec ( List fieldNames) { + this.fieldNames = fieldNames; + } + + @Override + public String codec( Record row) { + if (null == fieldNames) { + return ""; + } + Map rowMap = new HashMap<> (fieldNames.size()); + int idx = 0; + for (String fieldName : fieldNames) { + rowMap.put(fieldName, convertionField(row.getColumn(idx))); + idx++; + } + return JSON.toJSONString(rowMap); + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java new file mode 100644 index 00000000..e1f6e0ee --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java @@ -0,0 +1,235 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.fastjson2.JSON; +import org.apache.commons.codec.binary.Base64; +import org.apache.http.HttpEntity; +import org.apache.http.HttpHeaders; +import org.apache.http.client.config.RequestConfig; +import org.apache.http.client.methods.CloseableHttpResponse; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.client.methods.HttpPut; +import org.apache.http.entity.ByteArrayEntity; +import org.apache.http.impl.client.CloseableHttpClient; +import org.apache.http.impl.client.DefaultRedirectStrategy; +import org.apache.http.impl.client.HttpClientBuilder; +import org.apache.http.impl.client.HttpClients; +import org.apache.http.util.EntityUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.ByteBuffer; +import java.nio.charset.StandardCharsets; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +public class DorisStreamLoadObserver { + private static final Logger LOG = LoggerFactory.getLogger(DorisStreamLoadObserver.class); + + private Keys options; + + private long pos; + private static final String RESULT_FAILED = "Fail"; + private static final String RESULT_LABEL_EXISTED = "Label Already Exists"; + private static final String LAEBL_STATE_VISIBLE = "VISIBLE"; + private static final String LAEBL_STATE_COMMITTED = "COMMITTED"; + private static final String RESULT_LABEL_PREPARE = "PREPARE"; + private static final String RESULT_LABEL_ABORTED = "ABORTED"; + private static final String RESULT_LABEL_UNKNOWN = "UNKNOWN"; + + + public DorisStreamLoadObserver ( Keys options){ + this.options = options; + } + + public void streamLoad(WriterTuple data) throws Exception { + String host = getLoadHost(); + if(host == null){ + throw new IOException ("load_url cannot be empty, or the host cannot connect.Please check your configuration."); + } + String loadUrl = new StringBuilder(host) + .append("/api/") + .append(options.getDatabase()) + .append("/") + .append(options.getTable()) + .append("/_stream_load") + .toString(); + LOG.info("Start to join batch data: rows[{}] bytes[{}] label[{}].", data.getRows().size(), data.getBytes(), data.getLabel()); + Map loadResult = put(loadUrl, data.getLabel(), addRows(data.getRows(), data.getBytes().intValue())); + LOG.info("StreamLoad response :{}",JSON.toJSONString(loadResult)); + final String keyStatus = "Status"; + if (null == loadResult || !loadResult.containsKey(keyStatus)) { + throw new IOException("Unable to flush data to Doris: unknown result status."); + } + LOG.debug("StreamLoad response:{}",JSON.toJSONString(loadResult)); + if (RESULT_FAILED.equals(loadResult.get(keyStatus))) { + throw new IOException( + new StringBuilder("Failed to flush data to Doris.\n").append(JSON.toJSONString(loadResult)).toString() + ); + } else if (RESULT_LABEL_EXISTED.equals(loadResult.get(keyStatus))) { + LOG.debug("StreamLoad response:{}",JSON.toJSONString(loadResult)); + checkStreamLoadState(host, data.getLabel()); + } + } + + private void checkStreamLoadState(String host, String label) throws IOException { + int idx = 0; + while(true) { + try { + TimeUnit.SECONDS.sleep(Math.min(++idx, 5)); + } catch (InterruptedException ex) { + break; + } + try (CloseableHttpClient httpclient = HttpClients.createDefault()) { + HttpGet httpGet = new HttpGet(new StringBuilder(host).append("/api/").append(options.getDatabase()).append("/get_load_state?label=").append(label).toString()); + httpGet.setHeader("Authorization", getBasicAuthHeader(options.getUsername(), options.getPassword())); + httpGet.setHeader("Connection", "close"); + + try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { + HttpEntity respEntity = getHttpEntity(resp); + if (respEntity == null) { + throw new IOException(String.format("Failed to flush data to Doris, Error " + + "could not get the final state of label[%s].\n", label), null); + } + Map result = (Map)JSON.parse(EntityUtils.toString(respEntity)); + String labelState = (String)result.get("data"); + if (null == labelState) { + throw new IOException(String.format("Failed to flush data to Doris, Error " + + "could not get the final state of label[%s]. response[%s]\n", label, EntityUtils.toString(respEntity)), null); + } + LOG.info(String.format("Checking label[%s] state[%s]\n", label, labelState)); + switch(labelState) { + case LAEBL_STATE_VISIBLE: + case LAEBL_STATE_COMMITTED: + return; + case RESULT_LABEL_PREPARE: + continue; + case RESULT_LABEL_ABORTED: + throw new DorisWriterExcetion (String.format("Failed to flush data to Doris, Error " + + "label[%s] state[%s]\n", label, labelState), null, true); + case RESULT_LABEL_UNKNOWN: + default: + throw new IOException(String.format("Failed to flush data to Doris, Error " + + "label[%s] state[%s]\n", label, labelState), null); + } + } + } + } + } + + private byte[] addRows(List rows, int totalBytes) { + if (Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { + Map props = (options.getLoadProps() == null ? new HashMap<> () : options.getLoadProps()); + byte[] lineDelimiter = DelimiterParser.parse((String)props.get("line_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); + ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); + for (byte[] row : rows) { + bos.put(row); + bos.put(lineDelimiter); + } + return bos.array(); + } + + if (Keys.StreamLoadFormat.JSON.equals(options.getStreamLoadFormat())) { + ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); + bos.put("[".getBytes(StandardCharsets.UTF_8)); + byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); + boolean isFirstElement = true; + for (byte[] row : rows) { + if (!isFirstElement) { + bos.put(jsonDelimiter); + } + bos.put(row); + isFirstElement = false; + } + bos.put("]".getBytes(StandardCharsets.UTF_8)); + return bos.array(); + } + throw new RuntimeException("Failed to join rows data, unsupported `format` from stream load properties:"); + } + private Map put(String loadUrl, String label, byte[] data) throws IOException { + LOG.info(String.format("Executing stream load to: '%s', size: '%s'", loadUrl, data.length)); + final HttpClientBuilder httpClientBuilder = HttpClients.custom() + .setRedirectStrategy(new DefaultRedirectStrategy () { + @Override + protected boolean isRedirectable(String method) { + return true; + } + }); + try ( CloseableHttpClient httpclient = httpClientBuilder.build()) { + HttpPut httpPut = new HttpPut(loadUrl); + httpPut.removeHeaders(HttpHeaders.CONTENT_LENGTH); + httpPut.removeHeaders(HttpHeaders.TRANSFER_ENCODING); + List cols = options.getColumns(); + if (null != cols && !cols.isEmpty() && Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { + httpPut.setHeader("columns", String.join(",", cols.stream().map(f -> String.format("`%s`", f)).collect(Collectors.toList()))); + } + if (null != options.getLoadProps()) { + for (Map.Entry entry : options.getLoadProps().entrySet()) { + httpPut.setHeader(entry.getKey(), String.valueOf(entry.getValue())); + } + } + httpPut.setHeader("Expect", "100-continue"); + httpPut.setHeader("label", label); + httpPut.setHeader("two_phase_commit", "false"); + httpPut.setHeader("Authorization", getBasicAuthHeader(options.getUsername(), options.getPassword())); + httpPut.setEntity(new ByteArrayEntity(data)); + httpPut.setConfig(RequestConfig.custom().setRedirectsEnabled(true).build()); + try ( CloseableHttpResponse resp = httpclient.execute(httpPut)) { + HttpEntity respEntity = getHttpEntity(resp); + if (respEntity == null) + return null; + return (Map)JSON.parse(EntityUtils.toString(respEntity)); + } + } + } + + private String getBasicAuthHeader(String username, String password) { + String auth = username + ":" + password; + byte[] encodedAuth = Base64.encodeBase64(auth.getBytes(StandardCharsets.UTF_8)); + return new StringBuilder("Basic ").append(new String(encodedAuth)).toString(); + } + + private HttpEntity getHttpEntity(CloseableHttpResponse resp) { + int code = resp.getStatusLine().getStatusCode(); + if (200 != code) { + LOG.warn("Request failed with code:{}", code); + return null; + } + HttpEntity respEntity = resp.getEntity(); + if (null == respEntity) { + LOG.warn("Request failed with empty response."); + return null; + } + return respEntity; + } + + private String getLoadHost() { + List hostList = options.getLoadUrlList(); + Collections.shuffle(hostList); + String host = new StringBuilder("http://").append(hostList.get((0))).toString(); + if (checkConnection(host)){ + return host; + } + return null; + } + + private boolean checkConnection(String host) { + try { + URL url = new URL(host); + HttpURLConnection co = (HttpURLConnection) url.openConnection(); + co.setConnectTimeout(5000); + co.connect(); + co.disconnect(); + return true; + } catch (Exception e1) { + e1.printStackTrace(); + return false; + } + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisUtil.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisUtil.java new file mode 100644 index 00000000..5f5a6f34 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisUtil.java @@ -0,0 +1,105 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.util.RdbmsException; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.druid.sql.parser.ParserException; +import com.google.common.base.Strings; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.ResultSet; +import java.sql.Statement; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +/** + * jdbc util + */ +public class DorisUtil { + private static final Logger LOG = LoggerFactory.getLogger(DorisUtil.class); + + private DorisUtil() {} + + public static List getDorisTableColumns( Connection conn, String databaseName, String tableName) { + String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); + List columns = new ArrayList<> (); + ResultSet rs = null; + try { + rs = DBUtil.query(conn, currentSql); + while (DBUtil.asyncResultSetNext(rs)) { + String colName = rs.getString("COLUMN_NAME"); + columns.add(colName); + } + return columns; + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(rs, null, null); + } + } + + public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { + if (null == preOrPostSqls) { + return Collections.emptyList(); + } + List renderedSqls = new ArrayList<>(); + for (String sql : preOrPostSqls) { + if (! Strings.isNullOrEmpty(sql)) { + renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); + } + } + return renderedSqls; + } + + public static void executeSqls(Connection conn, List sqls) { + Statement stmt = null; + String currentSql = null; + try { + stmt = conn.createStatement(); + for (String sql : sqls) { + currentSql = sql; + DBUtil.executeSqlWithoutResultSet(stmt, sql); + } + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(null, stmt, null); + } + } + + public static void preCheckPrePareSQL( Keys options) { + String table = options.getTable(); + List preSqls = options.getPreSqlList(); + List renderedPreSqls = DorisUtil.renderPreOrPostSqls(preSqls, table); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); + for (String sql : renderedPreSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch ( ParserException e) { + throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } + + public static void preCheckPostSQL( Keys options) { + String table = options.getTable(); + List postSqls = options.getPostSqlList(); + List renderedPostSqls = DorisUtil.renderPreOrPostSqls(postSqls, table); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); + for(String sql : renderedPostSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch (ParserException e){ + throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriter.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriter.java new file mode 100644 index 00000000..b44d5440 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriter.java @@ -0,0 +1,164 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.util.ArrayList; +import java.util.List; + +/** + * doris data writer + */ +public class DorisWriter extends Writer { + + public static class Job extends Writer.Job { + + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + private Configuration originalConfig = null; + private Keys options; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + options = new Keys (super.getPluginJobConf()); + options.doPretreatment(); + } + + @Override + public void preCheck(){ + this.init(); + DorisUtil.preCheckPrePareSQL(options); + DorisUtil.preCheckPostSQL(options); + } + + @Override + public void prepare() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPreSqls = DorisUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); + DorisUtil.executeSqls(conn, renderedPreSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public List split(int mandatoryNumber) { + List configurations = new ArrayList<>(mandatoryNumber); + for (int i = 0; i < mandatoryNumber; i++) { + configurations.add(originalConfig); + } + return configurations; + } + + @Override + public void post() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPostSqls = DorisUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Start to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); + DorisUtil.executeSqls(conn, renderedPostSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public void destroy() { + } + + } + + public static class Task extends Writer.Task { + private DorisWriterManager writerManager; + private Keys options; + private DorisCodec rowCodec; + + @Override + public void init() { + options = new Keys (super.getPluginJobConf()); + if (options.isWildcardColumn()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); + List columns = DorisUtil.getDorisTableColumns(conn, options.getDatabase(), options.getTable()); + options.setInfoCchemaColumns(columns); + } + writerManager = new DorisWriterManager(options); + rowCodec = DorisCodecFactory.createCodec(options); + } + + @Override + public void prepare() { + } + + public void startWrite(RecordReceiver recordReceiver) { + try { + Record record; + while ((record = recordReceiver.getFromReader()) != null) { + if (record.getColumnNumber() != options.getColumns().size()) { + throw DataXException + .asDataXException( + DBUtilErrorCode.CONF_ERROR, + String.format( + "There is an error in the column configuration information. " + + "This is because you have configured a task where the number of fields to be read from the source:%s " + + "is not equal to the number of fields to be written to the destination table:%s. " + + "Please check your configuration and make changes.", + record.getColumnNumber(), + options.getColumns().size())); + } + writerManager.writeRecord(rowCodec.codec(record)); + } + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void post() { + try { + writerManager.close(); + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void destroy() {} + + @Override + public boolean supportFailOver(){ + return false; + } + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterExcetion.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterExcetion.java new file mode 100644 index 00000000..7797d79f --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterExcetion.java @@ -0,0 +1,29 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import java.io.IOException; +import java.util.Map; + +public class DorisWriterExcetion extends IOException { + + private final Map response; + private boolean reCreateLabel; + + public DorisWriterExcetion ( String message, Map response) { + super(message); + this.response = response; + } + + public DorisWriterExcetion ( String message, Map response, boolean reCreateLabel) { + super(message); + this.response = response; + this.reCreateLabel = reCreateLabel; + } + + public Map getFailedResponse() { + return response; + } + + public boolean needReCreateLabel() { + return reCreateLabel; + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterManager.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterManager.java new file mode 100644 index 00000000..f0ba6b52 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterManager.java @@ -0,0 +1,192 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.google.common.base.Strings; +import org.apache.commons.lang3.concurrent.BasicThreadFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; +import java.util.concurrent.Executors; +import java.util.concurrent.LinkedBlockingDeque; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; + +public class DorisWriterManager { + + private static final Logger LOG = LoggerFactory.getLogger(DorisWriterManager.class); + + private final DorisStreamLoadObserver visitor; + private final Keys options; + private final List buffer = new ArrayList<> (); + private int batchCount = 0; + private long batchSize = 0; + private volatile boolean closed = false; + private volatile Exception flushException; + private final LinkedBlockingDeque< WriterTuple > flushQueue; + private ScheduledExecutorService scheduler; + private ScheduledFuture scheduledFuture; + + public DorisWriterManager( Keys options) { + this.options = options; + this.visitor = new DorisStreamLoadObserver (options); + flushQueue = new LinkedBlockingDeque<>(options.getFlushQueueLength()); + this.startScheduler(); + this.startAsyncFlushing(); + } + + public void startScheduler() { + stopScheduler(); + this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("Doris-interval-flush").daemon(true).build()); + this.scheduledFuture = this.scheduler.schedule(() -> { + synchronized (DorisWriterManager.this) { + if (!closed) { + try { + String label = createBatchLabel(); + LOG.info(String.format("Doris interval Sinking triggered: label[%s].", label)); + if (batchCount == 0) { + startScheduler(); + } + flush(label, false); + } catch (Exception e) { + flushException = e; + } + } + } + }, options.getFlushInterval(), TimeUnit.MILLISECONDS); + } + + public void stopScheduler() { + if (this.scheduledFuture != null) { + scheduledFuture.cancel(false); + this.scheduler.shutdown(); + } + } + + public final synchronized void writeRecord(String record) throws IOException { + checkFlushException(); + try { + byte[] bts = record.getBytes(StandardCharsets.UTF_8); + buffer.add(bts); + batchCount++; + batchSize += bts.length; + if (batchCount >= options.getBatchRows() || batchSize >= options.getBatchSize()) { + String label = createBatchLabel(); + LOG.debug(String.format("Doris buffer Sinking triggered: rows[%d] label[%s].", batchCount, label)); + flush(label, false); + } + } catch (Exception e) { + throw new IOException("Writing records to Doris failed.", e); + } + } + + public synchronized void flush(String label, boolean waitUtilDone) throws Exception { + checkFlushException(); + if (batchCount == 0) { + if (waitUtilDone) { + waitAsyncFlushingDone(); + } + return; + } + flushQueue.put(new WriterTuple (label, batchSize, new ArrayList<>(buffer))); + if (waitUtilDone) { + // wait the last flush + waitAsyncFlushingDone(); + } + buffer.clear(); + batchCount = 0; + batchSize = 0; + } + + public synchronized void close() { + if (!closed) { + closed = true; + try { + String label = createBatchLabel(); + if (batchCount > 0) LOG.debug(String.format("Doris Sink is about to close: label[%s].", label)); + flush(label, true); + } catch (Exception e) { + throw new RuntimeException("Writing records to Doris failed.", e); + } + } + checkFlushException(); + } + + public String createBatchLabel() { + StringBuilder sb = new StringBuilder(); + if (! Strings.isNullOrEmpty(options.getLabelPrefix())) { + sb.append(options.getLabelPrefix()); + } + return sb.append(UUID.randomUUID().toString()) + .toString(); + } + + private void startAsyncFlushing() { + // start flush thread + Thread flushThread = new Thread(new Runnable(){ + public void run() { + while(true) { + try { + asyncFlush(); + } catch (Exception e) { + flushException = e; + } + } + } + }); + flushThread.setDaemon(true); + flushThread.start(); + } + + private void waitAsyncFlushingDone() throws InterruptedException { + // wait previous flushings + for (int i = 0; i <= options.getFlushQueueLength(); i++) { + flushQueue.put(new WriterTuple ("", 0l, null)); + } + checkFlushException(); + } + + private void asyncFlush() throws Exception { + WriterTuple flushData = flushQueue.take(); + if (Strings.isNullOrEmpty(flushData.getLabel())) { + return; + } + stopScheduler(); + LOG.debug(String.format("Async stream load: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); + for (int i = 0; i <= options.getMaxRetries(); i++) { + try { + // flush to Doris with stream load + visitor.streamLoad(flushData); + LOG.info(String.format("Async stream load finished: label[%s].", flushData.getLabel())); + startScheduler(); + break; + } catch (Exception e) { + LOG.warn("Failed to flush batch data to Doris, retry times = {}", i, e); + if (i >= options.getMaxRetries()) { + throw new IOException(e); + } + if (e instanceof DorisWriterExcetion && (( DorisWriterExcetion )e).needReCreateLabel()) { + String newLabel = createBatchLabel(); + LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); + flushData.setLabel(newLabel); + } + try { + Thread.sleep(1000l * Math.min(i + 1, 10)); + } catch (InterruptedException ex) { + Thread.currentThread().interrupt(); + throw new IOException("Unable to flush, interrupted while doing another attempt", e); + } + } + } + } + + private void checkFlushException() { + if (flushException != null) { + throw new RuntimeException("Writing records to Doris failed.", flushException); + } + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/Keys.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/Keys.java new file mode 100644 index 00000000..e460e76b --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/Keys.java @@ -0,0 +1,177 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; + +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +public class Keys implements Serializable { + + private static final long serialVersionUID = 1l; + private static final int MAX_RETRIES = 3; + private static final int BATCH_ROWS = 500000; + private static final long DEFAULT_FLUSH_INTERVAL = 30000; + + private static final String LOAD_PROPS_FORMAT = "format"; + public enum StreamLoadFormat { + CSV, JSON; + } + + private static final String USERNAME = "username"; + private static final String PASSWORD = "password"; + private static final String DATABASE = "connection[0].selectedDatabase"; + private static final String TABLE = "connection[0].table[0]"; + private static final String COLUMN = "column"; + private static final String PRE_SQL = "preSql"; + private static final String POST_SQL = "postSql"; + private static final String JDBC_URL = "connection[0].jdbcUrl"; + private static final String LABEL_PREFIX = "labelPrefix"; + private static final String MAX_BATCH_ROWS = "maxBatchRows"; + private static final String MAX_BATCH_SIZE = "batchSize"; + private static final String FLUSH_INTERVAL = "flushInterval"; + private static final String LOAD_URL = "loadUrl"; + private static final String FLUSH_QUEUE_LENGTH = "flushQueueLength"; + private static final String LOAD_PROPS = "loadProps"; + + private static final String DEFAULT_LABEL_PREFIX = "datax_doris_writer_"; + + private static final long DEFAULT_MAX_BATCH_SIZE = 90 * 1024 * 1024; //default 90M + + private final Configuration options; + + private List infoSchemaColumns; + private List userSetColumns; + private boolean isWildcardColumn; + + public Keys ( Configuration options) { + this.options = options; + this.userSetColumns = options.getList(COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); + if (1 == options.getList(COLUMN, String.class).size() && "*".trim().equals(options.getList(COLUMN, String.class).get(0))) { + this.isWildcardColumn = true; + } + } + + public void doPretreatment() { + validateRequired(); + validateStreamLoadUrl(); + } + + public String getJdbcUrl() { + return options.getString(JDBC_URL); + } + + public String getDatabase() { + return options.getString(DATABASE); + } + + public String getTable() { + return options.getString(TABLE); + } + + public String getUsername() { + return options.getString(USERNAME); + } + + public String getPassword() { + return options.getString(PASSWORD); + } + + public String getLabelPrefix() { + String label = options.getString(LABEL_PREFIX); + return null == label ? DEFAULT_LABEL_PREFIX : label; + } + + public List getLoadUrlList() { + return options.getList(LOAD_URL, String.class); + } + + public List getColumns() { + if (isWildcardColumn) { + return this.infoSchemaColumns; + } + return this.userSetColumns; + } + + public boolean isWildcardColumn() { + return this.isWildcardColumn; + } + + public void setInfoCchemaColumns(List cols) { + this.infoSchemaColumns = cols; + } + + public List getPreSqlList() { + return options.getList(PRE_SQL, String.class); + } + + public List getPostSqlList() { + return options.getList(POST_SQL, String.class); + } + + public Map getLoadProps() { + return options.getMap(LOAD_PROPS); + } + + public int getMaxRetries() { + return MAX_RETRIES; + } + + public int getBatchRows() { + Integer rows = options.getInt(MAX_BATCH_ROWS); + return null == rows ? BATCH_ROWS : rows; + } + + public long getBatchSize() { + Long size = options.getLong(MAX_BATCH_SIZE); + return null == size ? DEFAULT_MAX_BATCH_SIZE : size; + } + + public long getFlushInterval() { + Long interval = options.getLong(FLUSH_INTERVAL); + return null == interval ? DEFAULT_FLUSH_INTERVAL : interval; + } + + public int getFlushQueueLength() { + Integer len = options.getInt(FLUSH_QUEUE_LENGTH); + return null == len ? 1 : len; + } + + public StreamLoadFormat getStreamLoadFormat() { + Map loadProps = getLoadProps(); + if (null == loadProps) { + return StreamLoadFormat.CSV; + } + if (loadProps.containsKey(LOAD_PROPS_FORMAT) + && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(LOAD_PROPS_FORMAT)))) { + return StreamLoadFormat.JSON; + } + return StreamLoadFormat.CSV; + } + + private void validateStreamLoadUrl() { + List urlList = getLoadUrlList(); + for (String host : urlList) { + if (host.split(":").length < 2) { + throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, + "The format of loadUrl is not correct, please enter:[`fe_ip:fe_http_ip;fe_ip:fe_http_ip`]."); + } + } + } + + private void validateRequired() { + final String[] requiredOptionKeys = new String[]{ + USERNAME, + DATABASE, + TABLE, + COLUMN, + LOAD_URL + }; + for (String optionKey : requiredOptionKeys) { + options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); + } + } +} diff --git a/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/WriterTuple.java b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/WriterTuple.java new file mode 100644 index 00000000..32e0b341 --- /dev/null +++ b/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/WriterTuple.java @@ -0,0 +1,20 @@ +package com.alibaba.datax.plugin.writer.doriswriter; + +import java.util.List; + +public class WriterTuple { + private String label; + private Long bytes; + private List rows; + + public WriterTuple ( String label, Long bytes, List rows){ + this.label = label; + this.rows = rows; + this.bytes = bytes; + } + + public String getLabel() { return label; } + public void setLabel(String label) { this.label = label; } + public Long getBytes() { return bytes; } + public List getRows() { return rows; } +} diff --git a/doriswriter/src/main/resources/plugin.json b/doriswriter/src/main/resources/plugin.json new file mode 100644 index 00000000..69dc31a2 --- /dev/null +++ b/doriswriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "doriswriter", + "class": "com.alibaba.datax.plugin.writer.doriswriter.DorisWriter", + "description": "apache doris writer plugin", + "developer": "apche doris" +} diff --git a/doriswriter/src/main/resources/plugin_job_template.json b/doriswriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..0187e539 --- /dev/null +++ b/doriswriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,20 @@ +{ + "name": "doriswriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "preSql": [], + "postSql": [], + "beLoadUrl": [], + "loadUrl": [], + "loadProps": {}, + "connection": [ + { + "jdbcUrl": "", + "selectedDatabase": "", + "table": [] + } + ] + } +} \ No newline at end of file diff --git a/elasticsearchwriter/pom.xml b/elasticsearchwriter/pom.xml index a60dbd88..8699c6e5 100644 --- a/elasticsearchwriter/pom.xml +++ b/elasticsearchwriter/pom.xml @@ -35,12 +35,12 @@ io.searchbox jest-common - 2.4.0 + 6.3.1 io.searchbox jest - 2.4.0 + 6.3.1 joda-time diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESClient.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESClient.java deleted file mode 100644 index 34bb7e54..00000000 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESClient.java +++ /dev/null @@ -1,236 +0,0 @@ -package com.alibaba.datax.plugin.writer.elasticsearchwriter; - -import com.google.gson.Gson; -import com.google.gson.JsonElement; -import com.google.gson.JsonObject; -import com.google.gson.JsonParser; -import io.searchbox.action.Action; -import io.searchbox.client.JestClient; -import io.searchbox.client.JestClientFactory; -import io.searchbox.client.JestResult; -import io.searchbox.client.config.HttpClientConfig; -import io.searchbox.client.config.HttpClientConfig.Builder; -import io.searchbox.core.Bulk; -import io.searchbox.indices.CreateIndex; -import io.searchbox.indices.DeleteIndex; -import io.searchbox.indices.IndicesExists; -import io.searchbox.indices.aliases.*; -import io.searchbox.indices.mapping.PutMapping; -import org.apache.http.HttpHost; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.util.ArrayList; -import java.util.List; -import java.util.Map; -import java.util.concurrent.TimeUnit; - -/** - * Created by xiongfeng.bxf on 17/2/8. - */ -public class ESClient { - private static final Logger log = LoggerFactory.getLogger(ESClient.class); - - private JestClient jestClient; - - public JestClient getClient() { - return jestClient; - } - - public void createClient(String endpoint, - String user, - String passwd, - boolean multiThread, - int readTimeout, - boolean compression, - boolean discovery) { - - JestClientFactory factory = new JestClientFactory(); - Builder httpClientConfig = new HttpClientConfig - .Builder(endpoint) - .setPreemptiveAuth(new HttpHost(endpoint)) - .multiThreaded(multiThread) - .connTimeout(30000) - .readTimeout(readTimeout) - .maxTotalConnection(200) - .requestCompressionEnabled(compression) - .discoveryEnabled(discovery) - .discoveryFrequency(5l, TimeUnit.MINUTES); - - if (!("".equals(user) || "".equals(passwd))) { - httpClientConfig.defaultCredentials(user, passwd); - } - - factory.setHttpClientConfig(httpClientConfig.build()); - - jestClient = factory.getObject(); - } - - public boolean indicesExists(String indexName) throws Exception { - boolean isIndicesExists = false; - JestResult rst = jestClient.execute(new IndicesExists.Builder(indexName).build()); - if (rst.isSucceeded()) { - isIndicesExists = true; - } else { - switch (rst.getResponseCode()) { - case 404: - isIndicesExists = false; - break; - case 401: - // 无权访问 - default: - log.warn(rst.getErrorMessage()); - break; - } - } - return isIndicesExists; - } - - public boolean deleteIndex(String indexName) throws Exception { - log.info("delete index " + indexName); - if (indicesExists(indexName)) { - JestResult rst = execute(new DeleteIndex.Builder(indexName).build()); - if (!rst.isSucceeded()) { - return false; - } - } else { - log.info("index cannot found, skip delete " + indexName); - } - return true; - } - - public boolean createIndex(String indexName, String typeName, - Object mappings, String settings, boolean dynamic) throws Exception { - JestResult rst = null; - if (!indicesExists(indexName)) { - log.info("create index " + indexName); - rst = jestClient.execute( - new CreateIndex.Builder(indexName) - .settings(settings) - .setParameter("master_timeout", "5m") - .build() - ); - //index_already_exists_exception - if (!rst.isSucceeded()) { - if (getStatus(rst) == 400) { - log.info(String.format("index [%s] already exists", indexName)); - return true; - } else { - log.error(rst.getErrorMessage()); - return false; - } - } else { - log.info(String.format("create [%s] index success", indexName)); - } - } - - int idx = 0; - while (idx < 5) { - if (indicesExists(indexName)) { - break; - } - Thread.sleep(2000); - idx ++; - } - if (idx >= 5) { - return false; - } - - if (dynamic) { - log.info("ignore mappings"); - return true; - } - log.info("create mappings for " + indexName + " " + mappings); - rst = jestClient.execute(new PutMapping.Builder(indexName, typeName, mappings) - .setParameter("master_timeout", "5m").build()); - if (!rst.isSucceeded()) { - if (getStatus(rst) == 400) { - log.info(String.format("index [%s] mappings already exists", indexName)); - } else { - log.error(rst.getErrorMessage()); - return false; - } - } else { - log.info(String.format("index [%s] put mappings success", indexName)); - } - return true; - } - - public JestResult execute(Action clientRequest) throws Exception { - JestResult rst = null; - rst = jestClient.execute(clientRequest); - if (!rst.isSucceeded()) { - //log.warn(rst.getErrorMessage()); - } - return rst; - } - - public Integer getStatus(JestResult rst) { - JsonObject jsonObject = rst.getJsonObject(); - if (jsonObject.has("status")) { - return jsonObject.get("status").getAsInt(); - } - return 600; - } - - public boolean isBulkResult(JestResult rst) { - JsonObject jsonObject = rst.getJsonObject(); - return jsonObject.has("items"); - } - - - public boolean alias(String indexname, String aliasname, boolean needClean) throws IOException { - GetAliases getAliases = new GetAliases.Builder().addIndex(aliasname).build(); - AliasMapping addAliasMapping = new AddAliasMapping.Builder(indexname, aliasname).build(); - JestResult rst = jestClient.execute(getAliases); - log.info(rst.getJsonString()); - List list = new ArrayList(); - if (rst.isSucceeded()) { - JsonParser jp = new JsonParser(); - JsonObject jo = (JsonObject)jp.parse(rst.getJsonString()); - for(Map.Entry entry : jo.entrySet()){ - String tindex = entry.getKey(); - if (indexname.equals(tindex)) { - continue; - } - AliasMapping m = new RemoveAliasMapping.Builder(tindex, aliasname).build(); - String s = new Gson().toJson(m.getData()); - log.info(s); - if (needClean) { - list.add(m); - } - } - } - - ModifyAliases modifyAliases = new ModifyAliases.Builder(addAliasMapping).addAlias(list).setParameter("master_timeout", "5m").build(); - rst = jestClient.execute(modifyAliases); - if (!rst.isSucceeded()) { - log.error(rst.getErrorMessage()); - return false; - } - return true; - } - - public JestResult bulkInsert(Bulk.Builder bulk, int trySize) throws Exception { - // es_rejected_execution_exception - // illegal_argument_exception - // cluster_block_exception - JestResult rst = null; - rst = jestClient.execute(bulk.build()); - if (!rst.isSucceeded()) { - log.warn(rst.getErrorMessage()); - } - return rst; - } - - /** - * 关闭JestClient客户端 - * - */ - public void closeJestClient() { - if (jestClient != null) { - jestClient.shutdownClient(); - } - } -} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESColumn.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESColumn.java deleted file mode 100644 index 8990d77c..00000000 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESColumn.java +++ /dev/null @@ -1,65 +0,0 @@ -package com.alibaba.datax.plugin.writer.elasticsearchwriter; - -/** - * Created by xiongfeng.bxf on 17/3/2. - */ -public class ESColumn { - - private String name;//: "appkey", - - private String type;//": "TEXT", - - private String timezone; - - private String format; - - private Boolean array; - - public void setName(String name) { - this.name = name; - } - - public void setType(String type) { - this.type = type; - } - - public void setTimeZone(String timezone) { - this.timezone = timezone; - } - - public void setFormat(String format) { - this.format = format; - } - - public String getName() { - return name; - } - - public String getType() { - return type; - } - - public String getTimezone() { - return timezone; - } - - public String getFormat() { - return format; - } - - public void setTimezone(String timezone) { - this.timezone = timezone; - } - - public Boolean isArray() { - return array; - } - - public void setArray(Boolean array) { - this.array = array; - } - - public Boolean getArray() { - return array; - } -} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriter.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriter.java deleted file mode 100644 index eb0e9a81..00000000 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESWriter.java +++ /dev/null @@ -1,460 +0,0 @@ -package com.alibaba.datax.plugin.writer.elasticsearchwriter; - -import com.alibaba.datax.common.element.Column; -import com.alibaba.datax.common.element.Record; -import com.alibaba.datax.common.exception.DataXException; -import com.alibaba.datax.common.plugin.RecordReceiver; -import com.alibaba.datax.common.spi.Writer; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.common.util.RetryUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; -import com.alibaba.fastjson.TypeReference; -import io.searchbox.client.JestResult; -import io.searchbox.core.Bulk; -import io.searchbox.core.BulkResult; -import io.searchbox.core.Index; -import org.joda.time.DateTime; -import org.joda.time.DateTimeZone; -import org.joda.time.format.DateTimeFormat; -import org.joda.time.format.DateTimeFormatter; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.net.URLEncoder; -import java.util.*; -import java.util.concurrent.Callable; - -public class ESWriter extends Writer { - private final static String WRITE_COLUMNS = "write_columns"; - - public static class Job extends Writer.Job { - private static final Logger log = LoggerFactory.getLogger(Job.class); - - private Configuration conf = null; - - @Override - public void init() { - this.conf = super.getPluginJobConf(); - } - - @Override - public void prepare() { - /** - * 注意:此方法仅执行一次。 - * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 - */ - ESClient esClient = new ESClient(); - esClient.createClient(Key.getEndpoint(conf), - Key.getAccessID(conf), - Key.getAccessKey(conf), - false, - 300000, - false, - false); - - String indexName = Key.getIndexName(conf); - String typeName = Key.getTypeName(conf); - boolean dynamic = Key.getDynamic(conf); - String mappings = genMappings(typeName); - String settings = JSONObject.toJSONString( - Key.getSettings(conf) - ); - log.info(String.format("index:[%s], type:[%s], mappings:[%s]", indexName, typeName, mappings)); - - try { - boolean isIndicesExists = esClient.indicesExists(indexName); - if (Key.isCleanup(this.conf) && isIndicesExists) { - esClient.deleteIndex(indexName); - } - // 强制创建,内部自动忽略已存在的情况 - if (!esClient.createIndex(indexName, typeName, mappings, settings, dynamic)) { - throw new IOException("create index or mapping failed"); - } - } catch (Exception ex) { - throw DataXException.asDataXException(ESWriterErrorCode.ES_MAPPINGS, ex.toString()); - } - esClient.closeJestClient(); - } - - private String genMappings(String typeName) { - String mappings = null; - Map propMap = new HashMap(); - List columnList = new ArrayList(); - - List column = conf.getList("column"); - if (column != null) { - for (Object col : column) { - JSONObject jo = JSONObject.parseObject(col.toString()); - String colName = jo.getString("name"); - String colTypeStr = jo.getString("type"); - if (colTypeStr == null) { - throw DataXException.asDataXException(ESWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " column must have type"); - } - ESFieldType colType = ESFieldType.getESFieldType(colTypeStr); - if (colType == null) { - throw DataXException.asDataXException(ESWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " unsupported type"); - } - - ESColumn columnItem = new ESColumn(); - - if (colName.equals(Key.PRIMARY_KEY_COLUMN_NAME)) { - // 兼容已有版本 - colType = ESFieldType.ID; - colTypeStr = "id"; - } - - columnItem.setName(colName); - columnItem.setType(colTypeStr); - - if (colType == ESFieldType.ID) { - columnList.add(columnItem); - // 如果是id,则properties为空 - continue; - } - - Boolean array = jo.getBoolean("array"); - if (array != null) { - columnItem.setArray(array); - } - Map field = new HashMap(); - field.put("type", colTypeStr); - //https://www.elastic.co/guide/en/elasticsearch/reference/5.2/breaking_50_mapping_changes.html#_literal_index_literal_property - // https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_deep_dive_on_doc_values.html#_disabling_doc_values - field.put("doc_values", jo.getBoolean("doc_values")); - field.put("ignore_above", jo.getInteger("ignore_above")); - field.put("index", jo.getBoolean("index")); - - switch (colType) { - case STRING: - // 兼容string类型,ES5之前版本 - break; - case KEYWORD: - // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_warm_up_global_ordinals - field.put("eager_global_ordinals", jo.getBoolean("eager_global_ordinals")); - case TEXT: - field.put("analyzer", jo.getString("analyzer")); - // 优化disk使用,也同步会提高index性能 - // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html - field.put("norms", jo.getBoolean("norms")); - field.put("index_options", jo.getBoolean("index_options")); - break; - case DATE: - columnItem.setTimeZone(jo.getString("timezone")); - columnItem.setFormat(jo.getString("format")); - // 后面时间会处理为带时区的标准时间,所以不需要给ES指定格式 - /* - if (jo.getString("format") != null) { - field.put("format", jo.getString("format")); - } else { - //field.put("format", "strict_date_optional_time||epoch_millis||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"); - } - */ - break; - case GEO_SHAPE: - field.put("tree", jo.getString("tree")); - field.put("precision", jo.getString("precision")); - default: - break; - } - propMap.put(colName, field); - columnList.add(columnItem); - } - } - - conf.set(WRITE_COLUMNS, JSON.toJSONString(columnList)); - - log.info(JSON.toJSONString(columnList)); - - Map rootMappings = new HashMap(); - Map typeMappings = new HashMap(); - typeMappings.put("properties", propMap); - rootMappings.put(typeName, typeMappings); - - mappings = JSON.toJSONString(rootMappings); - - if (mappings == null || "".equals(mappings)) { - throw DataXException.asDataXException(ESWriterErrorCode.BAD_CONFIG_VALUE, "must have mappings"); - } - - return mappings; - } - - @Override - public List split(int mandatoryNumber) { - List configurations = new ArrayList(mandatoryNumber); - for (int i = 0; i < mandatoryNumber; i++) { - configurations.add(conf); - } - return configurations; - } - - @Override - public void post() { - ESClient esClient = new ESClient(); - esClient.createClient(Key.getEndpoint(conf), - Key.getAccessID(conf), - Key.getAccessKey(conf), - false, - 300000, - false, - false); - String alias = Key.getAlias(conf); - if (!"".equals(alias)) { - log.info(String.format("alias [%s] to [%s]", alias, Key.getIndexName(conf))); - try { - esClient.alias(Key.getIndexName(conf), alias, Key.isNeedCleanAlias(conf)); - } catch (IOException e) { - throw DataXException.asDataXException(ESWriterErrorCode.ES_ALIAS_MODIFY, e); - } - } - } - - @Override - public void destroy() { - - } - } - - public static class Task extends Writer.Task { - - private static final Logger log = LoggerFactory.getLogger(Job.class); - - private Configuration conf; - - - ESClient esClient = null; - private List typeList; - private List columnList; - - private int trySize; - private int batchSize; - private String index; - private String type; - private String splitter; - - @Override - public void init() { - this.conf = super.getPluginJobConf(); - index = Key.getIndexName(conf); - type = Key.getTypeName(conf); - - trySize = Key.getTrySize(conf); - batchSize = Key.getBatchSize(conf); - splitter = Key.getSplitter(conf); - columnList = JSON.parseObject(this.conf.getString(WRITE_COLUMNS), new TypeReference>() { - }); - - typeList = new ArrayList(); - - for (ESColumn col : columnList) { - typeList.add(ESFieldType.getESFieldType(col.getType())); - } - - esClient = new ESClient(); - } - - @Override - public void prepare() { - esClient.createClient(Key.getEndpoint(conf), - Key.getAccessID(conf), - Key.getAccessKey(conf), - Key.isMultiThread(conf), - Key.getTimeout(conf), - Key.isCompression(conf), - Key.isDiscovery(conf)); - } - - @Override - public void startWrite(RecordReceiver recordReceiver) { - List writerBuffer = new ArrayList(this.batchSize); - Record record = null; - long total = 0; - while ((record = recordReceiver.getFromReader()) != null) { - writerBuffer.add(record); - if (writerBuffer.size() >= this.batchSize) { - total += doBatchInsert(writerBuffer); - writerBuffer.clear(); - } - } - - if (!writerBuffer.isEmpty()) { - total += doBatchInsert(writerBuffer); - writerBuffer.clear(); - } - - String msg = String.format("task end, write size :%d", total); - getTaskPluginCollector().collectMessage("writesize", String.valueOf(total)); - log.info(msg); - esClient.closeJestClient(); - } - - private String getDateStr(ESColumn esColumn, Column column) { - DateTime date = null; - DateTimeZone dtz = DateTimeZone.getDefault(); - if (esColumn.getTimezone() != null) { - // 所有时区参考 http://www.joda.org/joda-time/timezones.html - dtz = DateTimeZone.forID(esColumn.getTimezone()); - } - if (column.getType() != Column.Type.DATE && esColumn.getFormat() != null) { - DateTimeFormatter formatter = DateTimeFormat.forPattern(esColumn.getFormat()); - date = formatter.withZone(dtz).parseDateTime(column.asString()); - return date.toString(); - } else if (column.getType() == Column.Type.DATE) { - date = new DateTime(column.asLong(), dtz); - return date.toString(); - } else { - return column.asString(); - } - } - - private long doBatchInsert(final List writerBuffer) { - Map data = null; - final Bulk.Builder bulkaction = new Bulk.Builder().defaultIndex(this.index).defaultType(this.type); - for (Record record : writerBuffer) { - data = new HashMap(); - String id = null; - for (int i = 0; i < record.getColumnNumber(); i++) { - Column column = record.getColumn(i); - String columnName = columnList.get(i).getName(); - ESFieldType columnType = typeList.get(i); - //如果是数组类型,那它传入的必是字符串类型 - if (columnList.get(i).isArray() != null && columnList.get(i).isArray()) { - String[] dataList = column.asString().split(splitter); - if (!columnType.equals(ESFieldType.DATE)) { - data.put(columnName, dataList); - } else { - for (int pos = 0; pos < dataList.length; pos++) { - dataList[pos] = getDateStr(columnList.get(i), column); - } - data.put(columnName, dataList); - } - } else { - switch (columnType) { - case ID: - if (id != null) { - id += record.getColumn(i).asString(); - } else { - id = record.getColumn(i).asString(); - } - break; - case DATE: - try { - String dateStr = getDateStr(columnList.get(i), column); - data.put(columnName, dateStr); - } catch (Exception e) { - getTaskPluginCollector().collectDirtyRecord(record, String.format("时间类型解析失败 [%s:%s] exception: %s", columnName, column.toString(), e.toString())); - } - break; - case KEYWORD: - case STRING: - case TEXT: - case IP: - case GEO_POINT: - data.put(columnName, column.asString()); - break; - case BOOLEAN: - data.put(columnName, column.asBoolean()); - break; - case BYTE: - case BINARY: - data.put(columnName, column.asBytes()); - break; - case LONG: - data.put(columnName, column.asLong()); - break; - case INTEGER: - data.put(columnName, column.asBigInteger()); - break; - case SHORT: - data.put(columnName, column.asBigInteger()); - break; - case FLOAT: - case DOUBLE: - data.put(columnName, column.asDouble()); - break; - case NESTED: - case OBJECT: - case GEO_SHAPE: - data.put(columnName, JSON.parse(column.asString())); - break; - default: - getTaskPluginCollector().collectDirtyRecord(record, "类型错误:不支持的类型:" + columnType + " " + columnName); - } - } - } - - if (id == null) { - //id = UUID.randomUUID().toString(); - bulkaction.addAction(new Index.Builder(data).build()); - } else { - bulkaction.addAction(new Index.Builder(data).id(id).build()); - } - } - - try { - return RetryUtil.executeWithRetry(new Callable() { - @Override - public Integer call() throws Exception { - JestResult jestResult = esClient.bulkInsert(bulkaction, 1); - if (jestResult.isSucceeded()) { - return writerBuffer.size(); - } - - String msg = String.format("response code: [%d] error :[%s]", jestResult.getResponseCode(), jestResult.getErrorMessage()); - log.warn(msg); - if (esClient.isBulkResult(jestResult)) { - BulkResult brst = (BulkResult) jestResult; - List failedItems = brst.getFailedItems(); - for (BulkResult.BulkResultItem item : failedItems) { - if (item.status != 400) { - // 400 BAD_REQUEST 如果非数据异常,请求异常,则不允许忽略 - throw DataXException.asDataXException(ESWriterErrorCode.ES_INDEX_INSERT, String.format("status:[%d], error: %s", item.status, item.error)); - } else { - // 如果用户选择不忽略解析错误,则抛异常,默认为忽略 - if (!Key.isIgnoreParseError(conf)) { - throw DataXException.asDataXException(ESWriterErrorCode.ES_INDEX_INSERT, String.format("status:[%d], error: %s, config not ignoreParseError so throw this error", item.status, item.error)); - } - } - } - - List items = brst.getItems(); - for (int idx = 0; idx < items.size(); ++idx) { - BulkResult.BulkResultItem item = items.get(idx); - if (item.error != null && !"".equals(item.error)) { - getTaskPluginCollector().collectDirtyRecord(writerBuffer.get(idx), String.format("status:[%d], error: %s", item.status, item.error)); - } - } - return writerBuffer.size() - brst.getFailedItems().size(); - } else { - Integer status = esClient.getStatus(jestResult); - switch (status) { - case 429: //TOO_MANY_REQUESTS - log.warn("server response too many requests, so auto reduce speed"); - break; - } - throw DataXException.asDataXException(ESWriterErrorCode.ES_INDEX_INSERT, jestResult.getErrorMessage()); - } - } - }, trySize, 60000L, true); - } catch (Exception e) { - if (Key.isIgnoreWriteError(this.conf)) { - log.warn(String.format("重试[%d]次写入失败,忽略该错误,继续写入!", trySize)); - } else { - throw DataXException.asDataXException(ESWriterErrorCode.ES_INDEX_INSERT, e); - } - } - return 0; - } - - @Override - public void post() { - } - - @Override - public void destroy() { - esClient.closeJestClient(); - } - } -} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchClient.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchClient.java new file mode 100644 index 00000000..08486e1f --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchClient.java @@ -0,0 +1,314 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfo; +import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfoResult; +import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.PutMapping7; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; +import com.google.gson.Gson; +import com.google.gson.JsonElement; +import com.google.gson.JsonObject; +import com.google.gson.JsonParser; +import io.searchbox.action.Action; +import io.searchbox.client.JestClient; +import io.searchbox.client.JestClientFactory; +import io.searchbox.client.JestResult; +import io.searchbox.client.config.HttpClientConfig; +import io.searchbox.client.config.HttpClientConfig.Builder; +import io.searchbox.core.Bulk; +import io.searchbox.indices.CreateIndex; +import io.searchbox.indices.DeleteIndex; +import io.searchbox.indices.IndicesExists; +import io.searchbox.indices.aliases.*; +import io.searchbox.indices.mapping.GetMapping; +import io.searchbox.indices.mapping.PutMapping; + +import io.searchbox.indices.settings.GetSettings; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +/** + * Created by xiongfeng.bxf on 17/2/8. + */ +public class ElasticSearchClient { + private static final Logger LOGGER = LoggerFactory.getLogger(ElasticSearchClient.class); + + private JestClient jestClient; + private Configuration conf; + + public JestClient getClient() { + return jestClient; + } + + public ElasticSearchClient(Configuration conf) { + this.conf = conf; + String endpoint = Key.getEndpoint(conf); + //es是支持集群写入的 + String[] endpoints = endpoint.split(","); + String user = Key.getUsername(conf); + String passwd = Key.getPassword(conf); + boolean multiThread = Key.isMultiThread(conf); + int readTimeout = Key.getTimeout(conf); + boolean compression = Key.isCompression(conf); + boolean discovery = Key.isDiscovery(conf); + String discoveryFilter = Key.getDiscoveryFilter(conf); + int totalConnection = this.conf.getInt("maxTotalConnection", 200); + JestClientFactory factory = new JestClientFactory(); + Builder httpClientConfig = new HttpClientConfig + .Builder(Arrays.asList(endpoints)) +// .setPreemptiveAuth(new HttpHost(endpoint)) + .multiThreaded(multiThread) + .connTimeout(readTimeout) + .readTimeout(readTimeout) + .maxTotalConnection(totalConnection) + .requestCompressionEnabled(compression) + .discoveryEnabled(discovery) + .discoveryFrequency(5L, TimeUnit.MINUTES) + .discoveryFilter(discoveryFilter); + if (!(StringUtils.isBlank(user) || StringUtils.isBlank(passwd))) { + // 匿名登录 + httpClientConfig.defaultCredentials(user, passwd); + } + factory.setHttpClientConfig(httpClientConfig.build()); + this.jestClient = factory.getObject(); + } + + public boolean indicesExists(String indexName) throws Exception { + boolean isIndicesExists = false; + JestResult rst = execute(new IndicesExists.Builder(indexName).build()); + if (rst.isSucceeded()) { + isIndicesExists = true; + } else { + LOGGER.warn("IndicesExists got ResponseCode: {} ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); + switch (rst.getResponseCode()) { + case 404: + isIndicesExists = false; + break; + case 401: + // 无权访问 + default: + LOGGER.warn(rst.getErrorMessage()); + break; + } + } + return isIndicesExists; + } + + public boolean deleteIndex(String indexName) throws Exception { + LOGGER.info("delete index {}", indexName); + if (indicesExists(indexName)) { + JestResult rst = execute(new DeleteIndex.Builder(indexName).build()); + if (!rst.isSucceeded()) { + LOGGER.warn("DeleteIndex got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); + return false; + } else { + LOGGER.info("delete index {} success", indexName); + } + } else { + LOGGER.info("index cannot found, skip delete index {}", indexName); + } + return true; + } + + public boolean isGreaterOrEqualThan7() throws Exception { + try { + ClusterInfoResult result = execute(new ClusterInfo.Builder().build()); + LOGGER.info("ClusterInfoResult: {}", result.getJsonString()); + return result.isGreaterOrEqualThan7(); + }catch(Exception e) { + LOGGER.warn(e.getMessage()); + return false; + } + } + + /** + * 获取索引的settings + * @param indexName 索引名 + * @return 设置 + */ + public String getIndexSettings(String indexName) { + GetSettings.Builder builder = new GetSettings.Builder(); + builder.addIndex(indexName); + GetSettings getSettings = builder.build(); + try { + LOGGER.info("begin GetSettings for index: {}", indexName); + JestResult result = this.execute(getSettings); + return result.getJsonString(); + } catch (Exception e) { + String message = "GetSettings for index error: " + e.getMessage(); + LOGGER.warn(message, e); + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_GET_SETTINGS, e.getMessage(), e); + } + } + + public boolean createIndexIfNotExists(String indexName, String typeName, + Object mappings, String settings, + boolean dynamic, boolean isGreaterOrEqualThan7) throws Exception { + JestResult rst; + if (!indicesExists(indexName)) { + LOGGER.info("create index {}", indexName); + rst = execute( + new CreateIndex.Builder(indexName) + .settings(settings) + .setParameter("master_timeout", Key.getMasterTimeout(this.conf)) + .build() + ); + //index_already_exists_exception + if (!rst.isSucceeded()) { + LOGGER.warn("CreateIndex got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); + if (getStatus(rst) == 400) { + LOGGER.info(String.format("index {} already exists", indexName)); + return true; + } else { + return false; + } + } else { + LOGGER.info("create {} index success", indexName); + } + } + + if (dynamic) { + LOGGER.info("dynamic is true, ignore mappings"); + return true; + } + LOGGER.info("create mappings for {} {}", indexName, mappings); + //如果大于7.x,mapping的PUT请求URI中不能带type,并且mapping设置中不能带有嵌套结构 + if (isGreaterOrEqualThan7) { + rst = execute(new PutMapping7.Builder(indexName, mappings). + setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build()); + } else { + rst = execute(new PutMapping.Builder(indexName, typeName, mappings) + .setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build()); + } + if (!rst.isSucceeded()) { + LOGGER.error("PutMapping got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); + return false; + } else { + LOGGER.info("index {} put mappings success", indexName); + } + return true; + } + + public T execute(Action clientRequest) throws IOException { + T rst = jestClient.execute(clientRequest); + if (!rst.isSucceeded()) { + LOGGER.warn(rst.getJsonString()); + } + return rst; + } + + public Integer getStatus(JestResult rst) { + JsonObject jsonObject = rst.getJsonObject(); + if (jsonObject.has("status")) { + return jsonObject.get("status").getAsInt(); + } + return 600; + } + + public boolean isBulkResult(JestResult rst) { + JsonObject jsonObject = rst.getJsonObject(); + return jsonObject.has("items"); + } + + + public boolean alias(String indexname, String aliasname, boolean needClean) throws IOException { + GetAliases getAliases = new GetAliases.Builder().addIndex(aliasname).build(); + AliasMapping addAliasMapping = new AddAliasMapping.Builder(indexname, aliasname).build(); + JestResult rst = null; + List list = new ArrayList(); + if (needClean) { + rst = execute(getAliases); + if (rst.isSucceeded()) { + JsonParser jp = new JsonParser(); + JsonObject jo = (JsonObject) jp.parse(rst.getJsonString()); + for (Map.Entry entry : jo.entrySet()) { + String tindex = entry.getKey(); + if (indexname.equals(tindex)) { + continue; + } + AliasMapping m = new RemoveAliasMapping.Builder(tindex, aliasname).build(); + String s = new Gson().toJson(m.getData()); + LOGGER.info(s); + list.add(m); + } + } + } + + ModifyAliases modifyAliases = new ModifyAliases.Builder(addAliasMapping).addAlias(list).setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build(); + rst = execute(modifyAliases); + if (!rst.isSucceeded()) { + LOGGER.error(rst.getErrorMessage()); + throw new IOException(rst.getErrorMessage()); + } + return true; + } + + /** + * 获取index的mapping + */ + public String getIndexMapping(String indexName) { + GetMapping.Builder builder = new GetMapping.Builder(); + builder.addIndex(indexName); + GetMapping getMapping = builder.build(); + try { + LOGGER.info("begin GetMapping for index: {}", indexName); + JestResult result = this.execute(getMapping); + return result.getJsonString(); + } catch (Exception e) { + String message = "GetMapping for index error: " + e.getMessage(); + LOGGER.warn(message, e); + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, e.getMessage(), e); + } + } + + public String getMappingForIndexType(String indexName, String typeName) { + String indexMapping = this.getIndexMapping(indexName); + JSONObject indexMappingInJson = JSON.parseObject(indexMapping); + List paths = Arrays.asList(indexName, "mappings"); + JSONObject properties = JsonPathUtil.getJsonObject(paths, indexMappingInJson); + JSONObject propertiesParent = properties; + if (StringUtils.isNotBlank(typeName) && properties.containsKey(typeName)) { + propertiesParent = (JSONObject) properties.get(typeName); + } + JSONObject mapping = (JSONObject) propertiesParent.get("properties"); + return JSON.toJSONString(mapping); + } + + public JestResult bulkInsert(Bulk.Builder bulk) throws Exception { + // es_rejected_execution_exception + // illegal_argument_exception + // cluster_block_exception + JestResult rst = null; + rst = execute(bulk.build()); + if (!rst.isSucceeded()) { + LOGGER.warn(rst.getErrorMessage()); + } + return rst; + } + + /** + * 关闭JestClient客户端 + * + */ + public void closeJestClient() { + if (jestClient != null) { + try { + // jestClient.shutdownClient(); + jestClient.close(); + } catch (IOException e) { + LOGGER.warn("ignore error: ", e.getMessage()); + } + + } + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchColumn.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchColumn.java new file mode 100644 index 00000000..a27b15b2 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchColumn.java @@ -0,0 +1,126 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import java.util.List; + +/** + * Created by xiongfeng.bxf on 17/3/2. + */ +public class ElasticSearchColumn { + + private String name;//: "appkey", + + private String type;//": "TEXT", + + private String timezone; + + /** + * 源头数据格式化处理,datax做的事情 + */ + private String format; + + /** + * 目标端格式化,es原生支持的格式 + */ + private String dstFormat; + + private boolean array; + + /** + * 是否使用目标端(ES原生)数组类型 + * + * 默认是false + */ + private boolean dstArray = false; + + private boolean jsonArray; + + private boolean origin; + + private List combineFields; + + private String combineFieldsValueSeparator = "-"; + + public String getCombineFieldsValueSeparator() { + return combineFieldsValueSeparator; + } + + public void setCombineFieldsValueSeparator(String combineFieldsValueSeparator) { + this.combineFieldsValueSeparator = combineFieldsValueSeparator; + } + + public List getCombineFields() { + return combineFields; + } + + public void setCombineFields(List combineFields) { + this.combineFields = combineFields; + } + + public void setName(String name) { + this.name = name; + } + + public void setType(String type) { + this.type = type; + } + + public void setTimeZone(String timezone) { + this.timezone = timezone; + } + + public void setFormat(String format) { + this.format = format; + } + + public String getName() { + return name; + } + + public String getType() { + return type; + } + + public boolean isOrigin() { return origin; } + + public void setOrigin(boolean origin) { this.origin = origin; } + + public String getTimezone() { + return timezone; + } + + public String getFormat() { + return format; + } + + public void setTimezone(String timezone) { + this.timezone = timezone; + } + + public boolean isArray() { + return array; + } + + public void setArray(boolean array) { + this.array = array; + } + + public boolean isJsonArray() {return jsonArray;} + + public void setJsonArray(boolean jsonArray) {this.jsonArray = jsonArray;} + + public String getDstFormat() { + return dstFormat; + } + + public void setDstFormat(String dstFormat) { + this.dstFormat = dstFormat; + } + + public boolean isDstArray() { + return dstArray; + } + + public void setDstArray(boolean dstArray) { + this.dstArray = dstArray; + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESFieldType.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchFieldType.java similarity index 73% rename from elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESFieldType.java rename to elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchFieldType.java index 14b09689..22c3ee6b 100644 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ESFieldType.java +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchFieldType.java @@ -3,8 +3,11 @@ package com.alibaba.datax.plugin.writer.elasticsearchwriter; /** * Created by xiongfeng.bxf on 17/3/1. */ -public enum ESFieldType { +public enum ElasticSearchFieldType { ID, + PARENT, + ROUTING, + VERSION, STRING, TEXT, KEYWORD, @@ -24,20 +27,18 @@ public enum ESFieldType { DATE_RANGE, GEO_POINT, GEO_SHAPE, - IP, + IP_RANGE, COMPLETION, TOKEN_COUNT, - - ARRAY, OBJECT, NESTED; - public static ESFieldType getESFieldType(String type) { + public static ElasticSearchFieldType getESFieldType(String type) { if (type == null) { return null; } - for (ESFieldType f : ESFieldType.values()) { + for (ElasticSearchFieldType f : ElasticSearchFieldType.values()) { if (f.name().compareTo(type.toUpperCase()) == 0) { return f; } diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriter.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriter.java new file mode 100644 index 00000000..2c8ed2d0 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriter.java @@ -0,0 +1,1117 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.DataXCaseEnvUtil; +import com.alibaba.datax.common.util.RetryUtil; +import com.alibaba.datax.plugin.writer.elasticsearchwriter.Key.ActionType; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; +import com.alibaba.fastjson2.TypeReference; +import com.alibaba.fastjson2.JSONWriter; +import com.google.common.base.Joiner; +import io.searchbox.client.JestResult; +import io.searchbox.core.*; +import io.searchbox.params.Parameters; +import org.apache.commons.lang3.StringUtils; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.joda.time.format.DateTimeFormat; +import org.joda.time.format.DateTimeFormatter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.*; +import java.util.concurrent.Callable; + +public class ElasticSearchWriter extends Writer { + private final static String WRITE_COLUMNS = "write_columns"; + + public static class Job extends Writer.Job { + private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); + + private Configuration conf = null; + int retryTimes = 3; + long sleepTimeInMilliSecond = 10000L; + + private String settingsCache; + + private void setSettings(String settings) { + this.settingsCache = JsonUtil.mergeJsonStr(settings, this.settingsCache); + } + + @Override + public void init() { + this.conf = super.getPluginJobConf(); + //LOGGER.info("conf:{}", conf); + this.retryTimes = this.conf.getInt("retryTimes", 3); + this.sleepTimeInMilliSecond = this.conf.getLong("sleepTimeInMilliSecond", 10000L); + } + + public List getIncludeSettings() { + return this.conf.getList("includeSettingKeys", Arrays.asList("number_of_shards", "number_of_replicas"), String.class); + } + + /** + * 从es中获取的原始settings转为需要的settings + * @param originSettings 原始settings + * @return settings + */ + private String convertSettings(String originSettings) { + if(StringUtils.isBlank(originSettings)) { + return null; + } + JSONObject jsonObject = JSON.parseObject(originSettings); + for(String key : jsonObject.keySet()) { + JSONObject settingsObj = jsonObject.getJSONObject(key); + if(settingsObj != null) { + JSONObject indexObj = settingsObj.getJSONObject("settings"); + JSONObject settings = indexObj.getJSONObject("index"); + JSONObject filterSettings = new JSONObject(); + if(settings != null) { + List includeSettings = getIncludeSettings(); + if(includeSettings != null && includeSettings.size() > 0) { + for(String includeSetting : includeSettings) { + Object fieldValue = settings.get(includeSetting); + if(fieldValue != null) { + filterSettings.put(includeSetting, fieldValue); + } + } + return filterSettings.toJSONString(); + } + } + } + } + return null; + } + + @Override + public void prepare() { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 + * 对于7.x之后的es版本,取消了index设置type的逻辑,因此在prepare阶段,加入了判断是否为7.x及以上版本 + * 如果是7.x及以上版本,需要对于index的type做不同的处理 + * 详见 : https://www.elastic.co/guide/en/elasticsearch/reference/6.8/removal-of-types.html + */ + final ElasticSearchClient esClient = new ElasticSearchClient(this.conf); + final String indexName = Key.getIndexName(conf); + ActionType actionType = Key.getActionType(conf); + final String typeName = Key.getTypeName(conf); + final boolean dynamic = Key.getDynamic(conf); + final String dstDynamic = Key.getDstDynamic(conf); + final String newSettings = JSONObject.toJSONString(Key.getSettings(conf)); + LOGGER.info("conf settings:{}, settingsCache:{}", newSettings, this.settingsCache); + final Integer esVersion = Key.getESVersion(conf); + boolean hasId = this.hasID(); + this.conf.set("hasId", hasId); + if (ActionType.UPDATE.equals(actionType) && !hasId && !hasPrimaryKeyInfo()) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.UPDATE_WITH_ID, "Update mode must specify column type with id or primaryKeyInfo config"); + } + + try { + RetryUtil.executeWithRetry(() -> { + boolean isGreaterOrEqualThan7 = esClient.isGreaterOrEqualThan7(); + if (esVersion != null && esVersion >= 7) { + isGreaterOrEqualThan7 = true; + } + String mappings = genMappings(dstDynamic, typeName, isGreaterOrEqualThan7); + conf.set("isGreaterOrEqualThan7", isGreaterOrEqualThan7); + + + LOGGER.info(String.format("index:[%s], type:[%s], mappings:[%s]", indexName, typeName, mappings)); + boolean isIndicesExists = esClient.indicesExists(indexName); + if (isIndicesExists) { + try { + // 将原有的mapping打印出来,便于排查问题 + String oldMappings = esClient.getMappingForIndexType(indexName, typeName); + LOGGER.info("the mappings for old index is: {}", oldMappings); + } catch (Exception e) { + LOGGER.warn("warn message: {}", e.getMessage()); + } + } + + if (Key.isTruncate(conf) && isIndicesExists) { + // 备份老的索引中的settings到缓存 + try { + String oldOriginSettings = esClient.getIndexSettings(indexName); + if (StringUtils.isNotBlank(oldOriginSettings)) { + String includeSettings = convertSettings(oldOriginSettings); + LOGGER.info("merge1 settings:{}, settingsCache:{}, includeSettings:{}", + oldOriginSettings, + this.settingsCache, includeSettings); + this.setSettings(includeSettings); + } + } catch (Exception e) { + LOGGER.warn("get old settings fail, indexName:{}", indexName); + } + esClient.deleteIndex(indexName); + } + + // 更新缓存中的settings + this.setSettings(newSettings); + LOGGER.info("merge2 settings:{}, settingsCache:{}", newSettings, this.settingsCache); + // 强制创建,内部自动忽略已存在的情况 + if (!esClient.createIndexIfNotExists(indexName, typeName, mappings, this.settingsCache, dynamic, + isGreaterOrEqualThan7)) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, ""); + } + + return true; + }, DataXCaseEnvUtil.getRetryTimes(this.retryTimes), DataXCaseEnvUtil.getRetryInterval(this.sleepTimeInMilliSecond), DataXCaseEnvUtil.getRetryExponential(false)); + } catch (Exception ex) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, ex.getMessage(), ex); + } finally { + try { + esClient.closeJestClient(); + } catch (Exception e) { + LOGGER.warn("ignore close jest client error: {}", e.getMessage()); + } + } + } + + private boolean hasID() { + List column = conf.getList("column"); + if (column != null) { + for (Object col : column) { + JSONObject jo = JSONObject.parseObject(col.toString()); + String colTypeStr = jo.getString("type"); + ElasticSearchFieldType colType = ElasticSearchFieldType.getESFieldType(colTypeStr); + if (ElasticSearchFieldType.ID.equals(colType)) { + return true; + } + } + } + return false; + } + + private boolean hasPrimaryKeyInfo() { + PrimaryKeyInfo primaryKeyInfo = Key.getPrimaryKeyInfo(this.conf); + if (null != primaryKeyInfo && null != primaryKeyInfo.getColumn() && !primaryKeyInfo.getColumn().isEmpty()) { + return true; + } else { + return false; + } + } + + + private String genMappings(String dstDynamic, String typeName, boolean isGreaterOrEqualThan7) { + String mappings; + Map propMap = new HashMap(); + List columnList = new ArrayList(); + ElasticSearchColumn combineItem = null; + + List column = conf.getList("column"); + if (column != null) { + for (Object col : column) { + JSONObject jo = JSONObject.parseObject(col.toString()); + String colName = jo.getString("name"); + String colTypeStr = jo.getString("type"); + if (colTypeStr == null) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " column must have type"); + } + ElasticSearchFieldType colType = ElasticSearchFieldType.getESFieldType(colTypeStr); + if (colType == null) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " unsupported type"); + } + + ElasticSearchColumn columnItem = new ElasticSearchColumn(); + + if (Key.PRIMARY_KEY_COLUMN_NAME.equals(colName)) { + // 兼容已有版本 + colType = ElasticSearchFieldType.ID; + colTypeStr = "id"; + } + + columnItem.setName(colName); + columnItem.setType(colTypeStr); + + JSONArray combineFields = jo.getJSONArray("combineFields"); + if (combineFields != null && !combineFields.isEmpty() && ElasticSearchFieldType.ID.equals(ElasticSearchFieldType.getESFieldType(colTypeStr))) { + List fields = new ArrayList(); + for (Object item : combineFields) { + fields.add((String) item); + } + columnItem.setCombineFields(fields); + combineItem = columnItem; + } + + String combineFieldsValueSeparator = jo.getString("combineFieldsValueSeparator"); + if (StringUtils.isNotBlank(combineFieldsValueSeparator)) { + columnItem.setCombineFieldsValueSeparator(combineFieldsValueSeparator); + } + + // 如果是id,version,routing,不需要创建mapping + if (colType == ElasticSearchFieldType.ID || colType == ElasticSearchFieldType.VERSION || colType == ElasticSearchFieldType.ROUTING) { + columnList.add(columnItem); + continue; + } + + // 如果是组合id中的字段,不需要创建mapping + // 所以组合id的定义必须要在columns最前面 + if (combineItem != null && combineItem.getCombineFields().contains(colName)) { + columnList.add(columnItem); + continue; + } + columnItem.setDstArray(false); + Boolean array = jo.getBoolean("array"); + if (array != null) { + columnItem.setArray(array); + Boolean dstArray = jo.getBoolean("dstArray"); + if(dstArray!=null) { + columnItem.setDstArray(dstArray); + } + } else { + columnItem.setArray(false); + } + Boolean jsonArray = jo.getBoolean("json_array"); + if (jsonArray != null) { + columnItem.setJsonArray(jsonArray); + } else { + columnItem.setJsonArray(false); + } + Map field = new HashMap(); + field.put("type", colTypeStr); + //https://www.elastic.co/guide/en/elasticsearch/reference/5.2/breaking_50_mapping_changes.html#_literal_index_literal_property + // https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_deep_dive_on_doc_values.html#_disabling_doc_values + field.put("doc_values", jo.getBoolean("doc_values")); + field.put("ignore_above", jo.getInteger("ignore_above")); + field.put("index", jo.getBoolean("index")); + switch (colType) { + case STRING: + // 兼容string类型,ES5之前版本 + break; + case KEYWORD: + // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_warm_up_global_ordinals + field.put("eager_global_ordinals", jo.getBoolean("eager_global_ordinals")); + break; + case TEXT: + field.put("analyzer", jo.getString("analyzer")); + // 优化disk使用,也同步会提高index性能 + // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html + field.put("norms", jo.getBoolean("norms")); + field.put("index_options", jo.getBoolean("index_options")); + if(jo.getString("fields") != null) { + field.put("fields", jo.getJSONObject("fields")); + } + break; + case DATE: + if (Boolean.TRUE.equals(jo.getBoolean("origin"))) { + if (jo.getString("format") != null) { + field.put("format", jo.getString("format")); + } + // es原生format覆盖原先来的format + if (jo.getString("dstFormat") != null) { + field.put("format", jo.getString("dstFormat")); + } + if(jo.getBoolean("origin") != null) { + columnItem.setOrigin(jo.getBoolean("origin")); + } + } else { + columnItem.setTimeZone(jo.getString("timezone")); + columnItem.setFormat(jo.getString("format")); + } + break; + case GEO_SHAPE: + field.put("tree", jo.getString("tree")); + field.put("precision", jo.getString("precision")); + break; + case OBJECT: + case NESTED: + if (jo.getString("dynamic") != null) { + field.put("dynamic", jo.getString("dynamic")); + } + break; + default: + break; + } + if (jo.containsKey("other_params")) { + field.putAll(jo.getJSONObject("other_params")); + } + propMap.put(colName, field); + columnList.add(columnItem); + } + } + + long version = System.currentTimeMillis(); + LOGGER.info("unified version: {}", version); + conf.set("version", version); + conf.set(WRITE_COLUMNS, JSON.toJSONString(columnList)); + + LOGGER.info(JSON.toJSONString(columnList)); + + Map rootMappings = new HashMap(); + Map typeMappings = new HashMap(); + typeMappings.put("properties", propMap); + rootMappings.put(typeName, typeMappings); + + // 7.x以后版本取消了index中关于type的指定,所以mapping的格式只能支持 + // { + // "properties" : { + // "abc" : { + // "type" : "text" + // } + // } + // } + // properties 外不能再嵌套typeName + + if(StringUtils.isNotBlank(dstDynamic)) { + typeMappings.put("dynamic", dstDynamic); + } + if (isGreaterOrEqualThan7) { + mappings = JSON.toJSONString(typeMappings); + } else { + mappings = JSON.toJSONString(rootMappings); + } + if (StringUtils.isBlank(mappings)) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, "must have mappings"); + } + + return mappings; + } + + @Override + public List split(int mandatoryNumber) { + List configurations = new ArrayList(mandatoryNumber); + for (int i = 0; i < mandatoryNumber; i++) { + configurations.add(this.conf.clone()); + } + return configurations; + } + + @Override + public void post() { + ElasticSearchClient esClient = new ElasticSearchClient(this.conf); + String alias = Key.getAlias(conf); + if (!"".equals(alias)) { + LOGGER.info(String.format("alias [%s] to [%s]", alias, Key.getIndexName(conf))); + try { + esClient.alias(Key.getIndexName(conf), alias, Key.isNeedCleanAlias(conf)); + } catch (IOException e) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_ALIAS_MODIFY, e); + } + } + } + + @Override + public void destroy() { + + } + } + + public static class Task extends Writer.Task { + + private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); + + private Configuration conf; + + + ElasticSearchClient esClient = null; + private List typeList; + private List columnList; + private List> deleteByConditions; + + private int trySize; + private long tryInterval; + private int batchSize; + private String index; + private String type; + private String splitter; + private ActionType actionType; + private ElasticSearchColumn combinedIdColumn; + private Map colNameToIndexMap; + private Map urlParams; + private boolean columnSizeChecked = false; + private boolean enableRedundantColumn = false; + private boolean enableWriteNull = true; + int retryTimes = 3; + long sleepTimeInMilliSecond = 10000L; + boolean isGreaterOrEqualThan7 = false; + private String fieldDelimiter; + private boolean hasId; + private PrimaryKeyInfo primaryKeyInfo; + private boolean hasPrimaryKeyInfo = false; + private List esPartitionColumn; + private boolean hasEsPartitionColumn = false; + + @Override + public void init() { + this.conf = super.getPluginJobConf(); + this.index = Key.getIndexName(conf); + this.type = Key.getTypeName(conf); + this.trySize = Key.getTrySize(conf); + this.tryInterval = Key.getTryInterval(conf); + this.batchSize = Key.getBatchSize(conf); + this.splitter = Key.getSplitter(conf); + this.actionType = Key.getActionType(conf); + this.urlParams = Key.getUrlParams(conf); + this.enableWriteNull = Key.isEnableNullUpdate(conf); + this.retryTimes = this.conf.getInt("retryTimes", 3); + this.sleepTimeInMilliSecond = this.conf.getLong("sleepTimeInMilliSecond", 10000L); + this.isGreaterOrEqualThan7 = this.conf.getBool("isGreaterOrEqualThan7", false); + this.parseDeleteCondition(conf); + this.columnList = JSON.parseObject(this.conf.getString(WRITE_COLUMNS), new TypeReference>() { + }); + LOGGER.info("columnList: {}", JSON.toJSONString(columnList)); + this.hasId = this.conf.getBool("hasId", false); + if (hasId) { + LOGGER.info("Task has id column, will use it to set _id property"); + } else { + LOGGER.info("Task will use elasticsearch auto generated _id property"); + } + this.fieldDelimiter = Key.getFieldDelimiter(this.conf); + this.enableRedundantColumn = this.conf.getBool("enableRedundantColumn", false); + this.typeList = new ArrayList(); + for (ElasticSearchColumn esColumn : this.columnList) { + this.typeList.add(ElasticSearchFieldType.getESFieldType(esColumn.getType())); + if (esColumn.getCombineFields() != null && esColumn.getCombineFields().size() > 0 + && ElasticSearchFieldType.getESFieldType(esColumn.getType()).equals(ElasticSearchFieldType.ID)) { + combinedIdColumn = esColumn; + } + } + this.primaryKeyInfo = Key.getPrimaryKeyInfo(this.conf); + this.esPartitionColumn = Key.getEsPartitionColumn(this.conf); + this.colNameToIndexMap = new HashMap(5); + this.handleMetaKeys(); + this.esClient = new ElasticSearchClient(this.conf); + } + + private void handleMetaKeys() { + if (null != this.primaryKeyInfo && null != this.primaryKeyInfo.getColumn() + && !this.primaryKeyInfo.getColumn().isEmpty()) { + this.hasPrimaryKeyInfo = true; + if (null == this.primaryKeyInfo.getFieldDelimiter()) { + if (null != this.fieldDelimiter) { + this.primaryKeyInfo.setFieldDelimiter(this.fieldDelimiter); + } else { + this.primaryKeyInfo.setFieldDelimiter(""); + } + } + + for (String eachPk : this.primaryKeyInfo.getColumn()) { + boolean foundKeyInColumn = false; + for (int i = 0; i < columnList.size(); i++) { + if (StringUtils.equals(eachPk, columnList.get(i).getName())) { + this.colNameToIndexMap.put(eachPk, i); + foundKeyInColumn = true; + break; + } + } + if (!foundKeyInColumn) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, + "primaryKeyInfo has column not exists in column"); + } + } + } + + if (null != this.esPartitionColumn && !this.esPartitionColumn.isEmpty()) { + this.hasEsPartitionColumn = true; + for (PartitionColumn eachPartitionCol : this.esPartitionColumn) { + boolean foundKeyInColumn = false; + for (int i = 0; i < columnList.size(); i++) { + if (StringUtils.equals(eachPartitionCol.getName(), columnList.get(i).getName())) { + this.colNameToIndexMap.put(eachPartitionCol.getName(), i); + foundKeyInColumn = true; + break; + } + } + if (!foundKeyInColumn) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, + "esPartitionColumn has column not exists in column"); + } + } + } + } + + private void parseDeleteCondition(Configuration conf) { + List> list = new ArrayList>(); + String config = Key.getDeleteBy(conf); + if (config != null) { + JSONArray array = JSON.parseArray(config); + for (Object obj : array) { + list.add((Map) obj); + } + deleteByConditions = list; + } + } + + + @Override + public void prepare() { + } + + /** + * 示例:{ + * "deleteBy" : [ + * {"product_status" : [-1,-2], "sub_status" : -3}, + * {"product_status" : -3} + * ] + * } + * + * 表示以下两类数据删除: + * 1. product_status为-1或-2并且sub_status为-3 + * 2. product_status为-3 + * + * 注意[{}]返回true + * @param record + * @return + */ + private boolean isDeleteRecord(Record record) { + if (deleteByConditions == null) { + return false; + } + + Map kv = new HashMap(); + for (int i = 0; i < record.getColumnNumber(); i++) { + Column column = record.getColumn(i); + String columnName = columnList.get(i).getName(); + kv.put(columnName, column.asString()); + } + + for (Map delCondition : deleteByConditions) { + if (meetAllCondition(kv, delCondition)) { + return true; + } + } + + return false; + } + + private boolean meetAllCondition(Map kv, Map delCondition) { + for (Map.Entry oneCondition : delCondition.entrySet()) { + if (!checkOneCondition(kv, oneCondition)) { + return false; + } + } + return true; + } + + private boolean checkOneCondition(Map kv, Map.Entry entry) { + Object value = kv.get(entry.getKey()); + if (entry.getValue() instanceof List) { + for (Object obj : (List) entry.getValue()) { + if (obj.toString().equals(value)) { + return true; + } + } + } else { + if (value != null && value.equals(entry.getValue().toString())) { + return true; + } + } + return false; + } + + @Override + public void startWrite(RecordReceiver recordReceiver) { + List writerBuffer = new ArrayList(this.batchSize); + Record record = null; + while ((record = recordReceiver.getFromReader()) != null) { + if (!columnSizeChecked) { + boolean isInvalid = true; + if (enableRedundantColumn) { + isInvalid = this.columnList.size() > record.getColumnNumber(); + } else { + isInvalid = this.columnList.size() != record.getColumnNumber(); + } + if (isInvalid) { + String message = String.format( + "column number not equal error, reader column size is %s, but the writer column size is %s", + record.getColumnNumber(), this.columnList.size()); + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, message); + } + columnSizeChecked = true; + } + writerBuffer.add(record); + if (writerBuffer.size() >= this.batchSize) { + this.doBatchInsert(writerBuffer); + writerBuffer.clear(); + } + } + + if (!writerBuffer.isEmpty()) { + this.doBatchInsert(writerBuffer); + writerBuffer.clear(); + } + } + + private String getDateStr(ElasticSearchColumn esColumn, Column column) { + // 如果保持原样,就直接返回 + if (esColumn.isOrigin()) { + return column.asString(); + } + DateTime date = null; + DateTimeZone dtz = DateTimeZone.getDefault(); + if (esColumn.getTimezone() != null) { + // 所有时区参考 http://www.joda.org/joda-time/timezones.html + // TODO:创建一次多处复用 + dtz = DateTimeZone.forID(esColumn.getTimezone()); + } + if (column.getType() != Column.Type.DATE && esColumn.getFormat() != null) { + // TODO:创建一次多处复用 + DateTimeFormatter formatter = DateTimeFormat.forPattern(esColumn.getFormat()); + date = formatter.withZone(dtz).parseDateTime(column.asString()); + return date.toString(); + } else if (column.getType() == Column.Type.DATE) { + if (null == column.getRawData()) { + return null; + } else { + date = new DateTime(column.asLong(), dtz); + return date.toString(); + } + } else { + return column.asString(); + } + } + + private void doBatchInsert(final List writerBuffer) { + Map data = null; + Bulk.Builder bulkactionTmp = null; + int totalNumber = writerBuffer.size(); + int dirtyDataNumber = 0; + if (this.isGreaterOrEqualThan7) { + bulkactionTmp = new Bulk.Builder().defaultIndex(this.index); + } else { + bulkactionTmp = new Bulk.Builder().defaultIndex(this.index).defaultType(this.type); + } + final Bulk.Builder bulkaction = bulkactionTmp; + // 增加url的参数 + for (Map.Entry entry : urlParams.entrySet()) { + bulkaction.setParameter(entry.getKey(), entry.getValue()); + } + for (Record record : writerBuffer) { + data = new HashMap(); + String id = null; + String parent = null; + String routing = null; + String version = null; + String columnName = null; + Column column = null; + try { + for (int i = 0; i < record.getColumnNumber(); i++) { + column = record.getColumn(i); + columnName = columnList.get(i).getName(); + // 如果组合id不等于null,需要把相关的字段全部忽略 + if (combinedIdColumn != null) { + if (combinedIdColumn.getCombineFields().contains(columnName)) { + continue; + } + } + //如果是json数组,当成对象类型处理 + ElasticSearchFieldType columnType = columnList.get(i).isJsonArray() ? ElasticSearchFieldType.NESTED : typeList.get(i); + + Boolean dstArray = columnList.get(i).isDstArray(); + + //如果是数组类型,那它传入的是字符串类型,也有可能是null + if (columnList.get(i).isArray() && null != column.asString()) { + String[] dataList = column.asString().split(splitter); + if (!columnType.equals(ElasticSearchFieldType.DATE)) { + if (dstArray) { + try { + // 根据客户配置的类型,转换成相应的类型 + switch (columnType) { + case BYTE: + case KEYWORD: + case TEXT: + data.put(columnName, dataList); + break; + case SHORT: + case INTEGER: + if (StringUtils.isBlank(column.asString().trim())) { + data.put(columnName, null); + } else { + Integer[] intDataList = new Integer[dataList.length]; + for (int j = 0; j < dataList.length; j++) { + dataList[j] = dataList[j].trim(); + if (StringUtils.isNotBlank(dataList[j])) { + intDataList[j] = Integer.valueOf(dataList[j]); + } + } + data.put(columnName, intDataList); + } + break; + case LONG: + if (StringUtils.isBlank(column.asString().trim())) { + data.put(columnName, null); + } else { + Long[] longDataList = new Long[dataList.length]; + for (int j = 0; j < dataList.length; j++) { + dataList[j] = dataList[j].trim(); + if (StringUtils.isNotBlank(dataList[j])) { + longDataList[j] = Long.valueOf(dataList[j]); + } + } + data.put(columnName, longDataList); + } + break; + case FLOAT: + case DOUBLE: + if (StringUtils.isBlank(column.asString().trim())) { + data.put(columnName, null); + } else { + Double[] doubleDataList = new Double[dataList.length]; + for (int j = 0; j < dataList.length; j++) { + dataList[j] = dataList[j].trim(); + if (StringUtils.isNotBlank(dataList[j])) { + doubleDataList[j] = Double.valueOf(dataList[j]); + } + } + data.put(columnName, doubleDataList); + } + break; + default: + data.put(columnName, dataList); + break; + } + } catch (Exception e) { + LOGGER.info("脏数据,记录:{}", record.toString()); + continue; + } + } else { + data.put(columnName, dataList); + } + } else { + data.put(columnName, dataList); + } + } else { + // LOGGER.info("columnType: {} integer: {}", columnType, column.asString()); + switch (columnType) { + case ID: + if (id != null) { + id += record.getColumn(i).asString(); + } else { + id = record.getColumn(i).asString(); + } + break; + case PARENT: + if (parent != null) { + parent += record.getColumn(i).asString(); + } else { + parent = record.getColumn(i).asString(); + } + break; + case ROUTING: + if (routing != null) { + routing += record.getColumn(i).asString(); + } else { + routing = record.getColumn(i).asString(); + } + break; + + case VERSION: + if (version != null) { + version += record.getColumn(i).asString(); + } else { + version = record.getColumn(i).asString(); + } + break; + case DATE: + String dateStr = getDateStr(columnList.get(i), column); + data.put(columnName, dateStr); + break; + case KEYWORD: + case STRING: + case TEXT: + case IP: + case GEO_POINT: + case IP_RANGE: + data.put(columnName, column.asString()); + break; + case BOOLEAN: + data.put(columnName, column.asBoolean()); + break; + case BYTE: + case BINARY: + // json序列化不支持byte类型,es支持的binary类型,必须传入base64的格式 + data.put(columnName, column.asString()); + break; + case LONG: + data.put(columnName, column.asLong()); + break; + case INTEGER: + data.put(columnName, column.asLong()); + break; + case SHORT: + data.put(columnName, column.asLong()); + break; + case FLOAT: + case DOUBLE: + data.put(columnName, column.asDouble()); + break; + case GEO_SHAPE: + case DATE_RANGE: + case INTEGER_RANGE: + case FLOAT_RANGE: + case LONG_RANGE: + case DOUBLE_RANGE: + if (null == column.asString()) { + data.put(columnName, column.asString()); + } else { + data.put(columnName, JSON.parse(column.asString())); + } + break; + case NESTED: + case OBJECT: + if (null == column.asString()) { + data.put(columnName, column.asString()); + } else { + // 转json格式 + data.put(columnName, JSON.parse(column.asString())); + } + break; + default: + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, String.format( + "Type error: unsupported type %s for column %s", columnType, columnName)); + } + } + } + + + if (this.hasPrimaryKeyInfo) { + List idData = new ArrayList(); + for (String eachCol : this.primaryKeyInfo.getColumn()) { + Column recordColumn = record.getColumn(this.colNameToIndexMap.get(eachCol)); + idData.add(recordColumn.asString()); + } + id = StringUtils.join(idData, this.primaryKeyInfo.getFieldDelimiter()); + } + if (this.hasEsPartitionColumn) { + List idData = new ArrayList(); + for (PartitionColumn eachCol : this.esPartitionColumn) { + Column recordColumn = record.getColumn(this.colNameToIndexMap.get(eachCol.getName())); + idData.add(recordColumn.asString()); + } + routing = StringUtils.join(idData, ""); + } + } catch (Exception e) { + // 脏数据 + super.getTaskPluginCollector().collectDirtyRecord(record, + String.format("parse error for column: %s errorMessage: %s", columnName, e.getMessage())); + dirtyDataNumber++; + // 处理下一个record + continue; + } + + if (LOGGER.isDebugEnabled()) { + LOGGER.debug("id: {} routing: {} data: {}", id, routing, JSON.toJSONString(data)); + } + + + if (isDeleteRecord(record)) { + Delete.Builder builder = new Delete.Builder(id); + bulkaction.addAction(builder.build()); + } else { + // 使用用户自定义组合唯一键 + if (combinedIdColumn != null) { + try { + id = processIDCombineFields(record, combinedIdColumn); + // LOGGER.debug("id: {}", id); + } catch (Exception e) { + // 脏数据 + super.getTaskPluginCollector().collectDirtyRecord(record, + String.format("parse error for column: %s errorMessage: %s", columnName, e.getMessage())); + // 处理下一个record + dirtyDataNumber++; + continue; + } + } + switch (actionType) { + case INDEX: + // 先进行json序列化,jest client的gson序列化会把等号按照html序列化 + Index.Builder builder = null; + if (this.enableWriteNull) { + builder = new Index.Builder( + JSONObject.toJSONString(data, JSONWriter.Feature.WriteMapNullValue, + JSONWriter.Feature.WriteEnumUsingToString)); + } else { + builder = new Index.Builder(JSONObject.toJSONString(data)); + } + if (id != null) { + builder.id(id); + } + if (parent != null) { + builder.setParameter(Parameters.PARENT, parent); + } + if (routing != null) { + builder.setParameter(Parameters.ROUTING, routing); + } + if (version != null) { + builder.setParameter(Parameters.VERSION, version); + builder.setParameter(Parameters.VERSION_TYPE, "external"); + } + bulkaction.addAction(builder.build()); + break; + case UPDATE: + // doc: https://www.cnblogs.com/crystaltu/articles/6992935.html + // doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html + Map updateDoc = new HashMap(); + updateDoc.put("doc", data); + updateDoc.put("doc_as_upsert", true); + Update.Builder update = null; + if (this.enableWriteNull) { + // write: {a:"1",b:null} + update = new Update.Builder( + JSONObject.toJSONString(updateDoc, JSONWriter.Feature.WriteMapNullValue, + JSONWriter.Feature.WriteEnumUsingToString)); + // 在DEFAULT_GENERATE_FEATURE基础上,只增加了SerializerFeature.WRITE_MAP_NULL_FEATURES + } else { + // write: {"a":"1"} + update = new Update.Builder(JSONObject.toJSONString(updateDoc)); + } + if (id != null) { + update.id(id); + } + if (parent != null) { + update.setParameter(Parameters.PARENT, parent); + } + if (routing != null) { + update.setParameter(Parameters.ROUTING, routing); + } + // version type [EXTERNAL] is not supported by the update API + if (version != null) { + update.setParameter(Parameters.VERSION, version); + } + bulkaction.addAction(update.build()); + break; + default: + break; + } + } + } + + if (dirtyDataNumber >= totalNumber) { + // all batch is dirty data + LOGGER.warn("all this batch is dirty data, dirtyDataNumber: {} totalDataNumber: {}", dirtyDataNumber, + totalNumber); + return; + } + + BulkResult bulkResult = null; + try { + bulkResult = RetryUtil.executeWithRetry(new Callable() { + @Override + public BulkResult call() throws Exception { + JestResult jestResult = esClient.bulkInsert(bulkaction); + if (jestResult.isSucceeded()) { + return null; + } + String msg = String.format("response code: [%d] error :[%s]", jestResult.getResponseCode(), + jestResult.getErrorMessage()); + LOGGER.warn(msg); + if (esClient.isBulkResult(jestResult)) { + BulkResult brst = (BulkResult) jestResult; + List failedItems = brst.getFailedItems(); + for (BulkResult.BulkResultItem item : failedItems) { + if (item.status != 400) { + // 400 BAD_REQUEST 如果非数据异常,请求异常,则不允许忽略 + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, + String.format("status:[%d], error: %s", item.status, item.error)); + } else { + // 如果用户选择不忽略解析错误,则抛异常,默认为忽略 + if (!Key.isIgnoreParseError(conf)) { + throw new NoReRunException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, + String.format( + "status:[%d], error: %s, config not ignoreParseError so throw this error", + item.status, item.error)); + } + } + } + return brst; + } else { + Integer status = esClient.getStatus(jestResult); + switch (status) { + case 429: // TOO_MANY_REQUESTS + LOGGER.warn("server response too many requests, so auto reduce speed"); + break; + default: + break; + } + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, + jestResult.getErrorMessage()); + } + } + }, this.trySize, this.tryInterval, false, Arrays.asList(DataXException.class)); + } catch (Exception e) { + if (Key.isIgnoreWriteError(this.conf)) { + LOGGER.warn(String.format("Retry [%d] write failed, ignore the error, continue to write!", trySize)); + } else { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, e.getMessage(), e); + } + } + + if (null != bulkResult) { + List items = bulkResult.getItems(); + for (int idx = 0; idx < items.size(); ++idx) { + BulkResult.BulkResultItem item = items.get(idx); + if (item.error != null && !"".equals(item.error)) { + super.getTaskPluginCollector().collectDirtyRecord(writerBuffer.get(idx), + String.format("status:[%d], error: %s", item.status, item.error)); + } + } + } + } + + private int getRecordColumnIndex(Record record, String columnName) { + if (colNameToIndexMap.containsKey(columnName)) { + return colNameToIndexMap.get(columnName); + } + + List columns = new ArrayList(); + int index = -1; + for (int i=0; i 1) { + throw DataXException.asDataXException( + ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, + "record has multiple columns found by name: " + columnName); + } + + colNameToIndexMap.put(columnName, index); + return index; + } + + private String processIDCombineFields(Record record, ElasticSearchColumn esColumn) { + List values = new ArrayList(esColumn.getCombineFields().size()); + for (String field : esColumn.getCombineFields()) { + int colIndex = getRecordColumnIndex(record, field); + Column col = record.getColumnNumber() <= colIndex ? null : record.getColumn(colIndex); + if (col == null) { + throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, field); + } + values.add(col.asString()); + } + return Joiner.on(esColumn.getCombineFieldsValueSeparator()).join(values); + } + + @Override + public void post() { + } + + @Override + public void destroy() { + try { + this.esClient.closeJestClient(); + } catch (Exception e) { + LOGGER.warn("ignore close jest client error: {}", e.getMessage()); + } + } + + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriterErrorCode.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriterErrorCode.java new file mode 100644 index 00000000..c9b02532 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriterErrorCode.java @@ -0,0 +1,41 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum ElasticSearchWriterErrorCode implements ErrorCode { + BAD_CONFIG_VALUE("ESWriter-00", "The value you configured is not valid."), + ES_INDEX_DELETE("ESWriter-01", "Delete index error."), + ES_INDEX_CREATE("ESWriter-02", "Index creation error."), + ES_MAPPINGS("ESWriter-03", "The mappings error."), + ES_INDEX_INSERT("ESWriter-04", "Insert data error."), + ES_ALIAS_MODIFY("ESWriter-05", "Alias modification error."), + JSON_PARSE("ESWrite-06", "Json format parsing error"), + UPDATE_WITH_ID("ESWrite-07", "Update mode must specify column type with id"), + RECORD_FIELD_NOT_FOUND("ESWrite-08", "Field does not exist in the original table"), + ES_GET_SETTINGS("ESWriter-09", "get settings failed"); + ; + + private final String code; + private final String description; + + ElasticSearchWriterErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s]. ", this.code, + this.description); + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonPathUtil.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonPathUtil.java new file mode 100644 index 00000000..e7619e7c --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonPathUtil.java @@ -0,0 +1,28 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import java.util.List; + +import com.alibaba.fastjson2.JSONObject; + +public class JsonPathUtil { + + public static JSONObject getJsonObject(List paths, JSONObject data) { + if (null == paths || paths.isEmpty()) { + return data; + } + + if (null == data) { + return null; + } + + JSONObject dataTmp = data; + for (String each : paths) { + if (null != dataTmp) { + dataTmp = dataTmp.getJSONObject(each); + } else { + return null; + } + } + return dataTmp; + } +} \ No newline at end of file diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonUtil.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonUtil.java new file mode 100644 index 00000000..ad6c01be --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonUtil.java @@ -0,0 +1,54 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONException; +import com.alibaba.fastjson2.JSONObject; + +/** + * @author bozu + * @date 2021/01/06 + */ +public class JsonUtil { + + /** + * 合并两个json + * @param source 源json + * @param target 目标json + * @return 合并后的json + * @throws JSONException + */ + public static String mergeJsonStr(String source, String target) throws JSONException { + if(source == null) { + return target; + } + if(target == null) { + return source; + } + return JSON.toJSONString(deepMerge(JSON.parseObject(source), JSON.parseObject(target))); + } + + /** + * 深度合并两个json对象,将source的值,merge到target中 + * @param source 源json + * @param target 目标json + * @return 合并后的json + * @throws JSONException + */ + private static JSONObject deepMerge(JSONObject source, JSONObject target) throws JSONException { + for (String key: source.keySet()) { + Object value = source.get(key); + if (target.containsKey(key)) { + // existing value for "key" - recursively deep merge: + if (value instanceof JSONObject) { + JSONObject valueJson = (JSONObject)value; + deepMerge(valueJson, target.getJSONObject(key)); + } else { + target.put(key, value); + } + } else { + target.put(key, value); + } + } + return target; + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/Key.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/Key.java index 0f2d3f5c..fcaac935 100644 --- a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/Key.java +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/Key.java @@ -1,9 +1,13 @@ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.util.Configuration; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; + import org.apache.commons.lang3.StringUtils; import java.util.HashMap; +import java.util.List; import java.util.Map; public final class Key { @@ -37,31 +41,35 @@ public final class Key { public static String getEndpoint(Configuration conf) { - return conf.getNecessaryValue("endpoint", ESWriterErrorCode.BAD_CONFIG_VALUE); + return conf.getNecessaryValue("endpoint", ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE); } - public static String getAccessID(Configuration conf) { - return conf.getString("accessId", ""); + public static String getUsername(Configuration conf) { + return conf.getString("username", conf.getString("accessId")); } - public static String getAccessKey(Configuration conf) { - return conf.getString("accessKey", ""); + public static String getPassword(Configuration conf) { + return conf.getString("password", conf.getString("accessKey")); } public static int getBatchSize(Configuration conf) { - return conf.getInt("batchSize", 1000); + return conf.getInt("batchSize", 1024); } public static int getTrySize(Configuration conf) { return conf.getInt("trySize", 30); } + public static long getTryInterval(Configuration conf) { + return conf.getLong("tryInterval", 60000L); + } + public static int getTimeout(Configuration conf) { return conf.getInt("timeout", 600000); } - public static boolean isCleanup(Configuration conf) { - return conf.getBool("cleanup", false); + public static boolean isTruncate(Configuration conf) { + return conf.getBool("truncate", conf.getBool("cleanup", false)); } public static boolean isDiscovery(Configuration conf) { @@ -69,7 +77,7 @@ public final class Key { } public static boolean isCompression(Configuration conf) { - return conf.getBool("compression", true); + return conf.getBool("compress", conf.getBool("compression", true)); } public static boolean isMultiThread(Configuration conf) { @@ -77,9 +85,17 @@ public final class Key { } public static String getIndexName(Configuration conf) { - return conf.getNecessaryValue("index", ESWriterErrorCode.BAD_CONFIG_VALUE); + return conf.getNecessaryValue("index", ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE); } + public static String getDeleteBy(Configuration conf) { + return conf.getString("deleteBy"); + } + + + /** + * TODO: 在7.0开始,一个索引只能建一个Type为_doc + * */ public static String getTypeName(Configuration conf) { String indexType = conf.getString("indexType"); if(StringUtils.isBlank(indexType)){ @@ -128,4 +144,58 @@ public final class Key { public static boolean getDynamic(Configuration conf) { return conf.getBool("dynamic", false); } + + public static String getDstDynamic(Configuration conf) { + return conf.getString("dstDynamic"); + } + + public static String getDiscoveryFilter(Configuration conf){ + return conf.getString("discoveryFilter","_all"); + } + + public static Boolean getVersioning(Configuration conf) { + return conf.getBool("versioning", false); + } + + public static Long getUnifiedVersion(Configuration conf) { + return conf.getLong("version", System.currentTimeMillis()); + } + + public static Map getUrlParams(Configuration conf) { + return conf.getMap("urlParams", new HashMap()); + } + + public static Integer getESVersion(Configuration conf) { + return conf.getInt("esVersion"); + } + + public static String getMasterTimeout(Configuration conf) { + return conf.getString("masterTimeout", "5m"); + } + + public static boolean isEnableNullUpdate(Configuration conf) { + return conf.getBool("enableWriteNull", true); + } + + public static String getFieldDelimiter(Configuration conf) { + return conf.getString("fieldDelimiter", ""); + } + + public static PrimaryKeyInfo getPrimaryKeyInfo(Configuration conf) { + String primaryKeyInfoString = conf.getString("primaryKeyInfo"); + if (StringUtils.isNotBlank(primaryKeyInfoString)) { + return JSON.parseObject(primaryKeyInfoString, new TypeReference() {}); + } else { + return null; + } + } + + public static List getEsPartitionColumn(Configuration conf) { + String esPartitionColumnString = conf.getString("esPartitionColumn"); + if (StringUtils.isNotBlank(esPartitionColumnString)) { + return JSON.parseObject(esPartitionColumnString, new TypeReference>() {}); + } else { + return null; + } + } } diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/NoReRunException.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/NoReRunException.java new file mode 100644 index 00000000..52064e58 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/NoReRunException.java @@ -0,0 +1,16 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.spi.ErrorCode; + +public class NoReRunException extends DataXException { + public NoReRunException(String errorMessage) { + super(errorMessage); + } + + public NoReRunException(ErrorCode errorCode, String errorMessage) { + super(errorCode, errorMessage); + } + + private static final long serialVersionUID = 1L; +} \ No newline at end of file diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PartitionColumn.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PartitionColumn.java new file mode 100644 index 00000000..b99829b2 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PartitionColumn.java @@ -0,0 +1,42 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +public class PartitionColumn { + private String name; + // like: DATA + private String metaType; + private String comment; + // like: VARCHAR + private String type; + + public String getName() { + return name; + } + + public String getMetaType() { + return metaType; + } + + public String getComment() { + return comment; + } + + public String getType() { + return type; + } + + public void setName(String name) { + this.name = name; + } + + public void setMetaType(String metaType) { + this.metaType = metaType; + } + + public void setComment(String comment) { + this.comment = comment; + } + + public void setType(String type) { + this.type = type; + } +} \ No newline at end of file diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PrimaryKeyInfo.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PrimaryKeyInfo.java new file mode 100644 index 00000000..b5821f51 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PrimaryKeyInfo.java @@ -0,0 +1,47 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter; + +import java.util.List; + +public class PrimaryKeyInfo { + + /** + * 主键类型:PrimaryKeyTypeEnum + * + * pk: 单个(业务)主键 specific: 联合主键 + */ + private String type; + + /** + * 用户定义的联合主键的连接符号 + */ + private String fieldDelimiter; + + /** + * 主键的列的名称 + */ + private List column; + + public String getType() { + return type; + } + + public String getFieldDelimiter() { + return fieldDelimiter; + } + + public List getColumn() { + return column; + } + + public void setType(String type) { + this.type = type; + } + + public void setFieldDelimiter(String fieldDelimiter) { + this.fieldDelimiter = fieldDelimiter; + } + + public void setColumn(List column) { + this.column = column; + } +} \ No newline at end of file diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfo.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfo.java new file mode 100644 index 00000000..173bc9e2 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfo.java @@ -0,0 +1,35 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; + +import com.google.gson.Gson; +import io.searchbox.action.AbstractAction; +import io.searchbox.client.config.ElasticsearchVersion; + +public class ClusterInfo extends AbstractAction { + @Override + protected String buildURI(ElasticsearchVersion elasticsearchVersion) { + return ""; + } + + @Override + public String getRestMethodName() { + return "GET"; + } + + @Override + public ClusterInfoResult createNewElasticSearchResult(String responseBody, int statusCode, String reasonPhrase, Gson gson) { + return createNewElasticSearchResult(new ClusterInfoResult(gson), responseBody, statusCode, reasonPhrase, gson); + } + + public static class Builder extends AbstractAction.Builder { + + public Builder() { + setHeader("accept", "application/json"); + setHeader("content-type", "application/json"); + } + + @Override + public ClusterInfo build() { + return new ClusterInfo(); + } + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfoResult.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfoResult.java new file mode 100644 index 00000000..b4f49a37 --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfoResult.java @@ -0,0 +1,49 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; + +import com.google.gson.Gson; +import io.searchbox.client.JestResult; + +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class ClusterInfoResult extends JestResult { + + private static final Pattern FIRST_NUMBER = Pattern.compile("\\d"); + + private static final int SEVEN = 7; + + public ClusterInfoResult(Gson gson) { + super(gson); + } + + public ClusterInfoResult(JestResult source) { + super(source); + } + + /** + * 判断es集群的部署版本是否大于7.x + * 大于7.x的es对于Index的type有较大改动,需要做额外判定 + * 对于7.x与6.x版本的es都做过测试,返回符合预期;5.x以下版本直接try-catch后返回false,向下兼容 + * @return + */ + public Boolean isGreaterOrEqualThan7() throws Exception { + // 如果是没有权限,直接返回false,兼容老版本 + if (responseCode == 403) { + return false; + } + if (!isSucceeded) { + throw new Exception(getJsonString()); + } + try { + String version = jsonObject.getAsJsonObject("version").get("number").toString(); + Matcher matcher = FIRST_NUMBER.matcher(version); + matcher.find(); + String number = matcher.group(); + Integer versionNum = Integer.valueOf(number); + return versionNum >= SEVEN; + } catch (Exception e) { + //5.x 以下版本不做兼容测试,如果返回json格式解析失败,有可能是以下版本,所以认为不大于7.x + return false; + } + } +} diff --git a/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/PutMapping7.java b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/PutMapping7.java new file mode 100644 index 00000000..c9f1d6be --- /dev/null +++ b/elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/PutMapping7.java @@ -0,0 +1,39 @@ +package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; + +import io.searchbox.action.GenericResultAbstractAction; +import io.searchbox.client.config.ElasticsearchVersion; + +public class PutMapping7 extends GenericResultAbstractAction { + protected PutMapping7(PutMapping7.Builder builder) { + super(builder); + + this.indexName = builder.index; + this.payload = builder.source; + } + + @Override + protected String buildURI(ElasticsearchVersion elasticsearchVersion) { + return super.buildURI(elasticsearchVersion) + "/_mapping"; + } + + @Override + public String getRestMethodName() { + return "PUT"; + } + + public static class Builder extends GenericResultAbstractAction.Builder { + private String index; + private Object source; + + public Builder(String index, Object source) { + this.index = index; + this.source = source; + } + + @Override + public PutMapping7 build() { + return new PutMapping7(this); + } + } + +} diff --git a/elasticsearchwriter/src/main/resources/plugin.json b/elasticsearchwriter/src/main/resources/plugin.json index b6e6384b..b39f1222 100644 --- a/elasticsearchwriter/src/main/resources/plugin.json +++ b/elasticsearchwriter/src/main/resources/plugin.json @@ -1,6 +1,6 @@ { "name": "elasticsearchwriter", - "class": "com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter", + "class": "com.alibaba.datax.plugin.writer.elasticsearchwriter.ElasticSearchWriter", "description": "适用于: 生产环境. 原理: TODO", "developer": "alibaba" } \ No newline at end of file diff --git a/ftpreader/pom.xml b/ftpreader/pom.xml index 7778d491..57bf889d 100755 --- a/ftpreader/pom.xml +++ b/ftpreader/pom.xml @@ -45,7 +45,7 @@ com.jcraft jsch - 0.1.51 + 0.1.54 commons-net @@ -89,4 +89,4 @@ - + diff --git a/ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/SftpHelper.java b/ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/SftpHelper.java index d25b040c..6e42e10c 100644 --- a/ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/SftpHelper.java +++ b/ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/SftpHelper.java @@ -64,6 +64,8 @@ public class SftpHelper extends FtpHelper { String message = String.format("请确认连接ftp服务器端口是否正确,错误的端口: [%s] ", port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); + }else{ + throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, "", e); } }else { if("Auth fail".equals(e.getMessage())){ diff --git a/ftpwriter/doc/ftpwriter.md b/ftpwriter/doc/ftpwriter.md index 6b1b2687..a38a1052 100644 --- a/ftpwriter/doc/ftpwriter.md +++ b/ftpwriter/doc/ftpwriter.md @@ -24,7 +24,7 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结 我们不能做到: -1. 单个文件不能支持并发写入。 +1. 单个文件并发写入。 ## 3 功能说明 diff --git a/ftpwriter/pom.xml b/ftpwriter/pom.xml index 69ec4a07..bf7ce83d 100644 --- a/ftpwriter/pom.xml +++ b/ftpwriter/pom.xml @@ -45,7 +45,7 @@ com.jcraft jsch - 0.1.51 + 0.1.54 commons-net diff --git a/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/SftpHelperImpl.java b/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/SftpHelperImpl.java index e6d78629..e748f12c 100644 --- a/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/SftpHelperImpl.java +++ b/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/SftpHelperImpl.java @@ -14,8 +14,8 @@ import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.serializer.SerializerFeature; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONWriter; import com.jcraft.jsch.ChannelSftp; import com.jcraft.jsch.JSch; import com.jcraft.jsch.JSchException; @@ -251,7 +251,7 @@ public class SftpHelperImpl implements IFtpHelper { @SuppressWarnings("rawtypes") Vector allFiles = this.channelSftp.ls(dir); LOG.debug(String.format("ls: %s", JSON.toJSONString(allFiles, - SerializerFeature.UseSingleQuotes))); + JSONWriter.Feature.UseSingleQuotes))); for (int i = 0; i < allFiles.size(); i++) { LsEntry le = (LsEntry) allFiles.get(i); String strName = le.getFilename(); diff --git a/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/StandardFtpHelperImpl.java b/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/StandardFtpHelperImpl.java index 8999b0a8..d5b9a746 100644 --- a/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/StandardFtpHelperImpl.java +++ b/ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/StandardFtpHelperImpl.java @@ -18,8 +18,8 @@ import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.serializer.SerializerFeature; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONWriter; public class StandardFtpHelperImpl implements IFtpHelper { private static final Logger LOG = LoggerFactory @@ -244,7 +244,7 @@ public class StandardFtpHelperImpl implements IFtpHelper { FTPFile[] fs = this.ftpClient.listFiles(dir); // LOG.debug(JSON.toJSONString(this.ftpClient.listNames(dir))); LOG.debug(String.format("ls: %s", - JSON.toJSONString(fs, SerializerFeature.UseSingleQuotes))); + JSON.toJSONString(fs, JSONWriter.Feature.UseSingleQuotes))); for (FTPFile ff : fs) { String strName = ff.getName(); if (strName.startsWith(prefixFileName)) { diff --git a/gaussdbreader/doc/gaussdbreader.md b/gaussdbreader/doc/gaussdbreader.md new file mode 100644 index 00000000..5caa4d59 --- /dev/null +++ b/gaussdbreader/doc/gaussdbreader.md @@ -0,0 +1,297 @@ + +# GaussDbReader 插件文档 + + +___ + + +## 1 快速介绍 + +GaussDbReader插件实现了从GaussDB读取数据。在底层实现上,GaussDbReader通过JDBC连接远程GaussDB数据库,并执行相应的sql语句将数据从GaussDB库中SELECT出来。 + +## 2 实现原理 + +简而言之,GaussDbReader通过JDBC连接器连接到远程的GaussDB数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程GaussDB数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 + +对于用户配置Table、Column、Where的信息,GaussDbReader将其拼接为SQL语句发送到GaussDB数据库;对于用户配置querySql信息,GaussDbReader直接将其发送到GaussDB数据库。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 配置一个从GaussDB数据库同步抽取数据到本地的作业: + +``` +{ + "job": { + "setting": { + "speed": { + //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. + "byte": 1048576 + }, + //出错限制 + "errorLimit": { + //出错的record条数上限,当大于该值即报错。 + "record": 0, + //出错的record百分比上限 1.0表示100%,0.02表示2% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "gaussdbreader", + "parameter": { + // 数据库连接用户名 + "username": "xx", + // 数据库连接密码 + "password": "xx", + "column": [ + "id","name" + ], + //切分主键 + "splitPk": "id", + "connection": [ + { + "table": [ + "table" + ], + "jdbcUrl": [ + "jdbc:opengauss://host:port/database" + ] + } + ] + } + }, + "writer": { + //writer类型 + "name": "streamwriter", + //是否打印内容 + "parameter": { + "print":true, + } + } + } + ] + } +} + +``` + +* 配置一个自定义SQL的数据库同步任务到本地内容的作业: + +```json +{ + "job": { + "setting": { + "speed": 1048576 + }, + "content": [ + { + "reader": { + "name": "gaussdbreader", + "parameter": { + "username": "xx", + "password": "xx", + "where": "", + "connection": [ + { + "querySql": [ + "select db_id,on_line_flag from db_info where db_id < 10;" + ], + "jdbcUrl": [ + "jdbc:opengauss://host:port/database", "jdbc:opengauss://host:port/database" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": false, + "encoding": "UTF-8" + } + } + } + ] + } +} +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,GaussDbReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,GaussDbReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 + + jdbcUrl按照GaussDB官方规范,并可以填写连接附件控制信息。具体请参看[GaussDB官方文档](https://docs.opengauss.org/zh/docs/3.1.0/docs/Developerguide/java-sql-Connection.html)。 + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:数据源的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:数据源指定用户名的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,GaussDbReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
+ + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 + + 支持列裁剪,即列可以挑选部分列进行导出。 + + 支持列换序,即列可以不按照表schema信息进行导出。 + + 支持常量配置,用户需要按照GaussDB语法格式: + ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] + id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。 + + **column必须用户显示指定同步的列集合,不允许为空!** + + * 必选:是
+ + * 默认值:无
+ +* **splitPk** + + * 描述:GaussDbReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提高数据同步的效能。 + + 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 + + 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,GaussDbReader将报错! + + splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 + + * 必选:否
+ + * 默认值:空
+ +* **where** + + * 描述:筛选条件,GaussDbReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
+ + where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。 + + * 必选:否
+ + * 默认值:无
+ +* **querySql** + + * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
+ + `当用户配置querySql时,GaussDbReader直接忽略table、column、where条件的配置`。 + + * 必选:否
+ + * 默认值:无
+ +* **fetchSize** + + * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
+ + `注意,该值过大(>2048)可能造成DataX进程OOM。`。 + + * 必选:否
+ + * 默认值:1024
+ + +### 3.3 类型转换 + +目前GaussDbReader支持大部分GaussDB类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出GaussDbReader针对GaussDB类型转换列表: + + +| DataX 内部类型| GaussDB 数据类型 | +| -------- | ----- | +| Long |bigint, bigserial, integer, smallint, serial | +| Double |double precision, money, numeric, real | +| String |varchar, char, text, bit, inet| +| Date |date, time, timestamp | +| Boolean |bool| +| Bytes |bytea| + +请注意: + +* `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。 + +## 4 性能报告 + +### 4.1 环境准备 + +#### 4.1.1 数据特征 +建表语句: + +create table pref_test( + id serial, + a_bigint bigint, + a_bit bit(10), + a_boolean boolean, + a_char character(5), + a_date date, + a_double double precision, + a_integer integer, + a_money money, + a_num numeric(10,2), + a_real real, + a_smallint smallint, + a_text text, + a_time time, + a_timestamp timestamp +) + +#### 4.1.2 机器参数 + +* 执行DataX的机器参数为: + 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz + 2. mem: MemTotal: 24676836kB MemFree: 6365080kB + 3. net: 百兆双网卡 + +* GaussDB数据库机器参数为: + D12 24逻辑核 192G内存 12*480G SSD 阵列 + + +### 4.2 测试报告 + +#### 4.2.1 单表测试报告 + + +| 通道数 | 是否按照主键切分 | DataX速度(Rec/s) | DataX流量(MB/s) | DataX机器运行负载 | +|--------|--------| --------|--------|--------| +|1| 否 | 10211 | 0.63 | 0.2 | +|1| 是 | 10211 | 0.63 | 0.2 | +|4| 否 | 10211 | 0.63 | 0.2 | +|4| 是 | 40000 | 2.48 | 0.5 | +|8| 否 | 10211 | 0.63 | 0.2 | +|8| 是 | 78048 | 4.84 | 0.8 | + + +说明: + +1. 这里的单表,主键类型为 serial,数据分布均匀。 +2. 对单表如果没有按照主键切分,那么配置通道个数不会提升速度,效果与1个通道一样。 diff --git a/gaussdbreader/pom.xml b/gaussdbreader/pom.xml new file mode 100644 index 00000000..ad2e0ba0 --- /dev/null +++ b/gaussdbreader/pom.xml @@ -0,0 +1,86 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + gaussdbreader + gaussdbreader + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + org.opengauss + opengauss-jdbc + 3.0.0 + + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + \ No newline at end of file diff --git a/gaussdbreader/src/main/assembly/package.xml b/gaussdbreader/src/main/assembly/package.xml new file mode 100755 index 00000000..65601e45 --- /dev/null +++ b/gaussdbreader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/gaussdbreader + + + target/ + + gaussdbreader-0.0.1-SNAPSHOT.jar + + plugin/reader/gaussdbreader + + + + + + false + plugin/reader/gaussdbreader/libs + runtime + + + diff --git a/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/Constant.java b/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/Constant.java new file mode 100644 index 00000000..33cdd309 --- /dev/null +++ b/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/Constant.java @@ -0,0 +1,7 @@ +package com.alibaba.datax.plugin.reader.gaussdbreader; + +public class Constant { + + public static final int DEFAULT_FETCH_SIZE = 1000; + +} diff --git a/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/GaussDbReader.java b/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/GaussDbReader.java new file mode 100644 index 00000000..ca158ab7 --- /dev/null +++ b/gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/GaussDbReader.java @@ -0,0 +1,86 @@ +package com.alibaba.datax.plugin.reader.gaussdbreader; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; + +import java.util.List; + +public class GaussDbReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.GaussDB; + + public static class Job extends Reader.Job { + + private Configuration originalConfig; + private CommonRdbmsReader.Job commonRdbmsReaderMaster; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, + Constant.DEFAULT_FETCH_SIZE); + if (fetchSize < 1) { + throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, + String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); + } + this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); + + this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); + this.commonRdbmsReaderMaster.init(this.originalConfig); + } + + @Override + public List split(int adviceNumber) { + return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); + } + + @Override + public void post() { + this.commonRdbmsReaderMaster.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderMaster.destroy(this.originalConfig); + } + + } + + public static class Task extends Reader.Task { + + private Configuration readerSliceConfig; + private CommonRdbmsReader.Task commonRdbmsReaderSlave; + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderSlave.init(this.readerSliceConfig); + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); + + this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, + super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderSlave.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); + } + + } + +} diff --git a/gaussdbreader/src/main/resources/plugin.json b/gaussdbreader/src/main/resources/plugin.json new file mode 100755 index 00000000..7d4ac8de --- /dev/null +++ b/gaussdbreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "gaussdbreader", + "class": "com.alibaba.datax.plugin.reader.gaussdbreader.GaussDbReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/gaussdbreader/src/main/resources/plugin_job_template.json b/gaussdbreader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..e39220eb --- /dev/null +++ b/gaussdbreader/src/main/resources/plugin_job_template.json @@ -0,0 +1,13 @@ +{ + "name": "gaussdbreader", + "parameter": { + "username": "", + "password": "", + "connection": [ + { + "table": [], + "jdbcUrl": [] + } + ] + } +} \ No newline at end of file diff --git a/gaussdbwriter/doc/gaussdbwriter.md b/gaussdbwriter/doc/gaussdbwriter.md new file mode 100644 index 00000000..e65b74d3 --- /dev/null +++ b/gaussdbwriter/doc/gaussdbwriter.md @@ -0,0 +1,267 @@ +# DataX GaussDbWriter + + +--- + + +## 1 快速介绍 + +GaussDbWriter插件实现了写入数据到 GaussDB主库目的表的功能。在底层实现上,GaussDbWriter通过JDBC连接远程 GaussDB 数据库,并执行相应的 insert into ... sql 语句将数据写入 GaussDB,内部会分批次提交入库。 + +GaussDbWriter面向ETL开发工程师,他们使用GaussDbWriter从数仓导入数据到GaussDB。同时 GaussDbWriter亦可以作为数据迁移工具为DBA等用户提供服务。 + + +## 2 实现原理 + +GaussDbWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 + + +* `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行) + +
+ + 注意: + 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 + 2. GaussDbWriter和MysqlWriter不同,不支持配置writeMode参数。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 这里使用一份从内存产生到 GaussDbWriter导入的数据。 + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + } + }, + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column" : [ + { + "value": "DataX", + "type": "string" + }, + { + "value": 19880808, + "type": "long" + }, + { + "value": "1988-08-08 08:08:08", + "type": "date" + }, + { + "value": true, + "type": "bool" + }, + { + "value": "test", + "type": "bytes" + } + ], + "sliceRecordCount": 1000 + } + }, + "writer": { + "name": "gaussdbwriter", + "parameter": { + "username": "xx", + "password": "xx", + "column": [ + "id", + "name" + ], + "preSql": [ + "delete from test" + ], + "connection": [ + { + "jdbcUrl": "jdbc:opengauss://127.0.0.1:3002/datax", + "table": [ + "test" + ] + } + ] + } + } + } + ] + } +} + +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 + + 注意:1、在一个数据库上只能配置一个值。 + 2、jdbcUrl按照GaussDB官方规范,并可以填写连接附加参数信息。具体请参看GaussDB官方文档或者咨询对应 DBA。 + + +* 必选:是
+ +* 默认值:无
+ +* **username** + + * 描述:目的数据库的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:目的数据库的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 + + 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 + + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] + + 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 + 2、此处 column 不能配置任何常量值 + + * 必选:是
+ + * 默认值:否
+ +* **preSql** + + * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
+ + * 必选:否
+ + * 默认值:无
+ +* **postSql** + + * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
+ + * 必选:否
+ + * 默认值:无
+ +* **batchSize** + + * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与GaussDB的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
+ + * 必选:否
+ + * 默认值:1024
+ +### 3.3 类型转换 + +目前 GaussDbWriter支持大部分 GaussDB类型,但也存在部分没有支持的情况,请注意检查你的类型。 + +下面列出 GaussDbWriter针对 GaussDB类型转换列表: + +| DataX 内部类型| GaussDB 数据类型 | +| -------- | ----- | +| Long |bigint, bigserial, integer, smallint, serial | +| Double |double precision, money, numeric, real | +| String |varchar, char, text, bit| +| Date |date, time, timestamp | +| Boolean |bool| +| Bytes |bytea| + +## 4 性能报告 + +### 4.1 环境准备 + +#### 4.1.1 数据特征 +建表语句: + +create table pref_test( +id serial, +a_bigint bigint, +a_bit bit(10), +a_boolean boolean, +a_char character(5), +a_date date, +a_double double precision, +a_integer integer, +a_money money, +a_num numeric(10,2), +a_real real, +a_smallint smallint, +a_text text, +a_time time, +a_timestamp timestamp +) + +#### 4.1.2 机器参数 + +* 执行DataX的机器参数为: + 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz + 2. mem: MemTotal: 24676836kB MemFree: 6365080kB + 3. net: 百兆双网卡 + +* GaussDB数据库机器参数为: + D12 24逻辑核 192G内存 12*480G SSD 阵列 + + +### 4.2 测试报告 + +#### 4.2.1 单表测试报告 + +| 通道数| 批量提交batchSize | DataX速度(Rec/s)| DataX流量(M/s) | DataX机器运行负载 +|--------|--------| --------|--------|--------|--------| +|1| 128 | 9259 | 0.55 | 0.3 +|1| 512 | 10869 | 0.653 | 0.3 +|1| 2048 | 9803 | 0.589 | 0.8 +|4| 128 | 30303 | 1.82 | 1 +|4| 512 | 36363 | 2.18 | 1 +|4| 2048 | 36363 | 2.18 | 1 +|8| 128 | 57142 | 3.43 | 2 +|8| 512 | 66666 | 4.01 | 1.5 +|8| 2048 | 66666 | 4.01 | 1.1 +|16| 128 | 88888 | 5.34 | 1.8 +|16| 2048 | 94117 | 5.65 | 2.5 +|32| 512 | 76190 | 4.58 | 3 + +#### 4.2.2 性能测试小结 +1. `channel数对性能影响很大` +2. `通常不建议写入数据库时,通道个数 > 32` + + +## FAQ + +*** + +**Q: GaussDbWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** + +A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 + +*** + +**Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** + +A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。 +第二种,向临时表导入数据,完成后再 rename 到线上表。 + +*** diff --git a/gaussdbwriter/pom.xml b/gaussdbwriter/pom.xml new file mode 100644 index 00000000..9da02eff --- /dev/null +++ b/gaussdbwriter/pom.xml @@ -0,0 +1,86 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + gaussdbwriter + gaussdbwriter + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + org.opengauss + opengauss-jdbc + 3.0.0 + + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + \ No newline at end of file diff --git a/gaussdbwriter/src/main/assembly/package.xml b/gaussdbwriter/src/main/assembly/package.xml new file mode 100755 index 00000000..7167c89d --- /dev/null +++ b/gaussdbwriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/gaussdbwriter + + + target/ + + gaussdbwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/gaussdbwriter + + + + + + false + plugin/writer/gaussdbwriter/libs + runtime + + + diff --git a/gaussdbwriter/src/main/java/com/alibaba/datax/plugin/reader/gaussdbwriter/GaussDbWriter.java b/gaussdbwriter/src/main/java/com/alibaba/datax/plugin/reader/gaussdbwriter/GaussDbWriter.java new file mode 100644 index 00000000..3f758ee7 --- /dev/null +++ b/gaussdbwriter/src/main/java/com/alibaba/datax/plugin/reader/gaussdbwriter/GaussDbWriter.java @@ -0,0 +1,103 @@ +package com.alibaba.datax.plugin.reader.gaussdbwriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Key; + +import java.util.List; + +public class GaussDbWriter extends Writer { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.GaussDB; + + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterMaster; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + // warn:not like mysql, GaussDB only support insert mode, don't use + String writeMode = this.originalConfig.getString(Key.WRITE_MODE); + if (null != writeMode) { + throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, + String.format("写入模式(writeMode)配置有误. 因为GaussDB不支持配置参数项 writeMode: %s, GaussDB仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); + } + + this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonRdbmsWriterMaster.init(this.originalConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterMaster.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); + } + + @Override + public void post() { + this.commonRdbmsWriterMaster.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterMaster.destroy(this.originalConfig); + } + + } + + public static class Task extends Writer.Task { + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task commonRdbmsWriterSlave; + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ + @Override + public String calcValueHolder(String columnType){ + if("serial".equalsIgnoreCase(columnType)){ + return "?::int"; + }else if("bigserial".equalsIgnoreCase(columnType)){ + return "?::int8"; + }else if("bit".equalsIgnoreCase(columnType)){ + return "?::bit varying"; + } + return "?::" + columnType; + } + }; + this.commonRdbmsWriterSlave.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); + } + + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterSlave.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); + } + + } + +} diff --git a/gaussdbwriter/src/main/resources/plugin.json b/gaussdbwriter/src/main/resources/plugin.json new file mode 100755 index 00000000..2f52a167 --- /dev/null +++ b/gaussdbwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "gaussdbwriter", + "class": "com.alibaba.datax.plugin.writer.gaussdbwriter.GaussDbWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/gaussdbwriter/src/main/resources/plugin_job_template.json b/gaussdbwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..539fa46f --- /dev/null +++ b/gaussdbwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,16 @@ +{ + "name": "gaussdbwriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "connection": [ + { + "jdbcUrl": "", + "table": [] + } + ], + "preSql": [], + "postSql": [] + } +} \ No newline at end of file diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java index 73a94cf5..2c015879 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java @@ -19,8 +19,8 @@ import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import lombok.extern.slf4j.Slf4j; diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java index 178b5e7c..644f8898 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java @@ -12,8 +12,8 @@ import org.apache.commons.lang3.StringUtils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; /** * @author jerrywang diff --git a/hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xHelper.java b/hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xHelper.java index c3e2a212..b9f16b17 100644 --- a/hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xHelper.java +++ b/hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.Path; diff --git a/hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xHelper.java b/hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xHelper.java index f671d31d..00b128f3 100644 --- a/hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xHelper.java +++ b/hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.Path; diff --git a/hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xHelper.java b/hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xHelper.java index 643072a9..82ad7122 100644 --- a/hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xHelper.java +++ b/hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.HBaseConfiguration; diff --git a/hbase11xsqlreader/doc/hbase11xsqlreader.md b/hbase11xsqlreader/doc/hbase11xsqlreader.md index 03261a1f..9f70077f 100644 --- a/hbase11xsqlreader/doc/hbase11xsqlreader.md +++ b/hbase11xsqlreader/doc/hbase11xsqlreader.md @@ -60,12 +60,16 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 //填写连接Phoenix的hbase集群zk地址 "hbaseConfig": { "hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com" - }, + }, + //填写要读取的phoenix的命名空间 + "schema": "TAG", //填写要读取的phoenix的表名 "table": "US_POPULATION", //填写要读取的列名,不填读取所有列 "column": [ - ] + ], + //查询条件 + "where": "id=" } }, "writer": { @@ -92,11 +96,18 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 * 必选:是
+ * 默认值:无
+* **schema** + + * 描述:编写Phoenix中的namespace,该值设置为'' + + * 必选:是
+ * 默认值:无
* **table** - * 描述:编写Phoenix中的表名,如果有namespace,该值设置为'namespace.tablename' + * 描述:编写Phoenix中的表名,该值设置为'tablename' * 必选:是
@@ -109,7 +120,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 * 必选:是
* 默认值:无
+* **where** + + * 描述:填写需要从phoenix表中读取条件判断。 + * 可选:是
+ + * 默认值:无
### 3.3 类型转换 @@ -172,11 +189,14 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 "hbaseConfig": { "hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com" }, + "schema": "TAG", //填写要读取的phoenix的表名 "table": "US_POPULATION", //填写要读取的列名,不填读取所有列 "column": [ - ] + ], + //查询条件 + "where": "id=" } }, "writer": { @@ -204,7 +224,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 * 必选:是
* 默认值:无
- +* **schema** + + * 描述:编写Phoenix中的namespace,该值设置为'' + + * 必选:是
+ + * 默认值:无
* **table** * 描述:编写Phoenix中的表名,如果有namespace,该值设置为'namespace.tablename' @@ -220,7 +246,13 @@ hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实 * 必选:是
* 默认值:无
+ * **where** + * 描述:填写需要从phoenix表中读取条件判断。 + + * 可选:是
+ + * 默认值:无
### 3.3 类型转换 diff --git a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java index 5309d1d9..cf4304ee 100644 --- a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java +++ b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.util.Pair; import org.apache.hadoop.mapreduce.InputSplit; @@ -26,9 +26,7 @@ import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; -import java.util.ArrayList; -import java.util.List; -import java.util.Map; +import java.util.*; public class HbaseSQLHelper { @@ -50,11 +48,15 @@ public class HbaseSQLHelper { String zkUrl = readerConfig.getZkUrl(); PhoenixConfigurationUtil.setInputClass(conf, PhoenixRecordWritable.class); - PhoenixConfigurationUtil.setInputTableName(conf, table); + + PhoenixConfigurationUtil.setInputTableName(conf, readerConfig.getSchema()+"."+table); if (!columns.isEmpty()) { PhoenixConfigurationUtil.setSelectColumnNames(conf, columns.toArray(new String[columns.size()])); } + if(Objects.nonNull(readerConfig.getWhere())){ + PhoenixConfigurationUtil.setInputTableConditions(conf,readerConfig.getWhere()); + } PhoenixEmbeddedDriver.ConnectionInfo info = null; try { info = PhoenixEmbeddedDriver.ConnectionInfo.create(zkUrl); @@ -67,15 +69,19 @@ public class HbaseSQLHelper { conf.setInt(HConstants.ZOOKEEPER_CLIENT_PORT, info.getPort()); if (info.getRootNode() != null) conf.set(HConstants.ZOOKEEPER_ZNODE_PARENT, info.getRootNode()); + conf.set(Key.NAME_SPACE_MAPPING_ENABLED,"true"); + conf.set(Key.SYSTEM_TABLES_TO_NAMESPACE,"true"); return conf; } - public static List getPColumnNames(String connectionString, String tableName) throws SQLException { - Connection con = - DriverManager.getConnection(connectionString); + public static List getPColumnNames(String connectionString, String tableName,String schema) throws SQLException { + Properties pro = new Properties(); + pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true); + pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true); + Connection con = DriverManager.getConnection(connectionString,pro); PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class); MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection); - PTable table = metaDataClient.updateCache("", tableName).getTable(); + PTable table = metaDataClient.updateCache(schema, tableName).getTable(); List columnNames = new ArrayList(); for (PColumn pColumn : table.getColumns()) { if (!pColumn.getName().getString().equals(SaltingUtil.SALTING_COLUMN_NAME)) diff --git a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderConfig.java b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderConfig.java index ab06f6e1..37060986 100644 --- a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderConfig.java +++ b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderConfig.java @@ -9,6 +9,7 @@ import org.slf4j.LoggerFactory; import java.sql.SQLException; import java.util.List; +import java.util.StringJoiner; public class HbaseSQLReaderConfig { private final static Logger LOG = LoggerFactory.getLogger(HbaseSQLReaderConfig.class); @@ -27,6 +28,9 @@ public class HbaseSQLReaderConfig { private String tableName; private List columns; // 目的表的所有列的列名,包括主键和非主键,不包括时间列 + private String where;//条件 + + private String schema;// /** * @return 获取原始的datax配置 */ @@ -96,22 +100,27 @@ public class HbaseSQLReaderConfig { } String zkQuorum = zkCfg.getFirst(); String znode = zkCfg.getSecond(); + if (zkQuorum == null || zkQuorum.isEmpty()) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的hbase.zookeeper.quorum配置不能为空" ); } // 生成sql使用的连接字符串, 格式: jdbc:hbase:zk_quorum:2181:/znode_parent - cfg.connectionString = "jdbc:phoenix:" + zkQuorum; - cfg.zkUrl = zkQuorum + ":2181"; + StringBuilder connectionString=new StringBuilder("jdbc:phoenix:"); + connectionString.append(zkQuorum); + cfg.connectionString = connectionString.toString(); + StringBuilder zkUrl =new StringBuilder(zkQuorum); + cfg.zkUrl = zkUrl.append(":2181").toString(); if (!znode.isEmpty()) { - cfg.connectionString += cfg.connectionString + ":" + znode; - cfg.zkUrl += cfg.zkUrl + ":" + znode; + cfg.connectionString = connectionString.append(":").append(znode).toString(); + cfg.zkUrl=zkUrl.append(":").append(znode).toString(); } } private static void parseTableConfig(HbaseSQLReaderConfig cfg, Configuration dataxCfg) { // 解析并检查表名 cfg.tableName = dataxCfg.getString(Key.TABLE); + cfg.schema = dataxCfg.getString(Key.SCHEMA); if (cfg.tableName == null || cfg.tableName.isEmpty()) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的tableName配置不能为空,请检查并修改配置." ); @@ -124,13 +133,14 @@ public class HbaseSQLReaderConfig { HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "您配置的tableName含有非法字符{0},请检查您的配置."); } else if (cfg.columns.isEmpty()) { try { - cfg.columns = HbaseSQLHelper.getPColumnNames(cfg.connectionString, cfg.tableName); + cfg.columns = HbaseSQLHelper.getPColumnNames(cfg.connectionString, cfg.tableName,cfg.schema); dataxCfg.set(Key.COLUMN, cfg.columns); } catch (SQLException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_COLUMN_ERROR, "HBase的columns配置不能为空,请添加目标表的列名配置." + e.getMessage(), e); } } + cfg.where=dataxCfg.getString(Key.WHERE); } @Override @@ -151,6 +161,8 @@ public class HbaseSQLReaderConfig { ret.append(","); } ret.setLength(ret.length() - 1); + ret.append("[where=]").append(getWhere()); + ret.append("[schema=]").append(getSchema()); ret.append("\n"); return ret.toString(); @@ -161,4 +173,20 @@ public class HbaseSQLReaderConfig { */ private HbaseSQLReaderConfig() { } + + public String getWhere() { + return where; + } + + public void setWhere(String where) { + this.where = where; + } + + public String getSchema() { + return schema; + } + + public void setSchema(String schema) { + this.schema = schema; + } } diff --git a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTask.java b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTask.java index 1ca22c6f..461649d1 100644 --- a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTask.java +++ b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTask.java @@ -19,10 +19,8 @@ import org.slf4j.LoggerFactory; import java.io.IOException; import java.math.BigDecimal; import java.sql.*; -import java.util.HashMap; -import java.util.LinkedHashMap; -import java.util.List; -import java.util.Map; +import java.sql.Date; +import java.util.*; /** * Created by admin on 1/3/18. @@ -42,11 +40,14 @@ public class HbaseSQLReaderTask { } private void getPColumns() throws SQLException { + Properties pro = new Properties(); + pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true); + pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true); Connection con = - DriverManager.getConnection(this.readerConfig.getConnectionString()); + DriverManager.getConnection(this.readerConfig.getConnectionString(),pro); PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class); MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection); - PTable table = metaDataClient.updateCache("", this.readerConfig.getTableName()).getTable(); + PTable table = metaDataClient.updateCache(this.readerConfig.getSchema(), this.readerConfig.getTableName()).getTable(); List columnNames = this.readerConfig.getColumns(); for (PColumn pColumn : table.getColumns()) { if (columnNames.contains(pColumn.getName().getString())) { diff --git a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/Key.java b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/Key.java index 7987d6c8..f8453add 100644 --- a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/Key.java +++ b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/Key.java @@ -24,5 +24,18 @@ public final class Key { * 【必选】列配置 */ public final static String COLUMN = "column"; + /** + * + */ + public static final String WHERE = "where"; + + /** + * 【可选】Phoenix表所属schema,默认为空 + */ + public static final String SCHEMA = "schema"; + + public static final String NAME_SPACE_MAPPING_ENABLED = "phoenix.schema.isNamespaceMappingEnabled"; + + public static final String SYSTEM_TABLES_TO_NAMESPACE = "phoenix.schema.mapSystemTablesToNamespace"; } diff --git a/hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLHelper.java b/hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLHelper.java index 41e57d4e..d1b23fdf 100644 --- a/hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLHelper.java +++ b/hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Admin; import org.apache.hadoop.hbase.util.Pair; diff --git a/hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xHelper.java b/hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xHelper.java index 94b13b60..2889b647 100644 --- a/hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xHelper.java +++ b/hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xHelper.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.HBaseConfiguration; diff --git a/hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20SQLReaderHelper.java b/hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20SQLReaderHelper.java index 0edc993f..11bbf734 100644 --- a/hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20SQLReaderHelper.java +++ b/hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20SQLReaderHelper.java @@ -175,7 +175,7 @@ public class HBase20SQLReaderHelper { if (querySql == null || querySql.isEmpty()) { // 如果splitPoints为空,则根据splitKey自动切分,不过这种切分方式无法保证数据均分,且只支持整形和字符型列 if (splitPoints == null || splitPoints.isEmpty()) { - LOG.info("Split accoring min and max value of splitColumn..."); + LOG.info("Split according min and max value of splitColumn..."); Pair minMaxPK = getPkRange(configuration); if (null == minMaxPK) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, @@ -208,7 +208,7 @@ public class HBase20SQLReaderHelper { } } else { - LOG.info("Split accoring splitPoints..."); + LOG.info("Split according splitPoints..."); // 根据指定splitPoints进行切分 rangeList = buildSplitRange(); } diff --git a/hbase20xsqlreader/src/main/resources/plugin.json b/hbase20xsqlreader/src/main/resources/plugin.json index 45856411..4a7b4edf 100644 --- a/hbase20xsqlreader/src/main/resources/plugin.json +++ b/hbase20xsqlreader/src/main/resources/plugin.json @@ -2,6 +2,6 @@ "name": "hbase20xsqlreader", "class": "com.alibaba.datax.plugin.reader.hbase20xsqlreader.HBase20xSQLReader", "description": "useScene: prod. mechanism: read data from phoenix through queryserver.", - "developer": "bake" + "developer": "alibaba" } diff --git a/hbase20xsqlwriter/src/main/resources/plugin.json b/hbase20xsqlwriter/src/main/resources/plugin.json index 91b7069f..93d3002a 100755 --- a/hbase20xsqlwriter/src/main/resources/plugin.json +++ b/hbase20xsqlwriter/src/main/resources/plugin.json @@ -2,6 +2,6 @@ "name": "hbase20xsqlwriter", "class": "com.alibaba.datax.plugin.writer.hbase20xsqlwriter.HBase20xSQLWriter", "description": "useScene: prod. mechanism: use hbase sql UPSERT to put data, index tables will be updated too.", - "developer": "bake" + "developer": "alibaba" } diff --git a/hdfsreader/doc/hdfsreader.md b/hdfsreader/doc/hdfsreader.md index cd83c530..ca9a021f 100644 --- a/hdfsreader/doc/hdfsreader.md +++ b/hdfsreader/doc/hdfsreader.md @@ -166,20 +166,20 @@ HdfsReader实现了从Hadoop分布式文件系统Hdfs中读取文件数据并转 默认情况下,用户可以全部按照String类型读取数据,配置如下: ```json - "column": ["*"] + "column": ["*"] ``` 用户可以指定Column字段信息,配置如下: ```json -{ - "type": "long", - "index": 0 //从本地文件文本第一列获取int字段 -}, -{ - "type": "string", - "value": "alibaba" //HdfsReader内部生成alibaba的字符串字段作为当前字段 -} + { + "type": "long", + "index": 0 //从本地文件文本第一列获取int字段 + }, + { + "type": "string", + "value": "alibaba" //HdfsReader内部生成alibaba的字符串字段作为当前字段 + } ``` 对于用户指定Column信息,type必须填写,index/value必须选择其一。 diff --git a/hdfsreader/pom.xml b/hdfsreader/pom.xml index a5c2da2c..de7c0e21 100644 --- a/hdfsreader/pom.xml +++ b/hdfsreader/pom.xml @@ -1,5 +1,6 @@ - + datax-all com.alibaba.datax @@ -111,6 +112,42 @@ ${datax-project-version}
+ + org.apache.parquet + parquet-column + 1.12.0 + + + org.apache.parquet + parquet-avro + 1.12.0 + + + org.apache.parquet + parquet-common + 1.12.0 + + + org.apache.parquet + parquet-format + 2.3.0 + + + org.apache.parquet + parquet-jackson + 1.12.0 + + + org.apache.parquet + parquet-encoding + 1.12.0 + + + org.apache.parquet + parquet-hadoop + 1.12.0 + + diff --git a/hdfsreader/src/main/assembly/package.xml b/hdfsreader/src/main/assembly/package.xml index 3f1393b7..a5f28e5c 100644 --- a/hdfsreader/src/main/assembly/package.xml +++ b/hdfsreader/src/main/assembly/package.xml @@ -37,6 +37,28 @@ + + + + + + + + + + src/main/libs + + *.* + + plugin/reader/ossreader/libs + + + src/main/libs + + *.* + + plugin/reader/hivereader/libs + diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Constant.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Constant.java index 6bfb9bf7..061c55a0 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Constant.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Constant.java @@ -10,4 +10,5 @@ public class Constant { public static final String CSV = "CSV"; public static final String SEQ = "SEQ"; public static final String RC = "RC"; + public static final String PARQUET = "PARQUET"; } diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java index c39d3847..720f8bf6 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java @@ -8,13 +8,17 @@ import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.ColumnEntry; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; +import org.apache.commons.lang3.BooleanUtils; import org.apache.commons.lang3.StringUtils; +import org.apache.commons.lang3.exception.ExceptionUtils; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.type.HiveDecimal; import org.apache.hadoop.hive.ql.io.RCFile; import org.apache.hadoop.hive.ql.io.RCFileRecordReader; import org.apache.hadoop.hive.ql.io.orc.OrcFile; @@ -29,14 +33,30 @@ import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.util.ReflectionUtils; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.hadoop.ParquetReader; +import org.apache.parquet.hadoop.example.GroupReadSupport; +import org.apache.parquet.hadoop.util.HadoopInputFile; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.MessageTypeParser; +import org.apache.parquet.schema.PrimitiveType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.sql.Timestamp; import java.text.SimpleDateFormat; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; import java.util.*; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; /** * Created by mingya.wmy on 2015/8/12. @@ -56,6 +76,10 @@ public class DFSUtil { public static final String HDFS_DEFAULTFS_KEY = "fs.defaultFS"; public static final String HADOOP_SECURITY_AUTHENTICATION_KEY = "hadoop.security.authentication"; + private Boolean skipEmptyOrcFile = false; + + private Integer orcFileEmptySize = null; + public DFSUtil(Configuration taskConfig) { hadoopConf = new org.apache.hadoop.conf.Configuration(); @@ -79,6 +103,7 @@ public class DFSUtil { this.hadoopConf.set(HADOOP_SECURITY_AUTHENTICATION_KEY, "kerberos"); } this.kerberosAuthentication(this.kerberosPrincipal, this.kerberosKeytabFilePath); + this.skipEmptyOrcFile = taskConfig.getBool(Key.SKIP_EMPTY_ORCFILE, false); LOG.info(String.format("hadoopConfig details:%s", JSON.toJSONString(this.hadoopConf))); } @@ -102,10 +127,11 @@ public class DFSUtil { * @param srcPaths 路径列表 * @param specifiedFileType 指定文件类型 */ - public HashSet getAllFiles(List srcPaths, String specifiedFileType) { + public HashSet getAllFiles(List srcPaths, String specifiedFileType, Boolean skipEmptyOrcFile, Integer orcFileEmptySize) { this.specifiedFileType = specifiedFileType; - + this.skipEmptyOrcFile = skipEmptyOrcFile; + this.orcFileEmptySize = orcFileEmptySize; if (!srcPaths.isEmpty()) { for (String eachPath : srcPaths) { LOG.info(String.format("get HDFS all files in path = [%s]", eachPath)); @@ -127,9 +153,13 @@ public class DFSUtil { FileStatus stats[] = hdfs.globStatus(path); for (FileStatus f : stats) { if (f.isFile()) { - if (f.getLen() == 0) { + long fileLength = f.getLen(); + if (fileLength == 0) { String message = String.format("文件[%s]长度为0,将会跳过不作处理!", hdfsPath); LOG.warn(message); + } else if (BooleanUtils.isTrue(this.skipEmptyOrcFile) && this.orcFileEmptySize != null && fileLength <= this.orcFileEmptySize) { + String message = String.format("The orc file [%s] is empty, file size: %s, DataX will skip it !", f.getPath().toString(), fileLength); + LOG.warn(message); } else { addSourceFileByType(f.getPath().toString()); } @@ -167,7 +197,16 @@ public class DFSUtil { LOG.info(String.format("[%s] 是目录, 递归获取该目录下的文件", f.getPath().toString())); getHDFSAllFilesNORegex(f.getPath().toString(), hdfs); } else if (f.isFile()) { - + long fileLength = f.getLen(); + if (fileLength == 0) { + String message = String.format("The file [%s] is empty, DataX will skip it !", f.getPath().toString()); + LOG.warn(message); + continue; + } else if (BooleanUtils.isTrue(this.skipEmptyOrcFile) && this.orcFileEmptySize != null && fileLength <= this.orcFileEmptySize) { + String message = String.format("The orc file [%s] is empty, file size: %s, DataX will skip it !", f.getPath().toString(), fileLength); + LOG.warn(message); + continue; + } addSourceFileByType(f.getPath().toString()); } else { String message = String.format("该路径[%s]文件类型既不是目录也不是文件,插件自动忽略。", @@ -331,26 +370,45 @@ public class DFSUtil { //If the network disconnected, will retry 45 times, each time the retry interval for 20 seconds //Each file as a split //TODO multy threads - InputSplit[] splits = in.getSplits(conf, 1); - - RecordReader reader = in.getRecordReader(splits[0], conf, Reporter.NULL); - Object key = reader.createKey(); - Object value = reader.createValue(); - // 获取列信息 - List fields = inspector.getAllStructFieldRefs(); - - List recordFields; - while (reader.next(key, value)) { - recordFields = new ArrayList(); - - for (int i = 0; i <= columnIndexMax; i++) { - Object field = inspector.getStructFieldData(value, fields.get(i)); - recordFields.add(field); + // OrcInputFormat getSplits params numSplits not used, splits size = block numbers + InputSplit[] splits; + try { + splits = in.getSplits(conf, 1); + } catch (Exception splitException) { + if (Boolean.TRUE.equals(this.skipEmptyOrcFile)) { + boolean isOrcFileEmptyException = checkIsOrcEmptyFileExecption(splitException); + if (isOrcFileEmptyException) { + LOG.info("skipEmptyOrcFile: true, \"{}\" is an empty orc file, skip it!", sourceOrcFilePath); + return; + } + } + throw splitException; + } + for (InputSplit split : splits) { + { + RecordReader reader = in.getRecordReader(split, conf, Reporter.NULL); + Object key = reader.createKey(); + Object value = reader.createValue(); + // 获取列信息 + List fields = inspector.getAllStructFieldRefs(); + + List recordFields; + while (reader.next(key, value)) { + recordFields = new ArrayList(); + + for (int i = 0; i <= columnIndexMax; i++) { + Object field = inspector.getStructFieldData(value, fields.get(i)); + recordFields.add(field); + } + List hivePartitionColumnEntrys = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.HIVE_PARTION_COLUMN); + ArrayList hivePartitionColumns = new ArrayList<>(); + hivePartitionColumns = UnstructuredStorageReaderUtil.getHivePartitionColumns(sourceOrcFilePath, hivePartitionColumnEntrys); + transportOneRecord(column, recordFields, recordSender, + taskPluginCollector, isReadAllColumns, nullFormat,hivePartitionColumns); + } + reader.close(); } - transportOneRecord(column, recordFields, recordSender, - taskPluginCollector, isReadAllColumns, nullFormat); } - reader.close(); } catch (Exception e) { String message = String.format("从orcfile文件路径[%s]中读取数据发生异常,请联系系统管理员。" , sourceOrcFilePath); @@ -363,8 +421,20 @@ public class DFSUtil { } } + private boolean checkIsOrcEmptyFileExecption(Exception e) { + if (e == null) { + return false; + } + + String fullStackTrace = ExceptionUtils.getStackTrace(e); + if (fullStackTrace.contains("org.apache.orc.impl.ReaderImpl.getRawDataSizeOfColumn") && fullStackTrace.contains("Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1")) { + return true; + } + return false; + } + private Record transportOneRecord(List columnConfigs, List recordFields - , RecordSender recordSender, TaskPluginCollector taskPluginCollector, boolean isReadAllColumns, String nullFormat) { + , RecordSender recordSender, TaskPluginCollector taskPluginCollector, boolean isReadAllColumns, String nullFormat, ArrayList hiveParitionColumns) { Record record = recordSender.createRecord(); Column columnGenerated; try { @@ -551,8 +621,9 @@ public class DFSUtil { } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.SEQ)) { return isSequenceFile(filepath, in); + } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.PARQUET)) { + return true; } - } catch (Exception e) { String message = String.format("检查文件[%s]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件," + "请检查您文件类型和文件是否正确。", filepath); @@ -689,4 +760,332 @@ public class DFSUtil { return false; } + public void parquetFileStartRead(String sourceParquetFilePath, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { + String schemaString = readerSliceConfig.getString(Key.PARQUET_SCHEMA); + if (StringUtils.isNotBlank(schemaString)) { + LOG.info("You config parquet schema, use it {}", schemaString); + } else { + schemaString = getParquetSchema(sourceParquetFilePath, hadoopConf); + LOG.info("Parquet schema parsed from: {} , schema is {}", sourceParquetFilePath, schemaString); + if (StringUtils.isBlank(schemaString)) { + throw DataXException.asDataXException("ParquetSchema is required, please check your config"); + } + } + MessageType parquetSchema = null; + List parquetTypes = null; + Map parquetMetaMap = null; + int fieldCount = 0; + try { + parquetSchema = MessageTypeParser.parseMessageType(schemaString); + fieldCount = parquetSchema.getFieldCount(); + parquetTypes = parquetSchema.getFields(); + parquetMetaMap = ParquetMessageHelper.parseParquetTypes(parquetTypes); + } catch (Exception e) { + String message = String.format("Error parsing to MessageType via Schema string [%s]", schemaString); + LOG.error(message); + throw DataXException.asDataXException(HdfsReaderErrorCode.PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR, e); + } + List column = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); + String nullFormat = readerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.NULL_FORMAT); + boolean isUtcTimestamp = readerSliceConfig.getBool(Key.PARQUET_UTC_TIMESTAMP, false); + boolean isReadAllColumns = (column == null || column.size() == 0) ? true : false; + LOG.info("ReadingAllColums: " + isReadAllColumns); + + /** + * 支持 hive 表中间加列场景 + * + * 开关默认 false,在 hive表存在中间加列的场景打开,需要根据 name排序 + * 不默认打开的原因 + * 1、存量hdfs任务,只根据 index获取字段,无name字段配置 + * 2、中间加列场景比较少 + * 3、存量任务可能存在列错位的问题,不能随意纠正 + */ + boolean supportAddMiddleColumn = readerSliceConfig.getBool(Key.SUPPORT_ADD_MIDDLE_COLUMN, false); + + boolean printNullValueException = readerSliceConfig.getBool("printNullValueException", false); + List ignoreIndex = readerSliceConfig.getList("ignoreIndex", new ArrayList(), Integer.class); + + JobConf conf = new JobConf(hadoopConf); + ParquetReader reader = null; + try { + Path parquetFilePath = new Path(sourceParquetFilePath); + GroupReadSupport readSupport = new GroupReadSupport(); + readSupport.init(conf, null, parquetSchema); + // 这里初始化parquetReader的时候,会getFileSystem,如果是HA集群,期间会根据hadoopConfig中区加载failover类,这里初始化builder带上conf + ParquetReader.Builder parquetReaderBuilder = ParquetReader.builder(readSupport, parquetFilePath); + parquetReaderBuilder.withConf(hadoopConf); + reader = parquetReaderBuilder.build(); + Group g = null; + + // 从文件名中解析分区信息 + List hivePartitionColumnEntrys = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.HIVE_PARTION_COLUMN); + ArrayList hivePartitionColumns = new ArrayList<>(); + hivePartitionColumns = UnstructuredStorageReaderUtil.getHivePartitionColumns(sourceParquetFilePath, hivePartitionColumnEntrys); + List schemaFieldList = null; + Map colNameIndexMap = null; + Map indexMap = null; + if (supportAddMiddleColumn) { + boolean nonName = column.stream().anyMatch(columnEntry -> StringUtils.isEmpty(columnEntry.getName())); + if (nonName) { + throw new DataXException("You configured column item without name, please correct it"); + } + List parquetFileFields = getParquetFileFields(parquetFilePath, hadoopConf); + schemaFieldList = parquetFileFields.stream().map(org.apache.parquet.schema.Type::getName).collect(Collectors.toList()); + colNameIndexMap = new ConcurrentHashMap<>(); + Map finalColNameIndexMap = colNameIndexMap; + column.forEach(columnEntry -> finalColNameIndexMap.put(columnEntry.getIndex(), columnEntry.getName())); + Iterator> iterator = finalColNameIndexMap.entrySet().iterator(); + while (iterator.hasNext()) { + Map.Entry next = iterator.next(); + if (!schemaFieldList.contains(next.getValue())) { + finalColNameIndexMap.remove((next.getKey())); + } + } + LOG.info("SupportAddMiddleColumn is true, fields from parquet file is {}, " + + "colNameIndexMap is {}", JSON.toJSONString(schemaFieldList), JSON.toJSONString(colNameIndexMap)); + fieldCount = column.size(); + indexMap = new HashMap<>(); + for (int j = 0; j < fieldCount; j++) { + if (colNameIndexMap.containsKey(j)) { + int index = findIndex(schemaFieldList, findEleInMap(colNameIndexMap, j)); + indexMap.put(j, index); + } + } + } + while ((g = reader.read()) != null) { + List formattedRecord = new ArrayList(fieldCount); + try { + for (int j = 0; j < fieldCount; j++) { + Object data = null; + try { + if (null != ignoreIndex && !ignoreIndex.isEmpty() && ignoreIndex.contains(j)) { + data = null; + } else { + if (supportAddMiddleColumn) { + if (!colNameIndexMap.containsKey(j)) { + formattedRecord.add(null); + continue; + } else { + data = DFSUtil.this.readFields(g, parquetTypes.get(indexMap.get(j)), indexMap.get(j), parquetMetaMap, isUtcTimestamp); + } + } else { + data = DFSUtil.this.readFields(g, parquetTypes.get(j), j, parquetMetaMap, isUtcTimestamp); + } + } + } catch (RuntimeException e) { + if (printNullValueException) { + LOG.warn(e.getMessage()); + } + } + formattedRecord.add(data); + } + transportOneRecord(column, formattedRecord, recordSender, taskPluginCollector, isReadAllColumns, nullFormat, hivePartitionColumns); + } catch (Exception e) { + throw DataXException.asDataXException(HdfsReaderErrorCode.READ_PARQUET_ERROR, e); + } + } + } catch (Exception e) { + throw DataXException.asDataXException(HdfsReaderErrorCode.READ_PARQUET_ERROR, e); + } finally { + org.apache.commons.io.IOUtils.closeQuietly(reader); + } + } + + private String findEleInMap(Map map, Integer key) { + Iterator> iterator = map.entrySet().iterator(); + while (iterator.hasNext()) { + Map.Entry next = iterator.next(); + if (key.equals(next.getKey())) { + return next.getValue(); + } + } + return null; + } + + private int findIndex(List schemaFieldList, String colName) { + for (int i = 0; i < schemaFieldList.size(); i++) { + if (schemaFieldList.get(i).equals(colName)) { + return i; + } + } + return -1; + } + + private List getParquetFileFields(Path filePath, org.apache.hadoop.conf.Configuration configuration) { + try (org.apache.parquet.hadoop.ParquetFileReader reader = org.apache.parquet.hadoop.ParquetFileReader.open(HadoopInputFile.fromPath(filePath, configuration))) { + org.apache.parquet.schema.MessageType schema = reader.getFooter().getFileMetaData().getSchema(); + List fields = schema.getFields(); + return fields; + } catch (IOException e) { + LOG.error("Fetch parquet field error", e); + throw new DataXException(String.format("Fetch parquet field error, msg is %s", e.getMessage())); + } + } + + private String getParquetSchema(String sourceParquetFilePath, org.apache.hadoop.conf.Configuration hadoopConf) { + GroupReadSupport readSupport = new GroupReadSupport(); + ParquetReader.Builder parquetReaderBuilder = ParquetReader.builder(readSupport, new Path(sourceParquetFilePath)); + ParquetReader reader = null; + try { + parquetReaderBuilder.withConf(hadoopConf); + reader = parquetReaderBuilder.build(); + Group g = null; + if ((g = reader.read()) != null) { + return g.getType().toString(); + } + } catch (Throwable e) { + LOG.error("Inner error, getParquetSchema failed, message is {}", e.getMessage()); + } finally { + org.apache.commons.io.IOUtils.closeQuietly(reader); + } + return null; + } + + /** + * parquet 相关 + */ + private static final int JULIAN_EPOCH_OFFSET_DAYS = 2440588; + private static final long MILLIS_IN_DAY = TimeUnit.DAYS.toMillis(1); + private static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1); + + private long julianDayToMillis(int julianDay) { + return (julianDay - JULIAN_EPOCH_OFFSET_DAYS) * MILLIS_IN_DAY; + } + + private org.apache.parquet.schema.OriginalType getOriginalType(org.apache.parquet.schema.Type type, Map parquetMetaMap) { + ParquetMeta meta = parquetMetaMap.get(type.getName()); + return meta.getOriginalType(); + } + + private org.apache.parquet.schema.PrimitiveType asPrimitiveType(org.apache.parquet.schema.Type type, Map parquetMetaMap) { + ParquetMeta meta = parquetMetaMap.get(type.getName()); + return meta.getPrimitiveType(); + } + + private Object readFields(Group g, org.apache.parquet.schema.Type type, int index, Map parquetMetaMap, boolean isUtcTimestamp) { + if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.MAP) { + Group groupData = g.getGroup(index, 0); + List parquetTypes = groupData.getType().getFields(); + JSONObject data = new JSONObject(); + for (int i = 0; i < parquetTypes.size(); i++) { + int j = groupData.getFieldRepetitionCount(i); + // map key value 的对数 + for (int k = 0; k < j; k++) { + Group groupDataK = groupData.getGroup(0, k); + List parquetTypesK = groupDataK.getType().getFields(); + if (2 != parquetTypesK.size()) { + // warn: 不是key value成对出现 + throw new RuntimeException(String.format("bad parquet map type: %s", groupData.getValueToString(index, 0))); + } + Object subDataKey = this.readFields(groupDataK, parquetTypesK.get(0), 0, parquetMetaMap, isUtcTimestamp); + Object subDataValue = this.readFields(groupDataK, parquetTypesK.get(1), 1, parquetMetaMap, isUtcTimestamp); + if (StringUtils.equalsIgnoreCase("key", parquetTypesK.get(0).getName())) { + ((JSONObject) data).put(subDataKey.toString(), subDataValue); + } else { + ((JSONObject) data).put(subDataValue.toString(), subDataKey); + } + } + } + return data; + } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.MAP_KEY_VALUE) { + Group groupData = g.getGroup(index, 0); + List parquetTypes = groupData.getType().getFields(); + JSONObject data = new JSONObject(); + for (int i = 0; i < parquetTypes.size(); i++) { + int j = groupData.getFieldRepetitionCount(i); + // map key value 的对数 + for (int k = 0; k < j; k++) { + Group groupDataK = groupData.getGroup(0, k); + List parquetTypesK = groupDataK.getType().getFields(); + if (2 != parquetTypesK.size()) { + // warn: 不是key value成对出现 + throw new RuntimeException(String.format("bad parquet map type: %s", groupData.getValueToString(index, 0))); + } + Object subDataKey = this.readFields(groupDataK, parquetTypesK.get(0), 0, parquetMetaMap, isUtcTimestamp); + Object subDataValue = this.readFields(groupDataK, parquetTypesK.get(1), 1, parquetMetaMap, isUtcTimestamp); + if (StringUtils.equalsIgnoreCase("key", parquetTypesK.get(0).getName())) { + ((JSONObject) data).put(subDataKey.toString(), subDataValue); + } else { + ((JSONObject) data).put(subDataValue.toString(), subDataKey); + } + } + } + return data; + } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.LIST) { + Group groupData = g.getGroup(index, 0); + List parquetTypes = groupData.getType().getFields(); + JSONArray data = new JSONArray(); + for (int i = 0; i < parquetTypes.size(); i++) { + Object subData = this.readFields(groupData, parquetTypes.get(i), i, parquetMetaMap, isUtcTimestamp); + data.add(subData); + } + return data; + } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.DECIMAL) { + Binary binaryDate = g.getBinary(index, 0); + if (null == binaryDate) { + return null; + } else { + org.apache.hadoop.hive.serde2.io.HiveDecimalWritable decimalWritable = new org.apache.hadoop.hive.serde2.io.HiveDecimalWritable(binaryDate.getBytes(), this.asPrimitiveType(type, parquetMetaMap).getDecimalMetadata().getScale()); + // g.getType().getFields().get(1).asPrimitiveType().getDecimalMetadata().getScale() + HiveDecimal hiveDecimal = decimalWritable.getHiveDecimal(); + if (null == hiveDecimal) { + return null; + } else { + return hiveDecimal.bigDecimalValue(); + } + // return decimalWritable.doubleValue(); + } + } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.DATE) { + return java.sql.Date.valueOf(LocalDate.ofEpochDay(g.getInteger(index, 0))); + } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.UTF8) { + return g.getValueToString(index, 0); + } else { + if (type.isPrimitive()) { + PrimitiveType.PrimitiveTypeName primitiveTypeName = this.asPrimitiveType(type, parquetMetaMap).getPrimitiveTypeName(); + if (PrimitiveType.PrimitiveTypeName.BINARY == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.BOOLEAN == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.DOUBLE == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.FLOAT == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.INT32 == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.INT64 == primitiveTypeName) { + return g.getValueToString(index, 0); + } else if (PrimitiveType.PrimitiveTypeName.INT96 == primitiveTypeName) { + Binary dataInt96 = g.getInt96(index, 0); + if (null == dataInt96) { + return null; + } else { + ByteBuffer buf = dataInt96.toByteBuffer(); + buf.order(ByteOrder.LITTLE_ENDIAN); + long timeOfDayNanos = buf.getLong(); + int julianDay = buf.getInt(); + if (isUtcTimestamp) { + // UTC + LocalDate localDate = LocalDate.ofEpochDay(julianDay - JULIAN_EPOCH_OFFSET_DAYS); + LocalTime localTime = LocalTime.ofNanoOfDay(timeOfDayNanos); + return Timestamp.valueOf(LocalDateTime.of(localDate, localTime)); + } else { + // local time + long mills = julianDayToMillis(julianDay) + (timeOfDayNanos / NANOS_PER_MILLISECOND); + Timestamp timestamp = new Timestamp(mills); + timestamp.setNanos((int) (timeOfDayNanos % TimeUnit.SECONDS.toNanos(1))); + return timestamp; + } + } + } else { + return g.getValueToString(index, 0); + } + } else { + return g.getValueToString(index, 0); + } + } + } + + } diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsPathFilter.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsPathFilter.java new file mode 100644 index 00000000..88dd1fa7 --- /dev/null +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsPathFilter.java @@ -0,0 +1,21 @@ +package com.alibaba.datax.plugin.reader.hdfsreader; + +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.PathFilter; + +/** + * Created by wmy on 16/11/29. + */ +public class HdfsPathFilter implements PathFilter { + + private String regex = null; + + public HdfsPathFilter(String regex) { + this.regex = regex; + } + + @Override + public boolean accept(Path path) { + return regex != null ? path.getName().matches(regex) : true; + } +} diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReader.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReader.java index c953ef16..08c630fc 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReader.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReader.java @@ -41,6 +41,8 @@ public class HdfsReader extends Reader { private String specifiedFileType = null; private DFSUtil dfsUtil = null; private List path = null; + private boolean skipEmptyOrcFile = false; + private Integer orcFileEmptySize = null; @Override public void init() { @@ -81,9 +83,10 @@ public class HdfsReader extends Reader { !specifiedFileType.equalsIgnoreCase(Constant.TEXT) && !specifiedFileType.equalsIgnoreCase(Constant.CSV) && !specifiedFileType.equalsIgnoreCase(Constant.SEQ) && - !specifiedFileType.equalsIgnoreCase(Constant.RC)){ - String message = "HdfsReader插件目前支持ORC, TEXT, CSV, SEQUENCE, RC五种格式的文件," + - "请将fileType选项的值配置为ORC, TEXT, CSV, SEQUENCE 或者 RC"; + !specifiedFileType.equalsIgnoreCase(Constant.RC) && + !specifiedFileType.equalsIgnoreCase(Constant.PARQUET)){ + String message = "HdfsReader插件目前支持ORC, TEXT, CSV, SEQUENCE, RC, PARQUET 六种格式的文件," + + "请将fileType选项的值配置为ORC, TEXT, CSV, SEQUENCE,RC 和 PARQUET"; throw DataXException.asDataXException(HdfsReaderErrorCode.FILE_TYPE_ERROR, message); } @@ -115,6 +118,16 @@ public class HdfsReader extends Reader { UnstructuredStorageReaderUtil.validateCompress(this.readerOriginConfig); UnstructuredStorageReaderUtil.validateCsvReaderConfig(this.readerOriginConfig); } + if (this.specifiedFileType.equalsIgnoreCase(Constant.ORC)) { + skipEmptyOrcFile = this.readerOriginConfig.getBool(Key.SKIP_EMPTY_ORCFILE, false); + orcFileEmptySize = this.readerOriginConfig.getInt(Key.ORCFILE_EMPTYSIZE); + //将orcFileEmptySize必填项检查去掉,仅需要配置skipEmptyOrcFile即可,考虑历史任务兼容性(For中华保险),保留orcFileEmptySize参数配置 + //if (skipEmptyOrcFile && orcFileEmptySize == null) { + // throw new IllegalArgumentException("When \"skipEmptyOrcFile\" is configured, " + // + "parameter \"orcFileEmptySize\" cannot be null."); + //} + } + LOG.info("skipEmptyOrcFile: {}, orcFileEmptySize: {}", skipEmptyOrcFile, orcFileEmptySize); } @@ -166,7 +179,7 @@ public class HdfsReader extends Reader { @Override public void prepare() { LOG.info("prepare(), start to getAllFiles..."); - this.sourceFiles = dfsUtil.getAllFiles(path, specifiedFileType); + this.sourceFiles = dfsUtil.getAllFiles(path, specifiedFileType,skipEmptyOrcFile, orcFileEmptySize); LOG.info(String.format("您即将读取的文件数为: [%s], 列表为: [%s]", this.sourceFiles.size(), StringUtils.join(this.sourceFiles, ","))); @@ -273,7 +286,9 @@ public class HdfsReader extends Reader { }else if(specifiedFileType.equalsIgnoreCase(Constant.RC)){ dfsUtil.rcFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); - }else { + } else if (specifiedFileType.equalsIgnoreCase(Constant.PARQUET)) { + dfsUtil.parquetFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); + } else { String message = "HdfsReader插件目前支持ORC, TEXT, CSV, SEQUENCE, RC五种格式的文件," + "请将fileType选项的值配置为ORC, TEXT, CSV, SEQUENCE 或者 RC"; diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReaderErrorCode.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReaderErrorCode.java index 8dd3f370..f2caa1a8 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReaderErrorCode.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReaderErrorCode.java @@ -19,7 +19,12 @@ public enum HdfsReaderErrorCode implements ErrorCode { FILE_TYPE_UNSUPPORT("HdfsReader-12", "文件类型目前不支持"), KERBEROS_LOGIN_ERROR("HdfsReader-13", "KERBEROS认证失败"), READ_SEQUENCEFILE_ERROR("HdfsReader-14", "读取SequenceFile文件出错"), - READ_RCFILE_ERROR("HdfsReader-15", "读取RCFile文件出错"),; + READ_RCFILE_ERROR("HdfsReader-15", "读取RCFile文件出错"), + INIT_RCFILE_SERDE_ERROR("HdfsReader-16", "Deserialize RCFile, initialization failed!"), + PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR("HdfsReader-17", "Error parsing ParquetSchema"), + INVALID_PARQUET_SCHEMA("HdfsReader-18", "ParquetSchema is invalid"), + READ_PARQUET_ERROR("HdfsReader-19", "Error reading Parquet file"), + CONNECT_HDFS_IO_ERROR("HdfsReader-20", "I/O exception in establishing connection with HDFS"); private final String code; private final String description; diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Key.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Key.java index 7b985a88..7f9b3a0a 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Key.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Key.java @@ -7,9 +7,60 @@ public final class Key { */ public final static String PATH = "path"; public final static String DEFAULT_FS = "defaultFS"; + public final static String HIVE_VERSION = "hiveVersion"; public static final String FILETYPE = "fileType"; public static final String HADOOP_CONFIG = "hadoopConfig"; public static final String HAVE_KERBEROS = "haveKerberos"; public static final String KERBEROS_KEYTAB_FILE_PATH = "kerberosKeytabFilePath"; + public static final String KERBEROS_CONF_FILE_PATH = "kerberosConfFilePath"; public static final String KERBEROS_PRINCIPAL = "kerberosPrincipal"; + public static final String PATH_FILTER = "pathFilter"; + public static final String PARQUET_SCHEMA = "parquetSchema"; + /** + * hive 3.x 或 cdh高版本,使用UTC时区存储时间戳,如果发现时区偏移,该配置项要配置成 true + */ + public static final String PARQUET_UTC_TIMESTAMP = "parquetUtcTimestamp"; + public static final String SUCCESS_ON_NO_FILE = "successOnNoFile"; + public static final String PROTECTION = "protection"; + + /** + * 用于显示地指定hdfs客户端的用户名 + */ + public static final String HDFS_USERNAME = "hdfsUsername"; + + /** + * ORC FILE空文件大小 + */ + public static final String ORCFILE_EMPTYSIZE = "orcFileEmptySize"; + + /** + * 是否跳过空的OrcFile + */ + public static final String SKIP_EMPTY_ORCFILE = "skipEmptyOrcFile"; + + /** + * 是否跳过 orc meta 信息 + */ + + public static final String SKIP_ORC_META = "skipOrcMetaInfo"; + /** + * 过滤_或者.开头的文件 + */ + public static final String REGEX_PATTERN = "^.*[/][^._].*"; + + public static final String FILTER_TAG_FILE = "filterTagFile"; + + // high level params refs https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/4.x/4.4.0/oss/configuration/jindosdk_configuration_list.md + // + public static final String FS_OSS_DOWNLOAD_QUEUE_SIZE = "ossDownloadQueueSize"; + + // + public static final String FS_OSS_DOWNLOAD_THREAD_CONCURRENCY = "ossDownloadThreadConcurrency"; + + public static final String FS_OSS_READ_READAHEAD_BUFFER_COUNT = "ossDownloadBufferCount"; + + public static final String FILE_SYSTEM_TYPE = "fileSystemType"; + public static final String CDH_3_X_HIVE_VERSION = "3.1.3-cdh"; + + public static final String SUPPORT_ADD_MIDDLE_COLUMN = "supportAddMiddleColumn"; } diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMessageHelper.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMessageHelper.java new file mode 100644 index 00000000..e5838d6e --- /dev/null +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMessageHelper.java @@ -0,0 +1,33 @@ +package com.alibaba.datax.plugin.reader.hdfsreader; + +import org.apache.parquet.schema.OriginalType; +import org.apache.parquet.schema.PrimitiveType; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * @author jitongchen + * @date 2023/9/7 10:20 AM + */ +public class ParquetMessageHelper { + public static Map parseParquetTypes(List parquetTypes) { + int fieldCount = parquetTypes.size(); + Map parquetMetaMap = new HashMap(); + for (int i = 0; i < fieldCount; i++) { + org.apache.parquet.schema.Type type = parquetTypes.get(i); + String name = type.getName(); + ParquetMeta parquetMeta = new ParquetMeta(); + parquetMeta.setName(name); + OriginalType originalType = type.getOriginalType(); + parquetMeta.setOriginalType(originalType); + if (type.isPrimitive()) { + PrimitiveType primitiveType = type.asPrimitiveType(); + parquetMeta.setPrimitiveType(primitiveType); + } + parquetMetaMap.put(name, parquetMeta); + } + return parquetMetaMap; + } +} diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMeta.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMeta.java new file mode 100644 index 00000000..6f99e9b5 --- /dev/null +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMeta.java @@ -0,0 +1,38 @@ +package com.alibaba.datax.plugin.reader.hdfsreader; + +import org.apache.parquet.schema.OriginalType; +import org.apache.parquet.schema.PrimitiveType; + +/** + * @author jitongchen + * @date 2023/9/7 10:20 AM + */ +public class ParquetMeta { + private String name; + private OriginalType originalType; + private PrimitiveType primitiveType; + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public OriginalType getOriginalType() { + return originalType; + } + + public void setOriginalType(OriginalType originalType) { + this.originalType = originalType; + } + + public PrimitiveType getPrimitiveType() { + return primitiveType; + } + + public void setPrimitiveType(PrimitiveType primitiveType) { + this.primitiveType = primitiveType; + } +} \ No newline at end of file diff --git a/hdfswriter/doc/hdfswriter.md b/hdfswriter/doc/hdfswriter.md index 028a544e..1259b253 100644 --- a/hdfswriter/doc/hdfswriter.md +++ b/hdfswriter/doc/hdfswriter.md @@ -231,6 +231,7 @@ HdfsWriter提供向HDFS文件系统指定路径中写入TEXTFile文件和ORCFile * append,写入前不做任何处理,DataX hdfswriter直接使用filename写入,并保证文件名不冲突。 * nonConflict,如果目录下有fileName前缀的文件,直接报错。 + * truncate,如果目录下有fileName前缀的文件,先删除后写入。 * 必选:是
diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsHelper.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsHelper.java index 1ecdb578..09fd2723 100644 --- a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsHelper.java +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsHelper.java @@ -8,8 +8,8 @@ import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.util.ColumnTypeUtil; import com.alibaba.datax.plugin.unstructuredstorage.util.HdfsUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; import com.google.common.collect.Lists; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; @@ -27,9 +27,8 @@ import org.apache.hadoop.mapred.*; import org.apache.hadoop.security.UserGroupInformation; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import parquet.schema.OriginalType; -import parquet.schema.PrimitiveType; -import parquet.schema.Types; +import parquet.hadoop.metadata.CompressionCodecName; +import parquet.schema.*; import java.io.IOException; import java.text.SimpleDateFormat; @@ -626,4 +625,61 @@ public class HdfsHelper { } return typeBuilder.named("m").toString(); } + + public void parquetFileStartWrite(RecordReceiver lineReceiver, Configuration config, String fileName, TaskPluginCollector taskPluginCollector, Configuration taskConfig) { + MessageType messageType = null; + ParquetFileProccessor proccessor = null; + Path outputPath = new Path(fileName); + String schema = config.getString(Key.PARQUET_SCHEMA); + try { + messageType = MessageTypeParser.parseMessageType(schema); + } catch (Exception e) { + String message = String.format("Error parsing the Schema string [%s] into MessageType", schema); + LOG.error(message); + throw DataXException.asDataXException(HdfsWriterErrorCode.PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR, e); + } + + // determine the compression codec + String compress = config.getString(Key.COMPRESS, null); + // be compatible with the old NONE + if ("NONE".equalsIgnoreCase(compress)) { + compress = "UNCOMPRESSED"; + } + CompressionCodecName compressionCodecName = CompressionCodecName.fromConf(compress); + LOG.info("The compression codec used for parquet writing is: {}", compressionCodecName, compress); + try { + proccessor = new ParquetFileProccessor(outputPath, messageType, compressionCodecName, false, taskConfig, taskPluginCollector, hadoopConf); + } catch (Exception e) { + String message = String.format("Initializing ParquetFileProccessor based on Schema[%s] failed.", schema); + LOG.error(message); + throw DataXException.asDataXException(HdfsWriterErrorCode.INIT_PROCCESSOR_FAILURE, e); + } + SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmm"); + String attempt = "attempt_" + dateFormat.format(new Date()) + "_0001_m_000000_0"; + conf.set(JobContext.TASK_ATTEMPT_ID, attempt); + FileOutputFormat outFormat = new TextOutputFormat(); + outFormat.setOutputPath(conf, outputPath); + outFormat.setWorkOutputPath(conf, outputPath); + try { + Record record = null; + while ((record = lineReceiver.getFromReader()) != null) { + proccessor.write(record); + } + } catch (Exception e) { + String message = String.format("An exception occurred while writing the file file [%s]", fileName); + LOG.error(message); + Path path = new Path(fileName); + deleteDir(path.getParent()); + throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e); + } finally { + if (proccessor != null) { + try { + proccessor.close(); + } catch (IOException e) { + LOG.error(e.getMessage(), e); + } + } + } + } + } diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java index 59ec6d18..e7707461 100644 --- a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java @@ -53,8 +53,8 @@ public class HdfsWriter extends Writer { this.defaultFS = this.writerSliceConfig.getNecessaryValue(Key.DEFAULT_FS, HdfsWriterErrorCode.REQUIRED_VALUE); //fileType check this.fileType = this.writerSliceConfig.getNecessaryValue(Key.FILE_TYPE, HdfsWriterErrorCode.REQUIRED_VALUE); - if( !fileType.equalsIgnoreCase("ORC") && !fileType.equalsIgnoreCase("TEXT")){ - String message = "HdfsWriter插件目前只支持ORC和TEXT两种格式的文件,请将filetype选项的值配置为ORC或者TEXT"; + if (!fileType.equalsIgnoreCase("ORC") && !fileType.equalsIgnoreCase("TEXT") && !fileType.equalsIgnoreCase("PARQUET")) { + String message = "HdfsWriter插件目前只支持ORC、TEXT、PARQUET三种格式的文件,请将filetype选项的值配置为ORC、TEXT或PARQUET"; throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, message); } //path @@ -415,6 +415,9 @@ public class HdfsWriter extends Writer { //写ORC FILE hdfsHelper.orcFileStartWrite(lineReceiver,this.writerSliceConfig, this.fileName, this.getTaskPluginCollector()); + } else if (fileType.equalsIgnoreCase("PARQUET")) { + //写PARQUET FILE + hdfsHelper.parquetFileStartWrite(lineReceiver, this.writerSliceConfig, this.fileName, this.getTaskPluginCollector(), this.writerSliceConfig); } LOG.info("end do write"); diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriterErrorCode.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriterErrorCode.java index a9e1cb30..8a729f97 100644 --- a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriterErrorCode.java +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriterErrorCode.java @@ -16,7 +16,11 @@ public enum HdfsWriterErrorCode implements ErrorCode { CONNECT_HDFS_IO_ERROR("HdfsWriter-06", "与HDFS建立连接时出现IO异常."), COLUMN_REQUIRED_VALUE("HdfsWriter-07", "您column配置中缺失了必须填写的参数值."), HDFS_RENAME_FILE_ERROR("HdfsWriter-08", "将文件移动到配置路径失败."), - KERBEROS_LOGIN_ERROR("HdfsWriter-09", "KERBEROS认证失败"); + KERBEROS_LOGIN_ERROR("HdfsWriter-09", "KERBEROS认证失败"), + PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR("HdfsWriter-10", "Parse parquet schema error"), + + INIT_PROCCESSOR_FAILURE("HdfsWriter-11", "Init processor failed"); + private final String code; private final String description; diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Key.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Key.java index 2b1fab98..05f4cd0a 100644 --- a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Key.java +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Key.java @@ -46,4 +46,32 @@ public class Key { public static final String PARQUET_SCHEMA = "parquetSchema"; public static final String PARQUET_MERGE_RESULT = "parquetMergeResult"; + + /** + * hive 3.x 或 cdh高版本,使用UTC时区存储时间戳,如果发现时区偏移,该配置项要配置成 true + */ + public static final String PARQUET_UTC_TIMESTAMP = "parquetUtcTimestamp"; + + // Kerberos + public static final String KERBEROS_CONF_FILE_PATH = "kerberosConfFilePath"; + + // PanguFS + public final static String PANGU_FS_CONFIG = "panguFSConfig"; + public final static String PANGU_FS_CONFIG_NUWA_CLUSTER = "nuwaCluster"; + public final static String PANGU_FS_CONFIG_NUWA_SERVERS = "nuwaServers"; + public final static String PANGU_FS_CONFIG_NUWA_PROXIES = "nuwaProxies"; + public final static String PANGU_FS_CONFIG_CAPABILITY = "capability"; + + + public static final String FS_OSS_UPLOAD_THREAD_CONCURRENCY = "ossUploadConcurrency"; + // + public static final String FS_OSS_UPLOAD_QUEUE_SIZE = "ossUploadQueueSize"; + // + public static final String FS_OSS_UPLOAD_MAX_PENDING_TASKS_PER_STREAM = "ossUploadMaxPendingTasksPerStream"; + + public static final String FS_OSS_BLOCKLET_SIZE_MB = "ossBlockSize"; + + public static final String FILE_SYSTEM_TYPE = "fileSystemType"; + public static final String ENABLE_COLUMN_EXCHANGE = "enableColumnExchange"; + public static final String SUPPORT_HIVE_DATETIME = "supportHiveDateTime"; } diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileProccessor.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileProccessor.java new file mode 100644 index 00000000..90d0f6e5 --- /dev/null +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileProccessor.java @@ -0,0 +1,30 @@ +package com.alibaba.datax.plugin.writer.hdfswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import org.apache.hadoop.fs.Path; +import parquet.hadoop.ParquetWriter; +import parquet.hadoop.metadata.CompressionCodecName; +import parquet.schema.MessageType; + +import java.io.IOException; + +/** + * @author jitongchen + * @date 2023/9/7 9:41 AM + */ +public class ParquetFileProccessor extends ParquetWriter { + + public ParquetFileProccessor(Path file, MessageType schema, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector, org.apache.hadoop.conf.Configuration configuration) throws IOException { + this(file, schema, CompressionCodecName.UNCOMPRESSED, enableDictionary, taskConfig, taskPluginCollector, configuration); + } + + public ParquetFileProccessor(Path file, MessageType schema, CompressionCodecName codecName, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector) throws IOException { + super(file, new ParquetFileSupport(schema, taskConfig, taskPluginCollector), codecName, DEFAULT_BLOCK_SIZE, DEFAULT_PAGE_SIZE, DEFAULT_PAGE_SIZE, enableDictionary, false, DEFAULT_WRITER_VERSION); + } + + public ParquetFileProccessor(Path file, MessageType schema, CompressionCodecName codecName, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector, org.apache.hadoop.conf.Configuration configuration) throws IOException { + super(file, new ParquetFileSupport(schema, taskConfig, taskPluginCollector), codecName, DEFAULT_BLOCK_SIZE, DEFAULT_PAGE_SIZE, DEFAULT_PAGE_SIZE, enableDictionary, false, DEFAULT_WRITER_VERSION, configuration); + } +} diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileSupport.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileSupport.java new file mode 100644 index 00000000..410d5231 --- /dev/null +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileSupport.java @@ -0,0 +1,642 @@ +package com.alibaba.datax.plugin.writer.hdfswriter; + +import com.alibaba.datax.common.element.*; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.LimitLogger; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import parquet.column.ColumnDescriptor; +import parquet.hadoop.api.WriteSupport; +import parquet.io.api.Binary; +import parquet.io.api.RecordConsumer; +import parquet.schema.*; + +import java.math.BigDecimal; +import java.math.RoundingMode; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.sql.Timestamp; +import java.text.SimpleDateFormat; +import java.time.LocalDateTime; +import java.time.OffsetDateTime; +import java.time.ZoneOffset; +import java.time.temporal.ChronoField; +import java.util.Arrays; +import java.util.Date; +import java.util.HashMap; +import java.util.List; +import java.util.concurrent.TimeUnit; + +/** + * @author jitongchen + * @date 2023/9/7 9:41 AM + */ +public class ParquetFileSupport extends WriteSupport { + public static final Logger LOGGER = LoggerFactory.getLogger(ParquetFileSupport.class); + private MessageType schema; + private List columns; + private RecordConsumer recordConsumer; + private boolean useRawDataTransf = true; + private boolean printStackTrace = true; + + // 不通类型的nullFormat + private String nullFormat; + + private String dateFormat; + private boolean isUtcTimestamp; + private SimpleDateFormat dateParse; + private Binary binaryForNull; + private TaskPluginCollector taskPluginCollector; + private String dataxParquetMode; + + public ParquetFileSupport(MessageType schema, com.alibaba.datax.common.util.Configuration taskConfig, TaskPluginCollector taskPluginCollector) { + this.schema = schema; + this.columns = schema.getColumns(); + this.useRawDataTransf = taskConfig.getBool(Key.PARQUET_FILE_USE_RAW_DATA_TRANSF, true); + + // 不通类型的nullFormat + this.nullFormat = taskConfig.getString(Key.NULL_FORMAT, Constant.DEFAULT_NULL_FORMAT); + this.binaryForNull = Binary.fromString(this.nullFormat); + + this.dateFormat = taskConfig.getString(Key.DATE_FORMAT, null); + if (StringUtils.isNotBlank(this.dateFormat)) { + this.dateParse = new SimpleDateFormat(dateFormat); + } + + this.isUtcTimestamp = taskConfig.getBool(Key.PARQUET_UTC_TIMESTAMP, false); + + this.taskPluginCollector = taskPluginCollector; + if (taskConfig.getKeys().contains("dataxParquetMode")) { + this.dataxParquetMode = taskConfig.getString("dataxParquetMode"); + } else { + // 默认值是columns + this.dataxParquetMode = "columns"; + } + } + + @Override + public WriteContext init(Configuration configuration) { + return new WriteContext(schema, new HashMap()); + } + + @Override + public void prepareForWrite(RecordConsumer recordConsumer) { + this.recordConsumer = recordConsumer; + } + + @Override + public void write(Record values) { + if (dataxParquetMode.equalsIgnoreCase("fields")) { + writeBaseOnFields(values); + return; + } + + // NOTE: 下面的实现其实是不对的,只是看代码注释貌似有用户已经在用 + // 所以暂时不动下面的逻辑。 + // 默认走的就是下面的这条代码路径 + if (values != null && columns != null && values.getColumnNumber() == columns.size()) { + recordConsumer.startMessage(); + for (int i = 0; i < columns.size(); i++) { + Column value = values.getColumn(i); + ColumnDescriptor columnDescriptor = columns.get(i); + Type type = this.schema.getFields().get(i); + if (value != null) { + try { + if (this.useRawDataTransf) { + if (value.getRawData() == null) { + continue; + } + recordConsumer.startField(columnDescriptor.getPath()[0], i); + // 原来使用Column->RawData的方法其实是错误的类型转换策略,会将DataX的数据内部表示形象序列化出去 + // 但是 Parquet 已经有用户使用了,故暂时只是配置项切换 + String rawData = value.getRawData().toString(); + switch (columnDescriptor.getType()) { + case BOOLEAN: + recordConsumer.addBoolean(Boolean.parseBoolean(rawData)); + break; + case FLOAT: + recordConsumer.addFloat(Float.parseFloat(rawData)); + break; + case DOUBLE: + recordConsumer.addDouble(Double.parseDouble(rawData)); + break; + case INT32: + OriginalType originalType = type.getOriginalType(); + if (originalType != null && StringUtils.equalsIgnoreCase("DATE", originalType.name())) { + int realVal = (int) (new java.sql.Date(Long.parseLong(rawData)).toLocalDate().toEpochDay()); + recordConsumer.addInteger(realVal); + } else { + recordConsumer.addInteger(Integer.parseInt(rawData)); + } + break; + case INT64: + recordConsumer.addLong(Long.valueOf(rawData)); + break; + case INT96: + recordConsumer.addBinary(timestampColToBinary(value)); + break; + case BINARY: + recordConsumer.addBinary(Binary.fromString(rawData)); + break; + case FIXED_LEN_BYTE_ARRAY: + PrimitiveType primitiveType = type.asPrimitiveType(); + if (primitiveType.getDecimalMetadata() != null) { + // decimal + recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); + break; + } + /* fall through */ + default: + recordConsumer.addBinary(Binary.fromString(rawData)); + break; + } + + recordConsumer.endField(columnDescriptor.getPath()[0], i); + } else { + boolean isNull = null == value.getRawData(); + + if (!isNull) { + recordConsumer.startField(columnDescriptor.getPath()[0], i); + + // no skip: empty fields are illegal, the field should be ommited completely instead + switch (columnDescriptor.getType()) { + case BOOLEAN: + recordConsumer.addBoolean(value.asBoolean()); + break; + case FLOAT: + recordConsumer.addFloat(value.asDouble().floatValue()); + break; + case DOUBLE: + recordConsumer.addDouble(value.asDouble()); + break; + case INT32: + OriginalType originalType = type.getOriginalType(); + if (originalType != null && StringUtils.equalsIgnoreCase("DATE", originalType.name())) { + int realVal = (int) (new java.sql.Date(value.asLong()).toLocalDate().toEpochDay()); + recordConsumer.addInteger(realVal); + } else { + recordConsumer.addInteger(value.asLong().intValue()); + } + break; + case INT64: + recordConsumer.addLong(value.asLong()); + break; + case INT96: + recordConsumer.addBinary(timestampColToBinary(value)); + break; + case BINARY: + String valueAsString2Write = null; + if (Column.Type.DATE == value.getType() && null != this.dateParse) { + valueAsString2Write = dateParse.format(value.asDate()); + } else { + valueAsString2Write = value.asString(); + } + recordConsumer.addBinary(Binary.fromString(valueAsString2Write)); + break; + case FIXED_LEN_BYTE_ARRAY: + PrimitiveType primitiveType = type.asPrimitiveType(); + if (primitiveType.getDecimalMetadata() != null) { + // decimal + recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); + break; + } + /* fall through */ + default: + recordConsumer.addBinary(Binary.fromString(value.asString())); + break; + } + recordConsumer.endField(columnDescriptor.getPath()[0], i); + } + } + } catch (Exception e) { + if (printStackTrace) { + printStackTrace = false; + LOGGER.warn("write to parquet error: {}", e.getMessage(), e); + } + // dirty data + if (null != this.taskPluginCollector) { + // job post 里面的merge taskPluginCollector 为null + this.taskPluginCollector.collectDirtyRecord(values, e, e.getMessage()); + } + } + } else { + recordConsumer.addBinary(this.binaryForNull); + } + } + recordConsumer.endMessage(); + } + } + + private Binary decimalToBinary(Column value, int precision, int scale) { + BigDecimal bigDecimal = value.asBigDecimal(); + bigDecimal = bigDecimal.setScale(scale, RoundingMode.HALF_UP); + byte[] decimalBytes = bigDecimal.unscaledValue().toByteArray(); + + int precToBytes = ParquetHiveSerDe.PRECISION_TO_BYTE_COUNT[precision - 1]; + if (precToBytes == decimalBytes.length) { + // No padding needed. + return Binary.fromByteArray(decimalBytes); + } + + byte[] tgt = new byte[precToBytes]; + + // padding -1 for negative number + if (bigDecimal.compareTo(new BigDecimal("0")) < 0) { + Arrays.fill(tgt, 0, precToBytes - decimalBytes.length, (byte) -1); + } + + System.arraycopy(decimalBytes, 0, tgt, precToBytes - decimalBytes.length, decimalBytes.length); + return Binary.fromByteArray(tgt); + } + + private static final int JULIAN_EPOCH_OFFSET_DAYS = 2_440_588; + private static final long MILLIS_IN_DAY = TimeUnit.DAYS.toMillis(1); + private static final long MILLS_PER_SECOND = TimeUnit.SECONDS.toMillis(1); + private static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1); + private static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1); + private static final ZoneOffset defaultOffset = OffsetDateTime.now().getOffset(); + + /** + * int 96 is timestamp in parquet + * + * @param valueColumn + * @return + */ + private Binary timestampColToBinary(Column valueColumn) { + if (valueColumn.getRawData() == null) { + return Binary.EMPTY; + } + long mills; + long nanos = 0; + if (valueColumn instanceof DateColumn) { + DateColumn dateColumn = (DateColumn) valueColumn; + mills = dateColumn.asLong(); + nanos = dateColumn.getNanos(); + } else { + mills = valueColumn.asLong(); + } + int julianDay; + long nanosOfDay; + if (isUtcTimestamp) { + // utc ignore current timezone (task should set timezone same as hive/hdfs) + long seconds = mills >= 0 ? mills / MILLS_PER_SECOND : (mills / MILLS_PER_SECOND - 1); + LocalDateTime localDateTime = LocalDateTime.ofEpochSecond(seconds, (int) nanos, defaultOffset); + julianDay = (int) (localDateTime.getLong(ChronoField.EPOCH_DAY) + JULIAN_EPOCH_OFFSET_DAYS); + nanosOfDay = localDateTime.getLong(ChronoField.NANO_OF_DAY); + } else { + // local date + julianDay = (int) ((mills / MILLIS_IN_DAY) + JULIAN_EPOCH_OFFSET_DAYS); + if (mills >= 0) { + nanosOfDay = ((mills % MILLIS_IN_DAY) / MILLS_PER_SECOND) * NANOS_PER_SECOND + nanos; + } else { + julianDay--; + nanosOfDay = (((mills % MILLIS_IN_DAY) / MILLS_PER_SECOND) - 1) * NANOS_PER_SECOND + nanos; + nanosOfDay += NANOS_PER_DAY; + } + } + + ByteBuffer buf = ByteBuffer.allocate(12); + buf.order(ByteOrder.LITTLE_ENDIAN); + buf.putLong(nanosOfDay); + buf.putInt(julianDay); + buf.flip(); + return Binary.fromByteBuffer(buf); + } + + private void writeBaseOnFields(Record values) { + //LOGGER.info("Writing parquet data using fields mode(The correct mode.)"); + List types = this.schema.getFields(); + + if (values != null && types != null && values.getColumnNumber() == types.size()) { + recordConsumer.startMessage(); + writeFields(types, values); + recordConsumer.endMessage(); + } + } + + private void writeFields(List types, Record values) { + for (int i = 0; i < types.size(); i++) { + Type type = types.get(i); + Column value = values.getColumn(i); + if (value != null) { + try { + if (type.isPrimitive()) { + writePrimitiveType(type, value, i); + } else { + writeGroupType(type, (JSON) JSON.parse(value.asString()), i); + } + } catch (Exception e) { + if (printStackTrace) { + printStackTrace = false; + LOGGER.warn("write to parquet error: {}", e.getMessage(), e); + } + // dirty data + if (null != this.taskPluginCollector) { + // job post 里面的merge taskPluginCollector 为null + this.taskPluginCollector.collectDirtyRecord(values, e, e.getMessage()); + } + } + } + } + } + + private void writeFields(List types, JSONObject values) { + for (int i = 0; i < types.size(); i++) { + Type type = types.get(i); + Object value = values.get(type.getName()); + + if (value != null) { + try { + if (type.isPrimitive()) { + writePrimitiveType(type, value, i); + } else { + writeGroupType(type, (JSON) value, i); + } + } catch (Exception e) { + if (printStackTrace) { + printStackTrace = false; + LOGGER.warn("write to parquet error: {}", e.getMessage(), e); + } + } + } else { + recordConsumer.addBinary(this.binaryForNull); + } + } + } + + private void writeGroupType(Type type, JSON value, int index) { + GroupType groupType = type.asGroupType(); + OriginalType originalType = groupType.getOriginalType(); + if (originalType != null) { + switch (originalType) { + case MAP: + writeMap(groupType, value, index); + break; + case LIST: + writeList(groupType, value, index); + break; + default: + break; + } + } else { + // struct + writeStruct(groupType, value, index); + } + } + + private void writeMap(GroupType groupType, JSON value, int index) { + if (value == null) { + return; + } + + JSONObject json = (JSONObject) value; + + if (json.isEmpty()) { + return; + } + + recordConsumer.startField(groupType.getName(), index); + + recordConsumer.startGroup(); + + // map + // key_value start + recordConsumer.startField("key_value", 0); + recordConsumer.startGroup(); + + List keyValueFields = groupType.getFields().get(0).asGroupType().getFields(); + Type keyType = keyValueFields.get(0); + Type valueType = keyValueFields.get(1); + for (String key : json.keySet()) { + // key + writePrimitiveType(keyType, key, 0); + + // value + if (valueType.isPrimitive()) { + writePrimitiveType(valueType, json.get(key), 1); + } else { + writeGroupType(valueType, (JSON) json.get(key), 1); + } + } + + recordConsumer.endGroup(); + recordConsumer.endField("key_value", 0); + // key_value end + + recordConsumer.endGroup(); + recordConsumer.endField(groupType.getName(), index); + } + + private void writeList(GroupType groupType, JSON value, int index) { + if (value == null) { + return; + } + + JSONArray json = (JSONArray) value; + + if (json.isEmpty()) { + return; + } + + recordConsumer.startField(groupType.getName(), index); + // list + recordConsumer.startGroup(); + + // list start + recordConsumer.startField("list", 0); + recordConsumer.startGroup(); + + Type elementType = groupType.getFields().get(0).asGroupType().getFields().get(0); + + if (elementType.isPrimitive()) { + for (Object elementValue : json) { + writePrimitiveType(elementType, elementValue, 0); + } + } else { + for (Object elementValue : json) { + writeGroupType(elementType, (JSON) elementValue, 0); + } + } + + recordConsumer.endGroup(); + recordConsumer.endField("list", 0); + // list end + recordConsumer.endGroup(); + + recordConsumer.endField(groupType.getName(), index); + } + + private void writeStruct(GroupType groupType, JSON value, int index) { + if (value == null) { + return; + } + JSONObject json = (JSONObject) value; + if (json.isEmpty()) { + return; + } + + recordConsumer.startField(groupType.getName(), index); + // struct start + recordConsumer.startGroup(); + + writeFields(groupType.getFields(), json); + recordConsumer.endGroup(); + // struct end + recordConsumer.endField(groupType.getName(), index); + } + + private void writePrimitiveType(Type type, Object value, int index) { + if (value == null) { + return; + } + + recordConsumer.startField(type.getName(), index); + PrimitiveType primitiveType = type.asPrimitiveType(); + + switch (primitiveType.getPrimitiveTypeName()) { + case BOOLEAN: + recordConsumer.addBoolean((Boolean) value); + break; + case FLOAT: + if (value instanceof Float) { + recordConsumer.addFloat(((Float) value).floatValue()); + } else if (value instanceof Double) { + recordConsumer.addFloat(((Double) value).floatValue()); + } else if (value instanceof Long) { + recordConsumer.addFloat(((Long) value).floatValue()); + } else if (value instanceof Integer) { + recordConsumer.addFloat(((Integer) value).floatValue()); + } + break; + case DOUBLE: + if (value instanceof Float) { + recordConsumer.addDouble(((Float) value).doubleValue()); + } else if (value instanceof Double) { + recordConsumer.addDouble(((Double) value).doubleValue()); + } else if (value instanceof Long) { + recordConsumer.addDouble(((Long) value).doubleValue()); + } else if (value instanceof Integer) { + recordConsumer.addDouble(((Integer) value).doubleValue()); + } + break; + case INT32: + if (value instanceof Integer) { + recordConsumer.addInteger((Integer) value); + } else if (value instanceof Long) { + recordConsumer.addInteger(((Long) value).intValue()); + } else { + // 之前代码写的有问题,导致这里丢列了没抛异常,先收集,后续看看有没有任务命中在决定怎么改 + LimitLogger.limit("dirtyDataHiveWriterParquet", TimeUnit.MINUTES.toMillis(1), () -> LOGGER.warn("dirtyDataHiveWriterParquet {}", String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()))); + } + break; + case INT64: + if (value instanceof Integer) { + recordConsumer.addLong(((Integer) value).longValue()); + } else if (value instanceof Long) { + recordConsumer.addInteger(((Long) value).intValue()); + } else { + // 之前代码写的有问题,导致这里丢列了没抛异常,先收集,后续看看有没有任务命中在决定怎么改 + LimitLogger.limit("dirtyDataHiveWriterParquet", TimeUnit.MINUTES.toMillis(1), () -> LOGGER.warn("dirtyDataHiveWriterParquet {}", String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()))); + } + break; + case INT96: + if (value instanceof Integer) { + recordConsumer.addBinary(timestampColToBinary(new LongColumn((Integer) value))); + } else if (value instanceof Long) { + recordConsumer.addBinary(timestampColToBinary(new LongColumn((Long) value))); + } else if (value instanceof Timestamp) { + recordConsumer.addBinary(timestampColToBinary(new DateColumn((Timestamp) value))); + } else if (value instanceof Date) { + recordConsumer.addBinary(timestampColToBinary(new DateColumn((Date) value))); + } else { + recordConsumer.addBinary(timestampColToBinary(new StringColumn(value.toString()))); + } + break; + case FIXED_LEN_BYTE_ARRAY: + if (primitiveType.getDecimalMetadata() != null) { + // decimal + Column column; + if (value instanceof Integer) { + column = new LongColumn((Integer) value); + } else if (value instanceof Long) { + column = new LongColumn((Long) value); + } else if (value instanceof Double) { + column = new DoubleColumn((Double) value); + } else if (value instanceof BigDecimal) { + column = new DoubleColumn((BigDecimal) value); + } else { + column = new StringColumn(value.toString()); + } + recordConsumer.addBinary(decimalToBinary(column, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); + break; + } + /* fall through */ + case BINARY: + default: + recordConsumer.addBinary(Binary.fromString((String) value)); + break; + } + recordConsumer.endField(type.getName(), index); + } + + private void writePrimitiveType(Type type, Column value, int index) { + if (value == null || value.getRawData() == null) { + return; + } + + recordConsumer.startField(type.getName(), index); + PrimitiveType primitiveType = type.asPrimitiveType(); + switch (primitiveType.getPrimitiveTypeName()) { + case BOOLEAN: + recordConsumer.addBoolean(value.asBoolean()); + break; + case FLOAT: + recordConsumer.addFloat(value.asDouble().floatValue()); + break; + case DOUBLE: + recordConsumer.addDouble(value.asDouble()); + break; + case INT32: + OriginalType originalType = type.getOriginalType(); + if (OriginalType.DATE.equals(originalType)) { + int realVal = (int) (new java.sql.Date(value.asLong()).toLocalDate().toEpochDay()); + recordConsumer.addInteger(realVal); + } else { + recordConsumer.addInteger(value.asLong().intValue()); + } + break; + case INT64: + recordConsumer.addLong(value.asLong()); + break; + case INT96: + recordConsumer.addBinary(timestampColToBinary(value)); + break; + case BINARY: + String valueAsString2Write = null; + if (Column.Type.DATE == value.getType() && null != this.dateParse) { + valueAsString2Write = dateParse.format(value.asDate()); + } else { + valueAsString2Write = value.asString(); + } + recordConsumer.addBinary(Binary.fromString(valueAsString2Write)); + break; + case FIXED_LEN_BYTE_ARRAY: + if (primitiveType.getDecimalMetadata() != null) { + // decimal + recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); + break; + } + /* fall through */ + default: + recordConsumer.addBinary(Binary.fromString(value.asString())); + break; + } + recordConsumer.endField(type.getName(), index); + } +} diff --git a/hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/BaseWriter.java b/hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/BaseWriter.java index 2c390bcb..03349e37 100644 --- a/hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/BaseWriter.java +++ b/hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/BaseWriter.java @@ -15,8 +15,8 @@ import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.ConfLoader; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.WriterUtil; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.alibaba.hologres.client.HoloClient; import com.alibaba.hologres.client.HoloConfig; import com.alibaba.hologres.client.Put; @@ -167,7 +167,7 @@ public class BaseWriter { if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { // 说明有 preSql 配置,则此处删除掉 originalConfig.remove(Key.PRE_SQL); - String tempJdbcUrl = jdbcUrl.replace("postgresql", "hologres"); + String tempJdbcUrl = jdbcUrl.replace("jdbc:postgresql://", "jdbc:hologres://"); try (Connection conn = DriverManager.getConnection( tempJdbcUrl, username, password)) { LOG.info("Begin to execute preSqls:[{}]. context info:{}.", @@ -191,32 +191,34 @@ public class BaseWriter { // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) public void post(Configuration originalConfig) { - String username = originalConfig.getString(Key.USERNAME); - String password = originalConfig.getString(Key.PASSWORD); + try { + String username = originalConfig.getString(Key.USERNAME); + String password = originalConfig.getString(Key.PASSWORD); - String jdbcUrl = originalConfig.getString(Key.JDBC_URL); + String jdbcUrl = originalConfig.getString(Key.JDBC_URL); - String table = originalConfig.getString(Key.TABLE); + String table = originalConfig.getString(Key.TABLE); - List postSqls = originalConfig.getList(Key.POST_SQL, - String.class); - List renderedPostSqls = WriterUtil.renderPreOrPostSqls( - postSqls, table); + List postSqls = originalConfig.getList(Key.POST_SQL, + String.class); + List renderedPostSqls = WriterUtil.renderPreOrPostSqls( + postSqls, table); - if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { - // 说明有 postSql 配置,则此处删除掉 - originalConfig.remove(Key.POST_SQL); - String tempJdbcUrl = jdbcUrl.replace("postgresql", "hologres"); - Connection conn = DBUtil.getConnection(this.dataBaseType, - tempJdbcUrl, username, password); - - LOG.info( - "Begin to execute postSqls:[{}]. context info:{}.", - StringUtils.join(renderedPostSqls, ";"), tempJdbcUrl); - WriterUtil.executeSqls(conn, renderedPostSqls, tempJdbcUrl, dataBaseType); - DBUtil.closeDBResources(null, null, conn); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + // 说明有 postSql 配置,则此处删除掉 + originalConfig.remove(Key.POST_SQL); + String tempJdbcUrl = jdbcUrl.replace("jdbc:postgresql://", "jdbc:hologres://"); + try (Connection conn = DriverManager.getConnection( + tempJdbcUrl, username, password)) { + LOG.info( + "Begin to execute postSqls:[{}]. context info:{}.", + StringUtils.join(renderedPostSqls, ";"), tempJdbcUrl); + WriterUtil.executeSqls(conn, renderedPostSqls, tempJdbcUrl, dataBaseType); + } + } + } catch (SQLException e) { + throw DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, e); } - } public void destroy(Configuration originalConfig) { diff --git a/hologresjdbcwriter/src/main/resources/plugin.json b/hologresjdbcwriter/src/main/resources/plugin.json index d46f216b..a9f93996 100644 --- a/hologresjdbcwriter/src/main/resources/plugin.json +++ b/hologresjdbcwriter/src/main/resources/plugin.json @@ -1,6 +1,6 @@ { - "name": "hologreswriter", - "class": "com.alibaba.datax.plugin.writer.hologreswriter.HologresWriter", + "name": "hologresjdbcwriter", + "class": "com.alibaba.datax.plugin.writer.hologresjdbcwriter.HologresJdbcWriter", "description": "", "developer": "alibaba" -} +} \ No newline at end of file diff --git a/hologresjdbcwriter/src/main/resources/plugin_job_template.json b/hologresjdbcwriter/src/main/resources/plugin_job_template.json index 656971c3..f509ccc0 100644 --- a/hologresjdbcwriter/src/main/resources/plugin_job_template.json +++ b/hologresjdbcwriter/src/main/resources/plugin_job_template.json @@ -1,5 +1,5 @@ { - "name": "hologreswriter", + "name": "hologresjdbcwriter", "parameter": { "url": "", "username": "", diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java index cf1b0f8f..558693ff 100644 --- a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java @@ -3,7 +3,7 @@ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.kudu.ColumnSchema; diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java index bff3509f..df872842 100644 --- a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java @@ -134,7 +134,7 @@ public class KuduWriterTask { break; case BOOLEAN: synchronized (lock) { - row.addBoolean(name, Boolean.getBoolean(rawData)); + row.addBoolean(name, Boolean.parseBoolean(rawData)); } break; case STRING: diff --git a/loghubreader/pom.xml b/loghubreader/pom.xml new file mode 100644 index 00000000..b2f52f3d --- /dev/null +++ b/loghubreader/pom.xml @@ -0,0 +1,73 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + loghubreader + + 0.0.1-SNAPSHOT + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.aliyun.openservices + aliyun-log + 0.6.22 + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/loghubreader/src/main/assembly/package.xml b/loghubreader/src/main/assembly/package.xml new file mode 100644 index 00000000..e1d8d739 --- /dev/null +++ b/loghubreader/src/main/assembly/package.xml @@ -0,0 +1,34 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + + plugin/reader/loghubreader + + + target/ + + loghubreader-0.0.1-SNAPSHOT.jar + + plugin/reader/loghubreader + + + + + + false + plugin/reader/loghubreader/libs + runtime + + + diff --git a/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Constant.java b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Constant.java new file mode 100644 index 00000000..fd9e88dc --- /dev/null +++ b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Constant.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.plugin.reader.loghubreader; + +public class Constant { + + public static String DATETIME_FORMAT = "yyyyMMddHHmmss"; + public static String DATE_FORMAT = "yyyyMMdd"; + + static String META_COL_SOURCE = "__source__"; + static String META_COL_TOPIC = "__topic__"; + static String META_COL_CATEGORY = "__category__"; + static String META_COL_MACHINEUUID = "__machineUUID__"; + static String META_COL_HOSTNAME = "__hostname__"; + static String META_COL_PATH = "__path__"; + static String META_COL_LOGTIME = "__logtime__"; + public static String META_COL_RECEIVE_TIME = "__receive_time__"; + + /** + * 除用户手动配置的列之外,其余数据列作为一个 json 读取到一列 + */ + static String COL_EXTRACT_OTHERS = "C__extract_others__"; + + /** + * 将所有元数据列作为一个 json 读取到一列 + */ + static String COL_EXTRACT_ALL_META = "C__extract_all_meta__"; +} diff --git a/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Key.java b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Key.java new file mode 100644 index 00000000..9067cc68 --- /dev/null +++ b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Key.java @@ -0,0 +1,38 @@ +package com.alibaba.datax.plugin.reader.loghubreader; + +public final class Key { + + /** + * 此处声明插件用到的需要插件使用者提供的配置项 + */ + public static final String ENDPOINT = "endpoint"; + + public static final String ACCESSKEYID = "accessId"; + + public static final String ACCESSKEYSECRET = "accessKey"; + + public static final String PROJECT = "project"; + + public static final String LOGSTORE = "logstore"; + + public static final String TOPIC = "topic"; + + public static final String COLUMN = "column"; + + public static final String BATCHSIZE = "batchSize"; + + public static final String BEGINTIMESTAMPMILLIS = "beginTimestampMillis"; + + public static final String ENDTIMESTAMPMILLIS = "endTimestampMillis"; + + public static final String BEGINDATETIME = "beginDateTime"; + + public static final String ENDDATETIME = "endDateTime"; + + public static final String TIMEFORMAT = "timeformat"; + + public static final String SOURCE = "source"; + + public static final String SHARD = "shard"; + +} diff --git a/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReader.java b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReader.java new file mode 100644 index 00000000..c52ef62d --- /dev/null +++ b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReader.java @@ -0,0 +1,482 @@ +package com.alibaba.datax.plugin.reader.loghubreader; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.DataXCaseEnvUtil; +import com.alibaba.datax.common.util.RetryUtil; +import com.alibaba.fastjson2.JSONObject; +import com.aliyun.openservices.log.Client; +import com.aliyun.openservices.log.common.Consts.CursorMode; +import com.aliyun.openservices.log.common.*; +import com.aliyun.openservices.log.exception.LogException; +import com.aliyun.openservices.log.response.BatchGetLogResponse; +import com.aliyun.openservices.log.response.GetCursorResponse; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.text.ParseException; +import java.text.SimpleDateFormat; +import java.util.*; +import java.util.concurrent.Callable; + +public class LogHubReader extends Reader { + public static class Job extends Reader.Job { + + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + + private Client client; + private Configuration originalConfig; + + private Long beginTimestampMillis; + private Long endTimestampMillis; + + @Override + public void init() { + LOG.info("loghub reader job init begin ..."); + this.originalConfig = super.getPluginJobConf(); + validateParameter(originalConfig); + + String endPoint = this.originalConfig.getString(Key.ENDPOINT); + String accessKeyId = this.originalConfig.getString(Key.ACCESSKEYID); + String accessKeySecret = this.originalConfig.getString(Key.ACCESSKEYSECRET); + + client = new Client(endPoint, accessKeyId, accessKeySecret); + LOG.info("loghub reader job init end."); + } + + private void validateParameter(Configuration conf){ + conf.getNecessaryValue(Key.ENDPOINT,LogHubReaderErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.ACCESSKEYID,LogHubReaderErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.ACCESSKEYSECRET,LogHubReaderErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.PROJECT,LogHubReaderErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.LOGSTORE,LogHubReaderErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.COLUMN,LogHubReaderErrorCode.REQUIRE_VALUE); + + int batchSize = this.originalConfig.getInt(Key.BATCHSIZE); + if (batchSize > 1000) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid batchSize[" + batchSize + "] value (0,1000]!"); + } + + beginTimestampMillis = this.originalConfig.getLong(Key.BEGINTIMESTAMPMILLIS); + String beginDateTime = this.originalConfig.getString(Key.BEGINDATETIME); + + if (beginDateTime != null) { + try { + beginTimestampMillis = getUnixTimeFromDateTime(beginDateTime); + } catch (ParseException e) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid beginDateTime[" + beginDateTime + "], format [yyyyMMddHHmmss or yyyyMMdd]!"); + } + } + + if (beginTimestampMillis != null && beginTimestampMillis <= 0) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid beginTimestampMillis[" + beginTimestampMillis + "]!"); + } + + endTimestampMillis = this.originalConfig.getLong(Key.ENDTIMESTAMPMILLIS); + String endDateTime = this.originalConfig.getString(Key.ENDDATETIME); + + if (endDateTime != null) { + try { + endTimestampMillis = getUnixTimeFromDateTime(endDateTime); + } catch (ParseException e) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid beginDateTime[" + endDateTime + "], format [yyyyMMddHHmmss or yyyyMMdd]!"); + } + } + + if (endTimestampMillis != null && endTimestampMillis <= 0) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid endTimestampMillis[" + endTimestampMillis + "]!"); + } + + if (beginTimestampMillis != null && endTimestampMillis != null + && endTimestampMillis <= beginTimestampMillis) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "endTimestampMillis[" + endTimestampMillis + "] must bigger than beginTimestampMillis[" + beginTimestampMillis + "]!"); + } + } + + private long getUnixTimeFromDateTime(String dateTime) throws ParseException { + try { + String format = Constant.DATETIME_FORMAT; + SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); + return simpleDateFormat.parse(dateTime).getTime() / 1000; + } catch (ParseException ignored) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "Invalid DateTime[" + dateTime + "]!"); + } + } + + @Override + public void prepare() { + } + + @Override + public List split(int adviceNumber) { + LOG.info("split() begin..."); + + List readerSplitConfigs = new ArrayList(); + + final String project = this.originalConfig.getString(Key.PROJECT); + final String logstore = this.originalConfig.getString(Key.LOGSTORE); + + List logStore = null; + try { + logStore = RetryUtil.executeWithRetry(new Callable>() { + @Override + public List call() throws Exception { + return client.ListShard(project, logstore).GetShards(); + } + }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); + } catch (Exception e) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "get LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); + } + + if (logStore == null) { + throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, + "LogStore[" + logstore + "] isn't exists, please check !"); + } + + int splitNumber = logStore.size(); + if (0 == splitNumber) { + throw DataXException.asDataXException(LogHubReaderErrorCode.EMPTY_LOGSTORE_VALUE, + "LogStore[" + logstore + "] has 0 shard, please check !"); + } + + Collections.shuffle(logStore); + for (int i = 0; i < logStore.size(); i++) { + if (beginTimestampMillis != null && endTimestampMillis != null) { + try { + String beginCursor = getCursorWithRetry(client, project, logstore, logStore.get(i).GetShardId(), beginTimestampMillis).GetCursor(); + String endCursor = getCursorWithRetry(client, project, logstore, logStore.get(i).GetShardId(), endTimestampMillis).GetCursor(); + if (beginCursor.equals(endCursor)) { + if ((i == logStore.size() - 1) && (readerSplitConfigs.size() == 0)) { + + } else { + LOG.info("skip empty shard[" + logStore.get(i) + "]!"); + continue; + } + } + } catch (Exception e) { + LOG.error("Check Shard[" + logStore.get(i) + "] Error, please check !" + e.toString()); + throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); + } + } + Configuration splitedConfig = this.originalConfig.clone(); + splitedConfig.set(Key.SHARD, logStore.get(i).GetShardId()); + readerSplitConfigs.add(splitedConfig); + } + + if (splitNumber < adviceNumber) { + // LOG.info(MESSAGE_SOURCE.message("hdfsreader.12", + // splitNumber, adviceNumber, splitNumber, splitNumber)); + } + LOG.info("split() ok and end..."); + + return readerSplitConfigs; + } + + @Override + public void post() { + } + + @Override + public void destroy() { + } + + private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final long fromTime) throws Exception { + return + RetryUtil.executeWithRetry(new Callable() { + @Override + public GetCursorResponse call() throws Exception { + LOG.info("loghug get cursor with project: {} logstore: {} shard: {} time: {}", project, logstore, shard, fromTime); + return client.GetCursor(project, logstore, shard, fromTime); + } + }, 7, 1000L, true); + } + + } + + public static class Task extends Reader.Task { + + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + + private Configuration taskConfig; + private Client client; + private String endPoint; + private String accessKeyId; + private String accessKeySecret; + private String project; + private String logstore; + private long beginTimestampMillis; + private long endTimestampMillis; + private int batchSize; + private int shard; + private List columns; + + @Override + public void init() { + this.taskConfig = super.getPluginJobConf(); + + endPoint = this.taskConfig.getString(Key.ENDPOINT); + accessKeyId = this.taskConfig.getString(Key.ACCESSKEYID); + accessKeySecret = this.taskConfig.getString(Key.ACCESSKEYSECRET); + project = this.taskConfig.getString(Key.PROJECT); + logstore = this.taskConfig.getString(Key.LOGSTORE); + batchSize = this.taskConfig.getInt(Key.BATCHSIZE, 128); + + this.beginTimestampMillis = this.taskConfig.getLong(Key.BEGINTIMESTAMPMILLIS, -1); + String beginDateTime = this.taskConfig.getString(Key.BEGINDATETIME); + + if (beginDateTime != null) { + try { + beginTimestampMillis = getUnixTimeFromDateTime(beginDateTime); + } catch (ParseException e) { + } + } + + this.endTimestampMillis = this.taskConfig.getLong(Key.ENDTIMESTAMPMILLIS, -1); + String endDateTime = this.taskConfig.getString(Key.ENDDATETIME); + + if (endDateTime != null) { + try { + endTimestampMillis = getUnixTimeFromDateTime(endDateTime); + } catch (ParseException e) { + } + } + + columns = this.taskConfig.getList(Key.COLUMN, String.class); + + shard = this.taskConfig.getInt(Key.SHARD); + + client = new Client(endPoint, accessKeyId, accessKeySecret); + LOG.info("init loghub reader task finished.project:{} logstore:{} batchSize:{}", project, logstore, batchSize); + } + + @Override + public void prepare() { + } + + private long getUnixTimeFromDateTime(String dateTime) throws ParseException { + try { + String format = Constant.DATETIME_FORMAT; + SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); + return simpleDateFormat.parse(dateTime).getTime() / 1000; + } catch (ParseException ignored) { + } + String format = Constant.DATE_FORMAT; + SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); + return simpleDateFormat.parse(dateTime).getTime() / 1000; + } + + private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final long fromTime) throws Exception { + return + RetryUtil.executeWithRetry(new Callable() { + @Override + public GetCursorResponse call() throws Exception { + LOG.info("loghug get cursor with project: {} logstore: {} shard: {} time: {}", project, logstore, shard, fromTime); + return client.GetCursor(project, logstore, shard, fromTime); + } + }, 7, 1000L, true); + } + + private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final CursorMode mode) throws Exception { + return + RetryUtil.executeWithRetry(new Callable() { + @Override + public GetCursorResponse call() throws Exception { + LOG.info("loghug get cursor with project: {} logstore: {} shard: {} mode: {}", project, logstore, shard, mode); + return client.GetCursor(project, logstore, shard, mode); + } + }, 7, 1000L, true); + } + + private BatchGetLogResponse batchGetLogWithRetry(final Client client, final String project, final String logstore, final int shard, final int batchSize, + final String curCursor, final String endCursor) throws Exception { + return + RetryUtil.executeWithRetry(new Callable() { + @Override + public BatchGetLogResponse call() throws Exception { + return client.BatchGetLog(project, logstore, shard, batchSize, curCursor, endCursor); + } + }, 7, 1000L, true); + } + + @Override + public void startRead(RecordSender recordSender) { + LOG.info("read start"); + + try { + GetCursorResponse cursorRes; + if (this.beginTimestampMillis != -1) { + cursorRes = getCursorWithRetry(client, project, logstore, this.shard, beginTimestampMillis); + } else { + cursorRes = getCursorWithRetry(client, project, logstore, this.shard, CursorMode.BEGIN); + } + String beginCursor = cursorRes.GetCursor(); + + LOG.info("the begin cursor, loghub requestId: {} cursor: {}", cursorRes.GetRequestId(), cursorRes.GetCursor()); + + if (this.endTimestampMillis != -1) { + cursorRes = getCursorWithRetry(client, project, logstore, this.shard, endTimestampMillis); + } else { + cursorRes = getCursorWithRetry(client, project, logstore, this.shard, CursorMode.END); + } + String endCursor = cursorRes.GetCursor(); + LOG.info("the end cursor, loghub requestId: {} cursor: {}", cursorRes.GetRequestId(), cursorRes.GetCursor()); + + if (StringUtils.equals(beginCursor, endCursor)) { + LOG.info("beginCursor:{} equals endCursor:{}, end directly!", beginCursor, endCursor); + return; + } + + String currentCursor = null; + String nextCursor = beginCursor; + + HashMap metaMap = new HashMap(); + HashMap dataMap = new HashMap(); + JSONObject allMetaJson = new JSONObject(); + while (!StringUtils.equals(currentCursor, nextCursor)) { + currentCursor = nextCursor; + BatchGetLogResponse logDataRes = batchGetLogWithRetry(client, project, logstore, this.shard, this.batchSize, currentCursor, endCursor); + + List logGroups = logDataRes.GetLogGroups(); + + for(LogGroupData logGroup: logGroups) { + metaMap.clear(); + allMetaJson.clear(); + FastLogGroup flg = logGroup.GetFastLogGroup(); + + metaMap.put("C_Category", flg.getCategory()); + metaMap.put(Constant.META_COL_CATEGORY, flg.getCategory()); + allMetaJson.put(Constant.META_COL_CATEGORY, flg.getCategory()); + + metaMap.put("C_Source", flg.getSource()); + metaMap.put(Constant.META_COL_SOURCE, flg.getSource()); + allMetaJson.put(Constant.META_COL_SOURCE, flg.getSource()); + + metaMap.put("C_Topic", flg.getTopic()); + metaMap.put(Constant.META_COL_TOPIC, flg.getTopic()); + allMetaJson.put(Constant.META_COL_TOPIC, flg.getTopic()); + + metaMap.put("C_MachineUUID", flg.getMachineUUID()); + metaMap.put(Constant.META_COL_MACHINEUUID, flg.getMachineUUID()); + allMetaJson.put(Constant.META_COL_MACHINEUUID, flg.getMachineUUID()); + + for (int tagIdx = 0; tagIdx < flg.getLogTagsCount(); ++tagIdx) { + FastLogTag logtag = flg.getLogTags(tagIdx); + String tagKey = logtag.getKey(); + String tagValue = logtag.getValue(); + if (tagKey.equals(Constant.META_COL_HOSTNAME)) { + metaMap.put("C_HostName", logtag.getValue()); + } else if (tagKey.equals(Constant.META_COL_PATH)) { + metaMap.put("C_Path", logtag.getValue()); + } + metaMap.put(tagKey, tagValue); + allMetaJson.put(tagKey, tagValue); + } + + for (int lIdx = 0; lIdx < flg.getLogsCount(); ++lIdx) { + dataMap.clear(); + FastLog log = flg.getLogs(lIdx); + + String logTime = String.valueOf(log.getTime()); + metaMap.put("C_LogTime", logTime); + metaMap.put(Constant.META_COL_LOGTIME, logTime); + allMetaJson.put(Constant.META_COL_LOGTIME, logTime); + + for (int cIdx = 0; cIdx < log.getContentsCount(); ++cIdx) { + FastLogContent content = log.getContents(cIdx); + dataMap.put(content.getKey(), content.getValue()); + } + + Record record = recordSender.createRecord(); + + JSONObject extractOthers = new JSONObject(); + if(columns.contains(Constant.COL_EXTRACT_OTHERS)){ + List keyList = Arrays.asList(dataMap.keySet().toArray(new String[dataMap.keySet().size()])); + for (String otherKey:keyList) { + if (!columns.contains(otherKey)){ + extractOthers.put(otherKey,dataMap.get(otherKey)); + } + } + } + if (null != this.columns && 1 == this.columns.size()) { + String columnsInStr = columns.get(0).toString(); + if ("\"*\"".equals(columnsInStr) || "*".equals(columnsInStr)) { + List keyList = Arrays.asList(dataMap.keySet().toArray(new String[dataMap.keySet().size()])); + Collections.sort(keyList); + + for (String key : keyList) { + record.addColumn(new StringColumn(key + ":" + dataMap.get(key))); + } + } else { + if (dataMap.containsKey(columnsInStr)) { + record.addColumn(new StringColumn(dataMap.get(columnsInStr))); + } else if (metaMap.containsKey(columnsInStr)) { + record.addColumn(new StringColumn(metaMap.get(columnsInStr))); + } else if (Constant.COL_EXTRACT_OTHERS.equals(columnsInStr)){ + record.addColumn(new StringColumn(extractOthers.toJSONString())); + } else if (Constant.COL_EXTRACT_ALL_META.equals(columnsInStr)) { + record.addColumn(new StringColumn(allMetaJson.toJSONString())); + } + } + } else { + for (String col : this.columns) { + if (dataMap.containsKey(col)) { + record.addColumn(new StringColumn(dataMap.get(col))); + } else if (metaMap.containsKey(col)) { + record.addColumn(new StringColumn(metaMap.get(col))); + } else if (col != null && col.startsWith("'") && col.endsWith("'")){ + String constant = col.substring(1, col.length()-1); + record.addColumn(new StringColumn(constant)); + }else if (Constant.COL_EXTRACT_OTHERS.equals(col)){ + record.addColumn(new StringColumn(extractOthers.toJSONString())); + } else if (Constant.COL_EXTRACT_ALL_META.equals(col)) { + record.addColumn(new StringColumn(allMetaJson.toJSONString())); + } else { + record.addColumn(new StringColumn(null)); + } + } + } + + recordSender.sendToWriter(record); + } + } + + nextCursor = logDataRes.GetNextCursor(); + } + } catch (LogException e) { + if (e.GetErrorCode().equals("LogStoreNotExist")) { + LOG.info("logStore[" + logstore +"] Not Exits! detail error messsage: " + e.toString()); + } else { + LOG.error("read LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); + throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); + } + + } catch (Exception e) { + LOG.error("read LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); + throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); + } + + LOG.info("end read loghub shard..."); + } + + @Override + public void post() { + } + + @Override + public void destroy() { + } + } +} diff --git a/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReaderErrorCode.java b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReaderErrorCode.java new file mode 100644 index 00000000..d9ee4c8e --- /dev/null +++ b/loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReaderErrorCode.java @@ -0,0 +1,34 @@ +package com.alibaba.datax.plugin.reader.loghubreader; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum LogHubReaderErrorCode implements ErrorCode { + BAD_CONFIG_VALUE("LogHuReader-00", "The value you configured is invalid."), + LOG_HUB_ERROR("LogHubReader-01","LogHub access encounter exception"), + REQUIRE_VALUE("LogHubReader-02","Missing parameters"), + EMPTY_LOGSTORE_VALUE("LogHubReader-03","There is no shard in this LogStore"); + + private final String code; + private final String description; + + private LogHubReaderErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s]. ", this.code, + this.description); + } +} diff --git a/loghubreader/src/main/resources/plugin.json b/loghubreader/src/main/resources/plugin.json new file mode 100644 index 00000000..31403dd6 --- /dev/null +++ b/loghubreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "loghubreader", + "class": "com.alibaba.datax.plugin.reader.loghubreader.LogHubReader", + "description": "适用于: 从SLS LogHub中读取数据", + "developer": "alibaba" +} \ No newline at end of file diff --git a/loghubreader/src/main/resources/plugin_job_template.json b/loghubreader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..4d536eb9 --- /dev/null +++ b/loghubreader/src/main/resources/plugin_job_template.json @@ -0,0 +1,12 @@ +{ + "name": "loghubreader", + "parameter": { + "endpoint": "", + "accessId": "", + "accessKey": "", + "project": "", + "logstore": "", + "batchSize":1024, + "column": [] + } +} \ No newline at end of file diff --git a/loghubwriter/pom.xml b/loghubwriter/pom.xml new file mode 100644 index 00000000..d43b7286 --- /dev/null +++ b/loghubwriter/pom.xml @@ -0,0 +1,73 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + loghubwriter + + 0.0.1-SNAPSHOT + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.aliyun.openservices + aliyun-log + 0.6.12 + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/loghubwriter/src/main/assembly/package.xml b/loghubwriter/src/main/assembly/package.xml new file mode 100644 index 00000000..44d25a48 --- /dev/null +++ b/loghubwriter/src/main/assembly/package.xml @@ -0,0 +1,34 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + + plugin/writer/loghubwriter + + + target/ + + loghubwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/loghubwriter + + + + + + false + plugin/writer/loghubwriter/libs + runtime + + + diff --git a/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/Key.java b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/Key.java new file mode 100644 index 00000000..bdfe3fa5 --- /dev/null +++ b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/Key.java @@ -0,0 +1,35 @@ +package com.alibaba.datax.plugin.writer.loghubwriter; + +/** + * 配置关键字 + * @author + */ +public final class Key { + + /** + * 此处声明插件用到的需要插件使用者提供的配置项 + */ + public static final String ENDPOINT = "endpoint"; + + public static final String ACCESS_KEY_ID = "accessId"; + + public static final String ACCESS_KEY_SECRET = "accessKey"; + + public static final String PROJECT = "project"; + + public static final String LOG_STORE = "logstore"; + + public static final String TOPIC = "topic"; + + public static final String COLUMN = "column"; + + public static final String BATCH_SIZE = "batchSize"; + + public static final String TIME = "time"; + + public static final String TIME_FORMAT = "timeformat"; + + public static final String SOURCE = "source"; + + public static final String HASH_BY_KEY = "hashKey"; +} diff --git a/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriter.java b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriter.java new file mode 100644 index 00000000..bf60d08c --- /dev/null +++ b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriter.java @@ -0,0 +1,315 @@ +package com.alibaba.datax.plugin.writer.loghubwriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.RetryUtil; +import com.alibaba.datax.common.util.StrUtil; +import com.aliyun.openservices.log.Client; +import com.aliyun.openservices.log.common.LogItem; +import com.aliyun.openservices.log.common.Shard; +import com.aliyun.openservices.log.exception.LogException; +import com.aliyun.openservices.log.request.ListShardRequest; +import com.aliyun.openservices.log.request.PutLogsRequest; +import com.aliyun.openservices.log.response.ListShardResponse; +import com.aliyun.openservices.log.response.PutLogsResponse; + +import org.apache.commons.codec.digest.Md5Crypt; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import sun.security.provider.MD5; + +import java.text.DateFormat; +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.Date; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.Callable; + +/** + * SLS 写插件 + * @author + */ +public class LogHubWriter extends Writer { + + public static class Job extends Writer.Job { + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + + private Configuration jobConfig = null; + + @Override + public void init() { + info(LOG, "loghub writer job init begin ..."); + this.jobConfig = super.getPluginJobConf(); + validateParameter(jobConfig); + info(LOG, "loghub writer job init end."); + } + + private void validateParameter(Configuration conf){ + conf.getNecessaryValue(Key.ENDPOINT,LogHubWriterErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.ACCESS_KEY_ID,LogHubWriterErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.ACCESS_KEY_SECRET,LogHubWriterErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.PROJECT,LogHubWriterErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.LOG_STORE,LogHubWriterErrorCode.REQUIRE_VALUE); + conf.getNecessaryValue(Key.COLUMN,LogHubWriterErrorCode.REQUIRE_VALUE); + } + + @Override + public List split(int mandatoryNumber) { + info(LOG, "split begin..."); + List configurationList = new ArrayList(); + for (int i = 0; i < mandatoryNumber; i++) { + configurationList.add(this.jobConfig.clone()); + } + info(LOG, "split end..."); + return configurationList; + } + + @Override + public void post() { + } + + @Override + public void destroy() { + } + } + + public static class Task extends Writer.Task { + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + private Configuration taskConfig; + private com.aliyun.openservices.log.Client logHubClient; + private String logStore; + private String topic; + private String project; + private List columnList; + private int batchSize; + private String timeCol; + private String timeFormat; + private String source; + private boolean isHashKey; + private List shards; + public void init() { + this.taskConfig = super.getPluginJobConf(); + String endpoint = taskConfig.getString(Key.ENDPOINT); + String accessKeyId = taskConfig.getString(Key.ACCESS_KEY_ID); + String accessKeySecret = taskConfig.getString(Key.ACCESS_KEY_SECRET); + project = taskConfig.getString(Key.PROJECT); + logStore = taskConfig.getString(Key.LOG_STORE); + topic = taskConfig.getString(Key.TOPIC,""); + columnList = taskConfig.getList(Key.COLUMN,String.class); + batchSize = taskConfig.getInt(Key.BATCH_SIZE,1024); + timeCol = taskConfig.getString(Key.TIME,""); + timeFormat = taskConfig.getString(Key.TIME_FORMAT,""); + source = taskConfig.getString(Key.SOURCE,""); + isHashKey = taskConfig.getBool(Key.HASH_BY_KEY,false); + logHubClient = new Client(endpoint, accessKeyId, accessKeySecret); + if (isHashKey) { + listShard(); + info(LOG, "init loghub writer with hash key mode."); + } + if (LOG.isInfoEnabled()) { + LOG.info("init loghub writer task finished.project:{} logstore:{} topic:{} batchSize:{}",project,logStore,topic,batchSize); + } + } + + /** + * 获取通道的分片信息 + */ + private void listShard() { + try { + ListShardResponse response = logHubClient.ListShard(new ListShardRequest(project,logStore)); + shards = response.GetShards(); + if (LOG.isInfoEnabled()) { + LOG.info("Get shard count:{}", shards.size()); + } + } catch (LogException e) { + info(LOG, "Get shard failed!"); + throw new RuntimeException("Get shard failed!", e); + } + } + + @Override + public void prepare() { + } + + private int getTime(String v) { + try { + if ("bigint".equalsIgnoreCase(timeFormat)) { + return Integer.valueOf(v); + } + + DateFormat sdf = new SimpleDateFormat(timeFormat); + Date date = sdf.parse(v); + return (int)(date.getTime()/1000); + } catch (Exception e) { + LOG.warn("Format time failed!", e); + } + return (int)(((new Date())).getTime()/1000); + } + + @Override + public void startWrite(RecordReceiver recordReceiver) { + info(LOG, "start to write....................."); + // 按照shared做hash处理 + if (isHashKey) { + processDataWithHashKey(recordReceiver); + } else { + processDataWithoutHashKey(recordReceiver); + } + info(LOG, "finish to write........."); + } + + private void processDataWithHashKey(RecordReceiver receiver) { + Record record; + Map> logMap = new HashMap>(shards.size()); + int count = 0; + try { + while ((record = receiver.getFromReader()) != null) { + LogItem logItem = new LogItem(); + if (record.getColumnNumber() != columnList.size()) { + this.getTaskPluginCollector().collectDirtyRecord(record, "column not match"); + } + + String id = ""; + for (int i = 0; i < record.getColumnNumber(); i++) { + String colName = columnList.get(i); + String colValue = record.getColumn(i).asString(); + if (colName.endsWith("_id")) { + id = colValue; + } + + logItem.PushBack(colName, colValue); + if (colName.equals(timeCol)) { + logItem.SetTime(getTime(colValue)); + } + } + + String hashKey = getShardHashKey(StrUtil.getMd5(id), shards); + if (!logMap.containsKey(hashKey)) { + info(LOG, "Hash key:" + hashKey); + logMap.put(hashKey, new ArrayList()); + } + logMap.get(hashKey).add(logItem); + + if (logMap.get(hashKey).size() % batchSize == 0) { + PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logMap.get(hashKey), hashKey); + PutLogsResponse response = putLog(request); + count += logMap.get(hashKey).size(); + if (LOG.isDebugEnabled()) { + LOG.debug("record count:{}, request id:{}", logMap.get(hashKey).size(), response.GetRequestId()); + } + logMap.get(hashKey).clear(); + } + } + + for (Map.Entry> entry : logMap.entrySet()) { + if (!entry.getValue().isEmpty()) { + // 将剩余的数据发送 + PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, entry.getValue(), entry.getKey()); + PutLogsResponse response = putLog(request); + count += entry.getValue().size(); + if (LOG.isDebugEnabled()) { + LOG.debug("record count:{}, request id:{}", entry.getValue().size(), response.GetRequestId()); + } + entry.getValue().clear(); + } + } + LOG.info("{} records have been sent", count); + } catch (LogException ex) { + throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, ex.getMessage(), ex); + } catch (Exception e) { + throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, e.getMessage(), e); + } + } + + private void processDataWithoutHashKey(RecordReceiver receiver) { + Record record; + ArrayList logGroup = new ArrayList(); + int count = 0; + try { + while ((record = receiver.getFromReader()) != null) { + LogItem logItem = new LogItem(); + if(record.getColumnNumber() != columnList.size()){ + this.getTaskPluginCollector().collectDirtyRecord(record,"column not match"); + } + for (int i = 0; i < record.getColumnNumber(); i++) { + String colName = columnList.get(i); + String colValue = record.getColumn(i).asString(); + logItem.PushBack(colName, colValue); + if(colName.equals(timeCol)){ + logItem.SetTime(getTime(colValue)); + } + } + + logGroup.add(logItem); + count++; + if (count % batchSize == 0) { + PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logGroup); + PutLogsResponse response = putLog(request); + logGroup.clear(); + if (LOG.isDebugEnabled()) { + LOG.debug("record count:{}, request id:{}", count, response.GetRequestId()); + } + } + } + if (!logGroup.isEmpty()) { + //将剩余的数据发送 + PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logGroup); + PutLogsResponse response = putLog(request); + logGroup.clear(); + if (LOG.isDebugEnabled()) { + LOG.debug("record count:{}, request id:{}", count, response.GetRequestId()); + } + } + LOG.info("{} records have been sent", count); + } catch (LogException ex) { + throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, ex.getMessage(), ex); + } catch (Exception e) { + throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, e.getMessage(), e); + } + } + + private PutLogsResponse putLog(final PutLogsRequest request) throws Exception{ + final Client client = this.logHubClient; + + return RetryUtil.executeWithRetry(new Callable() { + public PutLogsResponse call() throws LogException{ + return client.PutLogs(request); + } + }, 3, 1000L, false); + } + + private String getShardHashKey(String hashKey, List shards) { + for (Shard shard : shards) { + if (hashKey.compareTo(shard.getExclusiveEndKey()) < 0 && hashKey.compareTo(shard.getInclusiveBeginKey()) >= 0) { + return shard.getInclusiveBeginKey(); + } + } + return shards.get(0).getInclusiveBeginKey(); + } + + @Override + public void post() { + } + + @Override + public void destroy() { + } + } + + /** + * 日志打印控制 + * + * @param logger + * @param message + */ + public static void info(Logger logger, String message) { + if (logger.isInfoEnabled()) { + logger.info(message); + } + } +} diff --git a/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriterErrorCode.java b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriterErrorCode.java new file mode 100644 index 00000000..98c5e16f --- /dev/null +++ b/loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriterErrorCode.java @@ -0,0 +1,33 @@ +package com.alibaba.datax.plugin.writer.loghubwriter; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum LogHubWriterErrorCode implements ErrorCode { + BAD_CONFIG_VALUE("LogHubWriter-00", "The value you configured is invalid."), + LOG_HUB_ERROR("LogHubWriter-01","LogHub access encounter exception"), + REQUIRE_VALUE("LogHubWriter-02","Missing parameters"); + + private final String code; + private final String description; + + private LogHubWriterErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s]. ", this.code, + this.description); + } +} \ No newline at end of file diff --git a/loghubwriter/src/main/resources/plugin.json b/loghubwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..2a913b14 --- /dev/null +++ b/loghubwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "loghubwriter", + "class": "com.alibaba.datax.plugin.writer.loghubwriter.LogHubWriter", + "description": "适用于: 将数据导入到SLS LogHub中", + "developer": "alibaba" +} \ No newline at end of file diff --git a/loghubwriter/src/main/resources/plugin_job_template.json b/loghubwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..ac0d3b2a --- /dev/null +++ b/loghubwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,13 @@ +{ + "name": "loghubwriter", + "parameter": { + "endpoint": "", + "accessId": "", + "accessKey": "", + "project": "", + "logstore": "", + "topic": "", + "batchSize":1024, + "column": [] + } +} \ No newline at end of file diff --git a/mongodbreader/doc/mongodbreader.md b/mongodbreader/doc/mongodbreader.md index 99d25731..297e598c 100644 --- a/mongodbreader/doc/mongodbreader.md +++ b/mongodbreader/doc/mongodbreader.md @@ -114,8 +114,7 @@ MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的J "accessKey": "********************", "truncate": true, "odpsServer": "xxx/api", - "tunnelServer": "xxx", - "accountType": "aliyun" + "tunnelServer": "xxx" } } } diff --git a/mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java b/mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java index ba7f07f4..4d129a5a 100644 --- a/mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java +++ b/mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java @@ -18,9 +18,9 @@ import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.mongodbreader.util.CollectionSplitUtil; import com.alibaba.datax.plugin.reader.mongodbreader.util.MongoUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.google.common.base.Joiner; import com.google.common.base.Strings; diff --git a/mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriter.java b/mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriter.java index 66c75078..76f35a40 100644 --- a/mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriter.java +++ b/mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriter.java @@ -7,9 +7,9 @@ import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.mongodbwriter.util.MongoUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.google.common.base.Strings; import com.mongodb.*; import com.mongodb.client.MongoCollection; diff --git a/mysqlreader/doc/mysqlreader.md b/mysqlreader/doc/mysqlreader.md index 24589579..bae4bce0 100644 --- a/mysqlreader/doc/mysqlreader.md +++ b/mysqlreader/doc/mysqlreader.md @@ -197,9 +197,9 @@ MysqlReader插件实现了从Mysql读取数据。在底层实现上,MysqlReade * **querySql** - * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
+ * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
- `当用户配置querySql时,MysqlReader直接忽略column、where条件的配置`,querySql优先级大于column、where选项。querySql和table不能同时存在 + `当用户配置querySql时,MysqlReader直接忽略table、column、where条件的配置`,querySql优先级大于table、column、where选项。 * 必选:否
diff --git a/neo4jwriter/doc/neo4jwriter.md b/neo4jwriter/doc/neo4jwriter.md new file mode 100644 index 00000000..0c6e356c --- /dev/null +++ b/neo4jwriter/doc/neo4jwriter.md @@ -0,0 +1,193 @@ +# DataX neo4jWriter 插件文档 + +## 功能简介 + +本目前市面上的neo4j 批量导入主要有Cypher Create,Load CSV,第三方或者官方提供的Batch Import。Load CSV支持节点10W级别一下,Batch Import 需要对数据库进行停机。要想实现不停机的数据写入,Cypher是最好的方式。 + +## 支持版本 + +支持Neo4j 4 和Neo4j 5,如果是Neo4j 3,需要自行将驱动降低至相对应的版本进行编译。 + +## 实现原理 + +将datax的数据转换成了neo4j驱动能识别的对象,利用 unwind 语法进行批量插入。 + +## 如何配置 + +### 配置项介绍 + +| 配置 | 说明 | 是否必须 | 默认值 | 示例 | +|:-------------------------------|--------------------| -------- | ------ | ---------------------------------------------------- | +| database | 数据库名字 | 是 | - | neo4j | +| uri | 数据库访问链接 | 是 | - | bolt://localhost:7687 | +| username | 访问用户名 | 是 | - | neo4j | +| password | 访问密码 | 是 | - | neo4j | +| bearerToken | 权限相关 | 否 | - | - | +| kerberosTicket | 权限相关 | 否 | - | - | +| cypher | 同步语句 | 是 | - | unwind $batch as row create(p) set p.name = row.name | +| batchDataVariableName | unwind 携带的数据变量名 | | | batch | +| properties | 定义neo4j中数据的属性名字和类型 | 是 | - | 见后续案例 | +| batchSize | 一批写入数据量 | 否 | 1000 | | +| maxTransactionRetryTimeSeconds | 事务运行最长时间 | 否 | 30秒 | 30 | +| maxConnectionTimeoutSeconds | 驱动最长链接时间 | 否 | 30秒 | 30 | +| retryTimes | 发生错误的重试次数 | 否 | 3次 | 3 | +| retrySleepMills | 重试失败后的等待时间 | 否 | 3秒 | 3 | + +### 支持的数据类型 +> 配置时均忽略大小写 +``` +BOOLEAN, +STRING, +LONG, +SHORT, +INTEGER, +DOUBLE, +FLOAT, +LOCAL_DATE, +LOCAL_TIME, +LOCAL_DATE_TIME, +LIST, +//map类型支持 . 属性表达式取值 +MAP, +CHAR_ARRAY, +BYTE_ARRAY, +BOOLEAN_ARRAY, +STRING_ARRAY, +LONG_ARRAY, +INT_ARRAY, +SHORT_ARRAY, +DOUBLE_ARRAY, +FLOAT_ARRAY, +Object_ARRAY +``` + +### 写节点 + +这里提供了一个写节点包含很多类型属性的例子。你可以在我的测试方法中运行。 + +```json +"writer": { + "name": "neo4jWriter", + "parameter": { + "uri": "neo4j://localhost:7687", + "username": "neo4j", + "password": "Test@12343", + "database": "neo4j", + "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", + "batchDataVariableName": "batch", + "batchSize": "33", + "properties": [ + { + "name": "pbool", + "type": "BOOLEAN" + }, + { + "name": "pstring", + "type": "STRING" + }, + { + "name": "plong", + "type": "LONG" + }, + { + "name": "pshort", + "type": "SHORT" + }, + { + "name": "pdouble", + "type": "DOUBLE" + }, + { + "name": "pstringarr", + "type": "STRING_ARRAY", + "split": "," + }, + { + "name": "plocaldate", + "type": "LOCAL_DATE", + "dateFormat": "yyyy-MM-dd" + } + ] + } + } +``` + +### 写关系 + +```json +"writer": { + "name": "neo4jWriter", + "parameter": { + "uri": "neo4j://localhost:7687", + "username": "neo4j", + "password": "Test@12343", + "database": "neo4j", + "cypher": "unwind $batch as row match(p1:Person) where p1.id = row.startNodeId match(p2:Person) where p2.id = row.endNodeId create (p1)-[:LINK]->(p2)", + "batchDataVariableName": "batch", + "batch_size": "33", + "properties": [ + { + "name": "startNodeId", + "type": "STRING" + }, + { + "name": "endNodeId", + "type": "STRING" + } + ] + } + } +``` + +### 节点/关系类型动态写 + +> 需要使用AOPC函数拓展,如果你的数据库没有,请安装APOC函数拓展 + +```json + "writer": { + "name": "neo4jWriter", + "parameter": { + "uri": "bolt://localhost:7687", + "username": "yourUserName", + "password": "yourPassword", + "database": "yourDataBase", + "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", + "batchDataVariableName": "batch", + "batch_size": "1", + "properties": [ + { + "name": "Label", + "type": "STRING" + }, + { + "name": "id", + "type": "STRING" + } + ] + } + } +``` + +## 注意事项 + +* properties定义的顺序需要与reader端顺序一一对应。 +* 灵活使用map类型,可以免去很多数据加工的烦恼。在cypher中,可以根据 . 属性访问符号一直取值。比如 unwind $batch as row create (p) set p.name = row.prop.name,set p.age = row.prop.age,在这个例子中,prop是map类型,包含name和age两个属性。 +* 如果提示事务超时,建议调大事务运行时间或者调小batchSize +* 如果用于更新场景,遇到死锁问题影响写入,建议二开源码加入死锁异常检测,并进行重试。 + +## 性能报告 + +**JVM参数** + +16G G1垃圾收集器 8核心 + +**Neo4j数据库配置** + +32核心,256G + +**datax 配置** + +* Channel 20 batchsize = 1000 +* 任务平均流量:15.23MB/s +* 记录写入速度:44440 rec/s +* 读出记录总数:2222013 diff --git a/neo4jwriter/pom.xml b/neo4jwriter/pom.xml new file mode 100644 index 00000000..2ff0f550 --- /dev/null +++ b/neo4jwriter/pom.xml @@ -0,0 +1,100 @@ + + + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + 4.0.0 + + neo4jwriter + neo4jwriter + jar + + + 8 + 8 + UTF-8 + 4.4.9 + 4.13.2 + 1.17.6 + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + org.neo4j.driver + neo4j-java-driver + ${neo4j-java-driver.version} + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + + org.testcontainers + testcontainers + ${test.container.version} + + + + junit + junit + ${junit4.version} + test + + + + + + + src/main/resources + + **/*.* + + true + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/neo4jwriter/src/main/assembly/package.xml b/neo4jwriter/src/main/assembly/package.xml new file mode 100644 index 00000000..3acbe674 --- /dev/null +++ b/neo4jwriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/neo4jwriter + + + target/ + + neo4jwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/neo4jwriter + + + + + + false + plugin/writer/neo4jwriter/libs + runtime + + + diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jClient.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jClient.java new file mode 100644 index 00000000..4451bbdf --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jClient.java @@ -0,0 +1,256 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter; + + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.RetryUtil; +import com.alibaba.datax.plugin.writer.neo4jwriter.adapter.DateAdapter; +import com.alibaba.datax.plugin.writer.neo4jwriter.adapter.ValueAdapter; +import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; +import com.alibaba.datax.plugin.writer.neo4jwriter.exception.Neo4jErrorCode; +import com.alibaba.fastjson2.JSON; +import org.apache.commons.lang3.StringUtils; +import org.neo4j.driver.*; +import org.neo4j.driver.exceptions.Neo4jException; +import org.neo4j.driver.internal.value.MapValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.*; +import java.util.concurrent.TimeUnit; + +import static com.alibaba.datax.plugin.writer.neo4jwriter.config.ConfigConstants.*; +import static com.alibaba.datax.plugin.writer.neo4jwriter.exception.Neo4jErrorCode.DATABASE_ERROR; + +public class Neo4jClient { + private static final Logger LOGGER = LoggerFactory.getLogger(Neo4jClient.class); + private Driver driver; + + private WriteConfig writeConfig; + private RetryConfig retryConfig; + private TaskPluginCollector taskPluginCollector; + + private Session session; + + private List writerBuffer; + + + public Neo4jClient(Driver driver, + WriteConfig writeConfig, + RetryConfig retryConfig, + TaskPluginCollector taskPluginCollector) { + this.driver = driver; + this.writeConfig = writeConfig; + this.retryConfig = retryConfig; + this.taskPluginCollector = taskPluginCollector; + this.writerBuffer = new ArrayList<>(writeConfig.batchSize); + } + + public void init() { + String database = writeConfig.database; + //neo4j 3.x 没有数据库 + if (null != database && !"".equals(database)) { + this.session = driver.session(SessionConfig.forDatabase(database)); + } else { + this.session = driver.session(); + } + } + + public static Neo4jClient build(Configuration config, TaskPluginCollector taskPluginCollector) { + + Driver driver = buildNeo4jDriver(config); + String cypher = checkCypher(config); + String database = config.getString(DATABASE.getKey()); + String batchVariableName = config.getString(BATCH_DATA_VARIABLE_NAME.getKey(), + BATCH_DATA_VARIABLE_NAME.getDefaultValue()); + List neo4jProperties = JSON.parseArray(config.getString(NEO4J_PROPERTIES.getKey()), Neo4jProperty.class); + int batchSize = config.getInt(BATCH_SIZE.getKey(), BATCH_SIZE.getDefaultValue()); + int retryTimes = config.getInt(RETRY_TIMES.getKey(), RETRY_TIMES.getDefaultValue()); + + return new Neo4jClient(driver, + new WriteConfig(cypher, database, batchVariableName, neo4jProperties, batchSize), + new RetryConfig(retryTimes, config.getLong(RETRY_SLEEP_MILLS.getKey(), RETRY_SLEEP_MILLS.getDefaultValue())), + taskPluginCollector + ); + } + + private static String checkCypher(Configuration config) { + String cypher = config.getString(CYPHER.getKey()); + if (StringUtils.isBlank(cypher)) { + throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "cypher must not null or empty"); + } + return cypher; + } + + private static Driver buildNeo4jDriver(Configuration config) { + + Config.ConfigBuilder configBuilder = Config.builder().withMaxConnectionPoolSize(1); + String uri = checkUriConfig(config); + + //connection timeout + //连接超时时间 + Long maxConnTime = config.getLong(MAX_CONNECTION_TIMEOUT_SECONDS.getKey(), MAX_TRANSACTION_RETRY_TIME.getDefaultValue()); + configBuilder + .withConnectionAcquisitionTimeout( + maxConnTime * 2, TimeUnit.SECONDS) + .withConnectionTimeout(maxConnTime, TimeUnit.SECONDS); + + + //transaction timeout + //事务运行超时时间 + Long txRetryTime = config.getLong(MAX_TRANSACTION_RETRY_TIME.getKey(), MAX_TRANSACTION_RETRY_TIME.getDefaultValue()); + configBuilder.withMaxTransactionRetryTime(txRetryTime, TimeUnit.SECONDS); + String username = config.getString(USERNAME.getKey()); + String password = config.getString(PASSWORD.getKey()); + String bearerToken = config.getString(BEARER_TOKEN.getKey()); + String kerberosTicket = config.getString(KERBEROS_TICKET.getKey()); + + if (StringUtils.isNotBlank(username) && StringUtils.isNotBlank(password)) { + + return GraphDatabase.driver(uri, AuthTokens.basic(username, password), configBuilder.build()); + + } else if (StringUtils.isNotBlank(bearerToken)) { + + return GraphDatabase.driver(uri, AuthTokens.bearer(bearerToken), configBuilder.build()); + + } else if (StringUtils.isNotBlank(kerberosTicket)) { + + return GraphDatabase.driver(uri, AuthTokens.kerberos(kerberosTicket), configBuilder.build()); + + } + + throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "Invalid Auth config."); + } + + private static String checkUriConfig(Configuration config) { + String uri = config.getString(URI.getKey()); + if (null == uri || uri.length() == 0) { + throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "Invalid uri configuration"); + } + return uri; + } + + public void destroy() { + tryFlushBuffer(); + if (driver != null) { + driver.close(); + } + if (session != null) { + session.close(); + } + DateAdapter.destroy(); + } + + private void tryFlushBuffer() { + if (!writerBuffer.isEmpty()) { + doWrite(writerBuffer); + writerBuffer.clear(); + } + } + + private void tryBatchWrite() { + if (!writerBuffer.isEmpty() && writerBuffer.size() >= writeConfig.batchSize) { + doWrite(writerBuffer); + writerBuffer.clear(); + } + } + + private void doWrite(List values) { + Value batchValues = Values.parameters(this.writeConfig.batchVariableName, values); + Query query = new Query(this.writeConfig.cypher, batchValues); +// LOGGER.debug("query:{}", query.text()); +// LOGGER.debug("batch:{}", toUnwindStr(values)); + try { + RetryUtil.executeWithRetry(() -> { + session.writeTransaction(tx -> tx.run(query)); + return null; + }, this.retryConfig.retryTimes, retryConfig.retrySleepMills, true, + Collections.singletonList(Neo4jException.class)); + } catch (Exception e) { + LOGGER.error("an exception occurred while writing to the database,message:{}", e.getMessage()); + throw DataXException.asDataXException(DATABASE_ERROR, e.getMessage()); + } + + + } + + private String toUnwindStr(List values) { + StringJoiner joiner = new StringJoiner(","); + for (MapValue value : values) { + joiner.add(value.toString()); + } + return "[" + joiner + "]"; + } + + public void tryWrite(Record record) { + MapValue neo4jValue = checkAndConvert(record); + writerBuffer.add(neo4jValue); + tryBatchWrite(); + } + + private MapValue checkAndConvert(Record record) { + int sourceColNum = record.getColumnNumber(); + List neo4jProperties = writeConfig.neo4jProperties; + if (neo4jProperties == null || neo4jProperties.size() != sourceColNum) { + throw new DataXException(Neo4jErrorCode.CONFIG_INVALID, "the read and write columns do not match!"); + } + Map data = new HashMap<>(sourceColNum * 4 / 3); + for (int i = 0; i < sourceColNum; i++) { + Column column = record.getColumn(i); + Neo4jProperty neo4jProperty = neo4jProperties.get(i); + try { + + Value value = ValueAdapter.column2Value(column, neo4jProperty); + data.put(neo4jProperty.getName(), value); + } catch (Exception e) { + LOGGER.info("dirty record:{},message :{}", column, e.getMessage()); + this.taskPluginCollector.collectDirtyRecord(record, e.getMessage()); + } + } + return new MapValue(data); + } + + public List getNeo4jFields() { + return this.writeConfig.neo4jProperties; + } + + + static class RetryConfig { + int retryTimes; + long retrySleepMills; + + RetryConfig(int retryTimes, long retrySleepMills) { + this.retryTimes = retryTimes; + this.retrySleepMills = retrySleepMills; + } + } + + static class WriteConfig { + String cypher; + + String database; + + String batchVariableName; + + List neo4jProperties; + + int batchSize; + + public WriteConfig(String cypher, + String database, + String batchVariableName, + List neo4jProperties, + int batchSize) { + this.cypher = cypher; + this.database = database; + this.batchVariableName = batchVariableName; + this.neo4jProperties = neo4jProperties; + this.batchSize = batchSize; + } + + + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jWriter.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jWriter.java new file mode 100644 index 00000000..6a589c1d --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jWriter.java @@ -0,0 +1,64 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter; + +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.element.Record; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +public class Neo4jWriter extends Writer { + public static class Job extends Writer.Job { + private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); + + private Configuration jobConf = null; + @Override + public void init() { + LOGGER.info("Neo4jWriter Job init success"); + this.jobConf = getPluginJobConf(); + } + + @Override + public void destroy() { + LOGGER.info("Neo4jWriter Job destroyed"); + } + + @Override + public List split(int mandatoryNumber) { + List configurations = new ArrayList(mandatoryNumber); + for (int i = 0; i < mandatoryNumber; i++) { + configurations.add(this.jobConf.clone()); + } + return configurations; + } + } + + public static class Task extends Writer.Task { + private static final Logger TASK_LOGGER = LoggerFactory.getLogger(Task.class); + private Neo4jClient neo4jClient; + @Override + public void init() { + Configuration taskConf = super.getPluginJobConf(); + this.neo4jClient = Neo4jClient.build(taskConf,getTaskPluginCollector()); + this.neo4jClient.init(); + TASK_LOGGER.info("neo4j writer task init success."); + } + + @Override + public void destroy() { + this.neo4jClient.destroy(); + TASK_LOGGER.info("neo4j writer task destroyed."); + } + + @Override + public void startWrite(RecordReceiver receiver) { + Record record; + while ((record = receiver.getFromReader()) != null){ + this.neo4jClient.tryWrite(record); + } + } + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/DateAdapter.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/DateAdapter.java new file mode 100644 index 00000000..51b214bd --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/DateAdapter.java @@ -0,0 +1,70 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.adapter; + + +import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; +import org.testcontainers.shaded.com.google.common.base.Supplier; + +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; +import java.time.format.DateTimeFormatter; + +/** + * @author fuyouj + */ +public class DateAdapter { + private static final ThreadLocal LOCAL_DATE_FORMATTER_MAP = new ThreadLocal<>(); + private static final ThreadLocal LOCAL_TIME_FORMATTER_MAP = new ThreadLocal<>(); + private static final ThreadLocal LOCAL_DATE_TIME_FORMATTER_MAP = new ThreadLocal<>(); + private static final String DEFAULT_LOCAL_DATE_FORMATTER = "yyyy-MM-dd"; + private static final String DEFAULT_LOCAL_TIME_FORMATTER = "HH:mm:ss"; + private static final String DEFAULT_LOCAL_DATE_TIME_FORMATTER = "yyyy-MM-dd HH:mm:ss"; + + + public static LocalDate localDate(String text, Neo4jProperty neo4jProperty) { + if (LOCAL_DATE_FORMATTER_MAP.get() != null) { + return LocalDate.parse(text, LOCAL_DATE_FORMATTER_MAP.get()); + } + + String format = getOrDefault(neo4jProperty::getDateFormat, DEFAULT_LOCAL_DATE_FORMATTER); + DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); + LOCAL_DATE_FORMATTER_MAP.set(dateTimeFormatter); + return LocalDate.parse(text, dateTimeFormatter); + } + + public static String getOrDefault(Supplier dateFormat, String defaultFormat) { + String format = dateFormat.get(); + if (null == format || "".equals(format)) { + return defaultFormat; + } else { + return format; + } + } + + public static void destroy() { + LOCAL_DATE_FORMATTER_MAP.remove(); + LOCAL_TIME_FORMATTER_MAP.remove(); + LOCAL_DATE_TIME_FORMATTER_MAP.remove(); + } + + public static LocalTime localTime(String text, Neo4jProperty neo4JProperty) { + if (LOCAL_TIME_FORMATTER_MAP.get() != null) { + return LocalTime.parse(text, LOCAL_TIME_FORMATTER_MAP.get()); + } + + String format = getOrDefault(neo4JProperty::getDateFormat, DEFAULT_LOCAL_TIME_FORMATTER); + DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); + LOCAL_TIME_FORMATTER_MAP.set(dateTimeFormatter); + return LocalTime.parse(text, dateTimeFormatter); + } + + public static LocalDateTime localDateTime(String text, Neo4jProperty neo4JProperty) { + if (LOCAL_DATE_TIME_FORMATTER_MAP.get() != null){ + return LocalDateTime.parse(text,LOCAL_DATE_TIME_FORMATTER_MAP.get()); + } + String format = getOrDefault(neo4JProperty::getDateFormat, DEFAULT_LOCAL_DATE_TIME_FORMATTER); + DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); + LOCAL_DATE_TIME_FORMATTER_MAP.set(dateTimeFormatter); + return LocalDateTime.parse(text, dateTimeFormatter); + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/ValueAdapter.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/ValueAdapter.java new file mode 100644 index 00000000..d0f4044d --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/ValueAdapter.java @@ -0,0 +1,95 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.adapter; + + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; +import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; +import com.alibaba.fastjson2.JSON; +import org.neo4j.driver.Value; +import org.neo4j.driver.Values; +import org.neo4j.driver.internal.value.NullValue; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.function.Function; + +/** + * @author fuyouj + */ +public class ValueAdapter { + + + public static Value column2Value(final Column column, final Neo4jProperty neo4JProperty) { + + String typeStr = neo4JProperty.getType(); + PropertyType type = PropertyType.fromStrIgnoreCase(typeStr); + if (column.asString() == null) { + return NullValue.NULL; + } + + switch (type) { + case NULL: + return NullValue.NULL; + case MAP: + return Values.value(JSON.parseObject(column.asString(), Map.class)); + case BOOLEAN: + return Values.value(column.asBoolean()); + case STRING: + return Values.value(column.asString()); + case INTEGER: + case LONG: + return Values.value(column.asLong()); + case SHORT: + return Values.value(Short.valueOf(column.asString())); + case FLOAT: + case DOUBLE: + return Values.value(column.asDouble()); + case BYTE_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Byte::valueOf)); + case CHAR_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), (s) -> s.charAt(0))); + case BOOLEAN_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Boolean::valueOf)); + case STRING_ARRAY: + case Object_ARRAY: + case LIST: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Function.identity())); + case LONG_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Long::valueOf)); + case INT_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Integer::valueOf)); + case SHORT_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Short::valueOf)); + case DOUBLE_ARRAY: + case FLOAT_ARRAY: + return Values.value(parseArrayType(neo4JProperty, column.asString(), Double::valueOf)); + case LOCAL_DATE: + return Values.value(DateAdapter.localDate(column.asString(), neo4JProperty)); + case LOCAL_TIME: + return Values.value(DateAdapter.localTime(column.asString(), neo4JProperty)); + case LOCAL_DATE_TIME: + return Values.value(DateAdapter.localDateTime(column.asString(), neo4JProperty)); + default: + return Values.value(column.getRawData()); + + } + } + + + private static List parseArrayType(final Neo4jProperty neo4JProperty, + final String strValue, + final Function convertFunc) { + if (null == strValue || "".equals(strValue)) { + return Collections.emptyList(); + } + String split = neo4JProperty.getSplitOrDefault(); + String[] strArr = strValue.split(split); + List ans = new ArrayList<>(); + for (String s : strArr) { + ans.add(convertFunc.apply(s)); + } + return ans; + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/ConfigConstants.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/ConfigConstants.java new file mode 100644 index 00000000..eed3588e --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/ConfigConstants.java @@ -0,0 +1,116 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.config; + + +import java.util.List; + +/** + * @author fuyouj + */ +public final class ConfigConstants { + + public static final Long DEFAULT_MAX_TRANSACTION_RETRY_SECONDS = 30L; + + public static final Long DEFAULT_MAX_CONNECTION_SECONDS = 30L; + + + + public static final Option RETRY_TIMES = + Option.builder() + .key("retryTimes") + .defaultValue(3) + .desc("The number of overwrites when an error occurs") + .build(); + + public static final Option RETRY_SLEEP_MILLS = + Option.builder() + .key("retrySleepMills") + .defaultValue(3000L) + .build(); + + /** + * cluster mode please reference + * how to connect cluster mode + */ + public static final Option URI = + Option.builder() + .key("uri") + .noDefaultValue() + .desc("uir of neo4j database") + .build(); + + public static final Option USERNAME = + Option.builder() + .key("username") + .noDefaultValue() + .desc("username for accessing the neo4j database") + .build(); + + public static final Option PASSWORD = + Option.builder() + .key("password") + .noDefaultValue() + .desc("password for accessing the neo4j database") + .build(); + + public static final Option BEARER_TOKEN = + Option.builder() + .key("bearerToken") + .noDefaultValue() + .desc("base64 encoded bearer token of the Neo4j. for Auth.") + .build(); + + public static final Option KERBEROS_TICKET = + Option.builder() + .key("kerberosTicket") + .noDefaultValue() + .desc("base64 encoded kerberos ticket of the Neo4j. for Auth.") + .build(); + + public static final Option DATABASE = + Option.builder() + .key("database") + .noDefaultValue() + .desc("database name.") + .build(); + + public static final Option CYPHER = + Option.builder() + .key("cypher") + .noDefaultValue() + .desc("cypher query.") + .build(); + + public static final Option MAX_TRANSACTION_RETRY_TIME = + Option.builder() + .key("maxTransactionRetryTimeSeconds") + .defaultValue(DEFAULT_MAX_TRANSACTION_RETRY_SECONDS) + .desc("maximum transaction retry time(seconds). transaction fail if exceeded.") + .build(); + public static final Option MAX_CONNECTION_TIMEOUT_SECONDS = + Option.builder() + .key("maxConnectionTimeoutSeconds") + .defaultValue(DEFAULT_MAX_CONNECTION_SECONDS) + .desc("The maximum amount of time to wait for a TCP connection to be established (seconds).") + .build(); + + public static final Option BATCH_DATA_VARIABLE_NAME = + Option.builder() + .key("batchDataVariableName") + .defaultValue("batch") + .desc("in a cypher statement, a variable name that represents a batch of data") + .build(); + + public static final Option> NEO4J_PROPERTIES = + Option.>builder() + .key("properties") + .noDefaultValue() + .desc("neo4j node or relation`s props") + .build(); + + public static final Option BATCH_SIZE = + Option.builder(). + key("batchSize") + .defaultValue(1000) + .desc("max batch size") + .build(); +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Neo4jProperty.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Neo4jProperty.java new file mode 100644 index 00000000..5c5867b3 --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Neo4jProperty.java @@ -0,0 +1,82 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.config; + +/** + * 由于dataX并不能传输数据的元数据,所以只能在writer端定义每列数据的名字 + * datax does not support data metadata, + * only the name of each column of data can be defined on neo4j writer + * + * @author fuyouj + */ +public class Neo4jProperty { + public static final String DEFAULT_SPLIT = ","; + + /** + * name of neo4j field + */ + private String name; + + /** + * neo4j type + * reference by org.neo4j.driver.Values + */ + private String type; + + /** + * for date + */ + private String dateFormat; + + /** + * for array type + */ + private String split; + + public Neo4jProperty() { + } + + public Neo4jProperty(String name, String type, String format, String split) { + this.name = name; + this.type = type; + this.dateFormat = format; + this.split = split; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getType() { + return type; + } + + public void setType(String type) { + this.type = type; + } + + public String getDateFormat() { + return dateFormat; + } + + public void setDateFormat(String dateFormat) { + this.dateFormat = dateFormat; + } + + public String getSplit() { + return getSplitOrDefault(); + } + + public String getSplitOrDefault() { + if (split == null || "".equals(split)) { + return DEFAULT_SPLIT; + } + return split; + } + + public void setSplit(String split) { + this.split = split; + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Option.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Option.java new file mode 100644 index 00000000..f22bd205 --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Option.java @@ -0,0 +1,65 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.config; + + +public class Option { + + public static class Builder { + private String key; + private String desc; + + private T defaultValue; + + public Builder key(String key) { + this.key = key; + return this; + } + + public Builder desc(String desc) { + this.desc = desc; + return this; + } + + public Builder defaultValue(T defaultValue) { + this.defaultValue = defaultValue; + return this; + } + + public Builder noDefaultValue() { + return this; + } + + public Option build() { + return new Option<>(this.key, this.desc, this.defaultValue); + } + } + + private final String key; + private final String desc; + + private final T defaultValue; + + public Option(String key, String desc, T defaultValue) { + this.key = key; + this.desc = desc; + this.defaultValue = defaultValue; + } + + public static Builder builder(){ + return new Builder<>(); + } + + public String getKey() { + return key; + } + + public String getDesc() { + return desc; + } + + public T getDefaultValue() { + if (defaultValue == null){ + throw new IllegalStateException(key + ":defaultValue is null"); + } + return defaultValue; + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/element/PropertyType.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/element/PropertyType.java new file mode 100644 index 00000000..b3446de7 --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/element/PropertyType.java @@ -0,0 +1,40 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.element; + +import java.util.Arrays; + +/** + * @see org.neo4j.driver.Values + * @author fuyouj + */ +public enum PropertyType { + NULL, + BOOLEAN, + STRING, + LONG, + SHORT, + INTEGER, + DOUBLE, + FLOAT, + LOCAL_DATE, + LOCAL_TIME, + LOCAL_DATE_TIME, + LIST, + MAP, + CHAR_ARRAY, + BYTE_ARRAY, + BOOLEAN_ARRAY, + STRING_ARRAY, + LONG_ARRAY, + INT_ARRAY, + SHORT_ARRAY, + DOUBLE_ARRAY, + FLOAT_ARRAY, + Object_ARRAY; + + public static PropertyType fromStrIgnoreCase(String typeStr) { + return Arrays.stream(PropertyType.values()) + .filter(e -> e.name().equalsIgnoreCase(typeStr)) + .findFirst() + .orElse(PropertyType.STRING); + } +} diff --git a/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/exception/Neo4jErrorCode.java b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/exception/Neo4jErrorCode.java new file mode 100644 index 00000000..d7df79ff --- /dev/null +++ b/neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/exception/Neo4jErrorCode.java @@ -0,0 +1,37 @@ +package com.alibaba.datax.plugin.writer.neo4jwriter.exception; + +import com.alibaba.datax.common.spi.ErrorCode; + + +public enum Neo4jErrorCode implements ErrorCode { + + /** + * Invalid configuration + * 配置校验异常 + */ + CONFIG_INVALID("NEO4J_ERROR_01","invalid configuration"), + /** + * database error + * 在执行写入到数据库时抛出的异常,可能是权限异常,也可能是连接超时,或者是配置到了从节点。 + * 如果是更新操作,还会有死锁异常。具体原因根据报错信息确定,但是这与dataX无关。 + */ + DATABASE_ERROR("NEO4J_ERROR_02","database error"); + + private final String code; + private final String description; + + @Override + public String getCode() { + return code; + } + + @Override + public String getDescription() { + return description; + } + + Neo4jErrorCode(String code, String description) { + this.code = code; + this.description = description; + } +} diff --git a/neo4jwriter/src/main/resources/plugin.json b/neo4jwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..3c8878f6 --- /dev/null +++ b/neo4jwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "neo4jWriter", + "class": "com.alibaba.datax.plugin.writer.neo4jwriter.Neo4jWriter", + "description": "dataX neo4j 写插件", + "developer": "付有杰" +} \ No newline at end of file diff --git a/neo4jwriter/src/main/resources/plugin_job_template.json b/neo4jwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..45bf3c88 --- /dev/null +++ b/neo4jwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,42 @@ +{ + "uri": "neo4j://localhost:7687", + "username": "neo4j", + "password": "Test@12343", + "database": "neo4j", + "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", + "batchDataVariableName": "batch", + "batchSize": "33", + "properties": [ + { + "name": "pbool", + //type 忽略大小写 + "type": "BOOLEAN" + }, + { + "name": "pstring", + "type": "STRING" + }, + { + "name": "plong", + "type": "LONG" + }, + { + "name": "pshort", + "type": "SHORT" + }, + { + "name": "pdouble", + "type": "DOUBLE" + }, + { + "name": "pstringarr", + "type": "STRING_ARRAY", + "split": "," + }, + { + "name": "plocaldate", + "type": "LOCAL_DATE", + "dateFormat": "yyyy-MM-dd" + } + ] +} \ No newline at end of file diff --git a/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/Neo4jWriterTest.java b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/Neo4jWriterTest.java new file mode 100644 index 00000000..53c9235e --- /dev/null +++ b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/Neo4jWriterTest.java @@ -0,0 +1,257 @@ +package com.alibaba.datax.plugin.writer; + + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.mock.MockRecord; +import com.alibaba.datax.plugin.writer.mock.MockUtil; +import com.alibaba.datax.plugin.writer.neo4jwriter.Neo4jClient; +import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; +import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; +import org.neo4j.driver.*; +import org.neo4j.driver.types.Node; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.testcontainers.containers.GenericContainer; +import org.testcontainers.containers.Network; +import org.testcontainers.containers.output.Slf4jLogConsumer; +import org.testcontainers.lifecycle.Startables; +import org.testcontainers.shaded.org.awaitility.Awaitility; +import org.testcontainers.utility.DockerImageName; +import org.testcontainers.utility.DockerLoggerFactory; + +import java.io.File; +import java.net.URI; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Stream; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + + +public class Neo4jWriterTest { + + private static final Logger LOGGER = LoggerFactory.getLogger(Neo4jWriterTest.class); + private static final int MOCK_NUM = 100; + private static final String CONTAINER_IMAGE = "neo4j:5.9.0"; + + private static final String CONTAINER_HOST = "neo4j-host"; + private static final int HTTP_PORT = 7474; + private static final int BOLT_PORT = 7687; + private static final String CONTAINER_NEO4J_USERNAME = "neo4j"; + private static final String CONTAINER_NEO4J_PASSWORD = "Test@12343"; + private static final URI CONTAINER_URI = URI.create("neo4j://localhost:" + BOLT_PORT); + + protected static final Network NETWORK = Network.newNetwork(); + + private GenericContainer container; + private Driver neo4jDriver; + private Session neo4jSession; + + @Before + public void init() { + DockerImageName imageName = DockerImageName.parse(CONTAINER_IMAGE); + container = + new GenericContainer<>(imageName) + .withNetwork(NETWORK) + .withNetworkAliases(CONTAINER_HOST) + .withExposedPorts(HTTP_PORT, BOLT_PORT) + .withEnv( + "NEO4J_AUTH", + CONTAINER_NEO4J_USERNAME + "/" + CONTAINER_NEO4J_PASSWORD) + .withEnv("apoc.export.file.enabled", "true") + .withEnv("apoc.import.file.enabled", "true") + .withEnv("apoc.import.file.use_neo4j_config", "true") + .withEnv("NEO4J_PLUGINS", "[\"apoc\"]") + .withLogConsumer( + new Slf4jLogConsumer( + DockerLoggerFactory.getLogger(CONTAINER_IMAGE))); + container.setPortBindings( + Arrays.asList( + String.format("%s:%s", HTTP_PORT, HTTP_PORT), + String.format("%s:%s", BOLT_PORT, BOLT_PORT))); + Startables.deepStart(Stream.of(container)).join(); + LOGGER.info("container started"); + Awaitility.given() + .ignoreExceptions() + .await() + .atMost(30, TimeUnit.SECONDS) + .untilAsserted(this::initConnection); + } + + @Test + public void testCreateNodeAllTypeField() { + final Result checkExists = neo4jSession.run("MATCH (p:Person) RETURN p limit 1"); + if (checkExists.hasNext()) { + neo4jSession.run("MATCH (p:Person) delete p"); + } + + Configuration configuration = Configuration.from(new File("src/test/resources/allTypeFieldNode.json")); + Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); + + neo4jClient.init(); + for (int i = 0; i < MOCK_NUM; i++) { + neo4jClient.tryWrite(mockAllTypeFieldTestNode(neo4jClient.getNeo4jFields())); + } + neo4jClient.destroy(); + + + Result result = neo4jSession.run("MATCH (p:Person) return p"); + // nodes + assertTrue(result.hasNext()); + int cnt = 0; + while (result.hasNext()) { + org.neo4j.driver.Record record = result.next(); + record.get("p").get("pbool").asBoolean(); + record.get("p").get("pstring").asString(); + record.get("p").get("plong").asLong(); + record.get("p").get("pshort").asInt(); + record.get("p").get("pdouble").asDouble(); + List list = (List) record.get("p").get("pstringarr").asObject(); + record.get("p").get("plocaldate").asLocalDate(); + cnt++; + + } + assertEquals(cnt, MOCK_NUM); + } + + + /** + * 创建关系 必须先有节点 + * 所以先创建节点再模拟关系 + */ + @Test + public void testCreateRelation() { + final Result checkExists = neo4jSession.run("MATCH (p1:Person)-[r:LINK]->(p1:Person) return r limit 1"); + if (checkExists.hasNext()) { + neo4jSession.run("MATCH (p1:Person)-[r:LINK]->(p1:Person) delete r,p1,p2"); + } + + String createNodeCql = "create (p:Person) set p.id = '%s'"; + Configuration configuration = Configuration.from(new File("src/test/resources/relationship.json")); + + Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); + neo4jClient.init(); + //创建节点为后续写关系做准备 + //Create nodes to prepare for subsequent write relationships + for (int i = 0; i < MOCK_NUM; i++) { + neo4jSession.run(String.format(createNodeCql, i + "start")); + neo4jSession.run(String.format(createNodeCql, i + "end")); + Record record = new MockRecord(); + record.addColumn(new StringColumn(i + "start")); + record.addColumn(new StringColumn(i + "end")); + neo4jClient.tryWrite(record); + + } + neo4jClient.destroy(); + + Result result = neo4jSession.run("MATCH (start:Person)-[r:LINK]->(end:Person) return r,start,end"); + // relationships + assertTrue(result.hasNext()); + int cnt = 0; + while (result.hasNext()) { + org.neo4j.driver.Record record = result.next(); + + Node startNode = record.get("start").asNode(); + assertTrue(startNode.hasLabel("Person")); + assertTrue(startNode.asMap().containsKey("id")); + + Node endNode = record.get("end").asNode(); + assertTrue(startNode.hasLabel("Person")); + assertTrue(endNode.asMap().containsKey("id")); + + + String name = record.get("r").type().name(); + assertEquals("RELATIONSHIP", name); + cnt++; + } + assertEquals(cnt, MOCK_NUM); + } + + /** + * neo4j中,Label和关系类型,想动态的写,需要借助于apoc函数 + */ + @Test + public void testUseApocCreateDynamicLabel() { + List dynamicLabel = new ArrayList<>(); + for (int i = 0; i < MOCK_NUM; i++) { + dynamicLabel.add("Label" + i); + } + //删除原有数据 + //remove test data if exist + //这种占位符的方式不支持批量动态写,当然可以使用union拼接,但是性能不好 + String query = "match (p:%s) return p"; + String delete = "match (p:%s) delete p"; + for (String label : dynamicLabel) { + Result result = neo4jSession.run(String.format(query, label)); + if (result.hasNext()) { + neo4jSession.run(String.format(delete, label)); + } + } + + Configuration configuration = Configuration.from(new File("src/test/resources/dynamicLabel.json")); + Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); + + neo4jClient.init(); + for (int i = 0; i < dynamicLabel.size(); i++) { + Record record = new MockRecord(); + record.addColumn(new StringColumn(dynamicLabel.get(i))); + record.addColumn(new StringColumn(String.valueOf(i))); + neo4jClient.tryWrite(record); + } + neo4jClient.destroy(); + + //校验脚本的批量写入是否正确 + int cnt = 0; + for (int i = 0; i < dynamicLabel.size(); i++) { + String label = dynamicLabel.get(i); + Result result = neo4jSession.run(String.format(query, label)); + while (result.hasNext()) { + org.neo4j.driver.Record record = result.next(); + Node node = record.get("p").asNode(); + assertTrue(node.hasLabel(label)); + assertEquals(node.asMap().get("id"), i + ""); + cnt++; + } + } + assertEquals(cnt, MOCK_NUM); + + } + + + private Record mockAllTypeFieldTestNode(List neo4JProperties) { + Record mock = new MockRecord(); + for (Neo4jProperty field : neo4JProperties) { + mock.addColumn(MockUtil.mockColumnByType(PropertyType.fromStrIgnoreCase(field.getType()))); + } + return mock; + } + + @After + public void destroy() { + if (neo4jSession != null) { + neo4jSession.close(); + } + if (neo4jDriver != null) { + neo4jDriver.close(); + } + if (container != null) { + container.close(); + } + } + + private void initConnection() { + neo4jDriver = + GraphDatabase.driver( + CONTAINER_URI, + AuthTokens.basic(CONTAINER_NEO4J_USERNAME, CONTAINER_NEO4J_PASSWORD)); + neo4jSession = neo4jDriver.session(SessionConfig.forDatabase("neo4j")); + } +} diff --git a/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockRecord.java b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockRecord.java new file mode 100644 index 00000000..77d3f500 --- /dev/null +++ b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockRecord.java @@ -0,0 +1,104 @@ +package com.alibaba.datax.plugin.writer.mock; + + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.fastjson2.JSON; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +public class MockRecord implements Record { + private static final int RECORD_AVERGAE_COLUMN_NUMBER = 16; + + private List columns; + + private int byteSize; + + + private Map meta; + + public MockRecord() { + this.columns = new ArrayList<>(RECORD_AVERGAE_COLUMN_NUMBER); + } + + @Override + public void addColumn(Column column) { + columns.add(column); + incrByteSize(column); + } + + @Override + public Column getColumn(int i) { + if (i < 0 || i >= columns.size()) { + return null; + } + return columns.get(i); + } + + @Override + public void setColumn(int i, final Column column) { + if (i < 0) { + throw new IllegalArgumentException("不能给index小于0的column设置值"); + } + + if (i >= columns.size()) { + expandCapacity(i + 1); + } + + decrByteSize(getColumn(i)); + this.columns.set(i, column); + incrByteSize(getColumn(i)); + } + + @Override + public String toString() { + Map json = new HashMap(); + json.put("size", this.getColumnNumber()); + json.put("data", this.columns); + return JSON.toJSONString(json); + } + + @Override + public int getColumnNumber() { + return this.columns.size(); + } + + @Override + public int getByteSize() { + return byteSize; + } + + public int getMemorySize() { + throw new UnsupportedOperationException(); + } + + @Override + public void setMeta(Map meta) { + + } + + @Override + public Map getMeta() { + return null; + } + + private void decrByteSize(final Column column) { + } + + private void incrByteSize(final Column column) { + } + + private void expandCapacity(int totalSize) { + if (totalSize <= 0) { + return; + } + + int needToExpand = totalSize - columns.size(); + while (needToExpand-- > 0) { + this.columns.add(null); + } + } +} diff --git a/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockUtil.java b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockUtil.java new file mode 100644 index 00000000..8f05f1e8 --- /dev/null +++ b/neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockUtil.java @@ -0,0 +1,50 @@ +package com.alibaba.datax.plugin.writer.mock; + + +import com.alibaba.datax.common.element.*; +import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; +import com.alibaba.fastjson2.JSON; + +import java.time.LocalDate; +import java.time.format.DateTimeFormatter; +import java.util.HashMap; +import java.util.Map; +import java.util.Random; + +public class MockUtil { + + public static Column mockColumnByType(PropertyType type) { + Random random = new Random(); + switch (type) { + case SHORT: + return new StringColumn("1"); + case BOOLEAN: + return new BoolColumn(random.nextInt() % 2 == 0); + case INTEGER: + case LONG: + return new LongColumn(random.nextInt(Integer.MAX_VALUE)); + case FLOAT: + case DOUBLE: + return new DoubleColumn(random.nextDouble()); + case NULL: + return null; + case BYTE_ARRAY: + return new BytesColumn(new byte[]{(byte) (random.nextInt() % 2)}); + case LOCAL_DATE: + return new StringColumn(LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd"))); + case MAP: + return new StringColumn(JSON.toJSONString(propmap())); + case STRING_ARRAY: + return new StringColumn("[1,1,1,1,1,1,1]"); + default: + return new StringColumn("randomStr" + random.nextInt(Integer.MAX_VALUE)); + } + } + + public static Map propmap() { + Map prop = new HashMap<>(); + prop.put("name", "neo4jWriter"); + prop.put("age", "1"); + return prop; + } +} diff --git a/neo4jwriter/src/test/resources/allTypeFieldNode.json b/neo4jwriter/src/test/resources/allTypeFieldNode.json new file mode 100644 index 00000000..6d504d79 --- /dev/null +++ b/neo4jwriter/src/test/resources/allTypeFieldNode.json @@ -0,0 +1,41 @@ +{ + "uri": "neo4j://localhost:7687", + "username":"neo4j", + "password":"Test@12343", + "database":"neo4j", + "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", + "batchDataVariableName": "batch", + "batchSize": "33", + "properties": [ + { + "name": "pbool", + "type": "BOOLEAN" + }, + { + "name": "pstring", + "type": "STRING" + }, + { + "name": "plong", + "type": "LONG" + }, + { + "name": "pshort", + "type": "SHORT" + }, + { + "name": "pdouble", + "type": "DOUBLE" + }, + { + "name": "pstringarr", + "type": "STRING_ARRAY", + "split": "," + }, + { + "name": "plocaldate", + "type": "LOCAL_DATE", + "dateFormat": "yyyy-MM-dd" + } + ] +} \ No newline at end of file diff --git a/neo4jwriter/src/test/resources/dynamicLabel.json b/neo4jwriter/src/test/resources/dynamicLabel.json new file mode 100644 index 00000000..05ed3e76 --- /dev/null +++ b/neo4jwriter/src/test/resources/dynamicLabel.json @@ -0,0 +1,19 @@ +{ + "uri": "bolt://localhost:7687", + "username":"neo4j", + "password":"Test@12343", + "database":"neo4j", + "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", + "batchDataVariableName": "batch", + "batchSize": "33", + "properties": [ + { + "name": "Label", + "type": "string" + }, + { + "name": "id", + "type": "STRING" + } + ] +} \ No newline at end of file diff --git a/neo4jwriter/src/test/resources/relationship.json b/neo4jwriter/src/test/resources/relationship.json new file mode 100644 index 00000000..cb9bbdf4 --- /dev/null +++ b/neo4jwriter/src/test/resources/relationship.json @@ -0,0 +1,19 @@ +{ + "uri": "neo4j://localhost:7687", + "username":"neo4j", + "password":"Test@12343", + "database":"neo4j", + "cypher": "unwind $batch as row match(p1:Person) where p1.id = row.startNodeId match(p2:Person) where p2.id = row.endNodeId create (p1)-[:LINK]->(p2)", + "batchDataVariableName": "batch", + "batchSize": "33", + "properties": [ + { + "name": "startNodeId", + "type": "STRING" + }, + { + "name": "endNodeId", + "type": "STRING" + } + ] +} \ No newline at end of file diff --git a/neo4jwriter/src/test/resources/streamreader2neo4j.json b/neo4jwriter/src/test/resources/streamreader2neo4j.json new file mode 100644 index 00000000..3d543ce3 --- /dev/null +++ b/neo4jwriter/src/test/resources/streamreader2neo4j.json @@ -0,0 +1,51 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "sliceRecordCount": 10, + "column": [ + { + "type": "string", + "value": "StreamReader" + }, + { + "type": "string", + "value": "1997" + } + ] + } + }, + "writer": { + "name": "neo4jWriter", + "parameter": { + "uri": "bolt://localhost:7687", + "username":"neo4j", + "password":"Test@12343", + "database":"neo4j", + "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", + "batchDataVariableName": "batch", + "batchSize": "3", + "properties": [ + { + "name": "Label", + "type": "string" + }, + { + "name": "id", + "type": "STRING" + } + ] + } + } + } + ], + "setting": { + "speed": { + "channel": 5 + } + } + } +} \ No newline at end of file diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java index 291dc785..2d60d0c6 100644 --- a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java @@ -1,6 +1,5 @@ package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; -import java.util.Arrays; import java.util.List; import com.alibaba.datax.common.constant.CommonConstant; @@ -11,7 +10,7 @@ import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.reader.oceanbasev10reader.OceanBaseReader; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.PartitionSplitUtil; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -27,7 +26,7 @@ public class ReaderJob extends CommonRdbmsReader.Job { public void init(Configuration originalConfig) { //将config中的column和table中的关键字进行转义 List columns = originalConfig.getList(Key.COLUMN, String.class); - ObReaderUtils.escapeDatabaseKeywords(columns); + ObReaderUtils.escapeDatabaseKeyword(columns); originalConfig.set(Key.COLUMN, columns); List conns = originalConfig.getList(Constant.CONN_MARK, JSONObject.class); @@ -38,7 +37,7 @@ public class ReaderJob extends CommonRdbmsReader.Job { // tables will be null when querySql is configured if (tables != null) { - ObReaderUtils.escapeDatabaseKeywords(tables); + ObReaderUtils.escapeDatabaseKeyword(tables); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), tables); } @@ -79,7 +78,8 @@ public class ReaderJob extends CommonRdbmsReader.Job { final String obJdbcDelimiter = com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING; if (jdbcUrl.startsWith(obJdbcDelimiter)) { String[] ss = jdbcUrl.split(obJdbcDelimiter); - if (ss.length >= 2) { + int elementCount = 2; + if (ss.length >= elementCount) { String tenant = ss[1].trim(); String[] sss = tenant.split(":"); return sss[0]; diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java index 6356b97b..d7b8f2ed 100644 --- a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java @@ -1,6 +1,7 @@ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import com.alibaba.datax.common.element.*; +import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.reader.util.SingleTableSplitUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; @@ -37,12 +38,15 @@ public class ObReaderUtils { public static final DataBaseType databaseType = DataBaseType.OceanBase; + private static final String TABLE_SCHEMA_DELIMITER = "."; + + private static final Pattern JDBC_PATTERN = Pattern.compile("jdbc:(oceanbase|mysql)://([\\w\\.-]+:\\d+)/([\\w\\.-]+)"); private static Set keywordsFromString2HashSet(final String keywords) { return new HashSet(Arrays.asList(keywords.split(","))); } - public static String escapeDatabaseKeywords(String keyword) { + public static String escapeDatabaseKeyword(String keyword) { if (databaseKeywords == null) { if (isOracleMode(compatibleMode)) { databaseKeywords = keywordsFromString2HashSet(ORACLE_KEYWORDS); @@ -57,10 +61,10 @@ public class ObReaderUtils { return keyword; } - public static void escapeDatabaseKeywords(List ids) { + public static void escapeDatabaseKeyword(List ids) { if (ids != null && ids.size() > 0) { for (int i = 0; i < ids.size(); i++) { - ids.set(i, escapeDatabaseKeywords(ids.get(i))); + ids.set(i, escapeDatabaseKeyword(ids.get(i))); } } } @@ -144,7 +148,7 @@ public class ObReaderUtils { if (isOracleMode(context.getCompatibleMode())) { tableName = tableName.toUpperCase(); String schema; - if (tableName.contains(".")) { + if (tableName.contains(TABLE_SCHEMA_DELIMITER)) { schema = String.format("'%s'", tableName.substring(0, tableName.indexOf("."))); tableName = tableName.substring(tableName.indexOf(".") + 1); } else { @@ -170,7 +174,7 @@ public class ObReaderUtils { while (rs.next()) { hasPk = true; String columnName = rs.getString("Column_name"); - columnName = escapeDatabaseKeywords(columnName); + columnName = escapeDatabaseKeyword(columnName); if (!realIndex.contains(columnName)) { realIndex.add(columnName); } @@ -462,7 +466,7 @@ public class ObReaderUtils { if (isOracleMode(compatibleMode)) { String schema; tableName = tableName.toUpperCase(); - if (tableName.contains(".")) { + if (tableName.contains(TABLE_SCHEMA_DELIMITER)) { schema = String.format("'%s'", tableName.substring(0, tableName.indexOf("."))); tableName = tableName.substring(tableName.indexOf(".") + 1); } else { @@ -513,7 +517,7 @@ public class ObReaderUtils { Iterator>> iterator = allIndex.entrySet().iterator(); while (iterator.hasNext()) { Map.Entry> entry = iterator.next(); - if (entry.getKey().equals("PRIMARY")) { + if ("PRIMARY".equals(entry.getKey())) { continue; } @@ -770,9 +774,7 @@ public class ObReaderUtils { } public static String getDbNameFromJdbcUrl(String jdbcUrl) { - final Pattern pattern = Pattern.compile("jdbc:(oceanbase|mysql)://([\\w\\.-]+:\\d+)/([\\w\\.-]+)"); - - Matcher matcher = pattern.matcher(jdbcUrl); + Matcher matcher = JDBC_PATTERN.matcher(jdbcUrl); if (matcher.find()) { return matcher.group(3); } else { @@ -814,18 +816,52 @@ public class ObReaderUtils { if (version1 == null || version2 == null) { throw new RuntimeException("can not compare null version"); } + ObVersion v1 = new ObVersion(version1); + ObVersion v2 = new ObVersion(version2); + return v1.compareTo(v2); + } - String[] ver1Part = version1.split("\\."); - String[] ver2Part = version2.split("\\."); - for (int i = 0; i < ver1Part.length; i++) { - int v1 = Integer.parseInt(ver1Part[i]), v2 = Integer.parseInt(ver2Part[i]); - if (v1 > v2) { - return 1; - } else if (v1 < v2) { - return -1; + /** + * + * @param conn + * @param sql + * @return + */ + public static List getResultsFromSql(Connection conn, String sql) { + List list = new ArrayList(); + Statement stmt = null; + ResultSet rs = null; + + LOG.info("executing sql: " + sql); + + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + while (rs.next()) { + list.add(rs.getString(1)); } + } catch (Exception e) { + LOG.error("error when executing sql: " + e.getMessage()); + } finally { + DBUtil.closeDBResources(rs, stmt, null); } - return 0; + return list; + } + + /** + * get obversion, try ob_version first, and then try version if failed + * @param conn + * @return + */ + public static ObVersion getObVersion(Connection conn) { + List results = getResultsFromSql(conn, "select ob_version()"); + if (results.size() == 0) { + results = getResultsFromSql(conn, "select version()"); + } + ObVersion obVersion = new ObVersion(results.get(0)); + + LOG.info("obVersion: " + obVersion); + return obVersion; } } diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartType.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartType.java index be190755..05c23d6f 100644 --- a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartType.java +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartType.java @@ -5,8 +5,13 @@ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; */ public enum PartType { + // Non partitioned table NONPARTITION("NONPARTITION"), + + // Partitioned table PARTITION("PARTITION"), + + // Subpartitioned table SUBPARTITION("SUBPARTITION"); private String typeString; diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartitionSplitUtil.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartitionSplitUtil.java index 3bf2320a..ad165d99 100644 --- a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartitionSplitUtil.java +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartitionSplitUtil.java @@ -3,7 +3,7 @@ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; -import com.alibaba.datax.plugin.rdbms.reader.util.HintUtil; +import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ObReaderKey; @@ -11,8 +11,6 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; -import java.sql.ResultSet; -import java.sql.Statement; import java.util.ArrayList; import java.util.List; @@ -22,12 +20,76 @@ import java.util.List; public class PartitionSplitUtil { private static final Logger LOG = LoggerFactory.getLogger(PartitionSplitUtil.class); + private static final String ORACLE_GET_SUBPART_TEMPLATE = + "select subpartition_name " + + "from dba_tab_subpartitions " + + "where table_name = '%s' and table_owner = '%s'"; + + private static final String ORACLE_GET_PART_TEMPLATE = + "select partition_name " + + "from dba_tab_partitions " + + "where table_name = '%s' and table_owner = '%s'"; + + private static final String MYSQL_GET_PART_TEMPLATE = + "select p.part_name " + + "from oceanbase.__all_part p, oceanbase.%s t, oceanbase.__all_database d " + + "where p.table_id = t.table_id " + + "and d.database_id = t.database_id " + + "and d.database_name = '%s' " + + "and t.table_name = '%s'"; + + private static final String MYSQL_GET_SUBPART_TEMPLATE = + "select p.sub_part_name " + + "from oceanbase.__all_sub_part p, oceanbase.%s t, oceanbase.__all_database d " + + "where p.table_id = t.table_id " + + "and d.database_id = t.database_id " + + "and d.database_name = '%s' " + + "and t.table_name = '%s'"; + + /** + * get partition info from data dictionary in ob oracle mode + * @param config + * @param tableName + * @return + */ + public static PartInfo getObOraclePartInfoBySQL(Configuration config, String tableName) { + PartInfo partInfo; + DataBaseType dbType = ObReaderUtils.databaseType; + String jdbcUrl = config.getString(Key.JDBC_URL); + String username = config.getString(Key.USERNAME); + String password = config.getString(Key.PASSWORD); + String dbname = ObReaderUtils.getDbNameFromJdbcUrl(jdbcUrl).toUpperCase(); + Connection conn = DBUtil.getConnection(dbType, jdbcUrl, username, password); + tableName = tableName.toUpperCase(); + + // check if the table has subpartitions or not + String getSubPartSql = String.format(ORACLE_GET_SUBPART_TEMPLATE, tableName, dbname); + List partList = ObReaderUtils.getResultsFromSql(conn, getSubPartSql); + if (partList != null && partList.size() > 0) { + partInfo = new PartInfo(PartType.SUBPARTITION); + partInfo.addPart(partList); + return partInfo; + } + + String getPartSql = String.format(ORACLE_GET_PART_TEMPLATE, tableName, dbname); + partList = ObReaderUtils.getResultsFromSql(conn, getPartSql); + if (partList != null && partList.size() > 0) { + partInfo = new PartInfo(PartType.PARTITION); + partInfo.addPart(partList); + return partInfo; + } + + // table is not partitioned + partInfo = new PartInfo(PartType.NONPARTITION); + return partInfo; + } + public static List splitByPartition (Configuration configuration) { List allSlices = new ArrayList<>(); - List conns = configuration.getList(Constant.CONN_MARK, Object.class); - for (int i = 0, len = conns.size(); i < len; i++) { + List connections = configuration.getList(Constant.CONN_MARK, Object.class); + for (int i = 0, len = connections.size(); i < len; i++) { Configuration sliceConfig = configuration.clone(); - Configuration connConf = Configuration.from(conns.get(i).toString()); + Configuration connConf = Configuration.from(connections.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); sliceConfig.set(Key.JDBC_URL, jdbcUrl); sliceConfig.remove(Constant.CONN_MARK); @@ -64,7 +126,7 @@ public class PartitionSplitUtil { slices.add(slice); } } else { - LOG.info("fail to get table part info or table is not partitioned, proceed as non-partitioned table."); + LOG.info("table is not partitioned."); Configuration slice = configuration.clone(); slice.set(Key.QUERY_SQL, ObReaderUtils.buildQuerySql(weakRead, column, table, where)); @@ -74,7 +136,16 @@ public class PartitionSplitUtil { return slices; } - private static PartInfo getObPartInfoBySQL(Configuration config, String table) { + public static PartInfo getObPartInfoBySQL(Configuration config, String table) { + boolean isOracleMode = config.getString(ObReaderKey.OB_COMPATIBILITY_MODE).equals("ORACLE"); + if (isOracleMode) { + return getObOraclePartInfoBySQL(config, table); + } else { + return getObMySQLPartInfoBySQL(config, table); + } + } + + public static PartInfo getObMySQLPartInfoBySQL(Configuration config, String table) { PartInfo partInfo = new PartInfo(PartType.NONPARTITION); List partList; Connection conn = null; @@ -86,45 +157,23 @@ public class PartitionSplitUtil { String allTable = "__all_table"; conn = DBUtil.getConnection(DataBaseType.OceanBase, jdbcUrl, username, password); - String obVersion = getResultsFromSql(conn, "select version()").get(0); - - LOG.info("obVersion: " + obVersion); - - if (ObReaderUtils.compareObVersion("2.2.76", obVersion) < 0) { + ObVersion obVersion = ObReaderUtils.getObVersion(conn); + if (obVersion.compareTo(ObVersion.V2276) >= 0 && + obVersion.compareTo(ObVersion.V4000) < 0) { allTable = "__all_table_v2"; } - String queryPart = String.format( - "select p.part_name " + - "from oceanbase.__all_part p, oceanbase.%s t, oceanbase.__all_database d " + - "where p.table_id = t.table_id " + - "and d.database_id = t.database_id " + - "and d.database_name = '%s' " + - "and t.table_name = '%s'", allTable, dbname, table); - String querySubPart = String.format( - "select p.sub_part_name " + - "from oceanbase.__all_sub_part p, oceanbase.%s t, oceanbase.__all_database d " + - "where p.table_id = t.table_id " + - "and d.database_id = t.database_id " + - "and d.database_name = '%s' " + - "and t.table_name = '%s'", allTable, dbname, table); - if (config.getString(ObReaderKey.OB_COMPATIBILITY_MODE).equals("ORACLE")) { - queryPart = String.format( - "select partition_name from all_tab_partitions where TABLE_OWNER = '%s' and table_name = '%s'", - dbname.toUpperCase(), table.toUpperCase()); - querySubPart = String.format( - "select subpartition_name from all_tab_subpartitions where TABLE_OWNER = '%s' and table_name = '%s'", - dbname.toUpperCase(), table.toUpperCase()); - } + String querySubPart = String.format(MYSQL_GET_SUBPART_TEMPLATE, allTable, dbname, table); PartType partType = PartType.SUBPARTITION; // try subpartition first - partList = getResultsFromSql(conn, querySubPart); + partList = ObReaderUtils.getResultsFromSql(conn, querySubPart); // if table is not sub-partitioned, the try partition if (partList.isEmpty()) { - partList = getResultsFromSql(conn, queryPart); + String queryPart = String.format(MYSQL_GET_PART_TEMPLATE, allTable, dbname, table); + partList = ObReaderUtils.getResultsFromSql(conn, queryPart); partType = PartType.PARTITION; } @@ -140,26 +189,4 @@ public class PartitionSplitUtil { return partInfo; } - - private static List getResultsFromSql(Connection conn, String sql) { - List list = new ArrayList(); - Statement stmt = null; - ResultSet rs = null; - - LOG.info("executing sql: " + sql); - - try { - stmt = conn.createStatement(); - rs = stmt.executeQuery(sql); - while (rs.next()) { - list.add(rs.getString(1)); - } - } catch (Exception e) { - LOG.error("error when executing sql: " + e.getMessage()); - } finally { - DBUtil.closeDBResources(rs, stmt, null); - } - - return list; - } } diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java index d482232a..17655a52 100644 --- a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java @@ -19,6 +19,15 @@ public class TaskContext { private boolean weakRead = true; private String userSavePoint; private String compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_MYSQL; + + public String getPartitionName() { + return partitionName; + } + + public void setPartitionName(String partitionName) { + this.partitionName = partitionName; + } + private String partitionName; // 断点续读的保存点 @@ -165,12 +174,4 @@ public class TaskContext { public void setCompatibleMode(String compatibleMode) { this.compatibleMode = compatibleMode; } - - public String getPartitionName() { - return partitionName; - } - - public void setPartitionName(String partitionName) { - this.partitionName = partitionName; - } } diff --git a/oceanbasev10reader/src/test/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtilsTest.java b/oceanbasev10reader/src/test/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtilsTest.java index bc387767..35966595 100644 --- a/oceanbasev10reader/src/test/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtilsTest.java +++ b/oceanbasev10reader/src/test/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtilsTest.java @@ -18,5 +18,7 @@ public class ObReaderUtilsTest { assert ObReaderUtils.compareObVersion("2.2.70", "2.2.50") == 1; assert ObReaderUtils.compareObVersion("2.2.70", "3.1.2") == -1; assert ObReaderUtils.compareObVersion("3.1.2", "3.1.2") == 0; + assert ObReaderUtils.compareObVersion("3.2.3.0", "3.2.3.0") == 0; + assert ObReaderUtils.compareObVersion("3.2.3.0-CE", "3.2.3.0") == 0; } } diff --git a/oceanbasev10writer/pom.xml b/oceanbasev10writer/pom.xml index cbe19732..11997a1e 100644 --- a/oceanbasev10writer/pom.xml +++ b/oceanbasev10writer/pom.xml @@ -64,8 +64,16 @@ + + com.oceanbase + shade-ob-partition-calculator + 1.0-SNAPSHOT + system + ${pom.basedir}/src/main/libs/shade-ob-partition-calculator-1.0-SNAPSHOT.jar + - + + log4j log4j 1.2.16 diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java index 9fa3cd9a..6776196b 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java @@ -6,6 +6,7 @@ public interface Config { double DEFAULT_MEMSTORE_THRESHOLD = 0.9d; + double DEFAULT_SLOW_MEMSTORE_THRESHOLD = 0.75d; String MEMSTORE_CHECK_INTERVAL_SECOND = "memstoreCheckIntervalSecond"; long DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND = 30; diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java index ede2eb01..06292db5 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java @@ -12,7 +12,7 @@ import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.DbUtils; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -61,7 +61,7 @@ public class OceanBaseV10Writer extends Writer { checkCompatibleMode(originalConfig); //将config中的column和table中的关键字进行转义 List columns = originalConfig.getList(Key.COLUMN, String.class); - ObWriterUtils.escapeDatabaseKeywords(columns); + ObWriterUtils.escapeDatabaseKeyword(columns); originalConfig.set(Key.COLUMN, columns); List conns = originalConfig.getList(Constant.CONN_MARK, JSONObject.class); @@ -69,7 +69,7 @@ public class OceanBaseV10Writer extends Writer { JSONObject conn = conns.get(i); Configuration connConfig = Configuration.from(conn.toString()); List tables = connConfig.getList(Key.TABLE, String.class); - ObWriterUtils.escapeDatabaseKeywords(tables); + ObWriterUtils.escapeDatabaseKeyword(tables); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), tables); } this.commonJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); @@ -86,6 +86,7 @@ public class OceanBaseV10Writer extends Writer { if (tableNumber == 1) { this.commonJob.prepare(this.originalConfig); final String version = fetchServerVersion(originalConfig); + ObWriterUtils.setObVersion(version); originalConfig.set(Config.OB_VERSION, version); } @@ -187,8 +188,9 @@ public class OceanBaseV10Writer extends Writer { } private String fetchServerVersion(Configuration config) { - final String fetchVersionSql = "show variables like 'version'"; - return DbUtils.fetchSingleValueWithRetry(config, fetchVersionSql); + final String fetchVersionSql = "show variables like 'version_comment'"; + String versionComment = DbUtils.fetchSingleValueWithRetry(config, fetchVersionSql); + return versionComment.split(" ")[1]; } private void checkCompatibleMode(Configuration configure) { diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/AbstractConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/AbstractConnHolder.java new file mode 100644 index 00000000..c8630cd0 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/AbstractConnHolder.java @@ -0,0 +1,48 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; + +public abstract class AbstractConnHolder { + private static final Logger LOG = LoggerFactory.getLogger(AbstractConnHolder.class); + + protected final Configuration config; + protected Connection conn; + + public AbstractConnHolder(Configuration config) { + this.config = config; + } + + public abstract Connection initConnection(); + + public Configuration getConfig() { + return config; + } + + public Connection getConn() { + try { + if (conn != null && !conn.isClosed()) { + return conn; + } + } catch (Exception e) { + LOG.warn("judge connection is closed or not failed. try to reconnect.", e); + } + return reconnect(); + } + + public Connection reconnect() { + DBUtil.closeDBResources(null, conn); + return initConnection(); + } + + public abstract String getJdbcUrl(); + + public abstract String getUserName(); + + public abstract void destroy(); +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java index 53172495..b8ae259a 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java @@ -23,7 +23,7 @@ import org.slf4j.LoggerFactory; public class DataBaseWriterBuffer { private static final Logger LOG = LoggerFactory.getLogger(DataBaseWriterBuffer.class); - private final ConnHolder connHolder; + private final AbstractConnHolder connHolder; private final String dbName; private Map> tableBuffer = new HashMap>(); private long lastCheckMemstoreTime; @@ -33,7 +33,7 @@ public class DataBaseWriterBuffer { this.dbName=dbName; } - public ConnHolder getConnHolder(){ + public AbstractConnHolder getConnHolder(){ return connHolder; } diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java index 10de5615..262fb1cb 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java @@ -3,15 +3,13 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.rdbms.util.DBUtil; -import com.alibaba.datax.plugin.rdbms.util.DataBaseType; /** * wrap oceanbase java client * @author oceanbase */ -public class OCJConnHolder extends ConnHolder { +public class OCJConnHolder extends AbstractConnHolder { private ServerConnectInfo connectInfo; private String dataSourceKey; @@ -28,17 +26,6 @@ public class OCJConnHolder extends ConnHolder { return conn; } - @Override - public Connection reconnect() { - DBUtil.closeDBResources(null, conn); - return initConnection(); - } - - @Override - public Connection getConn() { - return conn; - } - @Override public String getJdbcUrl() { return connectInfo.jdbcUrl; diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java index 8ff53039..ac75d359 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java @@ -16,7 +16,7 @@ import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; * @author oceanbase * */ -public class ObClientConnHolder extends ConnHolder { +public class ObClientConnHolder extends AbstractConnHolder { private final String jdbcUrl; private final String userName; private final String password; diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java index b0611642..fe8889e1 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java @@ -1,5 +1,7 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; +import static org.apache.commons.lang3.StringUtils.EMPTY; + import java.util.regex.Matcher; import java.util.regex.Pattern; @@ -12,40 +14,19 @@ public class ServerConnectInfo { public String databaseName; public String ipPort; public String jdbcUrl; + public boolean publicCloud; + /** + * + * @param jdbcUrl format is jdbc:oceanbase//ip:port + * @param username format is cluster:tenant:username or username@tenant#cluster or user@tenant or user + * @param password + */ public ServerConnectInfo(final String jdbcUrl, final String username, final String password) { - if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { - String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); - if (ss.length != 3) { - throw new RuntimeException("jdbc url format is not correct: " + jdbcUrl); - } - this.userName = username; - this.clusterName = ss[1].trim().split(":")[0]; - this.tenantName = ss[1].trim().split(":")[1]; - this.jdbcUrl = ss[2].replace("jdbc:mysql:", "jdbc:oceanbase:"); - } else { - this.jdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:"); - if (username.contains("@") && username.contains("#")) { - this.userName = username.substring(0, username.indexOf("@")); - this.tenantName = username.substring(username.indexOf("@") + 1, username.indexOf("#")); - this.clusterName = username.substring(username.indexOf("#") + 1); - } else if (username.contains(":")) { - String[] config = username.split(":"); - if (config.length != 3) { - throw new RuntimeException ("username format is not correct: " + username); - } - this.clusterName = config[0]; - this.tenantName = config[1]; - this.userName = config[2]; - } else { - this.clusterName = null; - this.tenantName = null; - this.userName = username; - } - } - + this.jdbcUrl = jdbcUrl; this.password = password; parseJdbcUrl(jdbcUrl); + parseFullUserName(username); } private void parseJdbcUrl(final String jdbcUrl) { @@ -56,11 +37,42 @@ public class ServerConnectInfo { String dbName = matcher.group(2); this.ipPort = ipPort; this.databaseName = dbName; + this.publicCloud = ipPort.split(":")[0].endsWith("aliyuncs.com"); } else { throw new RuntimeException("Invalid argument:" + jdbcUrl); } } + private void parseFullUserName(final String fullUserName) { + int tenantIndex = fullUserName.indexOf("@"); + int clusterIndex = fullUserName.indexOf("#"); + if (fullUserName.contains(":") && tenantIndex < 0) { + String[] names = fullUserName.split(":"); + if (names.length != 3) { + throw new RuntimeException("invalid argument: " + fullUserName); + } else { + this.clusterName = names[0]; + this.tenantName = names[1]; + this.userName = names[2]; + } + } else if (!publicCloud || tenantIndex < 0) { + this.userName = tenantIndex < 0 ? fullUserName : fullUserName.substring(0, tenantIndex); + this.clusterName = clusterIndex < 0 ? EMPTY : fullUserName.substring(clusterIndex + 1); + this.tenantName = tenantIndex < 0 ? EMPTY : fullUserName.substring(tenantIndex + 1, clusterIndex); + } else { + // If in public cloud, the username with format user@tenant#cluster should be parsed, otherwise, connection can't be created. + this.userName = fullUserName.substring(0, tenantIndex); + if (clusterIndex > tenantIndex) { + this.tenantName = fullUserName.substring(tenantIndex + 1, clusterIndex); + this.clusterName = fullUserName.substring(clusterIndex + 1); + } else { + this.tenantName = fullUserName.substring(tenantIndex + 1); + this.clusterName = EMPTY; + } + } + } + + @Override public String toString() { StringBuffer strBuffer = new StringBuffer(); return strBuffer.append("clusterName:").append(clusterName).append(", tenantName:").append(tenantName) @@ -69,11 +81,18 @@ public class ServerConnectInfo { } public String getFullUserName() { - StringBuilder builder = new StringBuilder(userName); - if (tenantName != null && clusterName != null) { - builder.append("@").append(tenantName).append("#").append(clusterName); + StringBuilder builder = new StringBuilder(); + builder.append(userName); + if (!EMPTY.equals(tenantName)) { + builder.append("@").append(tenantName); } + if (!EMPTY.equals(clusterName)) { + builder.append("#").append(clusterName); + } + if (EMPTY.equals(this.clusterName) && EMPTY.equals(this.tenantName)) { + return this.userName; + } return builder.toString(); } } diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/IObPartCalculator.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/IObPartCalculator.java new file mode 100644 index 00000000..b49ade02 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/IObPartCalculator.java @@ -0,0 +1,19 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; + +import com.alibaba.datax.common.element.Record; + +/** + * @author cjyyz + * @date 2023/02/07 + * @since + */ +public interface IObPartCalculator { + + /** + * 计算 Partition Id + * + * @param record + * @return Long + */ + Long calculate(Record record); +} \ No newline at end of file diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV1.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV1.java new file mode 100644 index 00000000..96985588 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV1.java @@ -0,0 +1,109 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; +import com.alipay.oceanbase.obproxy.data.TableEntryKey; +import com.alipay.oceanbase.obproxy.util.ObPartitionIdCalculator; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; +import java.util.concurrent.TimeUnit; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * OceanBase 1.x和2.x的分区计算 + * + * @author cjyyz + * @date 2023/02/07 + * @since + */ +public class ObPartitionCalculatorV1 implements IObPartCalculator { + + private static final Logger LOG = LoggerFactory.getLogger(ObPartitionCalculatorV1.class); + + /** + * 分区键的位置 + */ + private List partIndexes; + + /** + * 表的全部字段名 + */ + private List columnNames; + + /** + * ocj partition calculator + */ + private ObPartitionIdCalculator calculator; + + /** + * @param connectInfo + * @param table + * @param columns + */ + public ObPartitionCalculatorV1(ServerConnectInfo connectInfo, String table, List columns) { + + initCalculator(connectInfo, table); + + if (Objects.isNull(calculator)) { + LOG.warn("partCalculator is null"); + return; + } + + this.partIndexes = new ArrayList<>(columns.size()); + this.columnNames = new ArrayList<>(columns); + + for (int i = 0; i < columns.size(); ++i) { + String columnName = columns.get(i); + if (calculator.isPartitionKeyColumn(columnName)) { + LOG.info(columnName + " is partition key."); + partIndexes.add(i); + } + } + } + + /** + * @param record + * @return Long + */ + @Override + public Long calculate(Record record) { + if (Objects.isNull(calculator)) { + return null; + } + + for (Integer i : partIndexes) { + calculator.addColumn(columnNames.get(i), record.getColumn(i).asString()); + } + return calculator.calculate(); + } + + /** + * @param connectInfo + * @param table + */ + private void initCalculator(ServerConnectInfo connectInfo, String table) { + + LOG.info(String.format("create tableEntryKey with clusterName %s, tenantName %s, databaseName %s, tableName %s", + connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table)); + TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, + connectInfo.databaseName, table); + + int retry = 0; + + do { + try { + if (retry > 0) { + TimeUnit.SECONDS.sleep(1); + LOG.info("retry create new part calculator {} times", retry); + } + LOG.info("create partCalculator with address: " + connectInfo.ipPort); + calculator = new ObPartitionIdCalculator(connectInfo.ipPort, tableEntryKey); + } catch (Exception ex) { + ++retry; + LOG.warn("create new part calculator failed, retry: {}", ex.getMessage()); + } + } while (calculator == null && retry < 3); + } +} \ No newline at end of file diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV2.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV2.java new file mode 100644 index 00000000..11b7b25c --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV2.java @@ -0,0 +1,169 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.DbUtils; +import com.oceanbase.partition.calculator.ObPartIdCalculator; +import com.oceanbase.partition.calculator.enums.ObPartLevel; +import com.oceanbase.partition.calculator.enums.ObServerMode; +import com.oceanbase.partition.calculator.helper.TableEntryExtractor; +import com.oceanbase.partition.calculator.model.TableEntry; +import com.oceanbase.partition.calculator.model.TableEntryKey; +import com.oceanbase.partition.calculator.model.Version; +import com.oceanbase.partition.metadata.desc.ObPartColumn; +import com.oceanbase.partition.metadata.desc.ObTablePart; +import java.sql.Connection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * OceanBase 3.x和4.x的分区计算 + * + * @author cjyyz + * @date 2023/02/07 + * @since + */ +public class ObPartitionCalculatorV2 implements IObPartCalculator { + + private static final Logger LOG = LoggerFactory.getLogger(ObPartitionCalculatorV2.class); + + /** + * OB的模式以及版本信息 + */ + private ObServerMode mode; + + /** + * ob-partition-calculator 分区计算组件 + */ + private ObPartIdCalculator calculator; + + /** + * 记录columns的字段名和在record中的位置。 + * 当目标表结构的分区键是生成列时,calculator 需要从改结构中获取到生成列所依赖的字段的值 + * e.g. + * create table t1 ( + * c1 varchar(20), + * c2 varchar(20) generated always as (substr(`c1`,1,8)) + * )partition by key(c2) partitions 5 + * + * 此时,columnNameIndexMap包含的元素是 c1:0 + * 需要将c1字段的值从columnNameIndexMap中添加到{@link com.oceanbase.partition.calculator.ObPartIdCalculator#getRefColumnValues()} + */ + private Map columnNameIndexMap; + + /** + * @param connectInfo + * @param table + * @param mode + */ + public ObPartitionCalculatorV2(ServerConnectInfo connectInfo, String table, ObServerMode mode, List columns) { + this.mode = mode; + this.columnNameIndexMap = new HashMap<>(); + for (int i = 0; i < columns.size(); i++) { + columnNameIndexMap.put(columns.get(i).toLowerCase(), i); + } + initCalculator(connectInfo, table); + } + + /** + * @param record + * @return Long + */ + @Override + public Long calculate(Record record) { + if (Objects.isNull(calculator)) { + return null; + } + if (!calculator.getTableEntry().isPartitionTable()) { + return 0L; + } + return calculator.calculatePartId(filterNullableColumns(record)); + } + + /** + * 初始化分区计算组件 + * + * @param connectInfo + * @param table + */ + private void initCalculator(ServerConnectInfo connectInfo, String table) { + TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table, mode); + boolean subsequentFromV4 = !mode.getVersion().isOlderThan(new Version("4.0.0.0")); + try { + TableEntry tableEntry; + try (Connection conn = getConnection(connectInfo, subsequentFromV4)){ + TableEntryExtractor extractor = new TableEntryExtractor(); + tableEntry = extractor.queryTableEntry(conn, tableEntryKey,subsequentFromV4); + } + this.calculator = new ObPartIdCalculator(false, tableEntry, subsequentFromV4); + } catch (Exception e) { + LOG.warn("create new part calculator failed. reason: {}", e.getMessage()); + } + } + + private Connection getConnection(ServerConnectInfo connectInfo, boolean subsequentFromV4) throws Exception { + // OceanBase 4.0.0.0及之后版本均使用业务租户连接计算分区 + if (subsequentFromV4) { + return DBUtil.getConnection(DataBaseType.OceanBase, connectInfo.jdbcUrl, connectInfo.getFullUserName(), connectInfo.password); + } + // OceanBase 4.0.0.0之前版本使用sys租户连接计算分区 + return DbUtils.buildSysConn(connectInfo.jdbcUrl, connectInfo.clusterName); + } + + /** + * 只选择分区字段值传入分区计算组件 + * + * @param record + * @return Object[] + */ + private Object[] filterNullableColumns(Record record) { + final ObTablePart tablePart = calculator.getTableEntry().getTablePart(); + + final Object[] filteredRecords = new Object[record.getColumnNumber()]; + + if (tablePart.getLevel().getIndex() > ObPartLevel.LEVEL_ZERO.getIndex()) { + // 从record中添加非生成列的一级分区值到filteredRecords数组中 + for (ObPartColumn partColumn : tablePart.getPartColumns()) { + if (partColumn.getColumnExpr() == null) { + int metaIndex = partColumn.getColumnIndex(); + String columnName = partColumn.getColumnName().toLowerCase(); + int idxInRecord = columnNameIndexMap.get(columnName); + filteredRecords[metaIndex] = record.getColumn(idxInRecord).asString(); + } + + } + // 从record中添加生成列的一级分区值到calculator的redColumnMap中,ObTablePart.getRefPartColumns中的字段名均为小写 + for (ObPartColumn partColumn : tablePart.getRefPartColumns()) { + String columnName = partColumn.getColumnName(); + int index = columnNameIndexMap.get(columnName); + calculator.addRefColumn(columnName, record.getColumn(index).asString()); + } + } + + if (tablePart.getLevel().getIndex() >= ObPartLevel.LEVEL_TWO.getIndex()) { + // 从record中添加非生成列的二级分区值到filteredRecords数组中 + for (ObPartColumn partColumn : tablePart.getSubPartColumns()) { + if (partColumn.getColumnExpr() == null) { + int metaIndex = partColumn.getColumnIndex(); + String columnName = partColumn.getColumnName().toLowerCase(); + int idxInRecord = columnNameIndexMap.get(columnName); + filteredRecords[metaIndex] = record.getColumn(idxInRecord).asString(); + } + + } + // 从record中添加生成列的二级分区值到calculator的redColumnMap中,ObTablePart.getRefSubPartColumns中的字段名均为小写 + for (ObPartColumn partColumn : tablePart.getRefSubPartColumns()) { + String columnName = partColumn.getColumnName(); + int index = columnNameIndexMap.get(columnName); + calculator.addRefColumn(columnName, record.getColumn(index).asString()); + } + } + return filteredRecords; + } +} \ No newline at end of file diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java index e6b4a561..0ad3a1ed 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java @@ -1,6 +1,5 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; -import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; @@ -11,16 +10,14 @@ import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; -import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.IObPartCalculator; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.ObPartitionCalculatorV1; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.ObPartitionCalculatorV2; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; -import com.alipay.oceanbase.obproxy.data.TableEntryKey; -import com.alipay.oceanbase.obproxy.util.ObPartitionIdCalculator; -import org.apache.commons.lang3.tuple.Pair; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - +import com.oceanbase.partition.calculator.enums.ObServerMode; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; @@ -35,8 +32,12 @@ import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; - -//import java.sql.PreparedStatement; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import static com.alibaba.datax.plugin.writer.oceanbasev10writer.Config.DEFAULT_SLOW_MEMSTORE_THRESHOLD; +import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.FAST; +import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.PAUSE; +import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.SLOW; public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { private static final Logger LOG = LoggerFactory.getLogger(ConcurrentTableWriterTask.class); @@ -47,42 +48,31 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { private long memstoreCheckIntervalSecond = Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND; // 最后一次检查 private long lastCheckMemstoreTime; + + private volatile ObWriterUtils.LoadMode loadMode = FAST; private static AtomicLong totalTask = new AtomicLong(0); private long taskId = -1; - private AtomicBoolean isMemStoreFull = new AtomicBoolean(false); - private ConnHolder checkConnHolder; + private HashMap> groupInsertValues; + private IObPartCalculator obPartCalculator; + private ConcurrentTableWriter concurrentWriter = null; + private AbstractConnHolder connHolder; + private boolean allTaskInQueue = false; + private Lock lock = new ReentrantLock(); + private Condition condition = lock.newCondition(); + private long startTime; + private String obWriteMode = "update"; + private boolean isOracleCompatibleMode = false; + private String obUpdateColumns = null; + private String dbName; + private int calPartFailedCount = 0; - public ConcurrentTableWriterTask(DataBaseType dataBaseType) { + public ConcurrentTableWriterTask(DataBaseType dataBaseType) { super(dataBaseType); taskId = totalTask.getAndIncrement(); } - private ObPartitionIdCalculator partCalculator = null; - - private HashMap> groupInsertValues; - List unknownPartRecords = new ArrayList(); -// private List unknownPartRecords; - private List partitionKeyIndexes; - - private ConcurrentTableWriter concurrentWriter = null; - - private ConnHolder connHolder; - - private boolean allTaskInQueue = false; - - private Lock lock = new ReentrantLock(); - private Condition condition = lock.newCondition(); - - private long startTime; - private boolean isOb2 = false; - private String obWriteMode = "update"; - private boolean isOracleCompatibleMode = false; - private String obUpdateColumns = null; - private List> deleteColPos; - private String dbName; - @Override public void init(Configuration config) { super.init(config); @@ -96,15 +86,11 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { this.memstoreThreshold = config.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); this.memstoreCheckIntervalSecond = config.getLong(Config.MEMSTORE_CHECK_INTERVAL_SECOND, Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND); - this.isOracleCompatibleMode = ObWriterUtils.isOracleMode(); - LOG.info("configure url is unavailable, use obclient for connections."); - this.checkConnHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, + this.connHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, connectInfo.getFullUserName(), connectInfo.password); - this.connHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, - connectInfo.getFullUserName(), connectInfo.password); - checkConnHolder.initConnection(); - if (isOracleCompatibleMode) { + this.isOracleCompatibleMode = ObWriterUtils.isOracleMode(); + if (isOracleCompatibleMode) { connectInfo.databaseName = connectInfo.databaseName.toUpperCase(); //在转义的情况下不翻译 if (!(table.startsWith("\"") && table.endsWith("\""))) { @@ -116,49 +102,36 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { } if (config.getBool(Config.USE_PART_CALCULATOR, Config.DEFAULT_USE_PART_CALCULATOR)) { - initPartCalculator(connectInfo); + this.obPartCalculator = createPartitionCalculator(connectInfo, ObServerMode.from(config.getString(Config.OB_COMPATIBLE_MODE), config.getString(Config.OB_VERSION))); } else { LOG.info("Disable partition calculation feature."); } - obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); - groupInsertValues = new HashMap>(); - partitionKeyIndexes = new ArrayList(); - rewriteSql(); + obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); + groupInsertValues = new HashMap>(); + rewriteSql(); - if (null == concurrentWriter) { - concurrentWriter = new ConcurrentTableWriter(config, connectInfo, writeRecordSql); - allTaskInQueue = false; - } + if (null == concurrentWriter) { + concurrentWriter = new ConcurrentTableWriter(config, connectInfo, writeRecordSql); + allTaskInQueue = false; + } + } - String version = config.getString(Config.OB_VERSION); - int pIdx = version.lastIndexOf('.'); - if ((Float.valueOf(version.substring(0, pIdx)) >= 2.1f)) { - isOb2 = true; - } - } + /** + * 创建需要的分区计算组件 + * + * @param connectInfo + * @return + */ + private IObPartCalculator createPartitionCalculator(ServerConnectInfo connectInfo, ObServerMode obServerMode) { + if (obServerMode.isSubsequentFrom("3.0.0.0")) { + LOG.info("oceanbase version is {}, use ob-partition-calculator to calculate partition Id.", obServerMode.getVersion()); + return new ObPartitionCalculatorV2(connectInfo, table, obServerMode, columns); + } - private void initPartCalculator(ServerConnectInfo connectInfo) { - int retry = 0; - LOG.info(String.format("create tableEntryKey with clusterName %s, tenantName %s, databaseName %s, tableName %s", - connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table)); - TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, - connectInfo.databaseName, table); - do { - try { - if (retry > 0) { - int sleep = retry > 8 ? 500 : (1 << retry); - TimeUnit.SECONDS.sleep(sleep); - LOG.info("retry create new part calculator, the {} times", retry); - } - LOG.info("create partCalculator with address: " + connectInfo.ipPort); - partCalculator = new ObPartitionIdCalculator(connectInfo.ipPort, tableEntryKey); - } catch (Exception ex) { - ++retry; - LOG.warn("create new part calculator failed, retry {}: {}", retry, ex.getMessage()); - } - } while (partCalculator == null && retry < 3); // try 3 times - } + LOG.info("oceanbase version is {}, use ocj to calculate partition Id.", obServerMode.getVersion()); + return new ObPartitionCalculatorV1(connectInfo, table, columns); + } public boolean isFinished() { return allTaskInQueue && concurrentWriter.checkFinish(); @@ -181,43 +154,18 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { if (isOracleCompatibleMode && obWriteMode.equalsIgnoreCase("update")) { // change obWriteMode to insert so the insert statement will be generated. obWriteMode = "insert"; - deleteColPos = ObWriterUtils.buildDeleteSql(conn, dbName, table, columns); } this.writeRecordSql = ObWriterUtils.buildWriteSql(table, columns, conn, obWriteMode, obUpdateColumns); LOG.info("writeRecordSql :{}", this.writeRecordSql); } - + + @Override public void prepare(Configuration writerSliceConfig) { super.prepare(writerSliceConfig); - calPartitionKeyIndex(partitionKeyIndexes); concurrentWriter.start(); } - private void calPartitionKeyIndex(List partKeyIndexes) { - partKeyIndexes.clear(); - if (null == partCalculator) { - LOG.error("partCalculator is null"); - return; - } - for (int i = 0; i < columns.size(); ++i) { - if (partCalculator.isPartitionKeyColumn(columns.get(i))) { - LOG.info(columns.get(i) + " is partition key."); - partKeyIndexes.add(i); - } - } - } - - private Long calPartitionId(List partKeyIndexes, Record record) { - if (partCalculator == null) { - return null; - } - for (Integer i : partKeyIndexes) { - partCalculator.addColumn(columns.get(i), record.getColumn(i).asString()); - } - return partCalculator.calculate(); - } - - @Override + @Override public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection) { this.taskPluginCollector = taskPluginCollector; @@ -278,21 +226,6 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { return fillPreparedStatement(preparedStatement, record); } - public PreparedStatement fillStatementIndex(PreparedStatement preparedStatement, - int prepIdx, int columnIndex, Column column) throws SQLException { - int columnSqltype = this.resultSetMetaData.getMiddle().get(columnIndex); - String typeName = this.resultSetMetaData.getRight().get(columnIndex); - return fillPreparedStatementColumnType(preparedStatement, prepIdx, columnSqltype, typeName, column); - } - - public void collectDirtyRecord(Record record, SQLException e) { - taskPluginCollector.collectDirtyRecord(record, e); - } - - public void insertOneRecord(Connection connection, List buffer) { - doOneInsert(connection, buffer); - } - private void addLeftRecords() { //不需要刷新Cache,已经是最后一批数据了 for (List groupValues : groupInsertValues.values()) { @@ -300,42 +233,28 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { addRecordsToWriteQueue(groupValues); } } - if (unknownPartRecords.size() > 0) { - addRecordsToWriteQueue(unknownPartRecords); - } } private void addRecordToCache(final Record record) { Long partId =null; try { - partId = calPartitionId(partitionKeyIndexes, record); + partId = obPartCalculator == null ? Long.MAX_VALUE : obPartCalculator.calculate(record); } catch (Exception e1) { - LOG.warn("fail to get partition id: " + e1.getMessage() + ", record: " + record); + if (calPartFailedCount++ < 10) { + LOG.warn("fail to get partition id: " + e1.getMessage() + ", record: " + record); + } } - if (partId == null && isOb2) { + if (partId == null) { LOG.debug("fail to calculate parition id, just put into the default buffer."); partId = Long.MAX_VALUE; } - if (partId != null) { - List groupValues = groupInsertValues.get(partId); - if (groupValues == null) { - groupValues = new ArrayList(batchSize); - groupInsertValues.put(partId, groupValues); - } - groupValues.add(record); - if (groupValues.size() >= batchSize) { - groupValues = addRecordsToWriteQueue(groupValues); - groupInsertValues.put(partId, groupValues); - } - } else { - LOG.debug("add unknown part record {}", record); - unknownPartRecords.add(record); - if (unknownPartRecords.size() >= batchSize) { - unknownPartRecords = addRecordsToWriteQueue(unknownPartRecords); - } - + List groupValues = groupInsertValues.computeIfAbsent(partId, k -> new ArrayList(batchSize)); + groupValues.add(record); + if (groupValues.size() >= batchSize) { + groupValues = addRecordsToWriteQueue(groupValues); + groupInsertValues.put(partId, groupValues); } } @@ -361,15 +280,25 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { return new ArrayList(batchSize); } private void checkMemStore() { - Connection checkConn = checkConnHolder.reconnect(); + Connection checkConn = connHolder.getConn(); + try { + if (checkConn == null || checkConn.isClosed()) { + checkConn = connHolder.reconnect(); + } + }catch (Exception e) { + LOG.warn("Check connection is unusable"); + } + long now = System.currentTimeMillis(); if (now - lastCheckMemstoreTime < 1000 * memstoreCheckIntervalSecond) { return; } - boolean isFull = ObWriterUtils.isMemstoreFull(checkConn, memstoreThreshold); - this.isMemStoreFull.set(isFull); - if (isFull) { - LOG.warn("OB memstore is full,sleep 30 seconds, threshold=" + memstoreThreshold); + double memUsedRatio = ObWriterUtils.queryMemUsedRatio(checkConn); + if (memUsedRatio >= DEFAULT_SLOW_MEMSTORE_THRESHOLD) { + this.loadMode = memUsedRatio >= memstoreThreshold ? PAUSE : SLOW; + LOG.info("Memstore used ration is {}. Load data {}", memUsedRatio, loadMode.name()); + }else { + this.loadMode = FAST; } lastCheckMemstoreTime = now; } @@ -377,21 +306,23 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { public boolean isMemStoreFull() { return isMemStoreFull.get(); } - - public void printEveryTime() { - long cost = System.currentTimeMillis() - startTime; - if (cost > 10000) { //10s - print(); - startTime = System.currentTimeMillis(); - } + + public boolean isShouldPause() { + return this.loadMode.equals(PAUSE); + } + + public boolean isShouldSlow() { + return this.loadMode.equals(SLOW); } public void print() { - LOG.debug("Statistic total task {}, finished {}, queue Size {}", - concurrentWriter.getTotalTaskCount(), - concurrentWriter.getFinishTaskCount(), - concurrentWriter.getTaskQueueSize()); - concurrentWriter.printStatistics(); + if (LOG.isDebugEnabled()) { + LOG.debug("Statistic total task {}, finished {}, queue Size {}", + concurrentWriter.getTotalTaskCount(), + concurrentWriter.getFinishTaskCount(), + concurrentWriter.getTaskQueueSize()); + concurrentWriter.printStatistics(); + } } public void waitTaskFinish() { @@ -424,8 +355,6 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { } // 把本级持有的conn关闭掉 DBUtil.closeDBResources(null, connHolder.getConn()); - DBUtil.closeDBResources(null, checkConnHolder.getConn()); - checkConnHolder.destroy(); super.destroy(writerSliceConfig); } @@ -476,7 +405,7 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { public synchronized void start() { for (int i = 0; i < threadCount; ++i) { LOG.info("start {} insert task.", (i+1)); - InsertTask insertTask = new InsertTask(taskId, queue, config, connectInfo, rewriteRecordSql, deleteColPos); + InsertTask insertTask = new InsertTask(taskId, queue, config, connectInfo, rewriteRecordSql); insertTask.setWriterTask(ConcurrentTableWriterTask.this); insertTask.setWriter(this); insertTasks.add(insertTask); @@ -502,7 +431,7 @@ public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { public void addBatchRecords(final List records) throws InterruptedException { boolean isSucc = false; while (!isSucc) { - isSucc = queue.offer(records, 5, TimeUnit.SECONDS); + isSucc = queue.offer(records, 5, TimeUnit.MILLISECONDS); checkMemStore(); } totalTaskCount.incrementAndGet(); diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java index 968908ca..df80cf7f 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java @@ -1,286 +1,204 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; -import java.sql.Connection; -import java.sql.PreparedStatement; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.List; -import java.util.Queue; -import java.util.concurrent.TimeUnit; - -import com.alibaba.datax.common.exception.DataXException; -import com.alibaba.datax.plugin.rdbms.util.DBUtil; -import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; -import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; -import org.apache.commons.lang3.tuple.Pair; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; -import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask.ConcurrentTableWriter; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.util.List; +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.TimeUnit; + public class InsertTask implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(InsertTask.class); - private ConcurrentTableWriterTask writerTask; - private ConcurrentTableWriter writer; + private ConcurrentTableWriterTask writerTask; + private ConcurrentTableWriter writer; - private String writeRecordSql; - private long totalCost = 0; - private long insertCount = 0; + private String writeRecordSql; + private long totalCost = 0; + private long insertCount = 0; - private Queue> queue; - private boolean isStop; - private ConnHolder connHolder; + private BlockingQueue> queue; + private boolean isStop; + private AbstractConnHolder connHolder; - private final long taskId; - private ServerConnectInfo connInfo; + private final long taskId; + private ServerConnectInfo connInfo; - // 失败重试次数 - private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; - private boolean printCost = Config.DEFAULT_PRINT_COST; - private long costBound = Config.DEFAULT_COST_BOUND; - private List> deleteMeta; + // 失败重试次数 + private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; + private boolean printCost = Config.DEFAULT_PRINT_COST; + private long costBound = Config.DEFAULT_COST_BOUND; - public InsertTask( - final long taskId, - Queue> recordsQueue, - Configuration config, - ServerConnectInfo connectInfo, - String writeRecordSql, - List> deleteMeta) { - this.taskId = taskId; - this.queue = recordsQueue; - this.connInfo = connectInfo; - failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); - printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); - costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); - this.connHolder = new ObClientConnHolder(config, connInfo.jdbcUrl, - connInfo.getFullUserName(), connInfo.password); - this.writeRecordSql = writeRecordSql; - this.isStop = false; - this.deleteMeta = deleteMeta; - connHolder.initConnection(); - } - - void setWriterTask(ConcurrentTableWriterTask writerTask) { - this.writerTask = writerTask; - } - - void setWriter(ConcurrentTableWriter writer) { - this.writer = writer; - } + public InsertTask( + final long taskId, + BlockingQueue> recordsQueue, + Configuration config, + ServerConnectInfo connectInfo, + String writeRecordSql) { + this.taskId = taskId; + this.queue = recordsQueue; + this.connInfo = connectInfo; + failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); + printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); + costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); + this.connHolder = new ObClientConnHolder(config, connInfo.jdbcUrl, + connInfo.getFullUserName(), connInfo.password); + this.writeRecordSql = writeRecordSql; + this.isStop = false; + connHolder.initConnection(); + } - private boolean isStop() { return isStop; } - public void setStop() { isStop = true; } - public long getTotalCost() { return totalCost; } - public long getInsertCount() { return insertCount; } - - @Override - public void run() { - Thread.currentThread().setName(String.format("%d-insertTask-%d", taskId, Thread.currentThread().getId())); - LOG.debug("Task {} start to execute...", taskId); - while (!isStop()) { - try { - List records = queue.poll(); - if (null != records) { - doMultiInsert(records, this.printCost, this.costBound); + void setWriterTask(ConcurrentTableWriterTask writerTask) { + this.writerTask = writerTask; + } - } else if (writerTask.isFinished()) { - writerTask.singalTaskFinish(); - LOG.debug("not more task, thread exist ..."); - break; - } else { - TimeUnit.MILLISECONDS.sleep(5); - } - } catch (InterruptedException e) { - LOG.debug("TableWriter is interrupt"); - } catch (Exception e) { - LOG.warn("ERROR UNEXPECTED {}", e); - } - } - LOG.debug("Thread exist..."); - } - - public void destroy() { - connHolder.destroy(); - }; - - public void calStatistic(final long cost) { - writer.increFinishCount(); - ++insertCount; - totalCost += cost; - if (this.printCost && cost > this.costBound) { - LOG.info("slow multi insert cost {}ms", cost); - } - } + void setWriter(ConcurrentTableWriter writer) { + this.writer = writer; + } - private void doDelete(Connection conn, final List buffer) throws SQLException { - if(deleteMeta == null || deleteMeta.size() == 0) { - return; - } - for (int i = 0; i < deleteMeta.size(); i++) { - String deleteSql = deleteMeta.get(i).getKey(); - int[] valueIdx = deleteMeta.get(i).getValue(); - PreparedStatement ps = null; - try { - ps = conn.prepareStatement(deleteSql); - StringBuilder builder = new StringBuilder(); - for (Record record : buffer) { - int bindIndex = 0; - for (int idx : valueIdx) { - writerTask.fillStatementIndex(ps, bindIndex++, idx, record.getColumn(idx)); - builder.append(record.getColumn(idx).asString()).append(","); - } - ps.addBatch(); - } - LOG.debug("delete values: " + builder.toString()); - ps.executeBatch(); - } catch (SQLException ex) { - LOG.error("SQL Exception when delete records with {}", deleteSql, ex); - throw ex; - } finally { - DBUtil.closeDBResources(ps, null); - } - } - } + private boolean isStop() { + return isStop; + } - public void doMultiInsert(final List buffer, final boolean printCost, final long restrict) { - checkMemstore(); - Connection conn = connHolder.getConn(); - boolean success = false; - long cost = 0; - long startTime = 0; - try { - for (int i = 0; i < failTryCount; ++i) { - if (i > 0) { - try { - int sleep = i >= 9 ? 500 : 1 << i;//不明白为什么要sleep 500s - TimeUnit.SECONDS.sleep(sleep); - } catch (InterruptedException e) { - LOG.info("thread interrupted ..., ignore"); - } - conn = connHolder.getConn(); - LOG.info("retry {}, start do batch insert, size={}", i, buffer.size()); - checkMemstore(); - } - startTime = System.currentTimeMillis(); - PreparedStatement ps = null; - try { - conn.setAutoCommit(false); + public void setStop() { + isStop = true; + } - // do delete if necessary - doDelete(conn, buffer); + public long getTotalCost() { + return totalCost; + } - ps = conn.prepareStatement(writeRecordSql); - for (Record record : buffer) { - ps = writerTask.fillStatement(ps, record); - ps.addBatch(); - } - ps.executeBatch(); - conn.commit(); - success = true; - cost = System.currentTimeMillis() - startTime; - calStatistic(cost); - break; - } catch (SQLException e) { - LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); - if (i == 0 || i > 10 ) { - for (Record record : buffer) { - LOG.warn("ERROR : record {}", record); - } - } - // 按照错误码分类,分情况处理 - // 如果是OB系统级异常,则需要重建连接 - boolean fatalFail = ObWriterUtils.isFatalError(e); - if (fatalFail) { - ObWriterUtils.sleep(300000); - connHolder.reconnect(); - // 如果是可恢复的异常,则重试 - } else if (ObWriterUtils.isRecoverableError(e)) { - conn.rollback(); - ObWriterUtils.sleep(60000); - } else {// 其它异常直接退出,采用逐条写入方式 - conn.rollback(); - ObWriterUtils.sleep(1000); - break; - } - } catch (Exception e) { - e.printStackTrace(); - LOG.warn("Insert error unexpected {}", e); - } finally { - DBUtil.closeDBResources(ps, null); - } - } - } catch (SQLException e) { - LOG.warn("ERROR:retry failSql State ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); - } + public long getInsertCount() { + return insertCount; + } - if (!success) { - try { - LOG.info("do one insert"); - conn = connHolder.reconnect(); - doOneInsert(conn, buffer); - cost = System.currentTimeMillis() - startTime; - calStatistic(cost); - } finally { - } - } - } + @Override + public void run() { + Thread.currentThread().setName(String.format("%d-insertTask-%d", taskId, Thread.currentThread().getId())); + LOG.debug("Task {} start to execute...", taskId); + while (!isStop()) { + try { + List records = queue.poll(5, TimeUnit.MILLISECONDS); + if (null != records) { + doMultiInsert(records, this.printCost, this.costBound); + } else if (writerTask.isFinished()) { + writerTask.singalTaskFinish(); + LOG.debug("not more task, thread exist ..."); + break; + } + } catch (InterruptedException e) { + LOG.debug("TableWriter is interrupt"); + } catch (Exception e) { + LOG.warn("ERROR UNEXPECTED ", e); + } + } + LOG.debug("Thread exist..."); + } - // process one row, delete before insert - private void doOneInsert(Connection connection, List buffer) { - List deletePstmtList = new ArrayList(); - PreparedStatement preparedStatement = null; - try { - connection.setAutoCommit(false); - if (deleteMeta != null && deleteMeta.size() > 0) { - for (int i = 0; i < deleteMeta.size(); i++) { - String deleteSql = deleteMeta.get(i).getKey(); - deletePstmtList.add(connection.prepareStatement(deleteSql)); - } - } + public void destroy() { + connHolder.destroy(); + } - preparedStatement = connection.prepareStatement(this.writeRecordSql); - for (Record record : buffer) { - try { - for (int i = 0; i < deletePstmtList.size(); i++) { - PreparedStatement deleteStmt = deletePstmtList.get(i); - int[] valueIdx = deleteMeta.get(i).getValue(); - int bindIndex = 0; - for (int idx : valueIdx) { - writerTask.fillStatementIndex(deleteStmt, bindIndex++, idx, record.getColumn(idx)); - } - deleteStmt.execute(); - } - preparedStatement = writerTask.fillStatement(preparedStatement, record); - preparedStatement.execute(); - connection.commit(); - } catch (SQLException e) { - writerTask.collectDirtyRecord(record, e); - } finally { - // 此处不应该关闭statement,后续的数据还需要用到 - } - } - } catch (Exception e) { - throw DataXException.asDataXException( - DBUtilErrorCode.WRITE_DATA_ERROR, e); - } finally { - DBUtil.closeDBResources(preparedStatement, null); - for (PreparedStatement pstmt : deletePstmtList) { - DBUtil.closeDBResources(pstmt, null); - } - } - } + public void calStatistic(final long cost) { + writer.increFinishCount(); + ++insertCount; + totalCost += cost; + if (this.printCost && cost > this.costBound) { + LOG.info("slow multi insert cost {}ms", cost); + } + } - private void checkMemstore() { - while (writerTask.isMemStoreFull()) { - ObWriterUtils.sleep(30000); - } - } + public void doMultiInsert(final List buffer, final boolean printCost, final long restrict) { + checkMemstore(); + Connection conn = connHolder.getConn(); + boolean success = false; + long cost = 0; + long startTime = 0; + try { + for (int i = 0; i < failTryCount; ++i) { + if (i > 0) { + conn = connHolder.getConn(); + LOG.info("retry {}, start do batch insert, size={}", i, buffer.size()); + checkMemstore(); + } + startTime = System.currentTimeMillis(); + PreparedStatement ps = null; + try { + conn.setAutoCommit(false); + ps = conn.prepareStatement(writeRecordSql); + for (Record record : buffer) { + ps = writerTask.fillStatement(ps, record); + ps.addBatch(); + } + ps.executeBatch(); + conn.commit(); + success = true; + cost = System.currentTimeMillis() - startTime; + calStatistic(cost); + break; + } catch (SQLException e) { + LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); + if (LOG.isDebugEnabled() && (i == 0 || i > 10)) { + for (Record record : buffer) { + LOG.warn("ERROR : record {}", record); + } + } + // 按照错误码分类,分情况处理 + // 如果是OB系统级异常,则需要重建连接 + boolean fatalFail = ObWriterUtils.isFatalError(e); + if (fatalFail) { + ObWriterUtils.sleep(300000); + connHolder.reconnect(); + // 如果是可恢复的异常,则重试 + } else if (ObWriterUtils.isRecoverableError(e)) { + conn.rollback(); + ObWriterUtils.sleep(60000); + } else {// 其它异常直接退出,采用逐条写入方式 + conn.rollback(); + ObWriterUtils.sleep(1000); + break; + } + } catch (Exception e) { + e.printStackTrace(); + LOG.warn("Insert error unexpected {}", e); + } finally { + DBUtil.closeDBResources(ps, null); + } + } + } catch (SQLException e) { + LOG.warn("ERROR:retry failSql State ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); + } + + if (!success) { + LOG.info("do one insert"); + conn = connHolder.reconnect(); + writerTask.doOneInsert(conn, buffer); + cost = System.currentTimeMillis() - startTime; + calStatistic(cost); + } + } + + private void checkMemstore() { + if (writerTask.isShouldSlow()) { + ObWriterUtils.sleep(100); + } else { + while (writerTask.isShouldPause()) { + ObWriterUtils.sleep(100); + } + } + } } diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java index 637a3be4..d2f42de5 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java @@ -12,7 +12,7 @@ import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; -import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; @@ -30,7 +30,7 @@ public class SingleTableWriterTask extends CommonRdbmsWriter.Task { // 失败重试次数 private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; - private ConnHolder connHolder; + private AbstractConnHolder connHolder; private String obWriteMode = "update"; private boolean isOracleCompatibleMode = false; private String obUpdateColumns = null; diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java index 5138c9cb..adffc6f7 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java @@ -3,18 +3,17 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; -import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.util.List; import java.util.concurrent.TimeUnit; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; public class DbUtils { @@ -25,7 +24,7 @@ public class DbUtils { final String password = config.getString(Key.PASSWORD); String jdbcUrl = config.getString(Key.JDBC_URL); - if(jdbcUrl == null) { + if (jdbcUrl == null) { List conns = config.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0).toString()); jdbcUrl = connConf.getString(Key.JDBC_URL); @@ -34,9 +33,9 @@ public class DbUtils { Connection conn = null; PreparedStatement stmt = null; ResultSet result = null; - boolean need_retry = false; String value = null; int retry = 0; + int failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); do { try { if (retry > 0) { @@ -58,14 +57,57 @@ public class DbUtils { LOG.info("value for query [{}] is [{}]", query, value); break; } catch (SQLException e) { - need_retry = true; ++retry; LOG.warn("fetch value with {} error {}", query, e); } finally { - DBUtil.closeDBResources(result, stmt, null); + DBUtil.closeDBResources(result, stmt, conn); } - } while (need_retry); + } while (retry < failTryCount); return value; } + + /** + * build sys connection from ordinary jdbc url + * + * @param jdbcUrl + * @param clusterName + * @return + * @throws Exception + */ + public static Connection buildSysConn(String jdbcUrl, String clusterName) throws Exception { + jdbcUrl = jdbcUrl.replace("jdbc:mysql://", "jdbc:oceanbase://"); + int startIdx = jdbcUrl.indexOf('/', "jdbc:oceanbase://".length()); + int endIdx = jdbcUrl.lastIndexOf('?'); + String prefix = jdbcUrl.substring(0, startIdx + 1); + final String postfix = jdbcUrl.substring(endIdx); + String sysJDBCUrl = prefix + "oceanbase" + postfix; + + String tenantName = "sys"; + String[][] userConfigs = { + {"monitor", "monitor"} + }; + + Connection conn = null; + for (String[] userConfig : userConfigs) { + try { + conn = DBUtil.getConnectionWithoutRetry(DataBaseType.OceanBase, sysJDBCUrl, String.format("%s@%s#%s", userConfig[0], + tenantName, clusterName), userConfig[1]); + } catch (Exception e) { + LOG.warn("fail connecting to ob: " + e.getMessage()); + + } + if (conn == null) { + LOG.warn("fail to get connection with user " + userConfig[0] + ", try alternative user."); + } else { + break; + } + } + + if (conn == null) { + throw new Exception("fail to get connection with sys tenant."); + } + + return conn; + } } diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java index ff1648a1..a5d6b0ea 100644 --- a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java @@ -1,8 +1,10 @@ package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; +import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter.Task; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import org.apache.commons.lang3.RandomUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; @@ -11,6 +13,7 @@ import org.slf4j.LoggerFactory; import java.sql.*; import java.util.*; +import static com.alibaba.datax.plugin.writer.oceanbasev10writer.Config.DEFAULT_SLOW_MEMSTORE_THRESHOLD; public class ObWriterUtils { @@ -18,14 +21,20 @@ public class ObWriterUtils { private static final String ORACLE_KEYWORDS = "ACCESS,ADD,ALL,ALTER,AND,ANY,ARRAYLEN,AS,ASC,AUDIT,BETWEEN,BY,CHAR,CHECK,CLUSTER,COLUMN,COMMENT,COMPRESS,CONNECT,CREATE,CURRENT,DATE,DECIMAL,DEFAULT,DELETE,DESC,DISTINCT,DROP,ELSE,EXCLUSIVE,EXISTS,FILE,FLOAT,FOR,FROM,GRANT,GROUP,HAVING,IDENTIFIED,IMMEDIATE,IN,INCREMENT,INDEX,INITIAL,INSERT,INTEGER,INTERSECT,INTO,IS,LEVEL,LIKE,LOCK,LONG,MAXEXTENTS,MINUS,MODE,MODIFY,NOAUDIT,NOCOMPRESS,NOT,NOTFOUND,NOWAIT,NULL,NUMBER,OF,OFFLINE,ON,ONLINE,OPTION,OR,ORDER,PCTFREE,PRIOR,PRIVILEGES,PUBLIC,RAW,RENAME,RESOURCE,REVOKE,ROW,ROWID,ROWLABEL,ROWNUM,ROWS,SELECT,SESSION,SET,SHARE,SIZE,SMALLINT,SQLBUF,START,SUCCESSFUL,SYNONYM,TABLE,THEN,TO,TRIGGER,UID,UNION,UNIQUE,UPDATE,USER,VALIDATE,VALUES,VARCHAR,VARCHAR2,VIEW,WHENEVER,WHERE,WITH"; private static String CHECK_MEMSTORE = "select 1 from %s.gv$memstore t where t.total>t.mem_limit * ?"; + private static final String CHECK_MEMSTORE_4_0 = "select 1 from %s.gv$ob_memstore t where t.MEMSTORE_USED>t.MEMSTORE_LIMIT * ?"; + + private static String CHECK_MEMSTORE_RATIO = "select min(t.total/t.mem_limit) from %s.gv$memstore t"; + private static final String CHECK_MEMSTORE_RATIO_4_0 = "select min(t.MEMSTORE_USED/t.MEMSTORE_LIMIT) from %s.gv$ob_memstore t"; + private static Set databaseKeywords; private static String compatibleMode = null; + private static String obVersion = null; protected static final Logger LOG = LoggerFactory.getLogger(Task.class); private static Set keywordsFromString2HashSet(final String keywords) { return new HashSet(Arrays.asList(keywords.split(","))); } - public static String escapeDatabaseKeywords(String keyword) { + public static String escapeDatabaseKeyword(String keyword) { if (databaseKeywords == null) { if (isOracleMode()) { databaseKeywords = keywordsFromString2HashSet(ORACLE_KEYWORDS); @@ -40,9 +49,9 @@ public class ObWriterUtils { return keyword; } - public static void escapeDatabaseKeywords(List keywords) { + public static void escapeDatabaseKeyword(List keywords) { for (int i = 0; i < keywords.size(); i++) { - keywords.set(i, escapeDatabaseKeywords(keywords.get(i))); + keywords.set(i, escapeDatabaseKeyword(keywords.get(i))); } } public static Boolean isEscapeMode(String keyword){ @@ -61,7 +70,7 @@ public class ObWriterUtils { if (isOracleMode()) { sysDbName = "sys"; } - ps = conn.prepareStatement(String.format(CHECK_MEMSTORE, sysDbName)); + ps = conn.prepareStatement(String.format(getMemStoreSql(), sysDbName)); ps.setDouble(1, memstoreThreshold); rs = ps.executeQuery(); // 只要有满足条件的,则表示当前租户 有个机器的memstore即将满 @@ -77,10 +86,50 @@ public class ObWriterUtils { return result; } + public static double queryMemUsedRatio (Connection conn) { + PreparedStatement ps = null; + ResultSet rs = null; + double result = 0; + try { + String sysDbName = "oceanbase"; + if (isOracleMode()) { + sysDbName = "sys"; + } + ps = conn.prepareStatement(String.format(getMemStoreRatioSql(), sysDbName)); + rs = ps.executeQuery(); + // 只要有满足条件的,则表示当前租户 有个机器的memstore即将满 + if (rs.next()) { + result = rs.getDouble(1); + } + } catch (Throwable e) { + LOG.warn("Check memstore fail, reason: {}. Use a random value instead.", e.getMessage()); + result = RandomUtils.nextDouble(0.3D, DEFAULT_SLOW_MEMSTORE_THRESHOLD + 0.2D); + } finally { + //do not need to close the statment in ob1.0 + } + return result; + } + public static boolean isOracleMode(){ return (compatibleMode.equals(Config.OB_COMPATIBLE_MODE_ORACLE)); } + private static String getMemStoreSql() { + if (ObVersion.valueOf(obVersion).compareTo(ObVersion.V4000) >= 0) { + return CHECK_MEMSTORE_4_0; + } else { + return CHECK_MEMSTORE; + } + } + + private static String getMemStoreRatioSql() { + if (ObVersion.valueOf(obVersion).compareTo(ObVersion.V4000) >= 0) { + return CHECK_MEMSTORE_RATIO_4_0; + } else { + return CHECK_MEMSTORE_RATIO; + } + } + public static String getCompatibleMode() { return compatibleMode; } @@ -89,6 +138,10 @@ public class ObWriterUtils { compatibleMode = mode; } + public static void setObVersion(String version) { + obVersion = version; + } + private static String buildDeleteSql (String tableName, List columns) { StringBuilder builder = new StringBuilder("DELETE FROM "); builder.append(tableName).append(" WHERE "); @@ -159,13 +212,13 @@ public class ObWriterUtils { while (rs.next()) { String keyName = rs.getString("Key_name"); String columnName = rs.getString("Column_name"); - columnName=escapeDatabaseKeywords(columnName); + columnName= escapeDatabaseKeyword(columnName); if(!ObWriterUtils.isEscapeMode(columnName)){ columnName = columnName.toUpperCase(); } List s = uniqueKeys.get(keyName); if (s == null) { - s = new ArrayList(); + s = new ArrayList<>(); uniqueKeys.put(keyName, s); } s.add(columnName); @@ -237,7 +290,7 @@ public class ObWriterUtils { String columnName = StringUtils.upperCase(rs.getString("Column_name")); Set s = uniqueKeys.get(keyName); if (s == null) { - s = new HashSet(); + s = new HashSet<>(); uniqueKeys.put(keyName, s); } s.add(columnName); @@ -399,7 +452,7 @@ public class ObWriterUtils { private static Set white = new HashSet(); static { - int[] errList = { 1213, 1047, 1041, 1094, 4000, 4012 }; + int[] errList = { 1213, 1047, 1041, 1094, 4000, 4012, 4013 }; for (int err : errList) { white.add(err); } @@ -429,4 +482,26 @@ public class ObWriterUtils { t.setDaemon(true); t.start(); } + + /** + * + */ + public static enum LoadMode { + + /** + * Fast insert + */ + FAST, + + /** + * Insert slowly + */ + SLOW, + + /** + * Pause to insert + */ + PAUSE + } + } diff --git a/oceanbasev10writer/src/main/libs/shade-ob-partition-calculator-1.0-SNAPSHOT.jar b/oceanbasev10writer/src/main/libs/shade-ob-partition-calculator-1.0-SNAPSHOT.jar new file mode 100644 index 00000000..34453ce6 Binary files /dev/null and b/oceanbasev10writer/src/main/libs/shade-ob-partition-calculator-1.0-SNAPSHOT.jar differ diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ColumnType.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ColumnType.java index eb674a7f..1c771d3e 100644 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ColumnType.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ColumnType.java @@ -3,20 +3,6 @@ package com.alibaba.datax.plugin.reader.odpsreader; public enum ColumnType { PARTITION, NORMAL, CONSTANT, UNKNOWN, ; - @Override - public String toString() { - switch (this) { - case PARTITION: - return "partition"; - case NORMAL: - return "normal"; - case CONSTANT: - return "constant"; - default: - return "unknown"; - } - } - public static ColumnType asColumnType(String columnTypeString) { if ("partition".equals(columnTypeString)) { return PARTITION; diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Constant.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Constant.java index dee2ef5c..cf34762d 100755 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Constant.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Constant.java @@ -14,20 +14,9 @@ public class Constant { public static final String PARTITION_SPLIT_MODE = "partition"; - public static final String DEFAULT_ACCOUNT_TYPE = "aliyun"; - - public static final String TAOBAO_ACCOUNT_TYPE = "taobao"; - // 常量字段用COLUMN_CONSTANT_FLAG 首尾包住即可 public final static String COLUMN_CONSTANT_FLAG = "'"; - /** - * 以下是获取accesskey id 需要用到的常量值 - */ - public static final String SKYNET_ACCESSID = "SKYNET_ACCESSID"; - - public static final String SKYNET_ACCESSKEY = "SKYNET_ACCESSKEY"; - public static final String PARTITION_COLUMNS = "partitionColumns"; public static final String PARSED_COLUMNS = "parsedColumns"; diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/InternalColumnInfo.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/InternalColumnInfo.java new file mode 100644 index 00000000..b5a15f1d --- /dev/null +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/InternalColumnInfo.java @@ -0,0 +1,24 @@ +package com.alibaba.datax.plugin.reader.odpsreader; + +public class InternalColumnInfo { + + private String columnName; + + private ColumnType columnType; + + public String getColumnName() { + return columnName; + } + + public void setColumnName(String columnName) { + this.columnName = columnName; + } + + public ColumnType getColumnType() { + return columnType; + } + + public void setColumnType(ColumnType columnType) { + this.columnType = columnType; + } +} diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Key.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Key.java index 2cee65d1..6f8c7d92 100755 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Key.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Key.java @@ -24,9 +24,6 @@ public class Key { // 当值为:partition 则只切分到分区;当值为:record,则当按照分区切分后达不到adviceNum时,继续按照record切分 public final static String SPLIT_MODE = "splitMode"; - // 账号类型,默认为aliyun,也可能为taobao等其他类型 - public final static String ACCOUNT_TYPE = "accountType"; - public final static String PACKAGE_AUTHORIZED_PROJECT = "packageAuthorizedProject"; public final static String IS_COMPRESS = "isCompress"; diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReader.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReader.java index 8cb7ba31..615cee50 100755 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReader.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReader.java @@ -7,7 +7,7 @@ import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.FilterUtil; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.reader.odpsreader.util.*; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.aliyun.odps.Column; import com.aliyun.odps.Odps; import com.aliyun.odps.Table; @@ -15,8 +15,6 @@ import com.aliyun.odps.TableSchema; import com.aliyun.odps.tunnel.TableTunnel.DownloadSession; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.lang3.StringUtils; -import org.apache.commons.lang3.tuple.MutablePair; -import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -44,12 +42,6 @@ public class OdpsReader extends Reader { this.originalConfig = super.getPluginJobConf(); this.successOnNoPartition = this.originalConfig.getBool(Key.SUCCESS_ON_NO_PATITION, false); - //如果用户没有配置accessId/accessKey,尝试从环境变量获取 - String accountType = originalConfig.getString(Key.ACCOUNT_TYPE, Constant.DEFAULT_ACCOUNT_TYPE); - if (Constant.DEFAULT_ACCOUNT_TYPE.equalsIgnoreCase(accountType)) { - this.originalConfig = IdAndKeyUtil.parseAccessIdAndKey(this.originalConfig); - } - //检查必要的参数配置 OdpsUtil.checkNecessaryConfig(this.originalConfig); //重试次数的配置检查 @@ -311,7 +303,7 @@ public class OdpsReader extends Reader { */ List allPartitionColumns = this.originalConfig.getList( Constant.PARTITION_COLUMNS, String.class); - List> parsedColumns = OdpsUtil + List parsedColumns = OdpsUtil .parseColumns(allNormalColumns, allPartitionColumns, userConfiguredColumns); @@ -320,13 +312,15 @@ public class OdpsReader extends Reader { StringBuilder sb = new StringBuilder(); sb.append("[ "); for (int i = 0, len = parsedColumns.size(); i < len; i++) { - Pair pair = parsedColumns.get(i); - sb.append(String.format(" %s : %s", pair.getLeft(), - pair.getRight())); + InternalColumnInfo pair = parsedColumns.get(i); + sb.append(String.format(" %s : %s", pair.getColumnName(), + pair.getColumnType())); if (i != len - 1) { sb.append(","); } } + + sb.append(" ]"); LOG.info("parsed column details: {} .", sb.toString()); } @@ -500,22 +494,11 @@ public class OdpsReader extends Reader { } try { - List parsedColumnsTmp = this.readerSliceConf - .getListConfiguration(Constant.PARSED_COLUMNS); - List> parsedColumns = new ArrayList>(); - for (int i = 0; i < parsedColumnsTmp.size(); i++) { - Configuration eachColumnConfig = parsedColumnsTmp.get(i); - String columnName = eachColumnConfig.getString("left"); - ColumnType columnType = ColumnType - .asColumnType(eachColumnConfig.getString("right")); - parsedColumns.add(new MutablePair( - columnName, columnType)); - - } + List parsedColumns = this.readerSliceConf.getListWithJson(Constant.PARSED_COLUMNS, + InternalColumnInfo.class); ReaderProxy readerProxy = new ReaderProxy(recordSender, downloadSession, columnTypeMap, parsedColumns, partition, this.isPartitionedTable, start, count, this.isCompress, this.readerSliceConf); - readerProxy.doRead(); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.READ_DATA_FAIL, diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ReaderProxy.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ReaderProxy.java index 31d0d605..c2e88eba 100755 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ReaderProxy.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ReaderProxy.java @@ -6,7 +6,7 @@ import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.reader.odpsreader.util.OdpsUtil; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.aliyun.odps.Column; import com.aliyun.odps.OdpsType; import com.aliyun.odps.data.*; @@ -17,7 +17,6 @@ import com.aliyun.odps.type.MapTypeInfo; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.codec.binary.Base64; import org.apache.commons.lang3.StringUtils; -import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -34,7 +33,7 @@ public class ReaderProxy { private RecordSender recordSender; private TableTunnel.DownloadSession downloadSession; private Map columnTypeMap; - private List> parsedColumns; + private List parsedColumns; private String partition; private boolean isPartitionTable; @@ -71,7 +70,7 @@ public class ReaderProxy { public ReaderProxy(RecordSender recordSender, TableTunnel.DownloadSession downloadSession, Map columnTypeMap, - List> parsedColumns, String partition, + List parsedColumns, String partition, boolean isPartitionTable, long start, long count, boolean isCompress, Configuration taskConfig) { this.recordSender = recordSender; this.downloadSession = downloadSession; @@ -136,9 +135,9 @@ public class ReaderProxy { // warn: for PARTITION||NORMAL columnTypeMap's key // sets(columnName) is big than parsedColumns's left // sets(columnName), always contain - for (Pair pair : this.parsedColumns) { - String columnName = pair.getLeft(); - switch (pair.getRight()) { + for (InternalColumnInfo pair : this.parsedColumns) { + String columnName = pair.getColumnName(); + switch (pair.getColumnType()) { case PARTITION: String partitionColumnValue = this .getPartitionColumnValue(partitionMap, @@ -201,7 +200,7 @@ public class ReaderProxy { } if (IS_DEBUG) { LOG.debug(String.format("partition value details: %s", - com.alibaba.fastjson.JSON.toJSONString(partitionMap))); + com.alibaba.fastjson2.JSON.toJSONString(partitionMap))); } return partitionMap; } @@ -213,7 +212,7 @@ public class ReaderProxy { // it's will never happen, but add this checking if (!partitionMap.containsKey(partitionColumnName)) { String errorMessage = MESSAGE_SOURCE.message("readerproxy.3", - com.alibaba.fastjson.JSON.toJSONString(partitionMap), + com.alibaba.fastjson2.JSON.toJSONString(partitionMap), partitionColumnName); throw DataXException.asDataXException( OdpsReaderErrorCode.READ_DATA_FAIL, errorMessage); diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/IdAndKeyUtil.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/IdAndKeyUtil.java deleted file mode 100644 index 05722b59..00000000 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/IdAndKeyUtil.java +++ /dev/null @@ -1,65 +0,0 @@ -/** - * (C) 2010-2022 Alibaba Group Holding Limited. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.alibaba.datax.plugin.reader.odpsreader.util; - -import com.alibaba.datax.common.exception.DataXException; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.common.util.IdAndKeyRollingUtil; -import com.alibaba.datax.common.util.MessageSource; -import com.alibaba.datax.plugin.reader.odpsreader.Key; -import com.alibaba.datax.plugin.reader.odpsreader.OdpsReaderErrorCode; - -import org.apache.commons.lang3.StringUtils; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.util.Map; - -public class IdAndKeyUtil { - private static Logger LOG = LoggerFactory.getLogger(IdAndKeyUtil.class); - private static MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(IdAndKeyUtil.class); - - public static Configuration parseAccessIdAndKey(Configuration originalConfig) { - String accessId = originalConfig.getString(Key.ACCESS_ID); - String accessKey = originalConfig.getString(Key.ACCESS_KEY); - - // 只要 accessId,accessKey 二者配置了一个,就理解为是用户本意是要直接手动配置其 accessid/accessKey - if (StringUtils.isNotBlank(accessId) || StringUtils.isNotBlank(accessKey)) { - LOG.info("Try to get accessId/accessKey from your config."); - //通过如下语句,进行检查是否确实配置了 - accessId = originalConfig.getNecessaryValue(Key.ACCESS_ID, OdpsReaderErrorCode.REQUIRED_VALUE); - accessKey = originalConfig.getNecessaryValue(Key.ACCESS_KEY, OdpsReaderErrorCode.REQUIRED_VALUE); - //检查完毕,返回即可 - return originalConfig; - } else { - Map envProp = System.getenv(); - return getAccessIdAndKeyFromEnv(originalConfig, envProp); - } - } - - private static Configuration getAccessIdAndKeyFromEnv(Configuration originalConfig, - Map envProp) { - // 如果获取到ak,在getAccessIdAndKeyFromEnv中已经设置到originalConfig了 - String accessKey = IdAndKeyRollingUtil.getAccessIdAndKeyFromEnv(originalConfig); - if (StringUtils.isBlank(accessKey)) { - // 无处获取(既没有配置在作业中,也没用在环境变量中) - throw DataXException.asDataXException(OdpsReaderErrorCode.GET_ID_KEY_FAIL, - MESSAGE_SOURCE.message("idandkeyutil.2")); - } - return originalConfig; - } -} diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsUtil.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsUtil.java index 0103a383..0ff34a81 100755 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsUtil.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsUtil.java @@ -7,6 +7,7 @@ import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.reader.odpsreader.ColumnType; import com.alibaba.datax.plugin.reader.odpsreader.Constant; +import com.alibaba.datax.plugin.reader.odpsreader.InternalColumnInfo; import com.alibaba.datax.plugin.reader.odpsreader.Key; import com.alibaba.datax.plugin.reader.odpsreader.OdpsReaderErrorCode; import com.aliyun.odps.*; @@ -75,19 +76,12 @@ public final class OdpsUtil { defaultProject = packageAuthorizedProject; } - String accountType = originalConfig.getString(Key.ACCOUNT_TYPE, - Constant.DEFAULT_ACCOUNT_TYPE); Account account = null; - if (accountType.equalsIgnoreCase(Constant.DEFAULT_ACCOUNT_TYPE)) { - if (StringUtils.isNotBlank(securityToken)) { - account = new StsAccount(accessId, accessKey, securityToken); - } else { - account = new AliyunAccount(accessId, accessKey); - } + if (StringUtils.isNotBlank(securityToken)) { + account = new StsAccount(accessId, accessKey, securityToken); } else { - throw DataXException.asDataXException(OdpsReaderErrorCode.ACCOUNT_TYPE_ERROR, - MESSAGE_SOURCE.message("odpsutil.3", accountType)); + account = new AliyunAccount(accessId, accessKey); } Odps odps = new Odps(account); @@ -215,19 +209,18 @@ public final class OdpsUtil { return userConfiguredPartitionClassification; } - public static List> parseColumns( + public static List parseColumns( List allNormalColumns, List allPartitionColumns, List userConfiguredColumns) { - List> parsededColumns = new ArrayList>(); + List parsededColumns = new ArrayList(); // warn: upper & lower case for (String column : userConfiguredColumns) { - MutablePair pair = new MutablePair(); - + InternalColumnInfo pair = new InternalColumnInfo(); // if constant column if (OdpsUtil.checkIfConstantColumn(column)) { // remove first and last ' - pair.setLeft(column.substring(1, column.length() - 1)); - pair.setRight(ColumnType.CONSTANT); + pair.setColumnName(column.substring(1, column.length() - 1)); + pair.setColumnType(ColumnType.CONSTANT); parsededColumns.add(pair); continue; } @@ -236,8 +229,8 @@ public final class OdpsUtil { // repeated in partitioning columns int index = OdpsUtil.indexOfIgnoreCase(allNormalColumns, column); if (0 <= index) { - pair.setLeft(allNormalColumns.get(index)); - pair.setRight(ColumnType.NORMAL); + pair.setColumnName(allNormalColumns.get(index)); + pair.setColumnType(ColumnType.NORMAL); parsededColumns.add(pair); continue; } @@ -245,8 +238,8 @@ public final class OdpsUtil { // if partition column index = OdpsUtil.indexOfIgnoreCase(allPartitionColumns, column); if (0 <= index) { - pair.setLeft(allPartitionColumns.get(index)); - pair.setRight(ColumnType.PARTITION); + pair.setColumnName(allPartitionColumns.get(index)); + pair.setColumnType(ColumnType.PARTITION); parsededColumns.add(pair); continue; } @@ -431,13 +424,13 @@ public final class OdpsUtil { MESSAGE_SOURCE.message("odpsutil.12", tableName), e); } - public static List getNormalColumns(List> parsedColumns, + public static List getNormalColumns(List parsedColumns, Map columnTypeMap) { List userConfigNormalColumns = new ArrayList(); Set columnNameSet = new HashSet(); - for (Pair columnInfo : parsedColumns) { - if (columnInfo.getValue() == ColumnType.NORMAL) { - String columnName = columnInfo.getKey(); + for (InternalColumnInfo columnInfo : parsedColumns) { + if (columnInfo.getColumnType() == ColumnType.NORMAL) { + String columnName = columnInfo.getColumnName(); if (!columnNameSet.contains(columnName)) { Column column = new Column(columnName, columnTypeMap.get(columnName)); userConfigNormalColumns.add(column); diff --git a/odpswriter/doc/odpswriter.md b/odpswriter/doc/odpswriter.md index d81672b0..845dd1d3 100644 --- a/odpswriter/doc/odpswriter.md +++ b/odpswriter/doc/odpswriter.md @@ -71,8 +71,7 @@ ODPSWriter插件用于实现往ODPS插入或者更新数据,主要提供给etl "accessKey": "xxxx", "truncate": true, "odpsServer": "http://sxxx/api", - "tunnelServer": "http://xxx", - "accountType": "aliyun" + "tunnelServer": "http://xxx" } } } diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Constant.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Constant.java index f4d9734b..efedfea9 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Constant.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Constant.java @@ -2,13 +2,6 @@ package com.alibaba.datax.plugin.writer.odpswriter; public class Constant { - public static final String SKYNET_ACCESSID = "SKYNET_ACCESSID"; - - public static final String SKYNET_ACCESSKEY = "SKYNET_ACCESSKEY"; - - public static final String DEFAULT_ACCOUNT_TYPE = "aliyun"; - - public static final String TAOBAO_ACCOUNT_TYPE = "taobao"; public static final String COLUMN_POSITION = "columnPosition"; diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Key.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Key.java index 7ee11128..8dff8a4c 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Key.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Key.java @@ -30,8 +30,6 @@ public final class Key { //boolean 类型,default:false public final static String EMPTY_AS_NULL = "emptyAsNull"; - public final static String ACCOUNT_TYPE = "accountType"; - public final static String IS_COMPRESS = "isCompress"; // preSql diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriter.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriter.java index c82fcef4..9b7276fa 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriter.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriter.java @@ -12,9 +12,9 @@ import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.odpswriter.model.PartitionInfo; import com.alibaba.datax.plugin.writer.odpswriter.model.UserDefinedFunction; import com.alibaba.datax.plugin.writer.odpswriter.util.*; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.aliyun.odps.Odps; import com.aliyun.odps.Table; import com.aliyun.odps.TableSchema; @@ -62,7 +62,6 @@ public class OdpsWriter extends Writer { private String tableName; private String tunnelServer; private String partition; - private String accountType; private boolean truncate; private String uploadId; private TableTunnel.UploadSession masterUpload; @@ -104,8 +103,6 @@ public class OdpsWriter extends Writer { this.tableName = this.originalConfig.getString(Key.TABLE); this.tunnelServer = this.originalConfig.getString(Key.TUNNEL_SERVER, null); - this.dealAK(); - // init odps config this.odps = OdpsUtil.initOdpsProject(this.originalConfig); @@ -153,31 +150,6 @@ public class OdpsWriter extends Writer { } } - private void dealAK() { - this.accountType = this.originalConfig.getString(Key.ACCOUNT_TYPE, - Constant.DEFAULT_ACCOUNT_TYPE); - - if (!Constant.DEFAULT_ACCOUNT_TYPE.equalsIgnoreCase(this.accountType) && - !Constant.TAOBAO_ACCOUNT_TYPE.equalsIgnoreCase(this.accountType)) { - throw DataXException.asDataXException(OdpsWriterErrorCode.ACCOUNT_TYPE_ERROR, - MESSAGE_SOURCE.message("odpswriter.1", accountType)); - } - this.originalConfig.set(Key.ACCOUNT_TYPE, this.accountType); - - //检查accessId,accessKey配置 - if (Constant.DEFAULT_ACCOUNT_TYPE - .equalsIgnoreCase(this.accountType)) { - this.originalConfig = IdAndKeyUtil.parseAccessIdAndKey(this.originalConfig); - String accessId = this.originalConfig.getString(Key.ACCESS_ID); - String accessKey = this.originalConfig.getString(Key.ACCESS_KEY); - if (IS_DEBUG) { - LOG.debug("accessId:[{}], accessKey:[{}] .", accessId, - accessKey); - } - LOG.info("accessId:[{}] .", accessId); - } - } - private void dealDynamicPartition() { /* * 如果显示配置了 supportDynamicPartition,则以配置为准 @@ -241,20 +213,6 @@ public class OdpsWriter extends Writer { @Override public void prepare() { - String accessId = null; - String accessKey = null; - if (Constant.DEFAULT_ACCOUNT_TYPE - .equalsIgnoreCase(this.accountType)) { - this.originalConfig = IdAndKeyUtil.parseAccessIdAndKey(this.originalConfig); - accessId = this.originalConfig.getString(Key.ACCESS_ID); - accessKey = this.originalConfig.getString(Key.ACCESS_KEY); - if (IS_DEBUG) { - LOG.debug("accessId:[{}], accessKey:[{}] .", accessId, - accessKey); - } - LOG.info("accessId:[{}] .", accessId); - } - // init odps config this.odps = OdpsUtil.initOdpsProject(this.originalConfig); diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterProxy.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterProxy.java index 221aca79..e7c95be1 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterProxy.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterProxy.java @@ -6,9 +6,9 @@ import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.odpswriter.util.OdpsUtil; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.aliyun.odps.OdpsType; import com.aliyun.odps.TableSchema; import com.aliyun.odps.data.ArrayRecord; diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/CustomPartitionUtils.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/CustomPartitionUtils.java index 51ad45a1..6153a820 100644 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/CustomPartitionUtils.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/CustomPartitionUtils.java @@ -4,7 +4,7 @@ import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.odpswriter.model.PartitionInfo; import com.alibaba.datax.plugin.writer.odpswriter.model.UserDefinedFunction; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.google.common.base.Joiner; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/IdAndKeyUtil.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/IdAndKeyUtil.java deleted file mode 100755 index 98c9afdd..00000000 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/IdAndKeyUtil.java +++ /dev/null @@ -1,65 +0,0 @@ -/** - * (C) 2010-2022 Alibaba Group Holding Limited. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.alibaba.datax.plugin.writer.odpswriter.util; - -import com.alibaba.datax.common.exception.DataXException; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.common.util.IdAndKeyRollingUtil; -import com.alibaba.datax.common.util.MessageSource; -import com.alibaba.datax.plugin.writer.odpswriter.Key; -import com.alibaba.datax.plugin.writer.odpswriter.OdpsWriterErrorCode; - -import org.apache.commons.lang3.StringUtils; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.util.Map; - -public class IdAndKeyUtil { - private static Logger LOG = LoggerFactory.getLogger(IdAndKeyUtil.class); - private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(IdAndKeyUtil.class); - - public static Configuration parseAccessIdAndKey(Configuration originalConfig) { - String accessId = originalConfig.getString(Key.ACCESS_ID); - String accessKey = originalConfig.getString(Key.ACCESS_KEY); - - // 只要 accessId,accessKey 二者配置了一个,就理解为是用户本意是要直接手动配置其 accessid/accessKey - if (StringUtils.isNotBlank(accessId) || StringUtils.isNotBlank(accessKey)) { - LOG.info("Try to get accessId/accessKey from your config."); - //通过如下语句,进行检查是否确实配置了 - accessId = originalConfig.getNecessaryValue(Key.ACCESS_ID, OdpsWriterErrorCode.REQUIRED_VALUE); - accessKey = originalConfig.getNecessaryValue(Key.ACCESS_KEY, OdpsWriterErrorCode.REQUIRED_VALUE); - //检查完毕,返回即可 - return originalConfig; - } else { - Map envProp = System.getenv(); - return getAccessIdAndKeyFromEnv(originalConfig, envProp); - } - } - - private static Configuration getAccessIdAndKeyFromEnv(Configuration originalConfig, - Map envProp) { - // 如果获取到ak,在getAccessIdAndKeyFromEnv中已经设置到originalConfig了 - String accessKey = IdAndKeyRollingUtil.getAccessIdAndKeyFromEnv(originalConfig); - if (StringUtils.isBlank(accessKey)) { - // 无处获取(既没有配置在作业中,也没用在环境变量中) - throw DataXException.asDataXException(OdpsWriterErrorCode.GET_ID_KEY_FAIL, - MESSAGE_SOURCE.message("idandkeyutil.2")); - } - return originalConfig; - } -} diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsUtil.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsUtil.java index a663da85..a3a372af 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsUtil.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsUtil.java @@ -79,7 +79,6 @@ public class OdpsUtil { public static Odps initOdpsProject(Configuration originalConfig) { - String accountType = originalConfig.getString(Key.ACCOUNT_TYPE); String accessId = originalConfig.getString(Key.ACCESS_ID); String accessKey = originalConfig.getString(Key.ACCESS_KEY); @@ -88,15 +87,10 @@ public class OdpsUtil { String securityToken = originalConfig.getString(Key.SECURITY_TOKEN); Account account; - if (accountType.equalsIgnoreCase(Constant.DEFAULT_ACCOUNT_TYPE)) { - if (StringUtils.isNotBlank(securityToken)) { - account = new com.aliyun.odps.account.StsAccount(accessId, accessKey, securityToken); - } else { - account = new AliyunAccount(accessId, accessKey); - } + if (StringUtils.isNotBlank(securityToken)) { + account = new com.aliyun.odps.account.StsAccount(accessId, accessKey, securityToken); } else { - throw DataXException.asDataXException(OdpsWriterErrorCode.ACCOUNT_TYPE_ERROR, - MESSAGE_SOURCE.message("odpsutil.4", accountType)); + account = new AliyunAccount(accessId, accessKey); } Odps odps = new Odps(account); diff --git a/opentsdbreader/pom.xml b/opentsdbreader/pom.xml index f2263726..b10fba02 100644 --- a/opentsdbreader/pom.xml +++ b/opentsdbreader/pom.xml @@ -24,9 +24,6 @@ 4.5 2.4 - - 1.2.28 - 2.3.2 @@ -47,10 +44,6 @@ slf4j-log4j12 org.slf4j - - fastjson - com.alibaba - commons-math3 org.apache.commons @@ -92,9 +85,8 @@ - com.alibaba - fastjson - ${fastjson.version} + com.alibaba.fastjson2 + fastjson2 diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java index 64c124ae..e8a84fb2 100644 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.reader.conn; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import java.util.Map; diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java index 939a856f..49ba5fb3 100644 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java @@ -2,7 +2,7 @@ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.util.TSDBUtils; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java index 009aa100..6f3c551a 100644 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java @@ -1,7 +1,7 @@ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.plugin.RecordSender; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import net.opentsdb.core.TSDB; import net.opentsdb.utils.Config; diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java index 4cd0476e..7790a2b1 100755 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java @@ -6,7 +6,7 @@ import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.conn.OpenTSDBConnection; import com.alibaba.datax.plugin.reader.util.TimeUtils; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.joda.time.DateTime; import org.slf4j.Logger; diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java index cbd0d7ca..fa82b634 100644 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.reader.util; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; import org.apache.http.entity.ContentType; diff --git a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java index bbfb75cb..9f1e38d5 100644 --- a/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java +++ b/opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java @@ -1,7 +1,7 @@ package com.alibaba.datax.plugin.reader.util; import com.alibaba.datax.plugin.reader.conn.DataPoint4TSDB; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/opentsdbreader/src/main/resources/plugin.json b/opentsdbreader/src/main/resources/plugin.json index 692a9853..5c9cbed9 100755 --- a/opentsdbreader/src/main/resources/plugin.json +++ b/opentsdbreader/src/main/resources/plugin.json @@ -6,5 +6,5 @@ "mechanism": "根据时间和 metric 直连底层 HBase 存储,从而 Scan 出符合条件的数据点", "warn": "指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00)" }, - "developer": "Benedict Jin" + "developer": "alibaba" } diff --git a/oraclereader/pom.xml b/oraclereader/pom.xml index ae8e06fa..d60e5ebf 100755 --- a/oraclereader/pom.xml +++ b/oraclereader/pom.xml @@ -44,8 +44,6 @@ com.oracle ojdbc6 11.2.0.3 - system - ${basedir}/src/main/lib/ojdbc6-11.2.0.3.jar diff --git a/oraclereader/src/main/assembly/package.xml b/oraclereader/src/main/assembly/package.xml index a0c9fd1c..a954a30d 100755 --- a/oraclereader/src/main/assembly/package.xml +++ b/oraclereader/src/main/assembly/package.xml @@ -15,13 +15,6 @@ plugin_job_template.json plugin/reader/oraclereader - - - src/main/lib - - ojdbc6-11.2.0.3.jar - - plugin/reader/oraclereader/libs target/ diff --git a/oraclereader/src/main/lib/ojdbc6-11.2.0.3.jar b/oraclereader/src/main/lib/ojdbc6-11.2.0.3.jar deleted file mode 100644 index 01da074d..00000000 Binary files a/oraclereader/src/main/lib/ojdbc6-11.2.0.3.jar and /dev/null differ diff --git a/oraclewriter/pom.xml b/oraclewriter/pom.xml index 95b78caf..1e8d0274 100755 --- a/oraclewriter/pom.xml +++ b/oraclewriter/pom.xml @@ -42,8 +42,6 @@ com.oracle ojdbc6 11.2.0.3 - system - ${basedir}/src/main/lib/ojdbc6-11.2.0.3.jar diff --git a/oraclewriter/src/main/assembly/package.xml b/oraclewriter/src/main/assembly/package.xml index 09a25d1a..9dab0c8e 100755 --- a/oraclewriter/src/main/assembly/package.xml +++ b/oraclewriter/src/main/assembly/package.xml @@ -16,13 +16,6 @@ plugin/writer/oraclewriter - - src/main/lib - - ojdbc6-11.2.0.3.jar - - plugin/writer/oraclewriter/libs - target/ diff --git a/oraclewriter/src/main/lib/ojdbc6-11.2.0.3.jar b/oraclewriter/src/main/lib/ojdbc6-11.2.0.3.jar deleted file mode 100644 index 01da074d..00000000 Binary files a/oraclewriter/src/main/lib/ojdbc6-11.2.0.3.jar and /dev/null differ diff --git a/oscarwriter/src/main/resources/plugin.json b/oscarwriter/src/main/resources/plugin.json index f1a99fec..43adfbfe 100644 --- a/oscarwriter/src/main/resources/plugin.json +++ b/oscarwriter/src/main/resources/plugin.json @@ -2,5 +2,5 @@ "name": "oscarwriter", "class": "com.alibaba.datax.plugin.writer.oscarwriter.OscarWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", - "developer": "linjiayu" + "developer": "alibaba" } \ No newline at end of file diff --git a/ossreader/doc/ossreader.md b/ossreader/doc/ossreader.md index e0259a2a..51d757bc 100644 --- a/ossreader/doc/ossreader.md +++ b/ossreader/doc/ossreader.md @@ -26,6 +26,8 @@ OSSReader实现了从OSS读取数据并转为DataX协议的功能,OSS本身是 6. 多个object可以支持并发读取。 +7. 支持读取 parquet orc 文件 + 我们暂时不能做到: 1. 单个Object(File)支持多线程并发读取,这里涉及到单个Object内部切分算法。二期考虑支持。 @@ -37,7 +39,7 @@ OSSReader实现了从OSS读取数据并转为DataX协议的功能,OSS本身是 ### 3.1 配置样例 - +读取 txt, csv 格式样例 ```json { "job": { @@ -80,6 +82,63 @@ OSSReader实现了从OSS读取数据并转为DataX协议的功能,OSS本身是 } } ``` +读取 orc 格式样例 +```json +{ + "stepType": "oss", + "parameter": { + "endpoint": "http://oss.aliyuncs.com", + "accessId": "", + "accessKey": "", + "bucket": "myBucket", + "fileFormat": "orc", + "path": "/tests/case61/orc__691b6815_9260_4037_9899_****", + "column": [ + { + "index": 0, + "type": "long" + }, + { + "index": "1", + "type": "string" + }, + { + "index": "2", + "type": "string" + } + ] + } +} +``` +读取 parquet 格式样例 +```json +{ + "stepType": "oss", + "parameter": { + "endpoint": "http://oss.aliyuncs.com", + "accessId": "", + "accessKey": "", + "bucket": "myBucket", + "fileFormat": "parquet", + "path": "/parquet", + "parquetSchema":"message m { optional BINARY registration_dttm (UTF8); optional Int64 id; optional BINARY first_name (UTF8); optional BINARY last_name (UTF8); optional BINARY email (UTF8); optional BINARY gender (UTF8); optional BINARY ip_address (UTF8); optional BINARY cc (UTF8); optional BINARY country (UTF8); optional BINARY birthdate (UTF8); optional DOUBLE salary; optional BINARY title (UTF8); optional BINARY comments (UTF8); }", + "column": [ + { + "index": 0, + "type": "long" + }, + { + "index": "1", + "type": "string" + }, + { + "index": "2", + "type": "string" + } + ] + } +} +``` ### 3.2 参数说明 diff --git a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReader.java b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReader.java index 62a1f81f..9b76c53e 100755 --- a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReader.java +++ b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReader.java @@ -12,8 +12,8 @@ import com.alibaba.datax.plugin.unstructuredstorage.FileFormat; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.binaryFileUtil.BinaryFileReaderUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.StartEndPair; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import com.aliyun.oss.ClientException; import com.aliyun.oss.OSSClient; import com.aliyun.oss.OSSException; diff --git a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/HdfsParquetUtil.java b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/HdfsParquetUtil.java index f332bb95..3012c84a 100644 --- a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/HdfsParquetUtil.java +++ b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/HdfsParquetUtil.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.reader.ossreader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.ossreader.Key; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; /** * @Author: guxuan diff --git a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssSplitUtil.java b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssSplitUtil.java index 760d8d5f..6ba80999 100644 --- a/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssSplitUtil.java +++ b/ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssSplitUtil.java @@ -7,8 +7,8 @@ import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.StartEndPair; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.UnstructuredSplitUtil; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import com.aliyun.oss.OSSClient; import com.aliyun.oss.model.GetObjectRequest; import com.aliyun.oss.model.OSSObject; diff --git a/osswriter/doc/osswriter.md b/osswriter/doc/osswriter.md index 1a3d3e47..0c23e698 100644 --- a/osswriter/doc/osswriter.md +++ b/osswriter/doc/osswriter.md @@ -18,7 +18,7 @@ OSSWriter提供了向OSS写入类CSV格式的一个或者多个表文件。 OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无结构化数据存储,OSSWriter需要在如下几个方面增加: -1. 支持且仅支持写入 TXT的文件,且要求TXT中shema为一张二维表。 +1. 支持写入 TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 @@ -28,6 +28,8 @@ OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无 7. 文件支持滚动,当文件大于某个size值或者行数值,文件需要切换。 [暂不支持] +8. 支持写 PARQUET、ORC 文件 + 我们不能做到: 1. 单个文件不能支持并发写入。 @@ -37,7 +39,7 @@ OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无 ### 3.1 配置样例 - +写 txt文件样例 ```json { "job": { @@ -65,7 +67,90 @@ OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无 } } ``` +写 orc 文件样例 +```json +{ + "job": { + "setting": {}, + "content": [ + { + "reader": {}, + "writer": { + "name": "osswriter", + "parameter": { + "endpoint": "http://oss.aliyuncs.com", + "accessId": "", + "accessKey": "", + "bucket": "myBucket", + "fileName": "test", + "encoding": "UTF-8", + "column": [ + { + "name": "col1", + "type": "BIGINT" + }, + { + "name": "col2", + "type": "DOUBLE" + }, + { + "name": "col3", + "type": "STRING" + } + ], + "fileFormat": "orc", + "path": "/tests/case61", + "writeMode": "append" + } + } + } + ] + } +} +``` +写 parquet 文件样例 +```json +{ + "job": { + "setting": {}, + "content": [ + { + "reader": {}, + "writer": { + "name": "osswriter", + "parameter": { + "endpoint": "http://oss.aliyuncs.com", + "accessId": "", + "accessKey": "", + "bucket": "myBucket", + "fileName": "test", + "encoding": "UTF-8", + "column": [ + { + "name": "col1", + "type": "BIGINT" + }, + { + "name": "col2", + "type": "DOUBLE" + }, + { + "name": "col3", + "type": "STRING" + } + ], + "parquetSchema": "message test { required int64 int64_col;\n required binary str_col (UTF8);\nrequired group params (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired binary value (UTF8);\n}\n}\nrequired group params_arr (LIST) {\n repeated group list {\n required binary element (UTF8);\n }\n}\nrequired group params_struct {\n required int64 id;\n required binary name (UTF8);\n }\nrequired group params_arr_complex (LIST) {\n repeated group list {\n required group element {\n required int64 id;\n required binary name (UTF8);\n}\n }\n}\nrequired group params_complex (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired group value {\n required int64 id;\n required binary name (UTF8);\n }\n}\n}\nrequired group params_struct_complex {\n required int64 id;\n required group detail {\n required int64 id;\n required binary name (UTF8);\n }\n }\n}", + "fileFormat": "parquet", + "path": "/tests/case61", + "writeMode": "append" + } + } + } + ] + } +} +``` ### 3.2 参数说明 * **endpoint** diff --git a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriter.java b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriter.java index a8aec0e6..f96a8e01 100644 --- a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriter.java +++ b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriter.java @@ -14,7 +14,7 @@ import com.alibaba.datax.plugin.unstructuredstorage.writer.binaryFileUtil.Binary import com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter; import com.alibaba.datax.plugin.writer.osswriter.util.HandlerUtil; import com.alibaba.datax.plugin.writer.osswriter.util.HdfsParquetUtil; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.aliyun.oss.model.*; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; diff --git a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileSupport.java b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileSupport.java index 9daa5a7f..c3ff777c 100644 --- a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileSupport.java +++ b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileSupport.java @@ -5,9 +5,9 @@ import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.unstructuredstorage.writer.Key; import com.alibaba.datax.plugin.writer.osswriter.Constant; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONArray; +import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.conf.Configuration; import org.slf4j.Logger; diff --git a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HdfsParquetUtil.java b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HdfsParquetUtil.java index ccd3aa35..dc102dac 100644 --- a/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HdfsParquetUtil.java +++ b/osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HdfsParquetUtil.java @@ -3,8 +3,8 @@ package com.alibaba.datax.plugin.writer.osswriter.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter; import com.alibaba.datax.plugin.writer.osswriter.Key; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.FileSystem; diff --git a/otsreader/doc/otsreader.md b/otsreader/doc/otsreader.md index 1297dbd6..77b4edfe 100644 --- a/otsreader/doc/otsreader.md +++ b/otsreader/doc/otsreader.md @@ -13,7 +13,7 @@ OTSReader插件实现了从OTS读取数据,并可以通过用户指定抽取 * 范围抽取 * 指定分片抽取 -OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务,提供海量结构化数据的存储和实时访问。OTS 以实例和表的形式组织数据,通过数据分片和负载均衡技术,实现规模上的无缝扩展。 +本版本的OTSReader新增了支持多版本数据的读取功能,同时兼容旧版本的配置文件 ## 2 实现原理 @@ -25,201 +25,425 @@ OTSReader会根据OTS的表范围,按照Datax并发的数目N,将范围等 ### 3.1 配置样例 -* 配置一个从OTS全表同步抽取数据到本地的作业: +#### 3.1.1 +* 配置一个从OTS表读取单版本数据的reader: ``` { - "job": { - "setting": { - }, - "content": [ - { - "reader": { - "name": "otsreader", - "parameter": { - /* ----------- 必填 --------------*/ - "endpoint":"", - "accessId":"", - "accessKey":"", - "instanceName":"", - - // 导出数据表的表名 - "table":"", - - // 需要导出的列名,支持重复列和常量列,区分大小写 - // 常量列:类型支持STRING,INT,DOUBLE,BOOL和BINARY - // 备注:BINARY需要通过Base64转换为对应的字符串传入插件 - "column":[ - {"name":"col1"}, // 普通列 - {"name":"col2"}, // 普通列 - {"name":"col3"}, // 普通列 - {"type":"STRING", "value" : "bazhen"}, // 常量列(字符串) - {"type":"INT", "value" : ""}, // 常量列(整形) - {"type":"DOUBLE", "value" : ""}, // 常量列(浮点) - {"type":"BOOL", "value" : ""}, // 常量列(布尔) - {"type":"BINARY", "value" : "Base64(bin)"} // 常量列(二进制),使用Base64编码完成 - ], - "range":{ - // 导出数据的起始范围 - // 支持INF_MIN, INF_MAX, STRING, INT - "begin":[ - {"type":"INF_MIN"}, - ], - // 导出数据的结束范围 - // 支持INF_MIN, INF_MAX, STRING, INT - "end":[ - {"type":"INF_MAX"}, - ] - } - } - }, - "writer": {} - } - ] - } -} -``` - -* 配置一个定义抽取范围的OTSReader: - -``` -{ - "job": { - "setting": { - "speed": { - "byte":10485760 + "job": { + "setting": { + "speed": { + //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. + "byte": 1048576 + } + //出错限制 + "errorLimit": { + //出错的record条数上限,当大于该值即报错。 + "record": 0, + //出错的record百分比上限 1.0表示100%,0.02表示2% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "otsreader-internal", + "parameter": { + "endpoint":"", + "accessId":"", + "accessKey":"", + "instanceName":"", + "table": "", + //version定义了是否使用新版本插件 可选值:false || true + "newVersion":"false", + //mode定义了读取数据的格式(普通数据/多版本数据),可选值:normal || multiversion + "mode": "normal", + + // 导出的范围,读取的范围是[begin,end),左闭右开的区间 + // begin小于end,表示正序读取数据 + // begin大于end,表示反序读取数据 + // begin和end不能相等 + // type支持的类型有如下几类: + // string、int、binary + // binary输入的方式采用二进制的Base64字符串形式传入 + // INF_MIN 表示无限小 + // INF_MAX 表示无限大 + "range":{ + // 可选,默认表示从无限小开始读取 + // 这个值的输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MIN,反序为INF_MAX + // 例子: + // 如果用户的表有2个PK,类型分别为string、int,那么如下3种输入都是合法,如: + // 1. [] --> 表示从表的开始位置读取 + // 2. [{"type":"string", "value":"a"}] --> 表示从[{"type":"string", "value":"a"},{"type":"INF_MIN"}] + // 3. [{"type":"string", "value":"a"},{"type":"INF_MIN"}] + // + // binary类型的PK列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 + // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 + // 例子(Java): + // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 + // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 + // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" + // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} + + "begin":[{"type":"string", "value":"a"},{"type":"INF_MIN"}], + + // 默认表示读取到无限大结束 + // 这个值得输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MAX,反序为INF_MIN + // 可选 + "end":[{"type":"string", "value":"a"},{"type":"INF_MAX"}], + + // 当前用户数据较多时,需要开启并发导出,Split可以将当前范围的的数据按照切分点切分为多个并发任务 + // 可选 + // 1. split中的输入值只能PK的第一列(分片建),且值的类型必须和PartitionKey一致 + // 2. 值的范围必须在begin和end之间 + // 3. split内部的值必须根据begin和end的正反序关系而递增或者递减 + "split":[{"type":"string", "value":"b"}, {"type":"string", "value":"c"}] }, - "errorLimit":0.0 + + + // 指定要导出的列,支持普通列和常量列 + // 格式 + // 普通列格式:{"name":"{your column name}"} + // 常量列格式:{"type":"", "value":""} , type支持string、int、binary、bool、double + // binary类型需要使用base64转换成对应的字符串传入 + // 注意: + // 1. PK列也是需要用户在下面单独指定 + "column": [ + {"name":"pk1"}, // 普通列,下同 + {"name":"pk2"}, + {"name":"attr1"}, + {"type":"string","value" : ""} // 指定常量列,下同 + {"type":"int","value" : ""} + {"type":"double","value" : ""} + // binary类型的常量列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 + // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 + // 例子(Java): + // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 + // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 + // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" + // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} + + {"type":"binary","value" : "aGVsbG8="} + ], + } }, - "content": [ - { - "reader": { - "name": "otsreader", - "parameter": { - "endpoint":"", - "accessId":"", - "accessKey":"", - "instanceName":"", - - // 导出数据表的表名 - "table":"", - - // 需要导出的列名,支持重复类和常量列,区分大小写 - // 常量列:类型支持STRING,INT,DOUBLE,BOOL和BINARY - // 备注:BINARY需要通过Base64转换为对应的字符串传入插件 - "column":[ - {"name":"col1"}, // 普通列 - {"name":"col2"}, // 普通列 - {"name":"col3"}, // 普通列 - {"type":"STRING","value" : ""}, // 常量列(字符串) - {"type":"INT","value" : ""}, // 常量列(整形) - {"type":"DOUBLE","value" : ""}, // 常量列(浮点) - {"type":"BOOL","value" : ""}, // 常量列(布尔) - {"type":"BINARY","value" : "Base64(bin)"} // 常量列(二进制) - ], - "range":{ - // 导出数据的起始范围 - // 支持INF_MIN, INF_MAX, STRING, INT - "begin":[ - {"type":"INF_MIN"}, - {"type":"INF_MAX"}, - {"type":"STRING", "value":"hello"}, - {"type":"INT", "value":"2999"}, - ], - // 导出数据的结束范围 - // 支持INF_MIN, INF_MAX, STRING, INT - "end":[ - {"type":"INF_MAX"}, - {"type":"INF_MIN"}, - {"type":"STRING", "value":"hello"}, - {"type":"INT", "value":"2999"}, - ] - } - } - }, - "writer": {} - } - ] - } + "writer": { + //writer类型 + "name": "streamwriter", + //是否打印内容 + "parameter": { + "print": true + } + } + } + ] + } } ``` +#### 3.1.2 +* 配置一个从OTS表读取多版本数据的reader(仅在newVersion == true时支持): + +``` +{ + "job": { + "setting": { + "speed": { + //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. + "byte": 1048576 + } + //出错限制 + "errorLimit": { + //出错的record条数上限,当大于该值即报错。 + "record": 0, + //出错的record百分比上限 1.0表示100%,0.02表示2% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "otsreader-internal", + "parameter": { + "endpoint":"", + "accessId":"", + "accessKey":"", + "instanceName":"", + "table": "", + //version定义了是否使用新版本插件 可选值:false || true + "newVersion":"true", + //mode定义了读取数据的格式(普通数据/多版本数据),可选值:normal || multiversion + "mode": "multiversion", + + // 导出的范围,,读取的范围是[begin,end),左闭右开的区间 + // begin小于end,表示正序读取数据 + // begin大于end,表示反序读取数据 + // begin和end不能相等 + // type支持的类型有如下几类: + // string、int、binary + // binary输入的方式采用二进制的Base64字符串形式传入 + // INF_MIN 表示无限小 + // INF_MAX 表示无限大 + "range":{ + // 可选,默认表示从无限小开始读取 + // 这个值的输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MIN,反序为INF_MAX + // 例子: + // 如果用户的表有2个PK,类型分别为string、int,那么如下3种输入都是合法,如: + // 1. [] --> 表示从表的开始位置读取 + // 2. [{"type":"string", "value":"a"}] --> 表示从[{"type":"string", "value":"a"},{"type":"INF_MIN"}] + // 3. [{"type":"string", "value":"a"},{"type":"INF_MIN"}] + // + // binary类型的PK列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 + // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 + // 例子(Java): + // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 + // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 + // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" + // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} + + "begin":[{"type":"string", "value":"a"},{"type":"INF_MIN"}], + + // 默认表示读取到无限大结束 + // 这个值得输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MAX,反序为INF_MIN + // 可选 + "end":[{"type":"string", "value":"g"},{"type":"INF_MAX"}], + + // 当前用户数据较多时,需要开启并发导出,Split可以将当前范围的的数据按照切分点切分为多个并发任务 + // 可选 + // 1. split中的输入值只能PK的第一列(分片建),且值的类型必须和PartitionKey一致 + // 2. 值的范围必须在begin和end之间 + // 3. split内部的值必须根据begin和end的正反序关系而递增或者递减 + "split":[{"type":"string", "value":"b"}, {"type":"string", "value":"c"}] + }, + + // 指定要导出的列,在多版本模式下只支持普通列 + // 格式: + // 普通列格式:{"name":"{your column name}"} + // 可选,默认导出所有列的所有版本 + // 注意: + // 1.在多版本模式下,不支持常量列 + // 2.PK列不能指定,导出4元组中默认包括完整的PK + // 3.不能重复指定列 + "column": [ + {"name":"attr1"} + ], + + // 请求数据的Time Range,读取的范围是[begin,end),左闭右开的区间 + // 可选,默认读取全部版本 + // 注意:begin必须小于end + "timeRange":{ + // 可选,默认为0 + // 取值范围是0~LONG_MAX + "begin":1400000000, + // 可选,默认为Long Max(9223372036854775807L) + // 取值范围是0~LONG_MAX + "end" :1600000000 + }, + + // 请求的指定Version + // 可选,默认读取所有版本 + // 取值范围是1~INT32_MAX + "maxVersion":10, + } + }, + "writer": { + //writer类型 + "name": "streamwriter", + //是否打印内容 + "parameter": { + "print": true + } + } + } + ] + } +} +``` +#### 3.1.3 +* 配置一个从OTS **时序表**读取数据的reader(仅在newVersion == true时支持): +```json +{ + "job": { + "setting": { + "speed": { + // 读取时序数据的通道数 + "channel": 5 + } + }, + "content": [ + { + "reader": { + "name": "otsreader", + "parameter": { + "endpoint": "", + "accessId": "", + "accessKey": "", + "instanceName": "", + "table": "", + // 读时序数据mode必须为normal + "mode": "normal", + // 读时序数据newVersion必须为true + "newVersion": "true", + // 配置该表为时序表 + "isTimeseriesTable":"true", + // 配置需要读取时间线的measurementName字段,非必需 + // 为空则读取全表数据 + "measurementName":"measurement_5", + // column是一个数组,每个元素表示一列 + // 对于常量列,需要配置以下字段: + // 1. type : 字段值类型,必需 + // 支持类型 : string, int, double, bool, binary + // 2. value : 字段值,必需 + // + // 对于普通列,需要配置以下字段: + // 1. name : 列名,必需 + // 时间线的'度量名称'使用_m_name标识,数据类型为String + // 时间线的'数据源'使用_data_source标识,数据类型为String + // 时间线的'标签'使用_tags标识,数据类型为String + // 时间线的'时间戳'使用_time标识,数据类型为Long + // 2. is_timeseries_tag : 是否为tags字段内部的键值,非必需,默认为false。 + // 3. type : 字段值类型,非必需,默认为string。 + // 支持类型 : string, int, double, bool, binary + "column": [ + { + "name": "_m_name" + }, + { + "name": "tagA", + "is_timeseries_tag":"true" + }, + { + "name": "double_0", + "type":"DOUBLE" + }, + { + "name": "string_0", + "type":"STRING" + }, + { + "name": "long_0", + "type":"int" + }, + { + "name": "binary_0", + "type":"BINARY" + }, + { + "name": "bool_0", + "type":"BOOL" + }, + { + "type":"STRING", + "value":"testString" + } + ] + } + }, + "writer": { + + } + } + ] + } +} + +``` ### 3.2 参数说明 * **endpoint** - * 描述:OTS Server的EndPoint地址,例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 + * 描述:OTS Server的EndPoint地址,例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 - * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
* **accessId** - * 描述:OTS的accessId
+ * 描述:OTS的accessId
- * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
* **accessKey** - * 描述:OTS的accessKey
+ * 描述:OTS的accessKey
- * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
* **instanceName** - * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
+ * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
- * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
* **table** - * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
+ * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
- * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
+ +* **newVersion** + + * 描述:version定义了使用的ots SDK版本。
+ * true,新版本插件,使用com.alicloud.openservices.tablestore的依赖(推荐) + * false,旧版本插件,使用com.aliyun.openservices.ots的依赖,**不支持多版本数据的读取** + + * 必选:否
+ + * 默认值:false
+ +* **mode** + + * 描述:读取为多版本格式的数据,目前有两种模式。
+ * normal,对应普通的数据 + * multiVersion,写入数据为多版本格式的数据,多版本模式下,配置参数有所不同,详见3.1.2 + + * 必选:否
+ + * 默认值:normal
* **column** - * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。由于OTS本身是NoSQL系统,在OTSReader抽取数据过程中,必须指定相应地字段名称。 + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。由于OTS本身是NoSQL系统,在OTSReader抽取数据过程中,必须指定相应地字段名称。 - 支持普通的列读取,例如: {"name":"col1"} + 支持普通的列读取,例如: {"name":"col1"} - 支持部分列读取,如用户不配置该列,则OTSReader不予读取。 + 支持部分列读取,如用户不配置该列,则OTSReader不予读取。 - 支持常量列读取,例如: {"type":"STRING", "value" : "DataX"}。使用type描述常量类型,目前支持STRING、INT、DOUBLE、BOOL、BINARY(用户使用Base64编码填写)、INF_MIN(OTS的系统限定最小值,使用该值用户不能填写value属性,否则报错)、INF_MAX(OTS的系统限定最大值,使用该值用户不能填写value属性,否则报错)。 + 支持常量列读取,例如: {"type":"STRING", "value" : "DataX"}。使用type描述常量类型,目前支持STRING、INT、DOUBLE、BOOL、BINARY(用户使用Base64编码填写)、INF_MIN(OTS的系统限定最小值,使用该值用户不能填写value属性,否则报错)、INF_MAX(OTS的系统限定最大值,使用该值用户不能填写value属性,否则报错)。 - 不支持函数或者自定义表达式,由于OTS本身不提供类似SQL的函数或者表达式功能,OTSReader也不能提供函数或表达式列功能。 + 不支持函数或者自定义表达式,由于OTS本身不提供类似SQL的函数或者表达式功能,OTSReader也不能提供函数或表达式列功能。 - * 必选:是
+ * 必选:是
- * 默认值:无
+ * 默认值:无
* **begin/end** - * 描述:该配置项必须配对使用,用于支持OTS表范围抽取。begin/end中描述的是OTS **PrimaryKey**的区间分布状态,而且必须保证区间覆盖到所有的PrimaryKey,**需要指定该表下所有的PrimaryKey范围,不能遗漏任意一个PrimaryKey**,对于无限大小的区间,可以使用{"type":"INF_MIN"},{"type":"INF_MAX"}指代。例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,begin/end可以配置为: + * 描述:该配置项必须配对使用,用于支持OTS表范围抽取。begin/end中描述的是OTS **PrimaryKey**的区间分布状态,而且必须保证区间覆盖到所有的PrimaryKey,**需要指定该表下所有的PrimaryKey范围,不能遗漏任意一个PrimaryKey**,对于无限大小的区间,可以使用{"type":"INF_MIN"},{"type":"INF_MAX"}指代。例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,begin/end可以配置为: - ```json - "range": { - "begin": { - {"type":"INF_MIN"}, //指定deviceID最小值 - {"type":"INT", "value":"0"} //指定deviceID最小值 - }, - "end": { - {"type":"INF_MAX"}, //指定deviceID抽取最大值 - {"type":"INT", "value":"9999"} //指定deviceID抽取最大值 - } - } - ``` + ```json + "range": { + "begin": { + {"type":"INF_MIN"}, //指定deviceID最小值 + {"type":"INT", "value":"0"} //指定deviceID最小值 + }, + "end": { + {"type":"INF_MAX"}, //指定deviceID抽取最大值 + {"type":"INT", "value":"9999"} //指定deviceID抽取最大值 + } + } + ``` 如果要对上述表抽取全表,可以使用如下配置: @@ -237,42 +461,42 @@ OTSReader会根据OTS的表范围,按照Datax并发的数目N,将范围等 } ``` - * 必选:是
+ * 必选:否
- * 默认值:空
+ * 默认值:读取全部值
* **split** - * 描述:该配置项属于高级配置项,是用户自己定义切分配置信息,普通情况下不建议用户使用。适用场景通常在OTS数据存储发生热点,使用OTSReader自动切分的策略不能生效情况下,使用用户自定义的切分规则。split指定是的在Begin、End区间内的切分点,且只能是partitionKey的切分点信息,即在split仅配置partitionKey,而不需要指定全部的PrimaryKey。 + * 描述:该配置项属于高级配置项,是用户自己定义切分配置信息,普通情况下不建议用户使用。适用场景通常在OTS数据存储发生热点,使用OTSReader自动切分的策略不能生效情况下,使用用户自定义的切分规则。split指定是的在Begin、End区间内的切分点,且只能是partitionKey的切分点信息,即在split仅配置partitionKey,而不需要指定全部的PrimaryKey。 - 例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,可以配置为: + 例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,可以配置为: - ```json - "range": { - "begin": { - {"type":"INF_MIN"}, //指定deviceID最小值 - {"type":"INF_MIN"} //指定deviceID最小值 - }, - "end": { - {"type":"INF_MAX"}, //指定deviceID抽取最大值 - {"type":"INF_MAX"} //指定deviceID抽取最大值 - }, - // 用户指定的切分点,如果指定了切分点,Job将按照begin、end和split进行Task的切分, - // 切分的列只能是Partition Key(ParimaryKey的第一列) - // 支持INF_MIN, INF_MAX, STRING, INT - "split":[ - {"type":"STRING", "value":"1"}, - {"type":"STRING", "value":"2"}, - {"type":"STRING", "value":"3"}, - {"type":"STRING", "value":"4"}, - {"type":"STRING", "value":"5"} - ] - } - ``` + ```json + "range": { + "begin": { + {"type":"INF_MIN"}, //指定deviceID最小值 + {"type":"INF_MIN"} //指定deviceID最小值 + }, + "end": { + {"type":"INF_MAX"}, //指定deviceID抽取最大值 + {"type":"INF_MAX"} //指定deviceID抽取最大值 + }, + // 用户指定的切分点,如果指定了切分点,Job将按照begin、end和split进行Task的切分, + // 切分的列只能是Partition Key(ParimaryKey的第一列) + // 支持INF_MIN, INF_MAX, STRING, INT + "split":[ + {"type":"STRING", "value":"1"}, + {"type":"STRING", "value":"2"}, + {"type":"STRING", "value":"3"}, + {"type":"STRING", "value":"4"}, + {"type":"STRING", "value":"5"} + ] + } + ``` - * 必选:否
+ * 必选:否
- * 默认值:无
+ * 默认值:无
### 3.3 类型转换 @@ -291,44 +515,14 @@ OTSReader会根据OTS的表范围,按照Datax并发的数目N,将范围等 * 注意,OTS本身不支持日期型类型。应用层一般使用Long报错时间的Unix TimeStamp。 -## 4 性能报告 -### 4.1 环境准备 +## 4 约束限制 -#### 4.1.1 数据特征 - -15列String(10 Byte), 2两列Integer(8 Byte),总计168Byte/r。 - -#### 4.1.2 机器参数 - -OTS端:3台前端机,5台后端机 - -DataX运行端: 24核CPU, 98GB内存 - -#### 4.1.3 DataX jvm 参数 - - -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError - -### 4.2 测试报告 - -#### 4.2.1 测试报告 - -|并发数|DataX CPU|OTS 流量|DATAX流量 | 前端QPS| 前端延时| -|--------|--------| --------|--------|--------|------| -|2| 36% |6.3M/s |12739 rec/s | 4.7 | 308ms | -|11| 155% | 32M/s |60732 rec/s | 23.9 | 412ms | -|50| 377% | 73M/s |145139 rec/s | 54 | 874ms | -|100| 448% | 82M/s | 156262 rec/s |60 | 1570ms | - - - -## 5 约束限制 - -### 5.1 一致性约束 +### 4.1 一致性约束 OTS是类BigTable的存储系统,OTS本身能够保证单行写事务性,无法提供跨行级别的事务。对于OTSReader而言也无法提供全表的一致性视图。例如对于OTSReader在0点启动的数据同步任务,在整个表数据同步过程中,OTSReader同样会抽取到后续更新的数据,无法提供准确的0点时刻该表一致性视图。 -### 5.2 增量数据同步 +### 4.2 增量数据同步 OTS本质上KV存储,目前只能针对PK进行范围查询,暂不支持按照字段范围抽取数据。因此只能对于增量查询,如果PK能够表示范围信息,例如自增ID,或者时间戳。 @@ -336,5 +530,4 @@ OTS本质上KV存储,目前只能针对PK进行范围查询,暂不支持按 时间戳, OTSReader可以通过PK过滤时间戳,通过制定Range范围进行增量抽取。这样使用的前提是OTS中的PrimaryKey必须包含主键时间列(时间主键需要使用OTS应用方生成。) -## 6 FAQ - +## 5 FAQ diff --git a/otsreader/pom.xml b/otsreader/pom.xml index eaac8804..dad538bf 100644 --- a/otsreader/pom.xml +++ b/otsreader/pom.xml @@ -1,5 +1,5 @@ + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 4.0.0 com.alibaba.datax @@ -10,17 +10,6 @@ otsreader - - org.apache.logging.log4j - log4j-api - 2.17.1 - - - - org.apache.logging.log4j - log4j-core - 2.17.1 - com.alibaba.datax datax-common @@ -47,22 +36,43 @@ 2.2.4 - log4j-api + log4j-core org.apache.logging.log4j + + + + com.aliyun.openservices + tablestore + 5.13.13 + log4j-core org.apache.logging.log4j - + com.google.code.gson gson 2.2.4 + + com.alibaba + fastjson + 1.2.83_noneautotype + compile + + + + src/main/java + + **/*.properties + + + @@ -98,10 +108,6 @@ maven-surefire-plugin 2.5 - all - 10 - true - -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=. **/unittest/*.java **/functiontest/*.java @@ -111,4 +117,3 @@ - diff --git a/otsreader/src/main/assembly/package.xml b/otsreader/src/main/assembly/package.xml index 7ee305d1..cb90f3e8 100644 --- a/otsreader/src/main/assembly/package.xml +++ b/otsreader/src/main/assembly/package.xml @@ -12,8 +12,8 @@ src/main/resources plugin.json - plugin_job_template.json - + plugin_job_template.json +
plugin/reader/otsreader
diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderMasterProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderMasterProxy.java new file mode 100644 index 00000000..ee622e16 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderMasterProxy.java @@ -0,0 +1,15 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import java.util.List; + +import com.alibaba.datax.common.util.Configuration; + +public interface IOtsReaderMasterProxy { + + public void init(Configuration param) throws Exception; + + public List split(int num) throws Exception; + + public void close(); + +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderSlaveProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderSlaveProxy.java new file mode 100644 index 00000000..d1100a2a --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderSlaveProxy.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.util.Configuration; + +/** + * OTS Reader工作进程接口 + */ +public interface IOtsReaderSlaveProxy { + /** + * 初始化函数,解析配置、初始化相关资源 + */ + public void init(Configuration configuration); + + /** + * 关闭函数,释放资源 + */ + public void close(); + + /** + * 数据导出函数 + * @param recordSender + * @throws Exception + */ + public void startRead(RecordSender recordSender) throws Exception; +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReader.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReader.java index 8880c07e..c6bc44b8 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReader.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReader.java @@ -1,45 +1,48 @@ package com.alibaba.datax.plugin.reader.otsreader; -import java.util.List; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.reader.otsreader.utils.Common; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSMode; +import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; +import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; +import com.alibaba.datax.plugin.reader.otsreader.utils.OtsReaderError; +import com.alicloud.openservices.tablestore.TableStoreException; import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSException; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + public class OtsReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); - private OtsReaderMasterProxy proxy = new OtsReaderMasterProxy(); + //private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OtsReader.class); + private IOtsReaderMasterProxy proxy = null; + @Override - public void init() { + public void init() { LOG.info("init() begin ..."); + + proxy = new OtsReaderMasterProxy(); try { this.proxy.init(getPluginJobConf()); - } catch (OTSException e) { - LOG.error("OTSException. ErrorCode:{}, ErrorMsg:{}, RequestId:{}", - new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); + } catch (TableStoreException e) { + LOG.error("OTSException: {}", e.toString(), e); + throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS ERROR"), e.toString(), e); } catch (ClientException e) { - LOG.error("ClientException. ErrorCode:{}, ErrorMsg:{}", - new Object[]{e.getErrorCode(), e.getMessage()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (IllegalArgumentException e) { - LOG.error("IllegalArgumentException. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.INVALID_PARAM, Common.getDetailMessage(e), e); + LOG.error("ClientException: {}", e.toString(), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } catch (Exception e) { - LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.ERROR, Common.getDetailMessage(e), e); + LOG.error("Exception. ErrorMsg:{}", e.toString(), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } + LOG.info("init() end ..."); } @@ -60,22 +63,9 @@ public class OtsReader extends Reader { try { confs = this.proxy.split(adviceNumber); - } catch (OTSException e) { - LOG.error("OTSException. ErrorCode:{}, ErrorMsg:{}, RequestId:{}", - new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (ClientException e) { - LOG.error("ClientException. ErrorCode:{}, ErrorMsg:{}", - new Object[]{e.getErrorCode(), e.getMessage()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (IllegalArgumentException e) { - LOG.error("IllegalArgumentException. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.INVALID_PARAM, Common.getDetailMessage(e), e); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.ERROR, Common.getDetailMessage(e), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } LOG.info("split() end ..."); @@ -85,39 +75,60 @@ public class OtsReader extends Reader { public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); - private OtsReaderSlaveProxy proxy = new OtsReaderSlaveProxy(); + //private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OtsReader.class); + private IOtsReaderSlaveProxy proxy = null; @Override public void init() { + + OTSConf conf = GsonParser.jsonToConf((String) this.getPluginJobConf().get(Constant.ConfigKey.CONF)); + // 是否使用新接口 + if(conf.isNewVersion()) { + if (conf.getMode() == OTSMode.MULTI_VERSION) { + LOG.info("init OtsReaderSlaveProxyMultiVersion"); + proxy = new OtsReaderSlaveMultiVersionProxy(); + } else { + LOG.info("init OtsReaderSlaveProxyNormal"); + proxy = new OtsReaderSlaveNormalProxy(); + } + + } + else{ + String metaMode = conf.getMetaMode(); + if (StringUtils.isNotBlank(metaMode) && !metaMode.equalsIgnoreCase("false")) { + LOG.info("init OtsMetaReaderSlaveProxy"); + proxy = new OtsReaderSlaveMetaProxy(); + } else { + LOG.info("init OtsReaderSlaveProxyOld"); + proxy = new OtsReaderSlaveProxyOld(); + } + } + + proxy.init(this.getPluginJobConf()); } @Override public void destroy() { + try { + proxy.close(); + } catch (Exception e) { + LOG.error("Exception. ErrorMsg:{}", e.toString(), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); + } } @Override public void startRead(RecordSender recordSender) { - LOG.info("startRead() begin ..."); + try { - this.proxy.read(recordSender,getPluginJobConf()); - } catch (OTSException e) { - LOG.error("OTSException. ErrorCode:{}, ErrorMsg:{}, RequestId:{}", - new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (ClientException e) { - LOG.error("ClientException. ErrorCode:{}, ErrorMsg:{}", - new Object[]{e.getErrorCode(), e.getMessage()}); - LOG.error("Stack", e); - throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (IllegalArgumentException e) { - LOG.error("IllegalArgumentException. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.INVALID_PARAM, Common.getDetailMessage(e), e); + proxy.startRead(recordSender); } catch (Exception e) { - LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsReaderError.ERROR, Common.getDetailMessage(e), e); + LOG.error("Exception. ErrorMsg:{}", e.toString(), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } - LOG.info("startRead() end ..."); + + + } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderMasterProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderMasterProxy.java index 2b758f06..4ecdd8c1 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderMasterProxy.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderMasterProxy.java @@ -1,221 +1,243 @@ package com.alibaba.datax.plugin.reader.otsreader; -import java.util.ArrayList; -import java.util.List; -import java.util.Map; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.callable.GetFirstRowPrimaryKeyCallable; -import com.alibaba.datax.plugin.reader.otsreader.callable.GetTableMetaCallable; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSConst; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; -import com.alibaba.datax.plugin.reader.otsreader.utils.ParamChecker; -import com.alibaba.datax.plugin.reader.otsreader.utils.Common; -import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; -import com.alibaba.datax.plugin.reader.otsreader.utils.ReaderModelParser; -import com.alibaba.datax.plugin.reader.otsreader.utils.RangeSplit; -import com.alibaba.datax.plugin.reader.otsreader.utils.RetryHelper; -import com.aliyun.openservices.ots.OTSClient; -import com.aliyun.openservices.ots.model.Direction; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; +import com.alibaba.datax.plugin.reader.otsreader.utils.*; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; -public class OtsReaderMasterProxy { +import java.lang.reflect.Field; +import java.util.ArrayList; +import java.util.List; - private OTSConf conf = new OTSConf(); - - private OTSRange range = null; - - private OTSClient ots = null; - - private TableMeta meta = null; - - private Direction direction = null; +public class OtsReaderMasterProxy implements IOtsReaderMasterProxy { private static final Logger LOG = LoggerFactory.getLogger(OtsReaderMasterProxy.class); + private OTSConf conf = null; + private TableMeta meta = null; + private SyncClientInterface ots = null; + private Direction direction = null; - /** - * 1.检查参数是否为 - * null,endpoint,accessid,accesskey,instance-name,table,column,range-begin,range-end,range-split - * 2.检查参数是否为空字符串 - * endpoint,accessid,accesskey,instance-name,table - * 3.检查是否为空数组 - * column - * 4.检查Range的类型个个数是否和PrimaryKey匹配 - * column,range-begin,range-end - * 5.检查Range Split 顺序和类型是否Range一致,类型是否于PartitionKey一致 - * column-split - * @param param - * @throws Exception - */ - public void init(Configuration param) throws Exception { - // 默认参数 - // 每次重试的时间都是上一次的一倍,当sleep时间大于30秒时,Sleep重试时间不在增长。18次能覆盖OTS的Failover时间5分钟 - conf.setRetry(param.getInt(OTSConst.RETRY, 18)); - conf.setSleepInMilliSecond(param.getInt(OTSConst.SLEEP_IN_MILLI_SECOND, 100)); - - // 必选参数 - conf.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT)); - conf.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID)); - conf.setAccesskey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY)); - conf.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME)); - conf.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME)); - - ots = new OTSClient( - this.conf.getEndpoint(), - this.conf.getAccessId(), - this.conf.getAccesskey(), - this.conf.getInstanceName()); - - meta = getTableMeta(ots, conf.getTableName()); - LOG.info("Table Meta : {}", GsonParser.metaToJson(meta)); - - conf.setColumns(ReaderModelParser.parseOTSColumnList(ParamChecker.checkListAndGet(param, Key.COLUMN, true))); - - Map rangeMap = ParamChecker.checkMapAndGet(param, Key.RANGE, true); - conf.setRangeBegin(ReaderModelParser.parsePrimaryKey(ParamChecker.checkListAndGet(rangeMap, Key.RANGE_BEGIN, false))); - conf.setRangeEnd(ReaderModelParser.parsePrimaryKey(ParamChecker.checkListAndGet(rangeMap, Key.RANGE_END, false))); - - range = ParamChecker.checkRangeAndGet(meta, this.conf.getRangeBegin(), this.conf.getRangeEnd()); - - direction = ParamChecker.checkDirectionAndEnd(meta, range.getBegin(), range.getEnd()); - LOG.info("Direction : {}", direction); - - List points = ReaderModelParser.parsePrimaryKey(ParamChecker.checkListAndGet(rangeMap, Key.RANGE_SPLIT)); - ParamChecker.checkInputSplitPoints(meta, range, direction, points); - conf.setRangeSplit(points); - } - - public List split(int num) throws Exception { - LOG.info("Expect split num : " + num); - - List configurations = new ArrayList(); - - List ranges = null; - - if (this.conf.getRangeSplit() != null) { // 用户显示指定了拆分范围 - LOG.info("Begin userDefinedRangeSplit"); - ranges = userDefinedRangeSplit(meta, range, this.conf.getRangeSplit()); - LOG.info("End userDefinedRangeSplit"); - } else { // 采用默认的切分算法 - LOG.info("Begin defaultRangeSplit"); - ranges = defaultRangeSplit(ots, meta, range, num); - LOG.info("End defaultRangeSplit"); - } - - // 解决大量的Split Point序列化消耗内存的问题 - // 因为slave中不会使用这个配置,所以置为空 - this.conf.setRangeSplit(null); - - for (OTSRange item : ranges) { - Configuration configuration = Configuration.newDefault(); - configuration.set(OTSConst.OTS_CONF, GsonParser.confToJson(this.conf)); - configuration.set(OTSConst.OTS_RANGE, GsonParser.rangeToJson(item)); - configuration.set(OTSConst.OTS_DIRECTION, GsonParser.directionToJson(direction)); - configurations.add(configuration); - } - - LOG.info("Configuration list count : " + configurations.size()); - - return configurations; - } public OTSConf getConf() { return conf; } + public TableMeta getMeta() { + return meta; + } + + public SyncClientInterface getOts() { + return ots; + } + + public void setOts(SyncClientInterface ots) { + this.ots = ots; + } + + /** + * 基于配置传入的配置文件,解析为对应的参数 + * + * @param param + * @throws Exception + */ + public void init(Configuration param) throws Exception { + // 基于预定义的Json格式,检查传入参数是否符合Conf定义规范 + conf = OTSConf.load(param); + + // Init ots + ots = OtsHelper.getOTSInstance(conf); + + // 宽行表init + if (!conf.isTimeseriesTable()) { + // 获取TableMeta + meta = OtsHelper.getTableMeta( + ots, + conf.getTableName(), + conf.getRetry(), + conf.getRetryPauseInMillisecond()); + + // 基于Meta检查Conf是否正确 + ParamChecker.checkAndSetOTSConf(conf, meta); + direction = ParamChecker.checkDirectionAndEnd(meta, conf.getRange().getBegin(), conf.getRange().getEnd()); + } + // 时序表 检查tablestore SDK version + if (conf.isTimeseriesTable()){ + Common.checkTableStoreSDKVersion(); + } + + + } + + public List split(int mandatoryNumber) throws Exception { + LOG.info("Expect split num : " + mandatoryNumber); + + List configurations = new ArrayList(); + + if (conf.isTimeseriesTable()) {{ // 时序表全部采用默认切分策略 + LOG.info("Begin timeseries table defaultRangeSplit"); + configurations = getTimeseriesConfigurationBySplit(mandatoryNumber); + LOG.info("End timeseries table defaultRangeSplit"); + }} + else if (this.conf.getRange().getSplit().size() != 0) { // 用户显示指定了拆分范围 + LOG.info("Begin userDefinedRangeSplit"); + configurations = getNormalConfigurationBySplit(); + LOG.info("End userDefinedRangeSplit"); + } else { // 采用默认的切分算法 + LOG.info("Begin defaultRangeSplit"); + configurations = getDefaultConfiguration(mandatoryNumber); + LOG.info("End defaultRangeSplit"); + } + + LOG.info("Expect split num: "+ mandatoryNumber +", and final configuration list count : " + configurations.size()); + return configurations; + } + public void close() { ots.shutdown(); } - // private function - - private TableMeta getTableMeta(OTSClient ots, String tableName) throws Exception { - return RetryHelper.executeWithRetry( - new GetTableMetaCallable(ots, tableName), + /** + * timeseries split信息,根据切分数配置多个Task + */ + private List getTimeseriesConfigurationBySplit(int mandatoryNumber) throws Exception { + List timeseriesScanSplitInfoList = OtsHelper.splitTimeseriesScan( + ots, + conf.getTableName(), + conf.getMeasurementName(), + mandatoryNumber, conf.getRetry(), - conf.getSleepInMilliSecond() - ); + conf.getRetryPauseInMillisecond()); + List configurations = new ArrayList<>(); + + for (int i = 0; i < timeseriesScanSplitInfoList.size(); i++) { + Configuration configuration = Configuration.newDefault(); + configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); + configuration.set(Constant.ConfigKey.SPLIT_INFO, GsonParser.timeseriesScanSplitInfoToString(timeseriesScanSplitInfoList.get(i))); + configurations.add(configuration); + } + return configurations; } - private RowPrimaryKey getPKOfFirstRow( - OTSRange range , Direction direction) throws Exception { + /** + * 根据用户配置的split信息,将配置文件基于Range范围转换为多个Task的配置 + */ + private List getNormalConfigurationBySplit() { + List> primaryKeys = new ArrayList>(); + primaryKeys.add(conf.getRange().getBegin()); + for (PrimaryKeyColumn column : conf.getRange().getSplit()) { + List point = new ArrayList(); + point.add(column); + ParamChecker.fillPrimaryKey(this.meta.getPrimaryKeyList(), point, PrimaryKeyValue.INF_MIN); + primaryKeys.add(point); + } + primaryKeys.add(conf.getRange().getEnd()); - RangeRowQueryCriteria cur = new RangeRowQueryCriteria(this.conf.getTableName()); - cur.setInclusiveStartPrimaryKey(range.getBegin()); - cur.setExclusiveEndPrimaryKey(range.getEnd()); - cur.setLimit(1); - cur.setColumnsToGet(Common.getPrimaryKeyNameList(meta)); - cur.setDirection(direction); + List configurations = new ArrayList(primaryKeys.size() - 1); - return RetryHelper.executeWithRetry( - new GetFirstRowPrimaryKeyCallable(ots, meta, cur), - conf.getRetry(), - conf.getSleepInMilliSecond() - ); + for (int i = 0; i < primaryKeys.size() - 1; i++) { + OTSRange range = new OTSRange(); + range.setBegin(primaryKeys.get(i)); + range.setEnd(primaryKeys.get(i + 1)); + + Configuration configuration = Configuration.newDefault(); + configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); + configuration.set(Constant.ConfigKey.RANGE, GsonParser.rangeToJson(range)); + configuration.set(Constant.ConfigKey.META, GsonParser.metaToJson(meta)); + configurations.add(configuration); + } + return configurations; } - private List defaultRangeSplit(OTSClient ots, TableMeta meta, OTSRange range, int num) throws Exception { + private List getDefaultConfiguration(int num) throws Exception { if (num == 1) { List ranges = new ArrayList(); + OTSRange range = new OTSRange(); + range.setBegin(conf.getRange().getBegin()); + range.setEnd(conf.getRange().getEnd()); ranges.add(range); - return ranges; + + return getConfigurationsFromRanges(ranges); } - + OTSRange reverseRange = new OTSRange(); - reverseRange.setBegin(range.getEnd()); - reverseRange.setEnd(range.getBegin()); + reverseRange.setBegin(conf.getRange().getEnd()); + reverseRange.setEnd(conf.getRange().getBegin()); Direction reverseDirection = (direction == Direction.FORWARD ? Direction.BACKWARD : Direction.FORWARD); - RowPrimaryKey realBegin = getPKOfFirstRow(range, direction); - RowPrimaryKey realEnd = getPKOfFirstRow(reverseRange, reverseDirection); - + List realBegin = getPKOfFirstRow(conf.getRange(), direction); + List realEnd = getPKOfFirstRow(reverseRange, reverseDirection); + // 因为如果其中一行为空,表示这个范围内至多有一行数据 // 所以不再细分,直接使用用户定义的范围 if (realBegin == null || realEnd == null) { List ranges = new ArrayList(); - ranges.add(range); - return ranges; + ranges.add(conf.getRange()); + return getConfigurationsFromRanges(ranges); } - + // 如果出现realBegin,realEnd的方向和direction不一致的情况,直接返回range int cmp = Common.compareRangeBeginAndEnd(meta, realBegin, realEnd); Direction realDirection = cmp > 0 ? Direction.BACKWARD : Direction.FORWARD; if (realDirection != direction) { LOG.warn("Expect '" + direction + "', but direction of realBegin and readlEnd is '" + realDirection + "'"); List ranges = new ArrayList(); - ranges.add(range); - return ranges; + ranges.add(conf.getRange()); + return getConfigurationsFromRanges(ranges); } List ranges = RangeSplit.rangeSplitByCount(meta, realBegin, realEnd, num); if (ranges.isEmpty()) { // 当PartitionKey相等时,工具内部不会切分Range - ranges.add(range); + ranges.add(conf.getRange()); } else { // replace first and last OTSRange first = ranges.get(0); OTSRange last = ranges.get(ranges.size() - 1); - first.setBegin(range.getBegin()); - last.setEnd(range.getEnd()); + first.setBegin(conf.getRange().getBegin()); + last.setEnd(conf.getRange().getEnd()); } - - return ranges; + + return getConfigurationsFromRanges(ranges); } - private List userDefinedRangeSplit(TableMeta meta, OTSRange range, List points) { - List ranges = RangeSplit.rangeSplitByPoint(meta, range.getBegin(), range.getEnd(), points); - if (ranges.isEmpty()) { // 当PartitionKey相等时,工具内部不会切分Range - ranges.add(range); + private List getConfigurationsFromRanges(List ranges){ + List configurationList = new ArrayList<>(); + for (OTSRange range:ranges + ) { + Configuration configuration = Configuration.newDefault(); + configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); + configuration.set(Constant.ConfigKey.RANGE, GsonParser.rangeToJson(range)); + configuration.set(Constant.ConfigKey.META, GsonParser.metaToJson(meta)); + configurationList.add(configuration); } - return ranges; + return configurationList; } + + private List getPKOfFirstRow( + OTSRange range , Direction direction) throws Exception { + + RangeRowQueryCriteria cur = new RangeRowQueryCriteria(this.conf.getTableName()); + cur.setInclusiveStartPrimaryKey(new PrimaryKey(range.getBegin())); + cur.setExclusiveEndPrimaryKey(new PrimaryKey(range.getEnd())); + cur.setLimit(1); + cur.addColumnsToGet(Common.getPrimaryKeyNameList(meta)); + cur.setDirection(direction); + cur.setMaxVersions(1); + + return RetryHelper.executeWithRetry( + new GetFirstRowPrimaryKeyCallable(ots, meta, cur), + conf.getRetry(), + conf.getRetryPauseInMillisecond() + ); + } + } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMetaProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMetaProxy.java new file mode 100644 index 00000000..f9860194 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMetaProxy.java @@ -0,0 +1,160 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; + +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; +import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; +import com.alibaba.datax.plugin.reader.otsreader.utils.Key; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.utils.ParamCheckerOld; +import com.alibaba.datax.plugin.reader.otsreader.utils.ReaderModelParser; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.utils.DefaultNoRetry; +import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; +import com.alibaba.fastjson.JSON; +import com.aliyun.openservices.ots.OTSClient; +import com.aliyun.openservices.ots.OTSServiceConfiguration; +import com.aliyun.openservices.ots.model.DescribeTableRequest; +import com.aliyun.openservices.ots.model.DescribeTableResult; +import com.aliyun.openservices.ots.model.ListTableResult; +import com.aliyun.openservices.ots.model.PrimaryKeyType; +import com.aliyun.openservices.ots.model.ReservedThroughputDetails; +import com.aliyun.openservices.ots.model.TableMeta; + +public class OtsReaderSlaveMetaProxy implements IOtsReaderSlaveProxy { + + private OTSClient ots = null; + private OTSConf conf = null; + private OTSRange range = null; + private com.alicloud.openservices.tablestore.model.TableMeta meta = null; + private Configuration configuration = null; + private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveMetaProxy.class); + + + @Override + public void init(Configuration configuration) { + OTSServiceConfiguration configure = new OTSServiceConfiguration(); + configure.setRetryStrategy(new DefaultNoRetry()); + + this.configuration = configuration; + conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); + range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); + meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); + + String endpoint = conf.getEndpoint(); + String accessId = conf.getAccessId(); + String accessKey = conf.getAccessKey(); + String instanceName = conf.getInstanceName(); + + ots = new OTSClient(endpoint, accessId, accessKey, instanceName, null, configure, null); + } + + @Override + public void close() { + ots.shutdown(); + } + + @Override + public void startRead(RecordSender recordSender) throws Exception { + List columns = ReaderModelParser + .parseOTSColumnList(ParamCheckerOld.checkListAndGet(configuration, Key.COLUMN, true)); + String metaMode = conf.getMetaMode(); // column + + + ListTableResult listTableResult = null; + try { + listTableResult = ots.listTable(); + LOG.info(String.format("ots listTable requestId:%s, traceId:%s", listTableResult.getRequestID(), + listTableResult.getTraceId())); + List allTables = listTableResult.getTableNames(); + for (String eachTable : allTables) { + DescribeTableRequest describeTableRequest = new DescribeTableRequest(); + describeTableRequest.setTableName(eachTable); + DescribeTableResult describeTableResult = ots.describeTable(describeTableRequest); + LOG.info(String.format("ots describeTable requestId:%s, traceId:%s", describeTableResult.getRequestID(), + describeTableResult.getTraceId())); + + TableMeta tableMeta = describeTableResult.getTableMeta(); + // table_name: first_table + // table primary key: type, data type: STRING + // table primary key: db_name, data type: STRING + // table primary key: table_name, data type: STRING + // Reserved throughput: read(0), write(0) + // last increase time: 1502881295 + // last decrease time: None + // number of decreases today: 0 + + String tableName = tableMeta.getTableName(); + Map primaryKey = tableMeta.getPrimaryKey(); + ReservedThroughputDetails reservedThroughputDetails = describeTableResult + .getReservedThroughputDetails(); + int reservedThroughputRead = reservedThroughputDetails.getCapacityUnit().getReadCapacityUnit(); + int reservedThroughputWrite = reservedThroughputDetails.getCapacityUnit().getWriteCapacityUnit(); + long lastIncreaseTime = reservedThroughputDetails.getLastIncreaseTime(); + long lastDecreaseTime = reservedThroughputDetails.getLastDecreaseTime(); + int numberOfDecreasesToday = reservedThroughputDetails.getNumberOfDecreasesToday(); + + Map allData = new HashMap(); + allData.put("endpoint", conf.getEndpoint()); + allData.put("instanceName", conf.getInstanceName()); + allData.put("table", tableName); + // allData.put("primaryKey", JSON.toJSONString(primaryKey)); + allData.put("reservedThroughputRead", reservedThroughputRead + ""); + allData.put("reservedThroughputWrite", reservedThroughputWrite + ""); + allData.put("lastIncreaseTime", lastIncreaseTime + ""); + allData.put("lastDecreaseTime", lastDecreaseTime + ""); + allData.put("numberOfDecreasesToday", numberOfDecreasesToday + ""); + + // 可扩展的可配置的形式 + if ("column".equalsIgnoreCase(metaMode)) { + // 如果是列元数据模式并且column中配置的name是primaryKey,映射成多行DataX Record + List primaryKeyRecords = new ArrayList(); + for (Entry eachPk : primaryKey.entrySet()) { + Record line = recordSender.createRecord(); + for (OTSColumn col : columns) { + if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { + line.addColumn(col.getValue()); + } else if ("primaryKey.name".equalsIgnoreCase(col.getName())) { + line.addColumn(new StringColumn(eachPk.getKey())); + } else if ("primaryKey.type".equalsIgnoreCase(col.getName())) { + line.addColumn(new StringColumn(eachPk.getValue().name())); + } else { + String v = allData.get(col.getName()); + line.addColumn(new StringColumn(v)); + } + } + LOG.debug("Reader send record : {}", line.toString()); + recordSender.sendToWriter(line); + primaryKeyRecords.add(line); + } + } else { + Record line = recordSender.createRecord(); + for (OTSColumn col : columns) { + if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { + line.addColumn(col.getValue()); + } else { + String v = allData.get(col.getName()); + line.addColumn(new StringColumn(v)); + } + } + LOG.debug("Reader send record : {}", line.toString()); + recordSender.sendToWriter(line); + } + } + } catch (Exception e) { + LOG.warn(JSON.toJSONString(listTableResult), e); + } + + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMultiVersionProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMultiVersionProxy.java new file mode 100644 index 00000000..818a507e --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMultiVersionProxy.java @@ -0,0 +1,102 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import com.alibaba.datax.common.element.LongColumn; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; +import com.alibaba.datax.plugin.reader.otsreader.utils.*; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.*; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class OtsReaderSlaveMultiVersionProxy implements IOtsReaderSlaveProxy { + private OTSConf conf = null; + private OTSRange range = null; + private TableMeta meta = null; + private SyncClientInterface ots = null; + + private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveMultiVersionProxy.class); + + @Override + public void init(Configuration configuration) { + conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); + range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); + meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); + + this.ots = OtsHelper.getOTSInstance(conf); + } + + @Override + public void close() { + ots.shutdown(); + } + + private void sendToDatax(RecordSender recordSender, PrimaryKey pk, Column c) { + Record line = recordSender.createRecord(); + //------------------------- + // 四元组 pk, column name, timestamp, value + //------------------------- + + // pk + for( PrimaryKeyColumn pkc : pk.getPrimaryKeyColumns()) { + line.addColumn(TranformHelper.otsPrimaryKeyColumnToDataxColumn(pkc)); + } + // column name + line.addColumn(new StringColumn(c.getName())); + // Timestamp + line.addColumn(new LongColumn(c.getTimestamp())); + // Value + line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); + + recordSender.sendToWriter(line); + } + + private void sendToDatax(RecordSender recordSender, Row row) { + PrimaryKey pk = row.getPrimaryKey(); + for (Column c : row.getColumns()) { + sendToDatax(recordSender, pk, c); + } + } + + /** + * 将获取到的数据采用4元组的方式传递给datax + * @param recordSender + * @param result + */ + private void sendToDatax(RecordSender recordSender, GetRangeResponse result) { + LOG.debug("Per request get row count : " + result.getRows().size()); + for (Row row : result.getRows()) { + sendToDatax(recordSender, row); + } + } + + @Override + public void startRead(RecordSender recordSender) throws Exception { + + PrimaryKey inclusiveStartPrimaryKey = new PrimaryKey(range.getBegin()); + PrimaryKey exclusiveEndPrimaryKey = new PrimaryKey(range.getEnd()); + PrimaryKey next = inclusiveStartPrimaryKey; + + RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(conf.getTableName()); + rangeRowQueryCriteria.setExclusiveEndPrimaryKey(exclusiveEndPrimaryKey); + rangeRowQueryCriteria.setDirection(Common.getDirection(range.getBegin(), range.getEnd())); + rangeRowQueryCriteria.setTimeRange(conf.getMulti().getTimeRange()); + rangeRowQueryCriteria.setMaxVersions(conf.getMulti().getMaxVersion()); + rangeRowQueryCriteria.addColumnsToGet(Common.toColumnToGet(conf.getColumn(), meta)); + + do{ + rangeRowQueryCriteria.setInclusiveStartPrimaryKey(next); + GetRangeResponse result = OtsHelper.getRange( + ots, + rangeRowQueryCriteria, + conf.getRetry(), + conf.getRetryPauseInMillisecond()); + sendToDatax(recordSender, result); + next = result.getNextStartPrimaryKey(); + } while(next != null); + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveNormalProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveNormalProxy.java new file mode 100644 index 00000000..f7d89b15 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveNormalProxy.java @@ -0,0 +1,256 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import com.alibaba.datax.common.element.LongColumn; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; +import com.alibaba.datax.plugin.reader.otsreader.utils.*; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.core.utils.Pair; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class OtsReaderSlaveNormalProxy implements IOtsReaderSlaveProxy { + private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveNormalProxy.class); + private OTSConf conf = null; + private OTSRange range = null; + private TableMeta meta = null; + private SyncClientInterface ots = null; + private TimeseriesScanSplitInfo splitInfo = null; + + @Override + public void init(Configuration configuration) { + conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); + if (!conf.isTimeseriesTable()) { + range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); + meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); + } else { + splitInfo = GsonParser.stringToTimeseriesScanSplitInfo((String) configuration.get(Constant.ConfigKey.SPLIT_INFO)); + // 时序表 检查tablestore SDK version + try{ + Common.checkTableStoreSDKVersion(); + } + catch (Exception e){ + LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); + throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); + } + } + + + this.ots = OtsHelper.getOTSInstance(conf); + } + + @Override + public void close() { + ots.shutdown(); + } + + private void sendToDatax(RecordSender recordSender, Row row) { + Record line = recordSender.createRecord(); + + PrimaryKey pk = row.getPrimaryKey(); + for (OTSColumn column : conf.getColumn()) { + if (column.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { + // 获取指定的列 + PrimaryKeyColumn value = pk.getPrimaryKeyColumn(column.getName()); + if (value != null) { + line.addColumn(TranformHelper.otsPrimaryKeyColumnToDataxColumn(value)); + } else { + Column c = row.getLatestColumn(column.getName()); + if (c != null) { + line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); + } else { + // 这里使用StringColumn的无参构造函数构造对象,而不是用null,下 + // 游(writer)应该通过获取Column,然后通过Column的数据接口的返回值 + // 是否是null来判断改Column是否为null + // Datax其他插件的也是使用这种方式,约定俗成,并没有使用直接向record中注入null方式代表空 + line.addColumn(new StringColumn()); + } + } + } else { + line.addColumn(column.getValue()); + } + } + recordSender.sendToWriter(line); + } + + private void sendToDatax(RecordSender recordSender, TimeseriesRow row) { + + + Record line = recordSender.createRecord(); + // 对于配置项中的每一列 + for (int i = 0; i < conf.getColumn().size(); i++) { + OTSColumn column = conf.getColumn().get(i); + // 如果不是常数列 + if (column.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { + // 如果是tags内字段 + if (conf.getColumn().get(i).getTimeseriesTag()) { + String s = row.getTimeseriesKey().getTags().get(column.getName()); + line.addColumn(new StringColumn(s)); + } + // 如果为measurement字段 + else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.MEASUREMENT_NAME)) { + String s = row.getTimeseriesKey().getMeasurementName(); + line.addColumn(new StringColumn(s)); + } + // 如果为dataSource字段 + else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.DATA_SOURCE)) { + String s = row.getTimeseriesKey().getDataSource(); + line.addColumn(new StringColumn(s)); + } + // 如果为tags字段 + else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.TAGS)) { + line.addColumn(new StringColumn(row.getTimeseriesKey().buildTagsString())); + } + else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.TIME)) { + Long l = row.getTimeInUs(); + line.addColumn(new LongColumn(l)); + } + // 否则为field内字段 + else { + ColumnValue c = row.getFields().get(column.getName()); + if (c == null) { + LOG.warn("Get column {} : type {} failed, use empty string instead", column.getName(), conf.getColumn().get(i).getValueType()); + line.addColumn(new StringColumn()); + } else if (c.getType() != conf.getColumn().get(i).getValueType()) { + LOG.warn("Get column {} failed, expected type: {}, actual type: {}. Sending actual type to writer.", column.getName(), conf.getColumn().get(i).getValueType(), c.getType()); + line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); + } else { + line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); + } + } + } + // 如果是常数列 + else { + line.addColumn(column.getValue()); + } + } + recordSender.sendToWriter(line); + } + + /** + * 将获取到的数据根据用户配置Column的方式传递给datax + * + * @param recordSender + * @param result + */ + private void sendToDatax(RecordSender recordSender, GetRangeResponse result) { + for (Row row : result.getRows()) { + sendToDatax(recordSender, row); + } + } + + private void sendToDatax(RecordSender recordSender, ScanTimeseriesDataResponse result) { + for (TimeseriesRow row : result.getRows()) { + sendToDatax(recordSender, row); + } + } + + @Override + public void startRead(RecordSender recordSender) throws Exception { + if (conf.isTimeseriesTable()) { + readTimeseriesTable(recordSender); + } else { + readNormalTable(recordSender); + } + } + + public void readTimeseriesTable(RecordSender recordSender) throws Exception { + + List timeseriesPkName = new ArrayList<>(); + timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.MEASUREMENT_NAME); + timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.DATA_SOURCE); + timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.TAGS); + timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.TIME); + + ScanTimeseriesDataRequest scanTimeseriesDataRequest = new ScanTimeseriesDataRequest(conf.getTableName()); + List> fieldsToGet = new ArrayList<>(); + for (int i = 0; i < conf.getColumn().size(); i++) { + /** + * 如果所配置列 + * 1. 不是常量列(即列名不为null) + * 2. 列名不在["measurementName","dataSource","tags"]中 + * 3. 不是tags内的字段 + * 则为需要获取的field字段。 + */ + String fieldName = conf.getColumn().get(i).getName(); + if (fieldName != null && !timeseriesPkName.contains(fieldName) && !conf.getColumn().get(i).getTimeseriesTag()) { + Pair pair = new Pair<>(fieldName, conf.getColumn().get(i).getValueType()); + fieldsToGet.add(pair); + } + } + scanTimeseriesDataRequest.setFieldsToGet(fieldsToGet); + scanTimeseriesDataRequest.setSplitInfo(splitInfo); + + while (true) { + ScanTimeseriesDataResponse response = OtsHelper.scanTimeseriesData( + ots, + scanTimeseriesDataRequest, + conf.getRetry(), + conf.getRetryPauseInMillisecond()); + sendToDatax(recordSender, response); + if (response.getNextToken() == null) { + break; + } + scanTimeseriesDataRequest.setNextToken(response.getNextToken()); + } + } + + public void readNormalTable(RecordSender recordSender) throws Exception { + PrimaryKey inclusiveStartPrimaryKey = new PrimaryKey(range.getBegin()); + PrimaryKey exclusiveEndPrimaryKey = new PrimaryKey(range.getEnd()); + PrimaryKey next = inclusiveStartPrimaryKey; + + RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(conf.getTableName()); + rangeRowQueryCriteria.setExclusiveEndPrimaryKey(exclusiveEndPrimaryKey); + rangeRowQueryCriteria.setDirection(Common.getDirection(range.getBegin(), range.getEnd())); + rangeRowQueryCriteria.setMaxVersions(1); + rangeRowQueryCriteria.addColumnsToGet(Common.toColumnToGet(conf.getColumn(), meta)); + + do { + rangeRowQueryCriteria.setInclusiveStartPrimaryKey(next); + GetRangeResponse result = OtsHelper.getRange( + ots, + rangeRowQueryCriteria, + conf.getRetry(), + conf.getRetryPauseInMillisecond()); + sendToDatax(recordSender, result); + next = result.getNextStartPrimaryKey(); + } while (next != null); + } + + + public void setConf(OTSConf conf) { + this.conf = conf; + } + + + public void setRange(OTSRange range) { + this.range = range; + } + + + public void setMeta(TableMeta meta) { + this.meta = meta; + } + + + public void setOts(SyncClientInterface ots) { + this.ots = ots; + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxy.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxy.java deleted file mode 100644 index e64b4e7e..00000000 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxy.java +++ /dev/null @@ -1,135 +0,0 @@ -package com.alibaba.datax.plugin.reader.otsreader; - -import java.util.List; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.alibaba.datax.common.element.Record; -import com.alibaba.datax.common.plugin.RecordSender; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.reader.otsreader.callable.GetRangeCallable; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSConst; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; -import com.alibaba.datax.plugin.reader.otsreader.utils.Common; -import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; -import com.alibaba.datax.plugin.reader.otsreader.utils.DefaultNoRetry; -import com.alibaba.datax.plugin.reader.otsreader.utils.RetryHelper; -import com.aliyun.openservices.ots.OTSClientAsync; -import com.aliyun.openservices.ots.OTSServiceConfiguration; -import com.aliyun.openservices.ots.model.Direction; -import com.aliyun.openservices.ots.model.GetRangeRequest; -import com.aliyun.openservices.ots.model.GetRangeResult; -import com.aliyun.openservices.ots.model.OTSFuture; -import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; -import com.aliyun.openservices.ots.model.Row; -import com.aliyun.openservices.ots.model.RowPrimaryKey; - -public class OtsReaderSlaveProxy { - - class RequestItem { - private RangeRowQueryCriteria criteria; - private OTSFuture future; - - RequestItem(RangeRowQueryCriteria criteria, OTSFuture future) { - this.criteria = criteria; - this.future = future; - } - - public RangeRowQueryCriteria getCriteria() { - return criteria; - } - - public OTSFuture getFuture() { - return future; - } - } - - private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveProxy.class); - - private void rowsToSender(List rows, RecordSender sender, List columns) { - for (Row row : rows) { - Record line = sender.createRecord(); - line = Common.parseRowToLine(row, columns, line); - - LOG.debug("Reader send record : {}", line.toString()); - - sender.sendToWriter(line); - } - } - - private RangeRowQueryCriteria generateRangeRowQueryCriteria(String tableName, RowPrimaryKey begin, RowPrimaryKey end, Direction direction, List columns) { - RangeRowQueryCriteria criteria = new RangeRowQueryCriteria(tableName); - criteria.setInclusiveStartPrimaryKey(begin); - criteria.setDirection(direction); - criteria.setColumnsToGet(columns); - criteria.setLimit(-1); - criteria.setExclusiveEndPrimaryKey(end); - return criteria; - } - - private RequestItem generateRequestItem( - OTSClientAsync ots, - OTSConf conf, - RowPrimaryKey begin, - RowPrimaryKey end, - Direction direction, - List columns) throws Exception { - RangeRowQueryCriteria criteria = generateRangeRowQueryCriteria(conf.getTableName(), begin, end, direction, columns); - - GetRangeRequest request = new GetRangeRequest(); - request.setRangeRowQueryCriteria(criteria); - OTSFuture future = ots.getRange(request); - - return new RequestItem(criteria, future); - } - - public void read(RecordSender sender, Configuration configuration) throws Exception { - LOG.info("read begin."); - - OTSConf conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); - OTSRange range = GsonParser.jsonToRange(configuration.getString(OTSConst.OTS_RANGE)); - Direction direction = GsonParser.jsonToDirection(configuration.getString(OTSConst.OTS_DIRECTION)); - - OTSServiceConfiguration configure = new OTSServiceConfiguration(); - configure.setRetryStrategy(new DefaultNoRetry()); - - OTSClientAsync ots = new OTSClientAsync( - conf.getEndpoint(), - conf.getAccessId(), - conf.getAccesskey(), - conf.getInstanceName(), - null, - configure, - null); - - RowPrimaryKey token = range.getBegin(); - List columns = Common.getNormalColumnNameList(conf.getColumns()); - - RequestItem request = null; - - do { - LOG.debug("Next token : {}", GsonParser.rowPrimaryKeyToJson(token)); - if (request == null) { - request = generateRequestItem(ots, conf, token, range.getEnd(), direction, columns); - } else { - RequestItem req = request; - - GetRangeResult result = RetryHelper.executeWithRetry( - new GetRangeCallable(ots, req.getCriteria(), req.getFuture()), - conf.getRetry(), - conf.getSleepInMilliSecond() - ); - if ((token = result.getNextStartPrimaryKey()) != null) { - request = generateRequestItem(ots, conf, token, range.getEnd(), direction, columns); - } - - rowsToSender(result.getRows(), sender, conf.getColumns()); - } - } while (token != null); - ots.shutdown(); - LOG.info("read end."); - } -} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxyOld.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxyOld.java new file mode 100644 index 00000000..72eb885e --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxyOld.java @@ -0,0 +1,181 @@ +package com.alibaba.datax.plugin.reader.otsreader; + +import java.util.List; + +import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alibaba.datax.plugin.reader.otsreader.utils.*; +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; +import com.aliyun.openservices.ots.model.*; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.callable.GetRangeCallableOld; +import com.aliyun.openservices.ots.OTSClientAsync; +import com.aliyun.openservices.ots.OTSServiceConfiguration; + +public class OtsReaderSlaveProxyOld implements IOtsReaderSlaveProxy { + + + private OTSClientAsync ots = null; + private OTSConf conf = null; + private OTSRange range = null; + + class RequestItem { + private RangeRowQueryCriteria criteria; + private OTSFuture future; + + RequestItem(RangeRowQueryCriteria criteria, OTSFuture future) { + this.criteria = criteria; + this.future = future; + } + + public RangeRowQueryCriteria getCriteria() { + return criteria; + } + + public OTSFuture getFuture() { + return future; + } + } + + private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveProxyOld.class); + + private void rowsToSender(List rows, RecordSender sender, List columns) { + for (Row row : rows) { + Record line = sender.createRecord(); + line = CommonOld.parseRowToLine(row, columns, line); + + LOG.debug("Reader send record : {}", line.toString()); + + sender.sendToWriter(line); + } + } + + private RangeRowQueryCriteria generateRangeRowQueryCriteria(String tableName, RowPrimaryKey begin, RowPrimaryKey end, Direction direction, List columns) { + RangeRowQueryCriteria criteria = new RangeRowQueryCriteria(tableName); + criteria.setInclusiveStartPrimaryKey(begin); + criteria.setDirection(direction); + criteria.setColumnsToGet(columns); + criteria.setLimit(-1); + criteria.setExclusiveEndPrimaryKey(end); + return criteria; + } + + private RequestItem generateRequestItem( + OTSClientAsync ots, + OTSConf conf, + RowPrimaryKey begin, + RowPrimaryKey end, + Direction direction, + List columns) throws Exception { + RangeRowQueryCriteria criteria = generateRangeRowQueryCriteria(conf.getTableName(), begin, end, direction, columns); + + GetRangeRequest request = new GetRangeRequest(); + request.setRangeRowQueryCriteria(criteria); + OTSFuture future = ots.getRange(request); + + return new RequestItem(criteria, future); + } + + @Override + public void init(Configuration configuration) { + conf = GsonParser.jsonToConf(configuration.getString(Constant.ConfigKey.CONF)); + range = GsonParser.jsonToRange(configuration.getString(Constant.ConfigKey.RANGE)); + + OTSServiceConfiguration configure = new OTSServiceConfiguration(); + configure.setRetryStrategy(new DefaultNoRetry()); + + ots = new OTSClientAsync( + conf.getEndpoint(), + conf.getAccessId(), + conf.getAccessKey(), + conf.getInstanceName(), + null, + configure, + null); + } + + @Override + public void close() { + ots.shutdown(); + } + + @Override + public void startRead(RecordSender recordSender) throws Exception { + RowPrimaryKey token = pKColumnList2RowPrimaryKey(range.getBegin()); + + List columns = CommonOld.getNormalColumnNameList(conf.getColumn()); + Direction direction = null; + switch (Common.getDirection(range.getBegin(), range.getEnd())){ + case FORWARD: + direction = Direction.FORWARD; + break; + case BACKWARD: + default: + direction = Direction.BACKWARD; + } + RequestItem request = null; + + do { + LOG.debug("Next token : {}", GsonParser.rowPrimaryKeyToJson(token)); + if (request == null) { + request = generateRequestItem(ots, conf, token, pKColumnList2RowPrimaryKey(range.getEnd()), direction, columns); + } else { + RequestItem req = request; + + GetRangeResult result = RetryHelperOld.executeWithRetry( + new GetRangeCallableOld(ots, req.getCriteria(), req.getFuture()), + conf.getRetry(), + // TODO + 100 + ); + if ((token = result.getNextStartPrimaryKey()) != null) { + request = generateRequestItem(ots, conf, token, pKColumnList2RowPrimaryKey(range.getEnd()), direction, columns); + } + + rowsToSender(result.getRows(), recordSender, conf.getColumn()); + } + } while (token != null); + } + + /** + * 将 {@link com.alicloud.openservices.tablestore.model.PrimaryKeyColumn}的列表转为{@link com.aliyun.openservices.ots.model.RowPrimaryKey} + * @param list + * @return + */ + public RowPrimaryKey pKColumnList2RowPrimaryKey(List list){ + RowPrimaryKey rowPrimaryKey = new RowPrimaryKey(); + for(PrimaryKeyColumn pk : list){ + PrimaryKeyValue v = null; + if(pk.getValue() == com.alicloud.openservices.tablestore.model.PrimaryKeyValue.INF_MAX){ + v = PrimaryKeyValue.INF_MAX; + } else if (pk.getValue() == com.alicloud.openservices.tablestore.model.PrimaryKeyValue.INF_MIN) { + v = PrimaryKeyValue.INF_MIN; + } + // 非INF_MAX 或 INF_MIN + else{ + switch (pk.getValue().getType()){ + case STRING: + v = PrimaryKeyValue.fromString(pk.getValue().asString()); + break; + case INTEGER: + v = PrimaryKeyValue.fromLong(pk.getValue().asLong()); + break; + case BINARY: + v = PrimaryKeyValue.fromBinary(pk.getValue().asBinary()); + break; + default: + throw new IllegalArgumentException("the pKColumnList to RowPrimaryKey conversion failed"); + } + } + + rowPrimaryKey.addPrimaryKeyColumn(pk.getName(),v); + } + return rowPrimaryKey; + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/ColumnAdaptor.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/ColumnAdaptor.java new file mode 100644 index 00000000..b2e14b5c --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/ColumnAdaptor.java @@ -0,0 +1,63 @@ +package com.alibaba.datax.plugin.reader.otsreader.adaptor; + +import com.alibaba.datax.common.element.*; +import com.google.gson.*; +import org.apache.commons.codec.binary.Base64; + +import java.lang.reflect.Type; + +public class ColumnAdaptor implements JsonDeserializer, JsonSerializer{ + private final static String TYPE = "type"; + private final static String RAW = "rawData"; + + @Override + public JsonElement serialize(Column obj, Type t, + JsonSerializationContext c) { + JsonObject json = new JsonObject(); + + String rawData = null; + switch (obj.getType()){ + case BOOL: + rawData = String.valueOf(obj.getRawData()); break; + case BYTES: + rawData = Base64.encodeBase64String((byte[]) obj.getRawData()); break; + case DOUBLE: + rawData = String.valueOf(obj.getRawData());break; + case LONG: + rawData = String.valueOf(obj.getRawData());break; + case STRING: + rawData = String.valueOf(obj.getRawData());break; + default: + throw new IllegalArgumentException("Unsupport parse the column type:" + obj.getType().toString()); + + } + json.add(TYPE, new JsonPrimitive(obj.getType().toString())); + json.add(RAW, new JsonPrimitive(rawData)); + return json; + } + + @Override + public Column deserialize(JsonElement ele, Type t, + JsonDeserializationContext c) throws JsonParseException { + JsonObject obj = ele.getAsJsonObject(); + + String strType = obj.getAsJsonPrimitive(TYPE).getAsString(); + String strRaw = obj.getAsJsonPrimitive(RAW).getAsString(); + Column.Type type = Column.Type.valueOf(strType); + switch (type){ + case BOOL: + return new BoolColumn(strRaw); + case BYTES: + return new BytesColumn(Base64.decodeBase64(strRaw)); + case DOUBLE: + return new DoubleColumn(strRaw); + case LONG: + return new LongColumn(strRaw); + case STRING: + return new StringColumn(strRaw); + default: + throw new IllegalArgumentException("Unsupport parse the column type:" + type.toString()); + + } + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/OTSColumnAdaptor.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/OTSColumnAdaptor.java deleted file mode 100644 index 25f9b682..00000000 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/OTSColumnAdaptor.java +++ /dev/null @@ -1,117 +0,0 @@ -package com.alibaba.datax.plugin.reader.otsreader.adaptor; - -import java.lang.reflect.Type; - -import org.apache.commons.codec.binary.Base64; - -import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; -import com.aliyun.openservices.ots.model.ColumnType; -import com.google.gson.JsonDeserializationContext; -import com.google.gson.JsonDeserializer; -import com.google.gson.JsonElement; -import com.google.gson.JsonObject; -import com.google.gson.JsonParseException; -import com.google.gson.JsonPrimitive; -import com.google.gson.JsonSerializationContext; -import com.google.gson.JsonSerializer; - -public class OTSColumnAdaptor implements JsonDeserializer, JsonSerializer{ - private final static String NAME = "name"; - private final static String COLUMN_TYPE = "column_type"; - private final static String VALUE_TYPE = "value_type"; - private final static String VALUE = "value"; - - private void serializeConstColumn(JsonObject json, OTSColumn obj) { - switch (obj.getValueType()) { - case STRING : - json.add(VALUE_TYPE, new JsonPrimitive(ColumnType.STRING.toString())); - json.add(VALUE, new JsonPrimitive(obj.getValue().asString())); - break; - case INTEGER : - json.add(VALUE_TYPE, new JsonPrimitive(ColumnType.INTEGER.toString())); - json.add(VALUE, new JsonPrimitive(obj.getValue().asLong())); - break; - case DOUBLE : - json.add(VALUE_TYPE, new JsonPrimitive(ColumnType.DOUBLE.toString())); - json.add(VALUE, new JsonPrimitive(obj.getValue().asDouble())); - break; - case BOOLEAN : - json.add(VALUE_TYPE, new JsonPrimitive(ColumnType.BOOLEAN.toString())); - json.add(VALUE, new JsonPrimitive(obj.getValue().asBoolean())); - break; - case BINARY : - json.add(VALUE_TYPE, new JsonPrimitive(ColumnType.BINARY.toString())); - json.add(VALUE, new JsonPrimitive(Base64.encodeBase64String(obj.getValue().asBytes()))); - break; - default: - throw new IllegalArgumentException("Unsupport serialize the type : " + obj.getValueType() + ""); - } - } - - private OTSColumn deserializeConstColumn(JsonObject obj) { - String strType = obj.getAsJsonPrimitive(VALUE_TYPE).getAsString(); - ColumnType type = ColumnType.valueOf(strType); - - JsonPrimitive jsonValue = obj.getAsJsonPrimitive(VALUE); - - switch (type) { - case STRING : - return OTSColumn.fromConstStringColumn(jsonValue.getAsString()); - case INTEGER : - return OTSColumn.fromConstIntegerColumn(jsonValue.getAsLong()); - case DOUBLE : - return OTSColumn.fromConstDoubleColumn(jsonValue.getAsDouble()); - case BOOLEAN : - return OTSColumn.fromConstBoolColumn(jsonValue.getAsBoolean()); - case BINARY : - return OTSColumn.fromConstBytesColumn(Base64.decodeBase64(jsonValue.getAsString())); - default: - throw new IllegalArgumentException("Unsupport deserialize the type : " + type + ""); - } - } - - private void serializeNormalColumn(JsonObject json, OTSColumn obj) { - json.add(NAME, new JsonPrimitive(obj.getName())); - } - - private OTSColumn deserializeNormarlColumn(JsonObject obj) { - return OTSColumn.fromNormalColumn(obj.getAsJsonPrimitive(NAME).getAsString()); - } - - @Override - public JsonElement serialize(OTSColumn obj, Type t, - JsonSerializationContext c) { - JsonObject json = new JsonObject(); - - switch (obj.getColumnType()) { - case CONST: - json.add(COLUMN_TYPE, new JsonPrimitive(OTSColumn.OTSColumnType.CONST.toString())); - serializeConstColumn(json, obj); - break; - case NORMAL: - json.add(COLUMN_TYPE, new JsonPrimitive(OTSColumn.OTSColumnType.NORMAL.toString())); - serializeNormalColumn(json, obj); - break; - default: - throw new IllegalArgumentException("Unsupport serialize the type : " + obj.getColumnType() + ""); - } - return json; - } - - @Override - public OTSColumn deserialize(JsonElement ele, Type t, - JsonDeserializationContext c) throws JsonParseException { - JsonObject obj = ele.getAsJsonObject(); - String strColumnType = obj.getAsJsonPrimitive(COLUMN_TYPE).getAsString(); - OTSColumn.OTSColumnType columnType = OTSColumn.OTSColumnType.valueOf(strColumnType); - - switch(columnType) { - case CONST: - return deserializeConstColumn(obj); - case NORMAL: - return deserializeNormarlColumn(obj); - default: - throw new IllegalArgumentException("Unsupport deserialize the type : " + columnType + ""); - } - } -} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/PrimaryKeyValueAdaptor.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/PrimaryKeyValueAdaptor.java index 1a49ea47..240427ae 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/PrimaryKeyValueAdaptor.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/PrimaryKeyValueAdaptor.java @@ -1,18 +1,12 @@ package com.alibaba.datax.plugin.reader.otsreader.adaptor; -import java.lang.reflect.Type; +import com.alicloud.openservices.tablestore.model.ColumnType; +import com.alicloud.openservices.tablestore.model.PrimaryKeyType; +import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; +import com.google.gson.*; +import org.apache.commons.codec.binary.Base64; -import com.aliyun.openservices.ots.model.ColumnType; -import com.aliyun.openservices.ots.model.PrimaryKeyType; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.google.gson.JsonDeserializationContext; -import com.google.gson.JsonDeserializer; -import com.google.gson.JsonElement; -import com.google.gson.JsonObject; -import com.google.gson.JsonParseException; -import com.google.gson.JsonPrimitive; -import com.google.gson.JsonSerializationContext; -import com.google.gson.JsonSerializer; +import java.lang.reflect.Type; /** * {"type":"INF_MIN", "value":""} @@ -31,27 +25,29 @@ public class PrimaryKeyValueAdaptor implements JsonDeserializer JsonSerializationContext c) { JsonObject json = new JsonObject(); - if (obj == PrimaryKeyValue.INF_MIN) { + if (obj.isInfMin()) { json.add(TYPE, new JsonPrimitive(INF_MIN)); - json.add(VALUE, new JsonPrimitive("")); return json; } - if (obj == PrimaryKeyValue.INF_MAX) { + if (obj.isInfMax()) { json.add(TYPE, new JsonPrimitive(INF_MAX)); - json.add(VALUE, new JsonPrimitive("")); return json; } switch (obj.getType()) { case STRING : - json.add(TYPE, new JsonPrimitive(ColumnType.STRING.toString())); + json.add(TYPE, new JsonPrimitive(ColumnType.STRING.toString())); json.add(VALUE, new JsonPrimitive(obj.asString())); break; case INTEGER : json.add(TYPE, new JsonPrimitive(ColumnType.INTEGER.toString())); json.add(VALUE, new JsonPrimitive(obj.asLong())); break; + case BINARY : + json.add(TYPE, new JsonPrimitive(ColumnType.BINARY.toString())); + json.add(VALUE, new JsonPrimitive(Base64.encodeBase64String(obj.asBinary()))); + break; default: throw new IllegalArgumentException("Unsupport serialize the type : " + obj.getType() + ""); } @@ -64,16 +60,17 @@ public class PrimaryKeyValueAdaptor implements JsonDeserializer JsonObject obj = ele.getAsJsonObject(); String strType = obj.getAsJsonPrimitive(TYPE).getAsString(); - JsonPrimitive jsonValue = obj.getAsJsonPrimitive(VALUE); - if (strType.equals(INF_MIN)) { + if (strType.equalsIgnoreCase(INF_MIN)) { return PrimaryKeyValue.INF_MIN; } - if (strType.equals(INF_MAX)) { + if (strType.equalsIgnoreCase(INF_MAX)) { return PrimaryKeyValue.INF_MAX; } + JsonPrimitive jsonValue = obj.getAsJsonPrimitive(VALUE); + PrimaryKeyValue value = null; PrimaryKeyType type = PrimaryKeyType.valueOf(strType); switch(type) { @@ -83,6 +80,9 @@ public class PrimaryKeyValueAdaptor implements JsonDeserializer case INTEGER : value = PrimaryKeyValue.fromLong(jsonValue.getAsLong()); break; + case BINARY : + value = PrimaryKeyValue.fromBinary(Base64.decodeBase64(jsonValue.getAsString())); + break; default: throw new IllegalArgumentException("Unsupport deserialize the type : " + type + ""); } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetFirstRowPrimaryKeyCallable.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetFirstRowPrimaryKeyCallable.java index f004c0ff..cdcae91a 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetFirstRowPrimaryKeyCallable.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetFirstRowPrimaryKeyCallable.java @@ -1,53 +1,42 @@ package com.alibaba.datax.plugin.reader.otsreader.callable; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.*; + +import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.concurrent.Callable; -import com.aliyun.openservices.ots.OTSClient; -import com.aliyun.openservices.ots.model.ColumnType; -import com.aliyun.openservices.ots.model.ColumnValue; -import com.aliyun.openservices.ots.model.GetRangeRequest; -import com.aliyun.openservices.ots.model.GetRangeResult; -import com.aliyun.openservices.ots.model.PrimaryKeyType; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; -import com.aliyun.openservices.ots.model.Row; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; +public class GetFirstRowPrimaryKeyCallable implements Callable> { -public class GetFirstRowPrimaryKeyCallable implements Callable{ - - private OTSClient ots = null; + private SyncClientInterface ots = null; private TableMeta meta = null; private RangeRowQueryCriteria criteria = null; - - public GetFirstRowPrimaryKeyCallable(OTSClient ots, TableMeta meta, RangeRowQueryCriteria criteria) { + + public GetFirstRowPrimaryKeyCallable(SyncClientInterface ots, TableMeta meta, RangeRowQueryCriteria criteria) { this.ots = ots; this.meta = meta; this.criteria = criteria; } - + @Override - public RowPrimaryKey call() throws Exception { - RowPrimaryKey ret = new RowPrimaryKey(); + public List call() throws Exception { + List ret = new ArrayList<>(); GetRangeRequest request = new GetRangeRequest(); request.setRangeRowQueryCriteria(criteria); - GetRangeResult result = ots.getRange(request); - List rows = result.getRows(); - if(rows.isEmpty()) { + GetRangeResponse response = ots.getRange(request); + List rows = response.getRows(); + if (rows.isEmpty()) { return null;// no data - } + } Row row = rows.get(0); - Map pk = meta.getPrimaryKey(); - for (String key:pk.keySet()) { - ColumnValue v = row.getColumns().get(key); - if (v.getType() == ColumnType.INTEGER) { - ret.addPrimaryKeyColumn(key, PrimaryKeyValue.fromLong(v.asLong())); - } else { - ret.addPrimaryKeyColumn(key, PrimaryKeyValue.fromString(v.asString())); - } + Map pk = meta.getPrimaryKeyMap(); + + for (String key : pk.keySet()) { + PrimaryKeyColumn v = row.getPrimaryKey().getPrimaryKeyColumnsMap().get(key); + ret.add(v); } return ret; } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallable.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallable.java index 2cd1398a..995d491c 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallable.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallable.java @@ -1,35 +1,26 @@ package com.alibaba.datax.plugin.reader.otsreader.callable; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.GetRangeRequest; +import com.alicloud.openservices.tablestore.model.GetRangeResponse; +import com.alicloud.openservices.tablestore.model.RangeRowQueryCriteria; + import java.util.concurrent.Callable; -import com.aliyun.openservices.ots.OTSClientAsync; -import com.aliyun.openservices.ots.model.GetRangeRequest; -import com.aliyun.openservices.ots.model.GetRangeResult; -import com.aliyun.openservices.ots.model.OTSFuture; -import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; - -public class GetRangeCallable implements Callable { +public class GetRangeCallable implements Callable { - private OTSClientAsync ots; + private SyncClientInterface ots; private RangeRowQueryCriteria criteria; - private OTSFuture future; - public GetRangeCallable(OTSClientAsync ots, RangeRowQueryCriteria criteria, OTSFuture future) { + public GetRangeCallable(SyncClientInterface ots, RangeRowQueryCriteria criteria) { this.ots = ots; this.criteria = criteria; - this.future = future; } @Override - public GetRangeResult call() throws Exception { - try { - return future.get(); - } catch (Exception e) { - GetRangeRequest request = new GetRangeRequest(); - request.setRangeRowQueryCriteria(criteria); - future = ots.getRange(request); - throw e; - } + public GetRangeResponse call() throws Exception { + GetRangeRequest request = new GetRangeRequest(); + request.setRangeRowQueryCriteria(criteria); + return ots.getRange(request); } - -} +} \ No newline at end of file diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallableOld.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallableOld.java new file mode 100644 index 00000000..c0434126 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallableOld.java @@ -0,0 +1,35 @@ +package com.alibaba.datax.plugin.reader.otsreader.callable; + +import java.util.concurrent.Callable; + +import com.aliyun.openservices.ots.OTSClientAsync; +import com.aliyun.openservices.ots.model.GetRangeRequest; +import com.aliyun.openservices.ots.model.GetRangeResult; +import com.aliyun.openservices.ots.model.OTSFuture; +import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; + +public class GetRangeCallableOld implements Callable { + + private OTSClientAsync ots; + private RangeRowQueryCriteria criteria; + private OTSFuture future; + + public GetRangeCallableOld(OTSClientAsync ots, RangeRowQueryCriteria criteria, OTSFuture future) { + this.ots = ots; + this.criteria = criteria; + this.future = future; + } + + @Override + public GetRangeResult call() throws Exception { + try { + return future.get(); + } catch (Exception e) { + GetRangeRequest request = new GetRangeRequest(); + request.setRangeRowQueryCriteria(criteria); + future = ots.getRange(request); + throw e; + } + } + +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTableMetaCallable.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTableMetaCallable.java index 2884e12b..36a122c2 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTableMetaCallable.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTableMetaCallable.java @@ -1,18 +1,19 @@ package com.alibaba.datax.plugin.reader.otsreader.callable; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.DescribeTableRequest; +import com.alicloud.openservices.tablestore.model.DescribeTableResponse; +import com.alicloud.openservices.tablestore.model.TableMeta; + import java.util.concurrent.Callable; -import com.aliyun.openservices.ots.OTSClient; -import com.aliyun.openservices.ots.model.DescribeTableRequest; -import com.aliyun.openservices.ots.model.DescribeTableResult; -import com.aliyun.openservices.ots.model.TableMeta; public class GetTableMetaCallable implements Callable{ - private OTSClient ots = null; + private SyncClientInterface ots = null; private String tableName = null; - public GetTableMetaCallable(OTSClient ots, String tableName) { + public GetTableMetaCallable(SyncClientInterface ots, String tableName) { this.ots = ots; this.tableName = tableName; } @@ -21,9 +22,9 @@ public class GetTableMetaCallable implements Callable{ public TableMeta call() throws Exception { DescribeTableRequest describeTableRequest = new DescribeTableRequest(); describeTableRequest.setTableName(tableName); - DescribeTableResult result = ots.describeTable(describeTableRequest); + DescribeTableResponse result = ots.describeTable(describeTableRequest); TableMeta tableMeta = result.getTableMeta(); return tableMeta; } -} +} \ No newline at end of file diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTimeseriesSplitCallable.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTimeseriesSplitCallable.java new file mode 100644 index 00000000..96521c41 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTimeseriesSplitCallable.java @@ -0,0 +1,38 @@ +package com.alibaba.datax.plugin.reader.otsreader.callable; + +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TimeseriesClient; +import com.alicloud.openservices.tablestore.model.timeseries.SplitTimeseriesScanTaskRequest; +import com.alicloud.openservices.tablestore.model.timeseries.SplitTimeseriesScanTaskResponse; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; + +import java.util.List; +import java.util.concurrent.Callable; + +public class GetTimeseriesSplitCallable implements Callable> { + + private TimeseriesClient client = null; + private String timeseriesTableName = null; + private String measurementName = null; + private int splitCountHint = 1; + + + public GetTimeseriesSplitCallable(SyncClientInterface ots, String timeseriesTableName, String measurementName, int splitCountHint) { + this.client = ((SyncClient) ots).asTimeseriesClient(); + this.timeseriesTableName = timeseriesTableName; + this.measurementName = measurementName; + this.splitCountHint = splitCountHint; + } + + @Override + public List call() throws Exception { + SplitTimeseriesScanTaskRequest request = new SplitTimeseriesScanTaskRequest(timeseriesTableName, splitCountHint); + if (measurementName.length() != 0) { + request.setMeasurementName(measurementName); + } + + SplitTimeseriesScanTaskResponse response = client.splitTimeseriesScanTask(request); + return response.getSplitInfos(); + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/ScanTimeseriesDataCallable.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/ScanTimeseriesDataCallable.java new file mode 100644 index 00000000..726d0e5d --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/ScanTimeseriesDataCallable.java @@ -0,0 +1,27 @@ +package com.alibaba.datax.plugin.reader.otsreader.callable; + +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TimeseriesClient; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; + +import java.util.List; +import java.util.concurrent.Callable; + +public class ScanTimeseriesDataCallable implements Callable { + + private TimeseriesClient client = null; + private ScanTimeseriesDataRequest request = null; + + public ScanTimeseriesDataCallable(SyncClientInterface ots, ScanTimeseriesDataRequest scanTimeseriesDataRequest){ + this.client = ((SyncClient) ots).asTimeseriesClient(); + this.request = scanTimeseriesDataRequest; + } + + @Override + public ScanTimeseriesDataResponse call() throws Exception { + return client.scanTimeseriesData(request); + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/DefaultNoRetry.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/DefaultNoRetry.java new file mode 100644 index 00000000..b286472d --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/DefaultNoRetry.java @@ -0,0 +1,32 @@ +package com.alibaba.datax.plugin.reader.otsreader.model; + + +import com.alicloud.openservices.tablestore.model.DefaultRetryStrategy; +import com.alicloud.openservices.tablestore.model.RetryStrategy; + +public class DefaultNoRetry extends DefaultRetryStrategy { + + public DefaultNoRetry() { + super(); + } + + @Override + public RetryStrategy clone() { + return super.clone(); + } + + @Override + public int getRetries() { + return super.getRetries(); + } + + @Override + public boolean shouldRetry(String action, Exception ex) { + return false; + } + + @Override + public long nextPause(String action, Exception ex) { + return super.nextPause(action, ex); + } +} \ No newline at end of file diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSColumn.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSColumn.java index 129ccd2f..809f4c38 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSColumn.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSColumn.java @@ -1,19 +1,18 @@ package com.alibaba.datax.plugin.reader.otsreader.model; -import com.alibaba.datax.common.element.BoolColumn; -import com.alibaba.datax.common.element.BytesColumn; -import com.alibaba.datax.common.element.Column; -import com.alibaba.datax.common.element.DoubleColumn; -import com.alibaba.datax.common.element.LongColumn; -import com.alibaba.datax.common.element.StringColumn; -import com.aliyun.openservices.ots.model.ColumnType; +import com.alibaba.datax.common.element.*; +import com.alicloud.openservices.tablestore.model.ColumnType; public class OTSColumn { private String name; private Column value; + private OTSColumnType columnType; + + // 时序数据column配置 private ColumnType valueType; - + private Boolean isTimeseriesTag; + public static enum OTSColumnType { NORMAL, // 普通列 CONST // 常量列 @@ -24,10 +23,9 @@ public class OTSColumn { this.columnType = OTSColumnType.NORMAL; } - private OTSColumn(Column value, ColumnType type) { + private OTSColumn(Column value) { this.value = value; this.columnType = OTSColumnType.CONST; - this.valueType = type; } public static OTSColumn fromNormalColumn(String name) { @@ -39,23 +37,23 @@ public class OTSColumn { } public static OTSColumn fromConstStringColumn(String value) { - return new OTSColumn(new StringColumn(value), ColumnType.STRING); + return new OTSColumn(new StringColumn(value)); } public static OTSColumn fromConstIntegerColumn(long value) { - return new OTSColumn(new LongColumn(value), ColumnType.INTEGER); + return new OTSColumn(new LongColumn(value)); } public static OTSColumn fromConstDoubleColumn(double value) { - return new OTSColumn(new DoubleColumn(value), ColumnType.DOUBLE); + return new OTSColumn(new DoubleColumn(value)); } public static OTSColumn fromConstBoolColumn(boolean value) { - return new OTSColumn(new BoolColumn(value), ColumnType.BOOLEAN); + return new OTSColumn(new BoolColumn(value)); } public static OTSColumn fromConstBytesColumn(byte[] value) { - return new OTSColumn(new BytesColumn(value), ColumnType.BINARY); + return new OTSColumn(new BytesColumn(value)); } public Column getValue() { @@ -65,12 +63,25 @@ public class OTSColumn { public OTSColumnType getColumnType() { return columnType; } - - public ColumnType getValueType() { - return valueType; - } + public String getName() { return name; } -} + + public ColumnType getValueType() { + return valueType; + } + + public void setValueType(ColumnType valueType) { + this.valueType = valueType; + } + + public Boolean getTimeseriesTag() { + return isTimeseriesTag; + } + + public void setTimeseriesTag(Boolean timeseriesTag) { + isTimeseriesTag = timeseriesTag; + } +} \ No newline at end of file diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConf.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConf.java index 8b109a39..cbfd8f6a 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConf.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConf.java @@ -1,90 +1,245 @@ package com.alibaba.datax.plugin.reader.otsreader.model; +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; +import com.alibaba.datax.plugin.reader.otsreader.utils.Key; +import com.alibaba.datax.plugin.reader.otsreader.utils.ParamChecker; +import com.alicloud.openservices.tablestore.model.ColumnType; + import java.util.List; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; - public class OTSConf { - private String endpoint= null; + private String endpoint = null; private String accessId = null; - private String accesskey = null; + private String accessKey = null; private String instanceName = null; private String tableName = null; + private OTSRange range = null; + private List column = null; + private OTSMode mode = null; + + @Deprecated + private String metaMode = ""; + + private boolean newVersion = false; + /** + * 以下配置仅用于timeseries数据读取 + */ + private boolean isTimeseriesTable = false; + private String measurementName = null; + /** + * 以上配置仅用于timeseries数据读取 + */ + private OTSMultiVersionConf multi = null; - private List rangeBegin = null; - private List rangeEnd = null; - private List rangeSplit = null; - - private List columns = null; - - private int retry; - private int sleepInMilliSecond; - + private int retry = Constant.ConfigDefaultValue.RETRY; + private int retryPauseInMillisecond = Constant.ConfigDefaultValue.RETRY_PAUSE_IN_MILLISECOND; + private int ioThreadCount = Constant.ConfigDefaultValue.IO_THREAD_COUNT; + private int maxConnectionCount = Constant.ConfigDefaultValue.MAX_CONNECTION_COUNT; + private int socketTimeoutInMillisecond = Constant.ConfigDefaultValue.SOCKET_TIMEOUT_IN_MILLISECOND; + private int connectTimeoutInMillisecond = Constant.ConfigDefaultValue.CONNECT_TIMEOUT_IN_MILLISECOND; + + public int getIoThreadCount() { + return ioThreadCount; + } + + public void setIoThreadCount(int ioThreadCount) { + this.ioThreadCount = ioThreadCount; + } + + public int getMaxConnectCount() { + return maxConnectionCount; + } + + public void setMaxConnectCount(int maxConnectCount) { + this.maxConnectionCount = maxConnectCount; + } + + public int getSocketTimeoutInMillisecond() { + return socketTimeoutInMillisecond; + } + + public void setSocketTimeoutInMillisecond(int socketTimeoutInMillisecond) { + this.socketTimeoutInMillisecond = socketTimeoutInMillisecond; + } + + public int getConnectTimeoutInMillisecond() { + return connectTimeoutInMillisecond; + } + + public void setConnectTimeoutInMillisecond(int connectTimeoutInMillisecond) { + this.connectTimeoutInMillisecond = connectTimeoutInMillisecond; + } + + public int getRetry() { + return retry; + } + + public void setRetry(int retry) { + this.retry = retry; + } + + public int getRetryPauseInMillisecond() { + return retryPauseInMillisecond; + } + + public void setRetryPauseInMillisecond(int sleepInMillisecond) { + this.retryPauseInMillisecond = sleepInMillisecond; + } + public String getEndpoint() { return endpoint; } + public void setEndpoint(String endpoint) { this.endpoint = endpoint; } + public String getAccessId() { return accessId; } + public void setAccessId(String accessId) { this.accessId = accessId; } - public String getAccesskey() { - return accesskey; + + public String getAccessKey() { + return accessKey; } - public void setAccesskey(String accesskey) { - this.accesskey = accesskey; + + public void setAccessKey(String accessKey) { + this.accessKey = accessKey; } + public String getInstanceName() { return instanceName; } + public void setInstanceName(String instanceName) { this.instanceName = instanceName; } + public String getTableName() { return tableName; } + public void setTableName(String tableName) { this.tableName = tableName; } - public List getColumns() { - return columns; + public OTSRange getRange() { + return range; } - public void setColumns(List columns) { - this.columns = columns; + + public void setRange(OTSRange range) { + this.range = range; } - public int getRetry() { - return retry; + + public OTSMode getMode() { + return mode; } - public void setRetry(int retry) { - this.retry = retry; + + public void setMode(OTSMode mode) { + this.mode = mode; } - public int getSleepInMilliSecond() { - return sleepInMilliSecond; + + public OTSMultiVersionConf getMulti() { + return multi; } - public void setSleepInMilliSecond(int sleepInMilliSecond) { - this.sleepInMilliSecond = sleepInMilliSecond; + + public void setMulti(OTSMultiVersionConf multi) { + this.multi = multi; } - public List getRangeBegin() { - return rangeBegin; + + public List getColumn() { + return column; } - public void setRangeBegin(List rangeBegin) { - this.rangeBegin = rangeBegin; + + public void setColumn(List column) { + this.column = column; } - public List getRangeEnd() { - return rangeEnd; + + public boolean isNewVersion() { + return newVersion; } - public void setRangeEnd(List rangeEnd) { - this.rangeEnd = rangeEnd; + + public void setNewVersion(boolean newVersion) { + this.newVersion = newVersion; } - public List getRangeSplit() { - return rangeSplit; + + @Deprecated + public String getMetaMode() { + return metaMode; } - public void setRangeSplit(List rangeSplit) { - this.rangeSplit = rangeSplit; + + @Deprecated + public void setMetaMode(String metaMode) { + this.metaMode = metaMode; + } + + public boolean isTimeseriesTable() { + return isTimeseriesTable; + } + + public void setTimeseriesTable(boolean timeseriesTable) { + isTimeseriesTable = timeseriesTable; + } + + public String getMeasurementName() { + return measurementName; + } + + public void setMeasurementName(String measurementName) { + this.measurementName = measurementName; + } + + public static OTSConf load(Configuration param) throws OTSCriticalException { + OTSConf c = new OTSConf(); + + // account + c.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT, true)); + c.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID, true)); + c.setAccessKey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY, true)); + c.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME, true)); + c.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME, true)); + + c.setRetry(param.getInt(Constant.ConfigKey.RETRY, Constant.ConfigDefaultValue.RETRY)); + c.setRetryPauseInMillisecond(param.getInt(Constant.ConfigKey.RETRY_PAUSE_IN_MILLISECOND, Constant.ConfigDefaultValue.RETRY_PAUSE_IN_MILLISECOND)); + c.setIoThreadCount(param.getInt(Constant.ConfigKey.IO_THREAD_COUNT, Constant.ConfigDefaultValue.IO_THREAD_COUNT)); + c.setMaxConnectCount(param.getInt(Constant.ConfigKey.MAX_CONNECTION_COUNT, Constant.ConfigDefaultValue.MAX_CONNECTION_COUNT)); + c.setSocketTimeoutInMillisecond(param.getInt(Constant.ConfigKey.SOCKET_TIMEOUTIN_MILLISECOND, Constant.ConfigDefaultValue.SOCKET_TIMEOUT_IN_MILLISECOND)); + c.setConnectTimeoutInMillisecond(param.getInt(Constant.ConfigKey.CONNECT_TIMEOUT_IN_MILLISECOND, Constant.ConfigDefaultValue.CONNECT_TIMEOUT_IN_MILLISECOND)); + + // range + c.setRange(ParamChecker.checkRangeAndGet(param)); + + // mode 可选参数 + c.setMode(ParamChecker.checkModeAndGet(param)); + //isNewVersion 可选参数 + c.setNewVersion(param.getBool(Key.NEW_VERSION, false)); + // metaMode 旧版本配置 + c.setMetaMode(param.getString(Key.META_MODE, "")); + + + + // 读时序表配置项 + c.setTimeseriesTable(param.getBool(Key.IS_TIMESERIES_TABLE, false)); + // column + if(!c.isTimeseriesTable()){ + //非时序表 + c.setColumn(ParamChecker.checkOTSColumnAndGet(param, c.getMode())); + } + else{ + // 时序表 + c.setMeasurementName(param.getString(Key.MEASUREMENT_NAME, "")); + c.setColumn(ParamChecker.checkTimeseriesColumnAndGet(param)); + ParamChecker.checkTimeseriesMode(c.getMode(), c.isNewVersion()); + } + + if (c.getMode() == OTSMode.MULTI_VERSION) { + c.setMulti(OTSMultiVersionConf.load(param)); + } + return c; } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSCriticalException.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSCriticalException.java new file mode 100644 index 00000000..f02346bc --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSCriticalException.java @@ -0,0 +1,24 @@ +package com.alibaba.datax.plugin.reader.otsreader.model; + +/** + * 插件错误异常,该异常主要用于描述插件的异常退出 + * @author redchen + */ +public class OTSCriticalException extends Exception{ + + private static final long serialVersionUID = 5820460098894295722L; + + public OTSCriticalException() {} + + public OTSCriticalException(String message) { + super(message); + } + + public OTSCriticalException(Throwable a) { + super(a); + } + + public OTSCriticalException(String message, Throwable a) { + super(message, a); + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSErrorCode.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSErrorCode.java new file mode 100644 index 00000000..0c537fce --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSErrorCode.java @@ -0,0 +1,115 @@ +/** + * Copyright (C) Alibaba Cloud Computing + * All rights reserved. + * + * 版权所有 (C)阿里云计算有限公司 + */ + +package com.alibaba.datax.plugin.reader.otsreader.model; + +/** + * 表示来自开放结构化数据服务(Open Table Service,OTS)的错误代码。 + * + */ +public class OTSErrorCode { + /** + * 用户身份验证失败。 + */ + public static final String AUTHORIZATION_FAILURE = "OTSAuthFailed"; + + /** + * 服务器内部错误。 + */ + public static final String INTERNAL_SERVER_ERROR = "OTSInternalServerError"; + + /** + * 参数错误。 + */ + public static final String INVALID_PARAMETER = "OTSParameterInvalid"; + + /** + * 整个请求过大。 + */ + public static final String REQUEST_TOO_LARGE = "OTSRequestBodyTooLarge"; + + /** + * 客户端请求超时。 + */ + public static final String REQUEST_TIMEOUT = "OTSRequestTimeout"; + + /** + * 用户的配额已经用满。 + */ + public static final String QUOTA_EXHAUSTED = "OTSQuotaExhausted"; + + /** + * 内部服务器发生failover,导致表的部分分区不可服务。 + */ + public static final String PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; + + /** + * 表刚被创建还无法立马提供服务。 + */ + public static final String TABLE_NOT_READY = "OTSTableNotReady"; + + /** + * 请求的表不存在。 + */ + public static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; + + /** + * 请求创建的表已经存在。 + */ + public static final String OBJECT_ALREADY_EXIST = "OTSObjectAlreadyExist"; + + /** + * 多个并发的请求写同一行数据,导致冲突。 + */ + public static final String ROW_OPEARTION_CONFLICT = "OTSRowOperationConflict"; + + /** + * 主键不匹配。 + */ + public static final String INVALID_PK = "OTSInvalidPK"; + + /** + * 读写能力调整过于频繁。 + */ + public static final String TOO_FREQUENT_RESERVED_THROUGHPUT_ADJUSTMENT = "OTSTooFrequentReservedThroughputAdjustment"; + + /** + * 该行总列数超出限制。 + */ + public static final String OUT_OF_COLUMN_COUNT_LIMIT = "OTSOutOfColumnCountLimit"; + + /** + * 该行所有列数据大小总和超出限制。 + */ + public static final String OUT_OF_ROW_SIZE_LIMIT = "OTSOutOfRowSizeLimit"; + + /** + * 剩余预留读写能力不足。 + */ + public static final String NOT_ENOUGH_CAPACITY_UNIT = "OTSNotEnoughCapacityUnit"; + + /** + * 预查条件检查失败。 + */ + public static final String CONDITION_CHECK_FAIL = "OTSConditionCheckFail"; + + /** + * 在OTS内部操作超时。 + */ + public static final String STORAGE_TIMEOUT = "OTSTimeout"; + + /** + * 在OTS内部有服务器不可访问。 + */ + public static final String SERVER_UNAVAILABLE = "OTSServerUnavailable"; + + /** + * OTS内部服务器繁忙。 + */ + public static final String SERVER_BUSY = "OTSServerBusy"; + +} \ No newline at end of file diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMode.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMode.java new file mode 100644 index 00000000..88c6ee67 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMode.java @@ -0,0 +1,6 @@ +package com.alibaba.datax.plugin.reader.otsreader.model; + +public enum OTSMode { + NORMAL, + MULTI_VERSION +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMultiVersionConf.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMultiVersionConf.java new file mode 100644 index 00000000..72a8e1b7 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMultiVersionConf.java @@ -0,0 +1,35 @@ +package com.alibaba.datax.plugin.reader.otsreader.model; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; +import com.alibaba.datax.plugin.reader.otsreader.utils.ParamChecker; +import com.alicloud.openservices.tablestore.model.TimeRange; + +public class OTSMultiVersionConf { + + private TimeRange timeRange = null; + private int maxVersion = -1; + + public TimeRange getTimeRange() { + return timeRange; + } + + public void setTimeRange(TimeRange timeRange) { + this.timeRange = timeRange; + } + + public int getMaxVersion() { + return maxVersion; + } + + public void setMaxVersion(int maxVersion) { + this.maxVersion = maxVersion; + } + + public static OTSMultiVersionConf load(Configuration param) throws OTSCriticalException { + OTSMultiVersionConf conf = new OTSMultiVersionConf(); + conf.setTimeRange(ParamChecker.checkTimeRangeAndGet(param)); + conf.setMaxVersion(param.getInt(Constant.ConfigKey.MAX_VERSION, Constant.ConfigDefaultValue.MAX_VERSION)); + return conf; + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSPrimaryKeyColumn.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSPrimaryKeyColumn.java index eaec50ce..44a37c0c 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSPrimaryKeyColumn.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSPrimaryKeyColumn.java @@ -15,8 +15,41 @@ public class OTSPrimaryKeyColumn { public PrimaryKeyType getType() { return type; } + + public com.alicloud.openservices.tablestore.model.PrimaryKeyType getType(Boolean newVersion) { + com.alicloud.openservices.tablestore.model.PrimaryKeyType res = null; + switch (this.type){ + case BINARY: + res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.BINARY; + break; + case INTEGER: + res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.INTEGER; + break; + case STRING: + default: + res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.STRING; + break; + } + return res; + } + public void setType(PrimaryKeyType type) { this.type = type; } + + public void setType(com.alicloud.openservices.tablestore.model.PrimaryKeyType type) { + switch (type){ + case BINARY: + this.type = PrimaryKeyType.BINARY; + break; + case INTEGER: + this.type = PrimaryKeyType.INTEGER; + break; + case STRING: + default: + this.type = PrimaryKeyType.STRING; + break; + } + } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSRange.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSRange.java index 8ebfcf7e..eb3095e6 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSRange.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSRange.java @@ -1,29 +1,31 @@ package com.alibaba.datax.plugin.reader.otsreader.model; -import com.aliyun.openservices.ots.model.RowPrimaryKey; +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; + +import java.util.List; + public class OTSRange { + private List begin = null; + private List end = null; + private List split = null; - private RowPrimaryKey begin = null; - private RowPrimaryKey end = null; - - public OTSRange() {} - - public OTSRange(RowPrimaryKey begin, RowPrimaryKey end) { - this.begin = begin; - this.end = end; - } - - public RowPrimaryKey getBegin() { + public List getBegin() { return begin; } - public void setBegin(RowPrimaryKey begin) { + public void setBegin(List begin) { this.begin = begin; } - public RowPrimaryKey getEnd() { + public List getEnd() { return end; } - public void setEnd(RowPrimaryKey end) { + public void setEnd(List end) { this.end = end; } + public List getSplit() { + return split; + } + public void setSplit(List split) { + this.split = split; + } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Common.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Common.java index 7bb3f52e..90065d5d 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Common.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Common.java @@ -1,26 +1,85 @@ package com.alibaba.datax.plugin.reader.otsreader.utils; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; + +import java.lang.reflect.Field; import java.util.ArrayList; +import java.util.HashMap; import java.util.List; import java.util.Map; -import com.alibaba.datax.common.element.BoolColumn; -import com.alibaba.datax.common.element.BytesColumn; -import com.alibaba.datax.common.element.DoubleColumn; -import com.alibaba.datax.common.element.LongColumn; -import com.alibaba.datax.common.element.Record; -import com.alibaba.datax.common.element.StringColumn; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; -import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSException; -import com.aliyun.openservices.ots.model.ColumnValue; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.Row; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; - public class Common { + public static List toColumnToGet(List columns, TableMeta meta) { + Map pk = meta.getPrimaryKeyMap(); + List names = new ArrayList(); + for (OTSColumn c : columns) { + if (c.getColumnType() == OTSColumn.OTSColumnType.NORMAL && !pk.containsKey(c.getName())) { + names.add(c.getName()); + } + } + return names; + } + + public static List getPrimaryKeyNameList(TableMeta meta) { + List names = new ArrayList(); + names.addAll(meta.getPrimaryKeyMap().keySet()); + return names; + } + + public static OTSPrimaryKeyColumn getPartitionKey(TableMeta meta) { + List keys = new ArrayList(); + keys.addAll(meta.getPrimaryKeyMap().keySet()); + + String key = keys.get(0); + + OTSPrimaryKeyColumn col = new OTSPrimaryKeyColumn(); + col.setName(key); + col.setType(meta.getPrimaryKeyMap().get(key)); + return col; + } + + public static Direction getDirection(List begin, List end) throws OTSCriticalException { + int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); + if (cmp < 0) { + return Direction.FORWARD; + } else if (cmp > 0) { + return Direction.BACKWARD; + } else { + throw new OTSCriticalException("Bug branch, the begin of range equals end of range."); + } + } + + public static int compareRangeBeginAndEnd(TableMeta meta, List begin, List end) { + if (begin.size() != end.size()) { + throw new IllegalArgumentException("Input size of begin not equal size of end, begin size : " + begin.size() + + ", end size : " + end.size() + "."); + } + + Map beginMap = new HashMap<>(); + Map endMap = new HashMap<>(); + + for(PrimaryKeyColumn primaryKeyColumn : begin){ + beginMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); + } + for(PrimaryKeyColumn primaryKeyColumn : end){ + endMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); + } + + for (String key : meta.getPrimaryKeyMap().keySet()) { + PrimaryKeyValue v1 = beginMap.get(key); + PrimaryKeyValue v2 = endMap.get(key); + int cmp = primaryKeyValueCmp(v1, v2); + if (cmp != 0) { + return cmp; + } + } + return 0; + } + public static int primaryKeyValueCmp(PrimaryKeyValue v1, PrimaryKeyValue v2) { if (v1.getType() != null && v2.getType() != null) { @@ -29,14 +88,14 @@ public class Common { "Not same column type, column1:" + v1.getType() + ", column2:" + v2.getType()); } switch (v1.getType()) { - case INTEGER: - Long l1 = Long.valueOf(v1.asLong()); - Long l2 = Long.valueOf(v2.asLong()); - return l1.compareTo(l2); - case STRING: - return v1.asString().compareTo(v2.asString()); - default: - throw new IllegalArgumentException("Unsuporrt compare the type: " + v1.getType() + "."); + case INTEGER: + Long l1 = Long.valueOf(v1.asLong()); + Long l2 = Long.valueOf(v2.asLong()); + return l1.compareTo(l2); + case STRING: + return v1.asString().compareTo(v2.asString()); + default: + throw new IllegalArgumentException("Unsuporrt compare the type: " + v1.getType() + "."); } } else { if (v1 == v2) { @@ -46,116 +105,31 @@ public class Common { return -1; } else if (v1 == PrimaryKeyValue.INF_MAX) { return 1; - } + } if (v2 == PrimaryKeyValue.INF_MAX) { return -1; } else if (v2 == PrimaryKeyValue.INF_MIN) { return 1; - } - } - } - return 0; - } - - public static OTSPrimaryKeyColumn getPartitionKey(TableMeta meta) { - List keys = new ArrayList(); - keys.addAll(meta.getPrimaryKey().keySet()); - - String key = keys.get(0); - - OTSPrimaryKeyColumn col = new OTSPrimaryKeyColumn(); - col.setName(key); - col.setType(meta.getPrimaryKey().get(key)); - return col; - } - - public static List getPrimaryKeyNameList(TableMeta meta) { - List names = new ArrayList(); - names.addAll(meta.getPrimaryKey().keySet()); - return names; - } - - public static int compareRangeBeginAndEnd(TableMeta meta, RowPrimaryKey begin, RowPrimaryKey end) { - if (begin.getPrimaryKey().size() != end.getPrimaryKey().size()) { - throw new IllegalArgumentException("Input size of begin not equal size of end, begin size : " + begin.getPrimaryKey().size() + - ", end size : " + end.getPrimaryKey().size() + "."); - } - for (String key : meta.getPrimaryKey().keySet()) { - PrimaryKeyValue v1 = begin.getPrimaryKey().get(key); - PrimaryKeyValue v2 = end.getPrimaryKey().get(key); - int cmp = primaryKeyValueCmp(v1, v2); - if (cmp != 0) { - return cmp; - } - } - return 0; - } - - public static List getNormalColumnNameList(List columns) { - List normalColumns = new ArrayList(); - for (OTSColumn col : columns) { - if (col.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { - normalColumns.add(col.getName()); - } - } - return normalColumns; - } - - public static Record parseRowToLine(Row row, List columns, Record line) { - Map values = row.getColumns(); - for (OTSColumn col : columns) { - if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { - line.addColumn(col.getValue()); - } else { - ColumnValue v = values.get(col.getName()); - if (v == null) { - line.addColumn(new StringColumn(null)); - } else { - switch(v.getType()) { - case STRING: line.addColumn(new StringColumn(v.asString())); break; - case INTEGER: line.addColumn(new LongColumn(v.asLong())); break; - case DOUBLE: line.addColumn(new DoubleColumn(v.asDouble())); break; - case BOOLEAN: line.addColumn(new BoolColumn(v.asBoolean())); break; - case BINARY: line.addColumn(new BytesColumn(v.asBinary())); break; - default: - throw new IllegalArgumentException("Unsuporrt tranform the type: " + col.getValue().getType() + "."); - } } } } - return line; + return 0; } - - public static String getDetailMessage(Exception exception) { - if (exception instanceof OTSException) { - OTSException e = (OTSException) exception; - return "OTSException[ErrorCode:" + e.getErrorCode() + ", ErrorMessage:" + e.getMessage() + ", RequestId:" + e.getRequestId() + "]"; - } else if (exception instanceof ClientException) { - ClientException e = (ClientException) exception; - return "ClientException[ErrorCode:" + e.getErrorCode() + ", ErrorMessage:" + e.getMessage() + "]"; - } else if (exception instanceof IllegalArgumentException) { - IllegalArgumentException e = (IllegalArgumentException) exception; - return "IllegalArgumentException[ErrorMessage:" + e.getMessage() + "]"; - } else { - return "Exception[ErrorMessage:" + exception.getMessage() + "]"; - } - } - - public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { - if (hadRetryTimes <= 0) { - return 0; - } - - int sleepTime = initSleepInMilliSecond; - for (int i = 1; i < hadRetryTimes; i++) { - sleepTime += sleepTime; - if (sleepTime > 30000) { - sleepTime = 30000; + public static void checkTableStoreSDKVersion() throws OTSCriticalException { + Field[] fields = ScanTimeseriesDataResponse.class.getFields(); + String sdkVersion = null; + for (Field f : fields){ + if (f.getName().equals("_VERSION_")){ + sdkVersion = ScanTimeseriesDataResponse._VERSION_; break; - } + } + } + if (sdkVersion == null){ + throw new OTSCriticalException("Check ots java SDK failed. Please check the version of tableStore maven dependency."); + }else if (Integer.parseInt(sdkVersion) < 20230111){ + throw new OTSCriticalException("Check tableStore java SDK failed. The expected version number is greater than 20230111, actually version : " + sdkVersion + "."); } - return sleepTime; } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CommonOld.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CommonOld.java new file mode 100644 index 00000000..d5c565f4 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CommonOld.java @@ -0,0 +1,112 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import com.alibaba.datax.common.element.BoolColumn; +import com.alibaba.datax.common.element.BytesColumn; +import com.alibaba.datax.common.element.DoubleColumn; +import com.alibaba.datax.common.element.LongColumn; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; +import com.aliyun.openservices.ots.ClientException; +import com.aliyun.openservices.ots.OTSException; +import com.aliyun.openservices.ots.model.ColumnValue; +import com.aliyun.openservices.ots.model.PrimaryKeyValue; +import com.aliyun.openservices.ots.model.Row; +import com.aliyun.openservices.ots.model.RowPrimaryKey; +import com.aliyun.openservices.ots.model.TableMeta; + +public class CommonOld { + public static int primaryKeyValueCmp(PrimaryKeyValue v1, PrimaryKeyValue v2) { + if (v1.getType() != null && v2.getType() != null) { + if (v1.getType() != v2.getType()) { + throw new IllegalArgumentException( + "Not same column type, column1:" + v1.getType() + ", column2:" + v2.getType()); + } + switch (v1.getType()) { + case INTEGER: + Long l1 = Long.valueOf(v1.asLong()); + Long l2 = Long.valueOf(v2.asLong()); + return l1.compareTo(l2); + case STRING: + return v1.asString().compareTo(v2.asString()); + default: + throw new IllegalArgumentException("Unsuporrt compare the type: " + v1.getType() + "."); + } + } else { + if (v1 == v2) { + return 0; + } else { + if (v1 == PrimaryKeyValue.INF_MIN) { + return -1; + } else if (v1 == PrimaryKeyValue.INF_MAX) { + return 1; + } + + if (v2 == PrimaryKeyValue.INF_MAX) { + return -1; + } else if (v2 == PrimaryKeyValue.INF_MIN) { + return 1; + } + } + } + return 0; + } + + + public static List getNormalColumnNameList(List columns) { + List normalColumns = new ArrayList(); + for (OTSColumn col : columns) { + if (col.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { + normalColumns.add(col.getName()); + } + } + return normalColumns; + } + + public static Record parseRowToLine(Row row, List columns, Record line) { + Map values = row.getColumns(); + for (OTSColumn col : columns) { + if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { + line.addColumn(col.getValue()); + } else { + ColumnValue v = values.get(col.getName()); + if (v == null) { + line.addColumn(new StringColumn(null)); + } else { + switch(v.getType()) { + case STRING: line.addColumn(new StringColumn(v.asString())); break; + case INTEGER: line.addColumn(new LongColumn(v.asLong())); break; + case DOUBLE: line.addColumn(new DoubleColumn(v.asDouble())); break; + case BOOLEAN: line.addColumn(new BoolColumn(v.asBoolean())); break; + case BINARY: line.addColumn(new BytesColumn(v.asBinary())); break; + default: + throw new IllegalArgumentException("Unsuporrt tranform the type: " + col.getValue().getType() + "."); + } + } + } + } + return line; + } + + public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { + + if (hadRetryTimes <= 0) { + return 0; + } + + int sleepTime = initSleepInMilliSecond; + for (int i = 1; i < hadRetryTimes; i++) { + sleepTime += sleepTime; + if (sleepTime > 30000) { + sleepTime = 30000; + break; + } + } + return sleepTime; + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CompareHelper.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CompareHelper.java new file mode 100644 index 00000000..19e06421 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CompareHelper.java @@ -0,0 +1,37 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; + +import java.util.List; + + +public class CompareHelper { + /** + * 比较PrimaryKeyColumn List的大小 + * 返回 + * -1 表示before小于after + * 0 表示before等于after + * 1 表示before大于after + * + * @param before + * @param after + * @return + */ + public static int comparePrimaryKeyColumnList(List before, List after) { + int size = before.size() < after.size() ? before.size() : after.size(); + + for (int i = 0; i < size; i++) { + int cmp = before.get(i).compareTo(after.get(i)); + if (cmp != 0) { + return cmp; + } + } + + if (before.size() < after.size() ) { + return -1; + } else if (before.size() > after.size() ) { + return 1; + } + return 0; + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Constant.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Constant.java new file mode 100644 index 00000000..90273bfb --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Constant.java @@ -0,0 +1,92 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +public class Constant { + /** + * Json中的Key名字定义 + */ +public class ConfigKey { + public static final String CONF = "conf"; + public static final String RANGE = "range"; + public static final String META = "meta"; + public static final String SPLIT_INFO = "splitInfo"; + + public static final String TIME_RANGE = "timeRange"; + public static final String MAX_VERSION = "maxVersion"; + + public static final String RETRY = "maxRetryTime"; + public static final String RETRY_PAUSE_IN_MILLISECOND = "retryPauseInMillisecond"; + public static final String IO_THREAD_COUNT = "ioThreadCount"; + public static final String MAX_CONNECTION_COUNT = "maxConnectionCount"; + public static final String SOCKET_TIMEOUTIN_MILLISECOND = "socketTimeoutInMillisecond"; + public static final String CONNECT_TIMEOUT_IN_MILLISECOND = "connectTimeoutInMillisecond"; + + public class Range { + public static final String BEGIN = "begin"; + public static final String END = "end"; + public static final String SPLIT = "split"; + }; + + public class PrimaryKeyColumn { + public static final String TYPE = "type"; + public static final String VALUE = "value"; + }; + + public class TimeseriesPKColumn { + public static final String MEASUREMENT_NAME = "_m_name"; + public static final String DATA_SOURCE = "_data_source"; + public static final String TAGS = "_tags"; + public static final String TIME = "_time"; + } + + public class Column { + public static final String NAME = "name"; + public static final String TYPE = "type"; + public static final String VALUE = "value"; + public static final String IS_TAG = "is_timeseries_tag"; + }; + + public class TimeRange { + public static final String BEGIN = "begin"; + public static final String END = "end"; + } + }; + + /** + * 定义的配置文件中value type中可取的值 + */ + public class ValueType { + public static final String INF_MIN = "INF_MIN"; + public static final String INF_MAX = "INF_MAX"; + public static final String STRING = "string"; + public static final String INTEGER = "int"; + public static final String BINARY = "binary"; + public static final String DOUBLE = "double"; + public static final String BOOLEAN = "bool"; + }; + + /** + * 全局默认常量定义 + */ + public class ConfigDefaultValue { + public static final int RETRY = 18; + public static final int RETRY_PAUSE_IN_MILLISECOND = 100; + public static final int IO_THREAD_COUNT = 1; + public static final int MAX_CONNECTION_COUNT = 1; + public static final int SOCKET_TIMEOUT_IN_MILLISECOND = 10000; + public static final int CONNECT_TIMEOUT_IN_MILLISECOND = 10000; + + public static final int MAX_VERSION = Integer.MAX_VALUE; + + public static final String DEFAULT_NAME = "DEFAULT_NAME"; + + public class Mode { + public static final String NORMAL = "normal"; + public static final String MULTI_VERSION = "multiVersion"; + } + + public class TimeRange { + public static final long MIN = 0; + public static final long MAX = Long.MAX_VALUE; + } + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/GsonParser.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/GsonParser.java index a82f3350..205f536d 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/GsonParser.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/GsonParser.java @@ -1,23 +1,26 @@ package com.alibaba.datax.plugin.reader.otsreader.utils; -import com.alibaba.datax.plugin.reader.otsreader.adaptor.OTSColumnAdaptor; +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.plugin.reader.otsreader.adaptor.ColumnAdaptor; import com.alibaba.datax.plugin.reader.otsreader.adaptor.PrimaryKeyValueAdaptor; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; +import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; +import com.alicloud.openservices.tablestore.model.TableMeta; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import com.aliyun.openservices.ots.model.Direction; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; import com.google.gson.Gson; import com.google.gson.GsonBuilder; +import java.util.Map; + public class GsonParser { private static Gson gsonBuilder() { return new GsonBuilder() - .registerTypeAdapter(OTSColumn.class, new OTSColumnAdaptor()) .registerTypeAdapter(PrimaryKeyValue.class, new PrimaryKeyValueAdaptor()) + .registerTypeAdapter(Column.class, new ColumnAdaptor()) .create(); } @@ -40,24 +43,39 @@ public class GsonParser { Gson g = gsonBuilder(); return g.fromJson(jsonStr, OTSConf.class); } - - public static String directionToJson (Direction direction) { + + public static String metaToJson (TableMeta meta) { Gson g = gsonBuilder(); - return g.toJson(direction); + return g.toJson(meta); + } + + public static TableMeta jsonToMeta (String jsonStr) { + Gson g = gsonBuilder(); + return g.fromJson(jsonStr, TableMeta.class); + } + + public static String timeseriesScanSplitInfoToString(TimeseriesScanSplitInfo timeseriesScanSplitInfo){ + Gson g = gsonBuilder(); + return g.toJson(timeseriesScanSplitInfo); + } + + public static TimeseriesScanSplitInfo stringToTimeseriesScanSplitInfo(String jsonStr){ + Gson g = gsonBuilder(); + return g.fromJson(jsonStr, TimeseriesScanSplitInfo.class); } public static Direction jsonToDirection (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, Direction.class); } - - public static String metaToJson (TableMeta meta) { - Gson g = gsonBuilder(); - return g.toJson(meta); - } - + public static String rowPrimaryKeyToJson (RowPrimaryKey row) { Gson g = gsonBuilder(); return g.toJson(row); } + + public static String mapToJson (Map map) { + Gson g = gsonBuilder(); + return g.toJson(map); + } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/Key.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Key.java similarity index 81% rename from otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/Key.java rename to otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Key.java index da6d4a5f..6628e4d3 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/Key.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Key.java @@ -14,7 +14,7 @@ * limitations under the License. */ -package com.alibaba.datax.plugin.reader.otsreader; +package com.alibaba.datax.plugin.reader.otsreader.utils; public final class Key { /* ots account configuration */ @@ -46,5 +46,13 @@ public final class Key { public final static String RANGE_END = "end"; public final static String RANGE_SPLIT = "split"; + + public final static String META_MODE = "metaMode"; + + public final static String MODE = "mode"; + public final static String NEW_VERSION = "newVersion"; + + public final static String IS_TIMESERIES_TABLE = "isTimeseriesTable"; + public final static String MEASUREMENT_NAME = "measurementName"; } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsHelper.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsHelper.java new file mode 100644 index 00000000..060507b6 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsHelper.java @@ -0,0 +1,82 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import com.alibaba.datax.plugin.reader.otsreader.callable.GetRangeCallable; +import com.alibaba.datax.plugin.reader.otsreader.callable.GetTableMetaCallable; +import com.alibaba.datax.plugin.reader.otsreader.callable.GetTimeseriesSplitCallable; +import com.alibaba.datax.plugin.reader.otsreader.callable.ScanTimeseriesDataCallable; +import com.alibaba.datax.plugin.reader.otsreader.model.DefaultNoRetry; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; +import com.alicloud.openservices.tablestore.ClientConfiguration; +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.core.utils.Pair; +import com.alicloud.openservices.tablestore.model.ColumnType; +import com.alicloud.openservices.tablestore.model.GetRangeResponse; +import com.alicloud.openservices.tablestore.model.RangeRowQueryCriteria; +import com.alicloud.openservices.tablestore.model.TableMeta; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; +import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +public class OtsHelper { + + public static SyncClientInterface getOTSInstance(OTSConf conf) { + ClientConfiguration clientConfigure = new ClientConfiguration(); + clientConfigure.setIoThreadCount(conf.getIoThreadCount()); + clientConfigure.setMaxConnections(conf.getMaxConnectCount()); + clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeoutInMillisecond()); + clientConfigure.setConnectionTimeoutInMillisecond(conf.getConnectTimeoutInMillisecond()); + clientConfigure.setRetryStrategy(new DefaultNoRetry()); + + SyncClient ots = new SyncClient( + conf.getEndpoint(), + conf.getAccessId(), + conf.getAccessKey(), + conf.getInstanceName(), + clientConfigure); + + + Map extraHeaders = new HashMap(); + extraHeaders.put("x-ots-sdk-type", "public"); + extraHeaders.put("x-ots-request-source", "datax-otsreader"); + ots.setExtraHeaders(extraHeaders); + + return ots; + } + + public static TableMeta getTableMeta(SyncClientInterface ots, String tableName, int retry, int sleepInMillisecond) throws Exception { + return RetryHelper.executeWithRetry( + new GetTableMetaCallable(ots, tableName), + retry, + sleepInMillisecond + ); + } + + public static GetRangeResponse getRange(SyncClientInterface ots, RangeRowQueryCriteria rangeRowQueryCriteria, int retry, int sleepInMillisecond) throws Exception { + return RetryHelper.executeWithRetry( + new GetRangeCallable(ots, rangeRowQueryCriteria), + retry, + sleepInMillisecond + ); + } + + public static List splitTimeseriesScan(SyncClientInterface ots, String tableName, String measurementName, int splitCountHint, int retry, int sleepInMillisecond) throws Exception { + return RetryHelper.executeWithRetry( + new GetTimeseriesSplitCallable(ots, tableName, measurementName, splitCountHint), + retry, + sleepInMillisecond + ); + } + + public static ScanTimeseriesDataResponse scanTimeseriesData(SyncClientInterface ots, ScanTimeseriesDataRequest scanTimeseriesDataRequest, int retry, int sleepInMillisecond) throws Exception { + return RetryHelper.executeWithRetry( + new ScanTimeseriesDataCallable(ots, scanTimeseriesDataRequest), + retry, + sleepInMillisecond + ); + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderError.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsReaderError.java similarity index 76% rename from otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderError.java rename to otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsReaderError.java index 05a13c1a..b578dcde 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderError.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsReaderError.java @@ -1,4 +1,4 @@ -package com.alibaba.datax.plugin.reader.otsreader; +package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.spi.ErrorCode; @@ -14,10 +14,10 @@ public class OtsReaderError implements ErrorCode { public final static OtsReaderError ERROR = new OtsReaderError( "OtsReaderError", - "该错误表示插件的内部错误,表示系统没有处理到的异常"); + "This error represents an internal error of the otsreader plugin, which indicates that the system is not processed."); public final static OtsReaderError INVALID_PARAM = new OtsReaderError( "OtsReaderInvalidParameter", - "该错误表示参数错误,表示用户输入了错误的参数格式等"); + "This error represents a parameter error, indicating that the user entered the wrong parameter format."); public OtsReaderError (String code) { this.code = code; diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamChecker.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamChecker.java index fbcdc972..b2139fc1 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamChecker.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamChecker.java @@ -1,162 +1,40 @@ package com.alibaba.datax.plugin.reader.otsreader.utils; -import java.util.List; -import java.util.Map; -import java.util.Map.Entry; - +import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; -import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; -import com.aliyun.openservices.ots.model.Direction; -import com.aliyun.openservices.ots.model.PrimaryKeyType; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; +import com.alibaba.datax.plugin.reader.otsreader.model.*; +import com.alicloud.openservices.tablestore.model.*; + +import java.util.*; public class ParamChecker { - private static void throwNotExistException(String key) { - throw new IllegalArgumentException("The param '" + key + "' is not exist."); + private static void throwNotExistException() { + throw new IllegalArgumentException("missing the key."); } - private static void throwStringLengthZeroException(String key) { - throw new IllegalArgumentException("The param length of '" + key + "' is zero."); + private static void throwStringLengthZeroException() { + throw new IllegalArgumentException("input the key is empty string."); } - private static void throwEmptyException(String key) { - throw new IllegalArgumentException("The param '" + key + "' is empty."); - } - - private static void throwNotListException(String key) { - throw new IllegalArgumentException("The param '" + key + "' is not a json array."); - } - - private static void throwNotMapException(String key) { - throw new IllegalArgumentException("The param '" + key + "' is not a json map."); - } - - public static String checkStringAndGet(Configuration param, String key) { - String value = param.getString(key); - if (null == value) { - throwNotExistException(key); - } else if (value.length() == 0) { - throwStringLengthZeroException(key); - } - return value; - } - - public static List checkListAndGet(Configuration param, String key, boolean isCheckEmpty) { - List value = null; + public static String checkStringAndGet(Configuration param, String key, boolean isTrim) throws OTSCriticalException { try { - value = param.getList(key); - } catch (ClassCastException e) { - throwNotListException(key); - } - if (null == value) { - throwNotExistException(key); - } else if (isCheckEmpty && value.isEmpty()) { - throwEmptyException(key); - } - return value; - } - - public static List checkListAndGet(Map range, String key) { - Object obj = range.get(key); - if (null == obj) { - return null; - } - return checkListAndGet(range, key, false); - } - - public static List checkListAndGet(Map range, String key, boolean isCheckEmpty) { - Object obj = range.get(key); - if (null == obj) { - throwNotExistException(key); - } - if (obj instanceof List) { - @SuppressWarnings("unchecked") - List value = (List)obj; - if (isCheckEmpty && value.isEmpty()) { - throwEmptyException(key); + String value = param.getString(key); + if (isTrim) { + value = value != null ? value.trim() : null; + } + if (null == value) { + throwNotExistException(); + } else if (value.length() == 0) { + throwStringLengthZeroException(); } return value; - } else { - throw new IllegalArgumentException("Can not parse list of '" + key + "' from map."); + } catch(RuntimeException e) { + throw new OTSCriticalException("Parse '"+ key +"' fail, " + e.getMessage(), e); } } - public static List checkListAndGet(Map range, String key, List defaultList) { - Object obj = range.get(key); - if (null == obj) { - return defaultList; - } - if (obj instanceof List) { - @SuppressWarnings("unchecked") - List value = (List)obj; - return value; - } else { - throw new IllegalArgumentException("Can not parse list of '" + key + "' from map."); - } - } - - public static Map checkMapAndGet(Configuration param, String key, boolean isCheckEmpty) { - Map value = null; - try { - value = param.getMap(key); - } catch (ClassCastException e) { - throwNotMapException(key); - } - if (null == value) { - throwNotExistException(key); - } else if (isCheckEmpty && value.isEmpty()) { - throwEmptyException(key); - } - return value; - } - - public static RowPrimaryKey checkInputPrimaryKeyAndGet(TableMeta meta, List range) { - if (meta.getPrimaryKey().size() != range.size()) { - throw new IllegalArgumentException(String.format( - "Input size of values not equal size of primary key. input size:%d, primary key size:%d .", - range.size(), meta.getPrimaryKey().size())); - } - RowPrimaryKey pk = new RowPrimaryKey(); - int i = 0; - for (Entry e: meta.getPrimaryKey().entrySet()) { - PrimaryKeyValue value = range.get(i); - if (e.getValue() != value.getType() && value != PrimaryKeyValue.INF_MIN && value != PrimaryKeyValue.INF_MAX) { - throw new IllegalArgumentException( - "Input range type not match primary key. Input type:" + value.getType() + ", Primary Key Type:"+ e.getValue() +", Index:" + i - ); - } else { - pk.addPrimaryKeyColumn(e.getKey(), value); - } - i++; - } - return pk; - } - - public static OTSRange checkRangeAndGet(TableMeta meta, List begin, List end) { - OTSRange range = new OTSRange(); - if (begin.size() == 0 && end.size() == 0) { - RowPrimaryKey beginRow = new RowPrimaryKey(); - RowPrimaryKey endRow = new RowPrimaryKey(); - for (String name : meta.getPrimaryKey().keySet()) { - beginRow.addPrimaryKeyColumn(name, PrimaryKeyValue.INF_MIN); - endRow.addPrimaryKeyColumn(name, PrimaryKeyValue.INF_MAX); - } - range.setBegin(beginRow); - range.setEnd(endRow); - } else { - RowPrimaryKey beginRow = checkInputPrimaryKeyAndGet(meta, begin); - RowPrimaryKey endRow = checkInputPrimaryKeyAndGet(meta, end); - range.setBegin(beginRow); - range.setEnd(endRow); - } - return range; - } - - public static Direction checkDirectionAndEnd(TableMeta meta, RowPrimaryKey begin, RowPrimaryKey end) { + public static Direction checkDirectionAndEnd(TableMeta meta, List begin, List end) { Direction direction = null; int cmp = Common.compareRangeBeginAndEnd(meta, begin, end) ; @@ -170,76 +48,420 @@ public class ParamChecker { return direction; } - /** - * 检查类型是否一致,是否重复,方向是否一致 - * @param direction - * @param before - * @param after - */ - private static void checkDirection(Direction direction, PrimaryKeyValue before, PrimaryKeyValue after) { - int cmp = Common.primaryKeyValueCmp(before, after); - if (cmp > 0) { // 反向 - if (direction == Direction.FORWARD) { - throw new IllegalArgumentException("Input direction of 'range-split' is FORWARD, but direction of 'range' is BACKWARD."); + public static List checkInputPrimaryKeyAndGet(TableMeta meta, List range) { + if (meta.getPrimaryKeyMap().size() != range.size()) { + throw new IllegalArgumentException(String.format( + "Input size of values not equal size of primary key. input size:%d, primary key size:%d .", + range.size(), meta.getPrimaryKeyMap().size())); + } + List pk = new ArrayList<>(); + int i = 0; + for (Map.Entry e: meta.getPrimaryKeyMap().entrySet()) { + PrimaryKeyValue value = range.get(i); + if (e.getValue() != value.getType() && value != PrimaryKeyValue.INF_MIN && value != PrimaryKeyValue.INF_MAX) { + throw new IllegalArgumentException( + "Input range type not match primary key. Input type:" + value.getType() + ", Primary Key Type:"+ e.getValue() +", Index:" + i + ); + } else { + pk.add(new PrimaryKeyColumn(e.getKey(), value)); } - } else if (cmp < 0) { // 正向 - if (direction == Direction.BACKWARD) { - throw new IllegalArgumentException("Input direction of 'range-split' is BACKWARD, but direction of 'range' is FORWARD."); + i++; + } + return pk; + } + + public static OTSRange checkRangeAndGet(Configuration param) throws OTSCriticalException { + try { + OTSRange range = new OTSRange(); + Map value = param.getMap(Key.RANGE); + // 用户可以不用配置range,默认表示导出全表 + if (value == null) { + return range; } - } else { // 重复列 - throw new IllegalArgumentException("Multi same column in 'range-split'."); + + /** + * Range格式:{ + * "begin":[], + * "end":[] + * } + */ + + // begin + // 如果不存在,表示从表开始位置读取 + Object arrayObj = value.get(Constant.ConfigKey.Range.BEGIN); + if (arrayObj != null) { + range.setBegin(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); + } + + // end + // 如果不存在,表示读取到表的结束位置 + arrayObj = value.get(Constant.ConfigKey.Range.END); + if (arrayObj != null) { + range.setEnd(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); + } + + // split + // 如果不存在,表示不做切分 + arrayObj = value.get(Constant.ConfigKey.Range.SPLIT); + if (arrayObj != null) { + range.setSplit(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); + } + + return range; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); + } + + } + + public static TimeRange checkTimeRangeAndGet(Configuration param) throws OTSCriticalException { + try { + + long begin = Constant.ConfigDefaultValue.TimeRange.MIN; + long end = Constant.ConfigDefaultValue.TimeRange.MAX; + + Map value = param.getMap(Constant.ConfigKey.TIME_RANGE); + // 用户可以不用配置time range,默认表示导出全表 + if (value == null) { + return new TimeRange(begin, end); + } + + /** + * TimeRange格式:{ + * "begin":, + * "end": + * } + */ + + // begin + // 如果不存在,表示从表开始位置读取 + Object obj = value.get(Constant.ConfigKey.TimeRange.BEGIN); + if (obj != null) { + begin = ParamParser.parseTimeRangeItem(obj, Constant.ConfigKey.TimeRange.BEGIN); + } + + // end + // 如果不存在,表示读取到表的结束位置 + obj = value.get(Constant.ConfigKey.TimeRange.END); + if (obj != null) { + end = ParamParser.parseTimeRangeItem(obj, Constant.ConfigKey.TimeRange.END); + } + + TimeRange range = new TimeRange(begin, end); + return range; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'timeRange' fail, " + e.getMessage(), e); } } - /** - * 检查 points中的所有点是否是在Begin和end之间 - * @param begin - * @param end - * @param points - */ - private static void checkPointsRange(Direction direction, PrimaryKeyValue begin, PrimaryKeyValue end, List points) { - if (direction == Direction.FORWARD) { - if (!(Common.primaryKeyValueCmp(begin, points.get(0)) < 0 && Common.primaryKeyValueCmp(end, points.get(points.size() - 1)) > 0)) { - throw new IllegalArgumentException("The item of 'range-split' is not within scope of 'range-begin' and 'range-end'."); + private static void checkColumnByMode(List columns , OTSMode mode) { + if (mode == OTSMode.MULTI_VERSION) { + for (OTSColumn c : columns) { + if (c.getColumnType() != OTSColumn.OTSColumnType.NORMAL) { + throw new IllegalArgumentException("in mode:'multiVersion', the 'column' only support specify column_name not const column."); + } } } else { - if (!(Common.primaryKeyValueCmp(begin, points.get(0)) > 0 && Common.primaryKeyValueCmp(end, points.get(points.size() - 1)) < 0)) { - throw new IllegalArgumentException("The item of 'range-split' is not within scope of 'range-begin' and 'range-end'."); + if (columns.isEmpty()) { + throw new IllegalArgumentException("in mode:'normal', the 'column' must specify at least one column_name or const column."); + } + } + } + + public static List checkOTSColumnAndGet(Configuration param, OTSMode mode) throws OTSCriticalException { + try { + List value = param.getList(Key.COLUMN); + // 用户可以不用配置Column + if (value == null) { + value = Collections.emptyList(); + } + + /** + * Column格式:[ + * {"Name":"pk1"}, + * {"type":"Binary","value" : "base64()"} + * ] + */ + List columns = ParamParser.parseOTSColumnArray(value); + checkColumnByMode(columns, mode); + return columns; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); + } + } + + public static List checkTimeseriesColumnAndGet(Configuration param) throws OTSCriticalException { + try { + List value = param.getList(Key.COLUMN); + List columns = ParamParser.parseOTSColumnArray(value); + + List columnTypes = checkColumnTypeAndGet(param); + List isTags = checkColumnIsTagAndGet(param); + + for (int i = 0; i < columns.size(); i++) { + columns.get(i).setValueType(columnTypes.get(i)); + columns.get(i).setTimeseriesTag(isTags.get(i)); + } + + checkColumnByMode(columns, OTSMode.NORMAL); + return columns; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); + } + } + + public static List checkColumnTypeAndGet(Configuration param) throws OTSCriticalException { + try { + List value = param.getList(Key.COLUMN); + List columnTypes = ParamParser.parseColumnTypeArray(value); + return columnTypes; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'type of column' fail, " + e.getMessage(), e); + } + } + + public static List checkColumnIsTagAndGet(Configuration param) throws OTSCriticalException { + try { + List value = param.getList(Key.COLUMN); + List columnIsTag = ParamParser.parseColumnIsTagArray(value); + return columnIsTag; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'isTag of column' fail, " + e.getMessage(), e); + } + } + + public static OTSMode checkModeAndGet(Configuration param) throws OTSCriticalException { + try { + String modeValue = param.getString(Key.MODE, "normal"); + if (modeValue.equalsIgnoreCase(Constant.ConfigDefaultValue.Mode.NORMAL)) { + return OTSMode.NORMAL; + } else if (modeValue.equalsIgnoreCase(Constant.ConfigDefaultValue.Mode.MULTI_VERSION)) { + return OTSMode.MULTI_VERSION; + } else { + throw new IllegalArgumentException("the 'mode' only support 'normal' and 'multiVersion' not '"+ modeValue +"'."); + } + } catch(RuntimeException e) { + throw new OTSCriticalException("Parse 'mode' fail, " + e.getMessage(), e); + } + } + + public static void checkTimeseriesMode(OTSMode mode, Boolean isNewVersion) throws OTSCriticalException { + if (mode == OTSMode.MULTI_VERSION){ + throw new OTSCriticalException("Timeseries table do not support mode : multiVersion." ); + } else if (!isNewVersion){ + throw new OTSCriticalException("Timeseries table is only supported in newVersion, please set \"newVersion\": \"true\"." ); + } + } + + public static List checkAndGetPrimaryKey( + List pk, + List pkSchema, + String jsonKey){ + List result = new ArrayList(); + if(pk != null) { + if (pk.size() > pkSchema.size()) { + throw new IllegalArgumentException("The '"+ jsonKey +"', input primary key column size more than table meta, input size: "+ pk.size() + +", meta pk size:" + pkSchema.size()); + } else { + //类型检查 + for (int i = 0; i < pk.size(); i++) { + PrimaryKeyValue pkc = pk.get(i).getValue(); + PrimaryKeySchema pkcs = pkSchema.get(i); + + if (!pkc.isInfMin() && !pkc.isInfMax() ) { + if (pkc.getType() != pkcs.getType()) { + throw new IllegalArgumentException( + "The '"+ jsonKey +"', input primary key column type mismath table meta, input type:"+ pkc.getType() + +", meta pk type:"+ pkcs.getType() + +", index:" + i); + } + } + result.add(new PrimaryKeyColumn(pkcs.getName(), pkc)); + } + } + return result; + } else { + return new ArrayList(); + } + } + + /** + * 检查split的类型是否和PartitionKey一致 + * @param points + * @param pkSchema + */ + private static List checkAndGetSplit( + List points, + List pkSchema){ + List result = new ArrayList(); + if (points == null) { + return result; + } + + // check 类型是否和PartitionKey一致即可 + PrimaryKeySchema partitionKeySchema = pkSchema.get(0); + for (int i = 0 ; i < points.size(); i++) { + PrimaryKeyColumn p = points.get(i); + if (!p.getValue().isInfMin() && !p.getValue().isInfMax()) { + if (p.getValue().getType() != partitionKeySchema.getType()) { + throw new IllegalArgumentException("The 'split', input primary key column type is mismatch partition key, input type: "+ p.getValue().getType().toString() + +", partition key type:" + partitionKeySchema.getType().toString() + +", index:" + i); + } + } + result.add(new PrimaryKeyColumn(partitionKeySchema.getName(), p.getValue())); + } + + return result; + } + + public static void fillPrimaryKey(List pkSchema, List pk, PrimaryKeyValue fillValue) { + for(int i = pk.size(); i < pkSchema.size(); i++) { + pk.add(new PrimaryKeyColumn(pkSchema.get(i).getName(), fillValue)); + } + } + + private static void fillBeginAndEnd( + List begin, + List end, + List pkSchema) { + if (begin.isEmpty()) { + fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); + } + if (end.isEmpty()) { + fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); + } + int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); + if (cmp == 0) { + // begin.size()和end.size()理论上必然相等,但是考虑到语义的清晰性,显示的给出begin.size() == end.size() + if (begin.size() == end.size() && begin.size() < pkSchema.size()) { + fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); + fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); + } else { + throw new IllegalArgumentException("The 'begin' can not be equal with 'end'."); + } + } else if (cmp < 0) { // 升序 + fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); + fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); + } else { // 降序 + fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MAX); + fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MIN); + } + } + + private static void checkBeginAndEndAndSplit( + List begin, + List end, + List split) { + int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); + + if (!split.isEmpty()) { + if (cmp < 0) { // 升序 + // 检查是否是升序 + for (int i = 0 ; i < split.size() - 1; i++) { + PrimaryKeyColumn before = split.get(i); + PrimaryKeyColumn after = split.get(i + 1); + if (before.compareTo(after) >=0) { // 升序 + throw new IllegalArgumentException("In 'split', the item value is not increasing, index: " + i); + } + } + if (begin.get(0).compareTo(split.get(0)) >= 0) { + throw new IllegalArgumentException("The 'begin' must be less than head of 'split'."); + } + if (split.get(split.size() - 1).compareTo(end.get(0)) >= 0) { + throw new IllegalArgumentException("tail of 'split' must be less than 'end'."); + } + } else if (cmp > 0) {// 降序 + // 检查是否是降序 + for (int i = 0 ; i < split.size() - 1; i++) { + PrimaryKeyColumn before = split.get(i); + PrimaryKeyColumn after = split.get(i + 1); + if (before.compareTo(after) <= 0) { // 升序 + throw new IllegalArgumentException("In 'split', the item value is not descending, index: " + i); + } + } + if (begin.get(0).compareTo(split.get(0)) <= 0) { + throw new IllegalArgumentException("The 'begin' must be large than head of 'split'."); + } + if (split.get(split.size() - 1).compareTo(end.get(0)) <= 0) { + throw new IllegalArgumentException("tail of 'split' must be large than 'end'."); + } + } else { + throw new IllegalArgumentException("The 'begin' can not equal with 'end'."); } } } /** - * 1.检测用户的输入类型是否和PartitionKey一致 - * 2.顺序是否和Range一致 - * 3.是否有重复列 - * 4.检查points的范围是否在range内 - * @param meta - * @param points + * 填充不完整的PK + * 检查Begin、End、Split 3者之间的关系是否符合预期 + * @param begin + * @param end + * @param split */ - public static void checkInputSplitPoints(TableMeta meta, OTSRange range, Direction direction, List points) { - if (null == points || points.isEmpty()) { - return; - } + private static void fillAndcheckBeginAndEndAndSplit( + List begin, + List end, + List split, + List pkSchema + ) { - OTSPrimaryKeyColumn part = Common.getPartitionKey(meta); - - // 处理第一个 - PrimaryKeyValue item = points.get(0); - if ( item.getType() != part.getType()) { - throw new IllegalArgumentException("Input type of 'range-split' not match partition key. " - + "Item of 'range-split' type:" + item.getType()+ ", Partition type:" + part.getType()); - } - - for (int i = 0 ; i < points.size() - 1; i++) { - PrimaryKeyValue before = points.get(i); - PrimaryKeyValue after = points.get(i + 1); - checkDirection(direction, before, after); - } - - PrimaryKeyValue begin = range.getBegin().getPrimaryKey().get(part.getName()); - PrimaryKeyValue end = range.getEnd().getPrimaryKey().get(part.getName()); - - checkPointsRange(direction, begin, end, points); + fillBeginAndEnd(begin, end, pkSchema); + checkBeginAndEndAndSplit(begin, end, split); } + + public static void checkAndSetOTSRange(OTSRange range, TableMeta meta) throws OTSCriticalException { + try { + List pkSchema = meta.getPrimaryKeyList(); + + // 检查是begin和end否和PK类型一致 + range.setBegin(checkAndGetPrimaryKey(range.getBegin(), pkSchema, Constant.ConfigKey.Range.BEGIN)); + range.setEnd(checkAndGetPrimaryKey(range.getEnd(), pkSchema, Constant.ConfigKey.Range.END)); + range.setSplit(checkAndGetSplit(range.getSplit(), pkSchema)); + + // 1.填充Begin和End + // 2.检查begin,end,split顺序是否正确 + fillAndcheckBeginAndEndAndSplit(range.getBegin(), range.getEnd(), range.getSplit(), pkSchema); + } catch(RuntimeException e) { + throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); + } + } + + public static void checkAndSetColumn(List columns, TableMeta meta, OTSMode mode) throws OTSCriticalException { + try { + if (mode == OTSMode.MULTI_VERSION) { + Set uniqueColumn = new HashSet(); + Map pk = meta.getPrimaryKeyMap(); + for (OTSColumn c : columns) { + // 是否包括PK列 + if (pk.get(c.getName()) != null) { + throw new IllegalArgumentException("in mode:'multiVersion', the 'column' can not include primary key column, input:"+ c.getName() +"."); + } + // 是否有重复列 + if (uniqueColumn.contains(c.getName())) { + throw new IllegalArgumentException("in mode:'multiVersion', the 'column' can not include same column, input:"+ c.getName() +"."); + } else { + uniqueColumn.add(c.getName()); + } + } + } + + } catch(RuntimeException e) { + throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); + } + } + + public static void normalCheck(OTSConf conf) { + // 旧版本不支持multiVersion模式 + if(!conf.isNewVersion() && conf.getMode() == OTSMode.MULTI_VERSION){ + throw new IllegalArgumentException("in mode:'multiVersion' :The old version do not support multiVersion mode. Please add config in otsreader: \"newVersion\":\"true\" ."); + } + } + + public static void checkAndSetOTSConf(OTSConf conf, TableMeta meta) throws OTSCriticalException { + normalCheck(conf); + checkAndSetOTSRange(conf.getRange(), meta); + checkAndSetColumn(conf.getColumn(), meta, conf.getMode()); + } + } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamCheckerOld.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamCheckerOld.java new file mode 100644 index 00000000..3489ab35 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamCheckerOld.java @@ -0,0 +1,36 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import com.alibaba.datax.common.util.Configuration; + +import java.util.List; + +public class ParamCheckerOld { + + private static void throwNotExistException(String key) { + throw new IllegalArgumentException("The param '" + key + "' is not exist."); + } + + private static void throwEmptyException(String key) { + throw new IllegalArgumentException("The param '" + key + "' is empty."); + } + + private static void throwNotListException(String key) { + throw new IllegalArgumentException("The param '" + key + "' is not a json array."); + } + + public static List checkListAndGet(Configuration param, String key, boolean isCheckEmpty) { + List value = null; + try { + value = param.getList(key); + } catch (ClassCastException e) { + throwNotListException(key); + } + if (null == value) { + throwNotExistException(key); + } else if (isCheckEmpty && value.isEmpty()) { + throwEmptyException(key); + } + return value; + } + +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamParser.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamParser.java new file mode 100644 index 00000000..862b915c --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamParser.java @@ -0,0 +1,255 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; +import com.alicloud.openservices.tablestore.model.ColumnType; +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; +import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; +import org.apache.commons.codec.binary.Base64; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +public class ParamParser { + + // ------------------------------------------------------------------------ + // Range解析相关的逻辑 + // ------------------------------------------------------------------------ + + private static PrimaryKeyValue parsePrimaryKeyValue(String type) { + return parsePrimaryKeyValue(type, null); + } + + private static PrimaryKeyValue parsePrimaryKeyValue(String type, String value) { + if (type.equalsIgnoreCase(Constant.ValueType.INF_MIN)) { + return PrimaryKeyValue.INF_MIN; + } else if (type.equalsIgnoreCase(Constant.ValueType.INF_MAX)) { + return PrimaryKeyValue.INF_MAX; + } else { + if (value != null) { + if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { + return PrimaryKeyValue.fromString(value); + } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { + return PrimaryKeyValue.fromLong(Long.valueOf(value)); + } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { + return PrimaryKeyValue.fromBinary(Base64.decodeBase64(value)); + } else { + throw new IllegalArgumentException("the column type only support :['INF_MIN', 'INF_MAX', 'string', 'int', 'binary']"); + } + } else { + throw new IllegalArgumentException("the column is missing the field 'value', input 'type':" + type); + } + } + } + + private static PrimaryKeyColumn parsePrimaryKeyColumn(Map item) { + Object typeObj = item.get(Constant.ConfigKey.PrimaryKeyColumn.TYPE); + Object valueObj = item.get(Constant.ConfigKey.PrimaryKeyColumn.VALUE); + + if (typeObj != null && valueObj != null) { + if (typeObj instanceof String && valueObj instanceof String) { + return new PrimaryKeyColumn( + Constant.ConfigDefaultValue.DEFAULT_NAME, + parsePrimaryKeyValue((String)typeObj, (String)valueObj) + ); + } else { + throw new IllegalArgumentException( + "the column's 'type' and 'value' must be string value, " + + "but type of 'type' is :" + typeObj.getClass() + + ", type of 'value' is :" + valueObj.getClass() + ); + } + } else if (typeObj != null) { + if (typeObj instanceof String) { + return new PrimaryKeyColumn( + Constant.ConfigDefaultValue.DEFAULT_NAME, + parsePrimaryKeyValue((String)typeObj) + ); + } else { + throw new IllegalArgumentException( + "the column's 'type' must be string value, " + + "but type of 'type' is :" + typeObj.getClass() + ); + } + } else { + throw new IllegalArgumentException("the column must include 'type' and 'value'."); + } + } + + @SuppressWarnings("unchecked") + public static List parsePrimaryKeyColumnArray(Object arrayObj) throws OTSCriticalException { + try { + List columns = new ArrayList(); + if (arrayObj instanceof List) { + List array = (List) arrayObj; + for (Object o : array) { + if (o instanceof Map) { + Map column = (Map) o; + columns.add(parsePrimaryKeyColumn(column)); + } else { + throw new IllegalArgumentException("input primary key column must be map object, but input type:" + o.getClass()); + } + } + } else { + throw new IllegalArgumentException("input 'begin','end','split' must be list object, but input type:" + arrayObj.getClass()); + } + return columns; + } catch (RuntimeException e) { + // 因为基础模块本身可能抛出一些错误,为了方便定位具体的出错位置,在此把Range加入到Error Message中 + throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); + } + } + + // ------------------------------------------------------------------------ + // Column解析相关的逻辑 + // ------------------------------------------------------------------------ + + private static OTSColumn parseOTSColumn(Object obj) { + if (obj instanceof String) { + return OTSColumn.fromNormalColumn((String)obj); + } else { + throw new IllegalArgumentException("the 'name' must be string, but input:" + obj.getClass()); + } + } + + private static OTSColumn parseOTSColumn(Object typeObj, Object valueObj) { + if (typeObj instanceof String && valueObj instanceof String) { + String type = (String)typeObj; + String value = (String)valueObj; + + if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { + return OTSColumn.fromConstStringColumn(value); + } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { + return OTSColumn.fromConstIntegerColumn(Long.valueOf(value)); + } else if (type.equalsIgnoreCase(Constant.ValueType.DOUBLE)) { + return OTSColumn.fromConstDoubleColumn(Double.valueOf(value)); + } else if (type.equalsIgnoreCase(Constant.ValueType.BOOLEAN)) { + return OTSColumn.fromConstBoolColumn(Boolean.valueOf(value)); + } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { + return OTSColumn.fromConstBytesColumn(Base64.decodeBase64(value)); + } else { + throw new IllegalArgumentException("the const column type only support :['string', 'int', 'double', 'bool', 'binary']"); + } + } else { + throw new IllegalArgumentException("the 'type' and 'value' must be string, but 'type''s type:" + typeObj.getClass() + " 'value''s type:" + valueObj.getClass()); + } + } + + private static OTSColumn parseOTSColumn(Map column) { + Object typeObj = column.get(Constant.ConfigKey.Column.TYPE); + Object valueObj = column.get(Constant.ConfigKey.Column.VALUE); + Object nameObj = column.get(Constant.ConfigKey.Column.NAME); + + if (nameObj != null) { + return parseOTSColumn(nameObj); + } else if (typeObj != null && valueObj != null) { + return parseOTSColumn(typeObj, valueObj); + } else { + throw new IllegalArgumentException("the item of column format support '{\"name\":\"\"}' or '{\"type\":\"\", \"value\":\"\"}'."); + } + } + + @SuppressWarnings("unchecked") + public static List parseOTSColumnArray(List value) throws OTSCriticalException { + try { + List result = new ArrayList(); + for (Object item:value) { + if (item instanceof Map){ + Map column = (Map) item; + result.add(ParamParser.parseOTSColumn(column)); + } else { + throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); + } + } + return result; + } catch (RuntimeException e) { + // 因为基础模块本身可能抛出一些错误,为了方便定位具体的出错位置,在此把Column加入到Error Message中 + throw new OTSCriticalException("Parse 'column' fail. " + e.getMessage(), e); + } + } + + private static ColumnType parseTimeseriesColumnType(Map column) { + Object typeObj = column.getOrDefault(Constant.ConfigKey.Column.TYPE, ""); + if (typeObj instanceof String) { + String type = (String)typeObj; + + if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { + return ColumnType.STRING; + } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { + return ColumnType.INTEGER; + } else if (type.equalsIgnoreCase(Constant.ValueType.DOUBLE)) { + return ColumnType.DOUBLE; + } else if (type.equalsIgnoreCase(Constant.ValueType.BOOLEAN)) { + return ColumnType.BOOLEAN; + } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { + return ColumnType.BINARY; + } else if (type.length() == 0){ + return ColumnType.STRING; + }else { + throw new IllegalArgumentException("the timeseries column type only support :['string', 'int', 'double', 'bool', 'binary']"); + } + } else { + throw new IllegalArgumentException("the 'type' must be string, but 'type''s type:" + typeObj.getClass()); + } + } + + public static List parseColumnTypeArray(List value) throws OTSCriticalException { + try { + List result = new ArrayList(); + for (Object item:value) { + if (item instanceof Map){ + Map column = (Map) item; + result.add(ParamParser.parseTimeseriesColumnType(column)); + } else { + throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); + } + } + return result; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'timeseries column type' fail. " + e.getMessage(), e); + } + } + + private static Boolean parseTimeseriesColumnIsTag(Map column) { + Object isTagParameter = column.getOrDefault(Constant.ConfigKey.Column.IS_TAG, ""); + if (isTagParameter instanceof String) { + String isTag = (String)isTagParameter; + return Boolean.valueOf(isTag); + } else { + throw new IllegalArgumentException("the 'isTag' must be string, but 'isTag''s type:" + isTagParameter.getClass()); + } + } + + public static List parseColumnIsTagArray(List value) throws OTSCriticalException { + try { + List result = new ArrayList(); + for (Object item:value) { + if (item instanceof Map){ + Map column = (Map) item; + result.add(ParamParser.parseTimeseriesColumnIsTag(column)); + } else { + throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); + } + } + return result; + } catch (RuntimeException e) { + throw new OTSCriticalException("Parse 'timeseries column isTag' fail. " + e.getMessage(), e); + } + } + + // ------------------------------------------------------------------------ + // TimeRange解析相关的逻辑 + // ------------------------------------------------------------------------ + + public static long parseTimeRangeItem(Object obj, String key) { + if (obj instanceof Integer) { + return (Integer)obj; + } else if (obj instanceof Long) { + return (Long)obj; + } else { + throw new IllegalArgumentException("the '"+ key +"' must be int, but input:" + obj.getClass()); + } + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RangeSplit.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RangeSplit.java index 74caac3f..fbef9279 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RangeSplit.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RangeSplit.java @@ -1,17 +1,15 @@ package com.alibaba.datax.plugin.reader.otsreader.utils; -import java.math.BigInteger; -import java.util.ArrayList; -import java.util.Collections; -import java.util.Comparator; -import java.util.List; - import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; -import com.aliyun.openservices.ots.model.PrimaryKeyType; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; +import com.alicloud.openservices.tablestore.model.PrimaryKeyType; +import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; +import com.alicloud.openservices.tablestore.model.TableMeta; + + +import java.math.BigInteger; +import java.util.*; /** * 主要提供对范围的解析 @@ -35,8 +33,8 @@ public class RangeSplit { * * 注意:该方法只支持begin小于end * - * @param beginStr - * @param endStr + * @param begin + * @param end * @param count * @return */ @@ -88,7 +86,6 @@ public class RangeSplit { * @return */ public static List splitStringRange(String begin, String end, int count) { - if (count <= 1) { throw new IllegalArgumentException("Input count <= 1 ."); } @@ -136,15 +133,14 @@ public class RangeSplit { } results.add(end); - return results; } /** * begin 一定要小于 end - * @param begin - * @param end - * @param count + * @param bigBegin + * @param bigEnd + * @param bigCount * @return */ private static List splitIntegerRange(BigInteger bigBegin, BigInteger bigEnd, BigInteger bigCount) { @@ -228,20 +224,31 @@ public class RangeSplit { } public static List rangeSplitByCount(TableMeta meta, - RowPrimaryKey begin, RowPrimaryKey end, int count) { + List begin, List end, int count) { List results = new ArrayList(); OTSPrimaryKeyColumn partitionKey = Common.getPartitionKey(meta); - PrimaryKeyValue beginPartitionKey = begin.getPrimaryKey().get( + Map beginMap = new HashMap<>(); + Map endMap = new HashMap<>(); + + for(PrimaryKeyColumn primaryKeyColumn : begin){ + beginMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); + } + for(PrimaryKeyColumn primaryKeyColumn : end){ + endMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); + } + + + PrimaryKeyValue beginPartitionKey = beginMap.get( partitionKey.getName()); - PrimaryKeyValue endPartitionKey = end.getPrimaryKey().get( + PrimaryKeyValue endPartitionKey = endMap.get( partitionKey.getName()); // 第一,先对PartitionKey列进行拆分 List ranges = RangeSplit.splitRangeByPrimaryKeyType( - partitionKey.getType(), beginPartitionKey, endPartitionKey, + partitionKey.getType(true), beginPartitionKey, endPartitionKey, count); if (ranges.isEmpty()) { @@ -250,130 +257,44 @@ public class RangeSplit { int size = ranges.size(); for (int i = 0; i < size - 1; i++) { - RowPrimaryKey bPk = new RowPrimaryKey(); - RowPrimaryKey ePk = new RowPrimaryKey(); + List bPk = new ArrayList<>(); + List ePk = new ArrayList<>(); - bPk.addPrimaryKeyColumn(partitionKey.getName(), ranges.get(i)); - ePk.addPrimaryKeyColumn(partitionKey.getName(), ranges.get(i + 1)); + bPk.add(new PrimaryKeyColumn(partitionKey.getName(), ranges.get(i))); + ePk.add(new PrimaryKeyColumn(partitionKey.getName(), ranges.get(i + 1))); - results.add(new OTSRange(bPk, ePk)); + OTSRange range = new OTSRange(); + range.setBegin(bPk); + range.setEnd(ePk); + results.add(range); } // 第二,填充非PartitionKey的ParimaryKey列 // 注意:在填充过程中,需要使用用户给定的Begin和End来替换切分出来的第一个Range // 的Begin和最后一个Range的End - List keys = new ArrayList(meta.getPrimaryKey().size()); - keys.addAll(meta.getPrimaryKey().keySet()); + List keys = new ArrayList(meta.getPrimaryKeyMap().size()); + keys.addAll(meta.getPrimaryKeyMap().keySet()); for (int i = 0; i < results.size(); i++) { for (int j = 1; j < keys.size(); j++) { OTSRange c = results.get(i); - RowPrimaryKey beginPK = c.getBegin(); - RowPrimaryKey endPK = c.getEnd(); + List beginPK = c.getBegin(); + List endPK = c.getEnd(); String key = keys.get(j); if (i == 0) { // 第一行 - beginPK.addPrimaryKeyColumn(key, - begin.getPrimaryKey().get(key)); - endPK.addPrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN); + beginPK.add(new PrimaryKeyColumn(key, + beginMap.get(key))); + endPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); } else if (i == results.size() - 1) {// 最后一行 - beginPK.addPrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN); - endPK.addPrimaryKeyColumn(key, end.getPrimaryKey().get(key)); + beginPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); + endPK.add(new PrimaryKeyColumn(key, endMap.get(key))); } else { - beginPK.addPrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN); - endPK.addPrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN); + beginPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); + endPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); } } } return results; } - - private static List getCompletePK(int num, - PrimaryKeyValue value) { - List values = new ArrayList(); - for (int j = 0; j < num; j++) { - if (j == 0) { - values.add(value); - } else { - // 这里在填充PK时,系统需要选择特定的值填充于此 - // 系统默认填充INF_MIN - values.add(PrimaryKeyValue.INF_MIN); - } - } - return values; - } - - /** - * 根据输入的范围begin和end,从target中取得对应的point - * @param begin - * @param end - * @param target - * @return - */ - public static List getSplitPoint(PrimaryKeyValue begin, PrimaryKeyValue end, List target) { - List result = new ArrayList(); - - int cmp = Common.primaryKeyValueCmp(begin, end); - - if (cmp == 0) { - return result; - } - - result.add(begin); - - Comparator comparator = new Comparator(){ - public int compare(PrimaryKeyValue arg0, PrimaryKeyValue arg1) { - return Common.primaryKeyValueCmp(arg0, arg1); - } - }; - - if (cmp > 0) { // 如果是逆序,则 reverse Comparator - comparator = Collections.reverseOrder(comparator); - } - - Collections.sort(target, comparator); - - for (PrimaryKeyValue value:target) { - if (comparator.compare(value, begin) > 0 && comparator.compare(value, end) < 0) { - result.add(value); - } - } - result.add(end); - - return result; - } - - public static List rangeSplitByPoint(TableMeta meta, RowPrimaryKey beginPK, RowPrimaryKey endPK, - List splits) { - - List results = new ArrayList(); - - int pkCount = meta.getPrimaryKey().size(); - - String partName = Common.getPartitionKey(meta).getName(); - PrimaryKeyValue begin = beginPK.getPrimaryKey().get(partName); - PrimaryKeyValue end = endPK.getPrimaryKey().get(partName); - - List newSplits = getSplitPoint(begin, end, splits); - - if (newSplits.isEmpty()) { - return results; - } - - for (int i = 0; i < newSplits.size() - 1; i++) { - OTSRange item = new OTSRange( - ParamChecker.checkInputPrimaryKeyAndGet(meta, - getCompletePK(pkCount, newSplits.get(i))), - ParamChecker.checkInputPrimaryKeyAndGet(meta, - getCompletePK(pkCount, newSplits.get(i + 1)))); - results.add(item); - } - // replace first and last - OTSRange first = results.get(0); - OTSRange last = results.get(results.size() - 1); - - first.setBegin(beginPK); - last.setEnd(endPK); - return results; - } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ReaderModelParser.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ReaderModelParser.java index 8e1dfd41..081532a6 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ReaderModelParser.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ReaderModelParser.java @@ -55,7 +55,7 @@ public class ReaderModelParser { } public static OTSColumn parseOTSColumn(Map item) { - if (item.containsKey(OTSConst.NAME) && item.size() == 1) { + if (item.containsKey(OTSConst.NAME)) { Object name = item.get(OTSConst.NAME); if (name instanceof String) { String nameStr = (String) name; diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelper.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelper.java index 8ed41267..318b7b51 100644 --- a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelper.java +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelper.java @@ -1,16 +1,15 @@ package com.alibaba.datax.plugin.reader.otsreader.utils; +import com.alibaba.datax.plugin.reader.otsreader.model.OTSErrorCode; +import com.alicloud.openservices.tablestore.ClientException; +import com.alicloud.openservices.tablestore.TableStoreException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import java.util.HashSet; import java.util.Set; import java.util.concurrent.Callable; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSErrorCode; -import com.aliyun.openservices.ots.OTSException; - public class RetryHelper { private static final Logger LOG = LoggerFactory.getLogger(RetryHelper.class); @@ -19,7 +18,7 @@ public class RetryHelper { public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { int retryTimes = 0; while (true){ - Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); + Thread.sleep(getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); try { return callable.call(); } catch (Exception e) { @@ -60,9 +59,9 @@ public class RetryHelper { } public static boolean canRetry(Exception exception) { - OTSException e = null; - if (exception instanceof OTSException) { - e = (OTSException) exception; + TableStoreException e = null; + if (exception instanceof TableStoreException) { + e = (TableStoreException) exception; LOG.warn( "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} @@ -72,12 +71,29 @@ public class RetryHelper { } else if (exception instanceof ClientException) { ClientException ce = (ClientException) exception; LOG.warn( - "ClientException:{}, ErrorMsg:{}", - new Object[]{ce.getErrorCode(), ce.getMessage()} + "ClientException:{}", + new Object[]{ce.getMessage()} ); return true; } else { return false; } } + + public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { + + if (hadRetryTimes <= 0) { + return 0; + } + + int sleepTime = initSleepInMilliSecond; + for (int i = 1; i < hadRetryTimes; i++) { + sleepTime += sleepTime; + if (sleepTime > 30000) { + sleepTime = 30000; + break; + } + } + return sleepTime; + } } diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelperOld.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelperOld.java new file mode 100644 index 00000000..28ad4ee3 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelperOld.java @@ -0,0 +1,83 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import java.util.HashSet; +import java.util.Set; +import java.util.concurrent.Callable; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.aliyun.openservices.ots.ClientException; +import com.aliyun.openservices.ots.OTSErrorCode; +import com.aliyun.openservices.ots.OTSException; + +public class RetryHelperOld { + + private static final Logger LOG = LoggerFactory.getLogger(RetryHelperOld.class); + private static final Set noRetryErrorCode = prepareNoRetryErrorCode(); + + public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { + int retryTimes = 0; + while (true){ + Thread.sleep(CommonOld.getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); + try { + return callable.call(); + } catch (Exception e) { + LOG.warn("Call callable fail, {}", e.getMessage()); + if (!canRetry(e)){ + LOG.error("Can not retry for Exception.", e); + throw e; + } else if (retryTimes >= maxRetryTimes) { + LOG.error("Retry times more than limition. maxRetryTimes : {}", maxRetryTimes); + throw e; + } + retryTimes++; + LOG.warn("Retry time : {}", retryTimes); + } + } + } + + private static Set prepareNoRetryErrorCode() { + Set pool = new HashSet(); + pool.add(OTSErrorCode.AUTHORIZATION_FAILURE); + pool.add(OTSErrorCode.INVALID_PARAMETER); + pool.add(OTSErrorCode.REQUEST_TOO_LARGE); + pool.add(OTSErrorCode.OBJECT_NOT_EXIST); + pool.add(OTSErrorCode.OBJECT_ALREADY_EXIST); + pool.add(OTSErrorCode.INVALID_PK); + pool.add(OTSErrorCode.OUT_OF_COLUMN_COUNT_LIMIT); + pool.add(OTSErrorCode.OUT_OF_ROW_SIZE_LIMIT); + pool.add(OTSErrorCode.CONDITION_CHECK_FAIL); + return pool; + } + + public static boolean canRetry(String otsErrorCode) { + if (noRetryErrorCode.contains(otsErrorCode)) { + return false; + } else { + return true; + } + } + + public static boolean canRetry(Exception exception) { + OTSException e = null; + if (exception instanceof OTSException) { + e = (OTSException) exception; + LOG.warn( + "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", + new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} + ); + return canRetry(e.getErrorCode()); + + } else if (exception instanceof ClientException) { + ClientException ce = (ClientException) exception; + LOG.warn( + "ClientException:{}, ErrorMsg:{}", + new Object[]{ce.getErrorCode(), ce.getMessage()} + ); + return true; + } else { + return false; + } + } +} diff --git a/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/TranformHelper.java b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/TranformHelper.java new file mode 100644 index 00000000..b082e658 --- /dev/null +++ b/otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/TranformHelper.java @@ -0,0 +1,42 @@ +package com.alibaba.datax.plugin.reader.otsreader.utils; + +import com.alibaba.datax.common.element.*; +import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; + +public class TranformHelper { + + public static Column otsPrimaryKeyColumnToDataxColumn(PrimaryKeyColumn pkc) { + switch (pkc.getValue().getType()) { + case STRING:return new StringColumn(pkc.getValue().asString()); + case INTEGER:return new LongColumn(pkc.getValue().asLong()); + case BINARY:return new BytesColumn(pkc.getValue().asBinary()); + default: + throw new IllegalArgumentException("PrimaryKey unsuporrt tranform the type: " + pkc.getValue().getType() + "."); + } + } + + public static Column otsColumnToDataxColumn(com.alicloud.openservices.tablestore.model.Column c) { + switch (c.getValue().getType()) { + case STRING:return new StringColumn(c.getValue().asString()); + case INTEGER:return new LongColumn(c.getValue().asLong()); + case BINARY:return new BytesColumn(c.getValue().asBinary()); + case BOOLEAN:return new BoolColumn(c.getValue().asBoolean()); + case DOUBLE:return new DoubleColumn(c.getValue().asDouble()); + default: + throw new IllegalArgumentException("Column unsuporrt tranform the type: " + c.getValue().getType() + "."); + + } + } + + public static Column otsColumnToDataxColumn(com.alicloud.openservices.tablestore.model.ColumnValue c) { + switch (c.getType()) { + case STRING:return new StringColumn(c.asString()); + case INTEGER:return new LongColumn(c.asLong()); + case BINARY:return new BytesColumn(c.asBinary()); + case BOOLEAN:return new BoolColumn(c.asBoolean()); + case DOUBLE:return new DoubleColumn(c.asDouble()); + default: + throw new IllegalArgumentException("Column unsuporrt tranform the type: " + c.getType() + "."); + } + } +} diff --git a/otsreader/src/main/resources/plugin.json b/otsreader/src/main/resources/plugin.json index bfd95627..4b55e374 100644 --- a/otsreader/src/main/resources/plugin.json +++ b/otsreader/src/main/resources/plugin.json @@ -3,4 +3,4 @@ "class": "com.alibaba.datax.plugin.reader.otsreader.OtsReader", "description": "", "developer": "alibaba" -} \ No newline at end of file +} diff --git a/otsstreamreader/README.md b/otsstreamreader/README.md index c861a737..5e68f1eb 100644 --- a/otsstreamreader/README.md +++ b/otsstreamreader/README.md @@ -1,127 +1,152 @@ ## TableStore增量数据导出通道:TableStoreStreamReader +本文为您介绍OTSStream Reader支持的数据类型、读取方式、字段映射和数据源等参数及配置示例。 +## 列模式 -### 快速介绍 -TableStoreStreamReader插件主要用于TableStore的增量数据导出,增量数据可以看作操作日志,除了数据本身外还附有操作信息。 +### 背景信息 -与全量导出插件不同,增量导出插件只有多版本模式,同时不支持指定列。这是与增量导出的原理有关的,导出的格式下面有详细介绍。 +OTSStream Reader插件主要用于导出Table Store的增量数据。您可以将增量数据看作操作日志,除数据本身外还附有操作信息。 -使用插件前必须确保表上已经开启Stream功能,可以在建表的时候指定开启,或者使用SDK的UpdateTable接口开启。 +与全量导出插件不同,增量导出插件只有多版本模式,且不支持指定列。使用插件前,您必须确保表上已经开启Stream功能。您可以在建表时指定开启,也可以使用SDK的UpdateTable接口开启。 - 开启Stream的方法: - SyncClient client = new SyncClient("", "", "", ""); - 1. 建表的时候开启: - CreateTableRequest createTableRequest = new CreateTableRequest(tableMeta); - createTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); // 24代表增量数据保留24小时 - client.createTable(createTableRequest); - - 2. 如果建表时未开启,可以通过UpdateTable开启: - UpdateTableRequest updateTableRequest = new UpdateTableRequest("tableName"); - updateTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); - client.updateTable(updateTableRequest); +开启Stream的方法,如下所示。 +```java +SyncClient client = new SyncClient("", "", "", ""); +#建表的时候开启: +CreateTableRequest createTableRequest = new CreateTableRequest(tableMeta); +createTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); // 24代表增量数据保留24小时。 +client.createTable(createTableRequest); +#如果建表时未开启,您可以通过UpdateTable开启: +UpdateTableRequest updateTableRequest = new UpdateTableRequest("tableName"); +updateTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); +client.updateTable(updateTableRequest); +``` +您使用SDK的UpdateTable功能,指定开启Stream并设置过期时间,即开启了Table Store增量数据导出功能。开启后,Table Store服务端就会将您的操作日志额外保存起来,每个分区有一个有序的操作日志队列,每条操作日志会在一定时间后被垃圾回收,该时间即为您指定的过期时间。 -### 实现原理 +Table Store的SDK提供了几个Stream相关的API用于读取这部分的操作日志,增量插件也是通过Table Store SDK的接口获取到增量数据,默认情况下会将增量数据转化为多个6元组的形式(pk、colName、version、colValue、opType和sequenceInfo)导入至MaxCompute中。 -首先用户使用SDK的UpdateTable功能,指定开启Stream并设置过期时间,即开启了增量功能。 +### 列模式 -开启后,TableStore服务端就会将用户的操作日志额外保存起来, -每个分区有一个有序的操作日志队列,每条操作日志会在一定时间后被垃圾回收,这个时间即用户指定的过期时间。 +在Table Store多版本模型下,表中的数据组织为行>列>版本三级的模式, 一行可以有任意列,列名并不是固定的,每一列可以含有多个版本,每个版本都有一个特定的时间戳(版本号)。 -TableStore的SDK提供了几个Stream相关的API用于将这部分操作日志读取出来,增量插件也是通过TableStore SDK的接口获取到增量数据的,并将 -增量数据转化为多个6元组的形式(pk, colName, version, colValue, opType, sequenceInfo)导入到ODPS中。 +您可以通过Table Store的API进行一系列读写操作,Table Store通过记录您最近对表的一系列写操作(或数据更改操作)来实现记录增量数据的目的,所以您也可以把增量数据看作一批操作记录。 + +Table Store支持**PutRow**、**UpdateRow**和**DeleteRow**操作: +- **PutRow**:写入一行,如果该行已存在即覆盖该行。 +- **UpdateRow**:更新一行,不更改原行的其它数据。更新包括新增或覆盖(如果对应列的对应版本已存在)一些列值、删除某一列的全部版本、删除某一列的某个版本。 +- **DeleteRow**:删除一行。 + +Table Store会根据每种操作生成对应的增量数据记录,Reader插件会读出这些记录,并导出为数据集成的数据格式。 + +同时,由于Table Store具有动态列、多版本的特性,所以Reader插件导出的一行不对应Table Store中的一行,而是对应Table Store中的一列的一个版本。即Table Store中的一行可能会导出很多行,每行包含主键值、该列的列名、该列下该版本的时间戳(版本号)、该版本的值、操作类型。如果设置isExportSequenceInfo为true,还会包括时序信息。 + +转换为数据集成的数据格式后,定义了以下四种操作类型: +- **U(UPDATE)**:写入一列的一个版本。 +- **DO(DELETE_ONE_VERSION)**:删除某一列的某个版本。 +- **DA(DELETE_ALL_VERSION)**:删除某一列的全部版本,此时需要根据主键和列名,删除对应列的全部版本。 +- **DR(DELETE_ROW)**:删除某一行,此时需要根据主键,删除该行数据。 + +假设该表有两个主键列,主键列名分别为pkName1, pkName2,示例如下。 + +| **pkName1** | **pkName2** | **columnName** | **timestamp** | **columnValue** | **opType** | +| --- | --- | --- | --- | --- | --- | +| pk1_V1 | pk2_V1 | col_a | 1441803688001 | col_val1 | U | +| pk1_V1 | pk2_V1 | col_a | 1441803688002 | col_val2 | U | +| pk1_V1 | pk2_V1 | col_b | 1441803688003 | col_val3 | U | +| pk1_V2 | pk2_V2 | col_a | 1441803688000 | — | DO | +| pk1_V2 | pk2_V2 | col_b | — | — | DA | +| pk1_V3 | pk2_V3 | — | — | — | DR | +| pk1_V3 | pk2_V3 | col_a | 1441803688005 | col_val1 | U | + +假设导出的数据如上,共7行,对应Table Store表内的3行,主键分别是(pk1_V1,pk2_V1),(pk1_V2, pk2_V2),(pk1_V3, pk2_V3): +- 对于主键为(pk1_V1,pk2_V1)的一行,包括写入col_a列的两个版本和col_b列的一个版本等操作。 +- 对于主键为(pk1_V2,pk2_V2)的一行,包括删除col_a列的一个版本和删除col_b列的全部版本等操作。 +- 对于主键为(pk1_V3,pk2_V3)的一行,包括删除整行和写入col_a列的一个版本等操作。 + +### 行模式 +#### 宽行表 +您可以通过行模式导出数据,该模式将用户每次更新的记录,抽取成行的形式导出,需要设置mode属性并配置列名。 +```json +"parameter": { + #parameter中配置下面三项配置(例如datasource、table等其它配置项照常配置)。 + "mode": "single_version_and_update_only", # 配置导出模式。 + "column":[ #按照需求添加需要导出TableStore中的列,您可以自定义设置配置个数。 + { + "name": "uid" #列名示例,可以是主键或属性列。 + }, + { + "name": "name" #列名示例,可以是主键或属性列。 + }, + ], + "isExportSequenceInfo": false, #single_version_and_update_only模式下只能是false。 +} +``` +#### 时序表 +`otsstreamreader`支持导出时序表中的增量数据,当表为时序表时,需要配置的信息如下: +```json +"parameter": { + #parameter中配置下面四项配置(例如datasource、table等其它配置项照常配置)。 + "mode": "single_version_and_update_only", # 配置导出模式。 + "isTimeseriesTable":"true", # 配置导出为时序表。 + "column":[ #按照需求添加需要导出TableStore中的列,您可以自定义设置配置个数。 + { + "name": "_m_name" #度量名称字段。 + }, + { + "name": "_data_source" #数据源字段。 + }, + { + "name": "_tags" #标签字段,将tags转换为string类型。 + }, + { + "name": "tag1_1", #标签内部字段键名称。 + "is_timeseries_tag":"true" #表明改字段为tags内部字段。 + }, + { + "name": "time" #时间戳字段。 + }, + { + "name": "name" #属性列名称。 + }, + ], + "isExportSequenceInfo": false, #single_version_and_update_only模式下只能是false。 +} +``` + +行模式导出的数据更接近于原始的行,易于后续处理,但需要注意以下问题: +- 每次导出的行是从用户每次更新的记录中抽取,每一行数据与用户的写入或更新操作一一对应。如果用户存在单独更新某些列的行为,则会出现有一些记录只有被更新的部分列,其它列为空的情况。 +- 行模式不会导出数据的版本号(即每列的时间戳),也无法进行删除操作。 + +### 数据类型转换列表 +目前OTSStream Reader支持所有的Table Store类型,其针对Table Store类型的转换列表,如下所示。 + +| **类型分类** | **OTSStream数据类型** | +| --- | --- | +| 整数类 | INTEGER | +| 浮点类 | DOUBLE | +| 字符串类 | STRING | +| 布尔类 | BOOLEAN | +| 二进制类 | BINARY | -### Reader的配置模版: - "reader": { - "name" : "otsstreamreader", - "parameter" : { - "endpoint" : "", - "accessId" : "", - "accessKey" : "", - "instanceName" : "", - //dataTable即需要导出数据的表。 - "dataTable" : "", - //statusTable是Reader用于保存状态的表,若该表不存在,Reader会自动创建该表。 - //一次离线导出任务完成后,用户不应删除该表,该表中记录的状态可用于下次导出任务中。 - "statusTable" : "TableStoreStreamReaderStatusTable", - //增量数据的时间范围(左闭右开)的左边界。 - "startTimestampMillis" : "", - //增量数据的时间范围(左闭右开)的右边界。 - "endTimestampMillis" : "", - //采云间调度只支持天级别,所以提供该配置,作用与startTimestampMillis和endTimestampMillis类似。 - "date": "", - //是否导出时序信息。 - "isExportSequenceInfo": true, - //从TableStore中读增量数据时,每次请求的最大重试次数,默认为30。 - "maxRetries" : 30 - } - } ### 参数说明 -| 名称 | 说明 | 类型 | 必选 | -| ---- | ---- | ---- | ---- | -| endpoint | TableStoreServer的Endpoint地址。| String | 是 | -| accessId | 用于访问TableStore服务的accessId。| String | 是 | -| accessKey | 用于访问TableStore服务的accessKey。 | String | 是 | -| instanceName | TableStore的实例名称。 | String | 是 | -| dataTable | 需要导出增量数据的表的名称。该表需要开启Stream,可以在建表时开启,或者使用UpdateTable接口开启。 | String | 是 | -| statusTable | Reader插件用于记录状态的表的名称,这些状态可用于减少对非目标范围内的数据的扫描,从而加快导出速度。
1. 用户不需要创建该表,只需要给出一个表名。Reader插件会尝试在用户的instance下创建该表,若该表不存在即创建新表,若该表已存在,会判断该表的Meta是否与期望一致,若不一致会抛出异常。
2. 在一次导出完成之后,用户不应删除该表,该表的状态可用于下次导出任务。
3. 该表会开启TTL,数据自动过期,因此可认为其数据量很小。
4. 针对同一个instance下的多个不同的dataTable的Reader配置,可以使用同一个statusTable,记录的状态信息互不影响。
综上,用户配置一个类似TableStoreStreamReaderStatusTable之类的名称即可,注意不要与业务相关的表重名。| String | 是 | -| startTimestampMillis | 增量数据的时间范围(左闭右开)的左边界,单位毫秒。
1. Reader插件会从statusTable中找对应startTimestampMillis的位点,从该点开始读取开始导出数据。
2. 若statusTable中找不到对应的位点,则从系统保留的增量数据的第一条开始读取,并跳过写入时间小于startTimestampMillis的数据。| Long | 否 | -| endTimestampMillis | 增量数据的时间范围(左闭右开)的右边界,单位毫秒。
1. Reader插件从startTimestampMillis位置开始导出数据后,当遇到第一条时间戳大于等于endTimestampMillis的数据时,结束导出数据,导出完成。
2. 当读取完当前全部的增量数据时,结束读取,即使未达到endTimestampMillis。 | Long | 否 | -| date | 日期格式为yyyyMMdd,如20151111,表示导出该日的数据。
若没有指定date,则必须指定startTimestampMillis和endTimestampMillis,反之也成立。 | String | 否 | -| isExportSequenceInfo | 是否导出时序信息,时序信息包含了数据的写入时间等。默认该值为false,即不导出。 | Boolean | 否 | -| maxRetries | 从TableStore中读增量数据时,每次请求的最大重试次数,默认为30,重试之间有间隔,30次重试总时间约为5分钟,一般无需更改。| Int | 否 | +| **参数** | **描述** | **是否必选** | **默认值** | +| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |---------| +| **dataSource** | 数据源名称,脚本模式支持添加数据源,该配置项填写的内容必须与添加的数据源名称保持一致。 | 是 | 无 | +| **dataTable** | 导出增量数据的表的名称。该表需要开启Stream,可以在建表时开启,或者使用UpdateTable接口开启。 | 是 | 无 | +| **statusTable** | Reader插件用于记录状态的表的名称,这些状态可用于减少对非目标范围内的数据的扫描,从而加快导出速度。statusTable是Reader用于保存状态的表,如果该表不存在,Reader会自动创建该表。一次离线导出任务完成后,您无需删除该表,该表中记录的状态可用于下次导出任务中:
  • 您无需创建该表,只需要给出一个表名。Reader插件会尝试在您的instance下创建该表,如果该表不存在即创建新表。如果该表已存在,会判断该表的Meta是否与期望一致,如果不一致会抛出异常。
  • 在一次导出完成之后,您无需删除该表,该表的状态可以用于下次的导出任务。
  • 该表会开启TTL,数据自动过期,会认为其数据量很小。
  • 针对同一个instance下的多个不同的dataTable的Reader配置,可以使用同一个statusTable,记录的状态信息互不影响。您配置一个类似**TableStoreStreamReaderStatusTable**的名称即可,请注意不要与业务相关的表重名。 | 是 | 无 | +| **startTimestampMillis** | 增量数据的时间范围(左闭右开)的左边界,单位为毫秒:
  • Reader插件会从statusTable中找对应**startTimestampMillis**的位点,从该点开始读取开始导出数据。
  • 如果statusTable中找不到对应的位点,则从系统保留的增量数据的第一条开始读取,并跳过写入时间小于**startTimestampMillis**的数据。 | 否 | 无 | +| **endTimestampMillis** | 增量数据的时间范围(左闭右开)的右边界,单位为毫秒:
  • Reader插件从**startTimestampMillis**位置开始导出数据后,当遇到第一条时间戳大于等于**endTimestampMillis**的数据时,结束导出数据,导出完成。
  • 当读取完当前全部的增量数据时,即使未达到**endTimestampMillis**,也会结束读取。 | 否 | 无 | +| **date** | 日期格式为**yyyyMMdd**,例如20151111,表示导出该日的数据。如果没有指定**date**,则需要指定**startTimestampMillis**和**endTimestampMillis**或**startTimeString**和**endTimeString**,反之也成立。例如,采云间调度仅支持天级别,所以提供该配置,作用与**startTimestampMillis**和**endTimestampMillis**或**startTimeString**和**endTimeString**类似。 | 否 | 无 | +| **isExportSequenceInfo** | 是否导出时序信息,时序信息包含了数据的写入时间等。默认该值为false,即不导出。 | 否 | false | +| **maxRetries** | 从TableStore中读增量数据时,每次请求的最大重试次数,默认为30次。重试之间有间隔,重试30次的总时间约为5分钟,通常无需更改。 | 否 | 30 | +| **startTimeString** | 任务的开始时间,即增量数据的时间范围(左闭右开)的左边界,格式为**yyyymmddhh24miss**,单位为秒。 | 否 | 无 | +| **endTimeString** | 任务的结束时间,即增量数据的时间范围(左闭右开)的右边界,格式为**yyyymmddhh24miss**,单位为秒。 | 否 | 无 | +| **enableSeekIterator** | Reader插件需要先确定增量位点,然后再拉取数据,如果是经常运行的任务,插件会根据之前扫描的位点来确定位置。如果之前没运行过这个插件,将会从增量开始位置(默认增量保留7天,即7天前)开始扫描,因此当还没有扫描到设置的开始时间之后的数据时,会存在开始一段时间没有数据导出的情况,您可以在reader的配置参数里增加** "enableSeekIterator": true**的配置,帮助您加快位点定位。 | 否 | false | +| **mode** | 导出模式,设置为**single_version_and_update_only**时为行模式,默认不设置为列模式。 | 否 | 无 | +| **isTimeseriesTable** | 是否为时序表,只有在行模式,即**mode**为**single_version_and_update_only**时配置生效。 | 否 | false | -### 导出的数据格式 -首先,在TableStore多版本模型下,表中的数据组织为“行-列-版本”三级的模式, -一行可以有任意列,列名也并非固定的,每一列可以含有多个版本,每个版本都有一个特定的时间戳(版本号)。 -用户可以通过TableStore的API进行一系列读写操作, -TableStore通过记录用户最近对表的一系列写操作(或称为数据更改操作)来实现记录增量数据的目的, -所以也可以把增量数据看作一批操作记录。 -TableStore有三类数据更改操作:PutRow、UpdateRow、DeleteRow。 - - + PutRow的语义是写入一行,若该行已存在即覆盖该行。 - - + UpdateRow的语义是更新一行,对原行其他数据不做更改, - 更新可能包括新增或覆盖(若对应列的对应版本已存在)一些列值、删除某一列的全部版本、删除某一列的某个版本。 - - + DeleteRow的语义是删除一行。 - -TableStore会根据每种操作生成对应的增量数据记录,Reader插件会读出这些记录,并导出成Datax的数据格式。 - -同时,由于TableStore具有动态列、多版本的特性,所以Reader插件导出的一行不对应TableStore中的一行,而是对应TableStore中的一列的一个版本。 -即TableStore中的一行可能会导出很多行,每行包含主键值、该列的列名、该列下该版本的时间戳(版本号)、该版本的值、操作类型。若设置isExportSequenceInfo为true,还会包括时序信息。 - -转换为Datax的数据格式后,我们定义了四种操作类型,分别为: - - + U(UPDATE): 写入一列的一个版本 - - + DO(DELETE_ONE_VERSION): 删除某一列的某个版本 - - + DA(DELETE_ALL_VERSION): 删除某一列的全部版本,此时需要根据主键和列名,将对应列的全部版本删除 - - + DR(DELETE_ROW): 删除某一行,此时需要根据主键,将该行数据全部删除 - - -举例如下,假设该表有两个主键列,主键列名分别为pkName1, pkName2: - -| pkName1 | pkName2 | columnName | timestamp | columnValue | opType | -| ------- | ------- | ---------- | --------- | ----------- | ------ | -| pk1_V1 | pk2_V1 | col_a | 1441803688001 | col_val1 | U | -| pk1_V1 | pk2_V1 | col_a | 1441803688002 | col_val2 | U | -| pk1_V1 | pk2_V1 | col_b | 1441803688003 | col_val3 | U | -| pk1_V2 | pk2_V2 | col_a | 1441803688000 | | DO | -| pk1_V2 | pk2_V2 | col_b | | | DA | -| pk1_V3 | pk2_V3 | | | | DR | -| pk1_V3 | pk2_V3 | col_a | 1441803688005 | col_val1 | U | - -假设导出的数据如上,共7行,对应TableStore表内的3行,主键分别是(pk1_V1,pk2_V1), (pk1_V2, pk2_V2), (pk1_V3, pk2_V3)。 - -对于主键为(pk1_V1, pk2_V1)的一行,包含三个操作,分别是写入col_a列的两个版本和col_b列的一个版本。 - -对于主键为(pk1_V2, pk2_V2)的一行,包含两个操作,分别是删除col_a列的一个版本、删除col_b列的全部版本。 - -对于主键为(pk1_V3, pk2_V3)的一行,包含两个操作,分别是删除整行、写入col_a列的一个版本。 diff --git a/otsstreamreader/pom.xml b/otsstreamreader/pom.xml index cb4a6206..db75ba1e 100644 --- a/otsstreamreader/pom.xml +++ b/otsstreamreader/pom.xml @@ -10,19 +10,20 @@ com.alibaba.datax otsstreamreader - 0.0.1 + 0.0.1-SNAPSHOT + - org.apache.logging.log4j - log4j-api - 2.17.1 - - - - org.apache.logging.log4j - log4j-core - 2.17.1 + com.aliyun.openservices + tablestore-streamclient + 1.0.0 + + + com.aliyun.openservices + tablestore + + com.alibaba.datax @@ -33,22 +34,28 @@ slf4j-log4j12 org.slf4j - - logback-classic - ch.qos.logback - + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.alibaba + fastjson + 1.2.83_noneautotype + compile + com.aliyun.openservices - tablestore-streamclient - 1.0.0 + tablestore + 5.13.12 - - log4j-api - org.apache.logging.log4j - log4j-core org.apache.logging.log4j @@ -60,12 +67,6 @@ gson 2.2.4 - - com.google.guava - guava - 18.0 - test - @@ -106,6 +107,18 @@ + + + org.apache.maven.plugins + maven-surefire-plugin + 2.5 + + + **/unittest/*.java + **/functiontest/*.java + + + diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_en_US.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_en_US.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_ja_JP.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_CN.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_HK.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_TW.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReader.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReader.java index 67313467..a41b19d4 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReader.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReader.java @@ -4,17 +4,27 @@ import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.CheckpointTimeTracker; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.OTSStreamJobShard; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.GsonParser; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSHelper; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSStreamJobShardUtil; +import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.model.StreamShard; +import java.util.ArrayList; import java.util.HashSet; import java.util.List; +import java.util.concurrent.Callable; import java.util.concurrent.ConcurrentSkipListSet; +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.*; + public class OTSStreamReader { public static class Job extends Reader.Job { @@ -46,20 +56,34 @@ public class OTSStreamReader { private OTSStreamReaderSlaveProxy proxy = new OTSStreamReaderSlaveProxy(); @Override - public void startRead(RecordSender recordSender) { - proxy.startRead(recordSender); - } - public void init() { try { OTSStreamReaderConfig config = GsonParser.jsonToConfig( (String) this.getPluginJobConf().get(OTSStreamReaderConstants.CONF)); - StreamJob streamJob = StreamJob.fromJson( - (String) this.getPluginJobConf().get(OTSStreamReaderConstants.STREAM_JOB)); List ownedShards = GsonParser.jsonToList( - (String) this.getPluginJobConf().get(OTSStreamReaderConstants.OWNED_SHARDS)); - List allShards = GsonParser.fromJson( - (String) this.getPluginJobConf().get(OTSStreamReaderConstants.ALL_SHARDS)); + (String) this.getPluginJobConf().get(OTSStreamReaderConstants.OWNED_SHARDS)); + + boolean confSimplifyEnable = this.getPluginJobConf().getBool(CONF_SIMPLIFY_ENABLE, + DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE); + + StreamJob streamJob; + List allShards; + + if (confSimplifyEnable) { + //不要从conf里获取, 避免分布式模式下Job Split切分出来的Config膨胀过大 + String version = this.getPluginJobConf().getString(OTSStreamReaderConstants.VERSION); + OTSStreamJobShard otsStreamJobShard = OTSStreamJobShardUtil.getOTSStreamJobShard(config, version); + + streamJob = otsStreamJobShard.getStreamJob(); + allShards = otsStreamJobShard.getAllShards(); + + } else { + streamJob = StreamJob.fromJson( + (String) this.getPluginJobConf().get(OTSStreamReaderConstants.STREAM_JOB)); + allShards = GsonParser.fromJson( + (String) this.getPluginJobConf().get(OTSStreamReaderConstants.ALL_SHARDS)); + } + proxy.init(config, streamJob, allShards, new HashSet(ownedShards)); } catch (TableStoreException ex) { throw DataXException.asDataXException(new OTSReaderError(ex.getErrorCode(), "OTS ERROR"), ex.toString(), ex); @@ -68,6 +92,11 @@ public class OTSStreamReader { } } + @Override + public void startRead(RecordSender recordSender) { + proxy.startRead(recordSender); + } + public void destroy() { proxy.close(); } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderMasterProxy.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderMasterProxy.java index 473e2c81..5c6a5b4b 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderMasterProxy.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderMasterProxy.java @@ -15,6 +15,8 @@ import org.slf4j.LoggerFactory; import java.util.*; +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.CONF_SIMPLIFY_ENABLE; + public class OTSStreamReaderMasterProxy { private OTSStreamReaderConfig conf = null; @@ -22,6 +24,7 @@ public class OTSStreamReaderMasterProxy { private StreamJob streamJob; private List allShards; + private String version; private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderConfig.class); @@ -41,19 +44,20 @@ public class OTSStreamReaderMasterProxy { checker.checkAndCreateStatusTableIfNotExist(); // 删除StatusTable记录的对应EndTime时刻的Checkpoint信息。防止本次任务受到之前导出任务的影响。 - String streamId = OTSHelper.getStreamDetails(ots, config.getDataTable()).getStreamId(); + String streamId = OTSHelper.getStreamResponse(ots, config.getDataTable(), config.isTimeseriesTable()).getStreamId(); CheckpointTimeTracker checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), streamId); checkpointInfoTracker.clearAllCheckpoints(config.getEndTimestampMillis()); SyncClientInterface ots = OTSHelper.getOTSInstance(config); - allShards = OTSHelper.getOrderedShardList(ots, streamId); + allShards = OTSHelper.getOrderedShardList(ots, streamId, conf.isTimeseriesTable()); List shardIds = new ArrayList(); for (StreamShard shard : allShards) { shardIds.add(shard.getShardId()); } - String version = "" + System.currentTimeMillis() + "-" + UUID.randomUUID(); + this.version = "" + System.currentTimeMillis() + "-" + UUID.randomUUID(); + LOG.info("version is: {}", this.version); streamJob = new StreamJob(conf.getDataTable(), streamId, version, new HashSet(shardIds), conf.getStartTimestampMillis(), conf.getEndTimestampMillis()); @@ -97,8 +101,16 @@ public class OTSStreamReaderMasterProxy { Configuration configuration = Configuration.newDefault(); configuration.set(OTSStreamReaderConstants.CONF, GsonParser.configToJson(conf)); - configuration.set(OTSStreamReaderConstants.STREAM_JOB, streamJob.toJson()); - configuration.set(OTSStreamReaderConstants.ALL_SHARDS, GsonParser.toJson(allShards)); + + // Fix #39430646 [离线同步分布式]DataX OTSStreamReader插件分布式模式优化瘦身 + if (conf.isConfSimplifyEnable()) { + configuration.set(OTSStreamReaderConstants.VERSION, this.version); + configuration.set(CONF_SIMPLIFY_ENABLE, true); + } else { + configuration.set(OTSStreamReaderConstants.STREAM_JOB, streamJob.toJson()); + configuration.set(OTSStreamReaderConstants.ALL_SHARDS, GsonParser.toJson(allShards)); + } + configuration.set(OTSStreamReaderConstants.OWNED_SHARDS, GsonParser.listToJson(shardIds.subList(start, end))); configurations.add(configuration); } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderSlaveProxy.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderSlaveProxy.java index 22035851..cdfbed28 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderSlaveProxy.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderSlaveProxy.java @@ -36,16 +36,18 @@ public class OTSStreamReaderSlaveProxy { private boolean findCheckpoints; // whether find checkpoint for last job, if so, we should read from checkpoint and skip nothing. private String slaveId = UUID.randomUUID().toString(); private StreamDetails streamDetails; + private boolean enableSeekIteratorByTimestamp; public void init(final OTSStreamReaderConfig otsStreamReaderConfig, StreamJob streamJob, List allShards, Set ownedShardIds) { slaveNumber.getAndIncrement(); this.config = otsStreamReaderConfig; this.ots = OTSHelper.getOTSInstance(config); this.streamJob = streamJob; - this.streamDetails = OTSHelper.getStreamDetails(ots, this.streamJob.getTableName()); + this.streamDetails = OTSHelper.getStreamDetails(ots, this.streamJob.getTableName(),config.isTimeseriesTable()); this.checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), this.streamJob.getStreamId()); this.checker = new OTSStreamReaderChecker(ots, config); this.allShardsMap = OTSHelper.toShardMap(allShards); + this.enableSeekIteratorByTimestamp = otsStreamReaderConfig.getEnableSeekIteratorByTimestamp(); LOG.info("SlaveId: {}, ShardIds: {}, OwnedShards: {}.", slaveId, allShards, ownedShardIds); this.ownedShards = new HashMap(); @@ -58,12 +60,12 @@ public class OTSStreamReaderSlaveProxy { } findCheckpoints = checker.checkAndSetCheckpoints(checkpointInfoTracker, allShardsMap, streamJob, shardToCheckpointMap); - if (!findCheckpoints) { - LOG.info("Checkpoint for stream '{}' in timestamp '{}' is not found.", streamJob.getStreamId(), streamJob.getStartTimeInMillis()); + if (!findCheckpoints && !enableSeekIteratorByTimestamp) { + LOG.info("Checkpoint for stream '{}' in timestamp '{}' is not found. EnableSeekIteratorByTimestamp: {}", streamJob.getStreamId(), streamJob.getStartTimeInMillis(), this.enableSeekIteratorByTimestamp); setWithNearestCheckpoint(); } - LOG.info("Find checkpoints: {}.", findCheckpoints); + LOG.info("Find checkpoints: {}, EnableSeekIteratorByTimestamp: {}", findCheckpoints, enableSeekIteratorByTimestamp); for (Map.Entry shard : ownedShards.entrySet()) { LOG.info("Shard to process, ShardInfo: [{}], StartCheckpoint: [{}].", shard.getValue(), shardToCheckpointMap.get(shard.getKey())); } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_en_US.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_en_US.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_ja_JP.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_CN.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_HK.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_TW.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConfig.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConfig.java index c89d7a37..bef910e3 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConfig.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConfig.java @@ -13,6 +13,9 @@ import java.util.ArrayList; import java.util.List; import java.util.Map; +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.CONF_SIMPLIFY_ENABLE; +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE; + public class OTSStreamReaderConfig { private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderConfig.class); @@ -33,6 +36,11 @@ public class OTSStreamReaderConfig { private static final String KEY_MODE = "mode"; private static final String KEY_COLUMN = "column"; private static final String KEY_THREAD_NUM = "threadNum"; + private static final String KEY_ENABLE_TABLE_GROUP_SUPPORT = "enableTableGroupSupport"; + + private static final String ENABLE_SEEK_SHARD_ITERATOR = "enableSeekIterator"; + + private static final String IS_TIMESERIES_TABLE = "isTimeseriesTable"; private static final int DEFAULT_MAX_RETRIES = 30; private static final long DEFAULT_SLAVE_LOOP_INTERVAL = 10 * TimeUtils.SECOND_IN_MILLIS; @@ -51,12 +59,19 @@ public class OTSStreamReaderConfig { private int threadNum = 32; private long slaveLoopInterval = DEFAULT_SLAVE_LOOP_INTERVAL; private long slaveLoggingStatusInterval = DEFAULT_SLAVE_LOGGING_STATUS_INTERVAL; + private boolean enableSeekIteratorByTimestamp; + private boolean enableTableGroupSupport; private Mode mode; private List columns; + private List columnsIsTimeseriesTags; private transient SyncClientInterface otsForTest; + private boolean confSimplifyEnable; + + private boolean isTimeseriesTable; + public String getEndpoint() { return endpoint; } @@ -129,6 +144,22 @@ public class OTSStreamReaderConfig { this.isExportSequenceInfo = isExportSequenceInfo; } + public boolean isEnableTableGroupSupport() { + return enableTableGroupSupport; + } + + public void setEnableTableGroupSupport(boolean enableTableGroupSupport) { + this.enableTableGroupSupport = enableTableGroupSupport; + } + + public boolean getEnableSeekIteratorByTimestamp() { + return enableSeekIteratorByTimestamp; + } + + public void setEnableSeekIteratorByTimestamp(boolean enableSeekIteratorByTimestamp) { + this.enableSeekIteratorByTimestamp = enableSeekIteratorByTimestamp; + } + public Mode getMode() { return mode; } @@ -145,24 +176,62 @@ public class OTSStreamReaderConfig { this.columns = columns; } + public List getColumnsIsTimeseriesTags() { + return columnsIsTimeseriesTags; + } + + public void setColumnsIsTimeseriesTags(List columnsIsTimeseriesTags) { + this.columnsIsTimeseriesTags = columnsIsTimeseriesTags; + } + + public boolean isTimeseriesTable() { + return isTimeseriesTable; + } + + public void setTimeseriesTable(boolean timeseriesTable) { + isTimeseriesTable = timeseriesTable; + } + private static void parseConfigForSingleVersionAndUpdateOnlyMode(OTSStreamReaderConfig config, Configuration param) { + try { + Boolean isTimeseriesTable = param.getBool(IS_TIMESERIES_TABLE); + if (isTimeseriesTable != null) { + config.setTimeseriesTable(isTimeseriesTable); + } else { + config.setTimeseriesTable(false); + } + } catch (RuntimeException ex) { + throw new OTSStreamReaderException("Parse timeseries stream settings fail, please check your config.", ex); + } + try { List values = param.getList(KEY_COLUMN); if (values == null) { config.setColumns(new ArrayList()); + config.setColumnsIsTimeseriesTags(new ArrayList()); return; } List columns = new ArrayList(); + List columnsIsTimeseriesTags = new ArrayList(); + Boolean isTimeseriesTable = config.isTimeseriesTable(); + for (Object item : values) { if (item instanceof Map) { String columnName = (String) ((Map) item).get("name"); columns.add(columnName); + + boolean columnsIsTimeseriesTag = false; + if (isTimeseriesTable && Boolean.parseBoolean((String) ((Map) item).getOrDefault("is_timeseries_tag", "false"))) { + columnsIsTimeseriesTag = true; + } + columnsIsTimeseriesTags.add(columnsIsTimeseriesTag); } else { throw new IllegalArgumentException("The item of column must be map object, please check your input."); } } config.setColumns(columns); + config.setColumnsIsTimeseriesTags(columnsIsTimeseriesTags); } catch (RuntimeException ex) { throw new OTSStreamReaderException("Parse column fail, please check your config.", ex); } @@ -178,56 +247,59 @@ public class OTSStreamReaderConfig { config.setDataTable(ParamChecker.checkStringAndGet(param, KEY_DATA_TABLE_NAME, true)); config.setStatusTable(ParamChecker.checkStringAndGet(param, KEY_STATUS_TABLE_NAME, true)); config.setIsExportSequenceInfo(param.getBool(KEY_IS_EXPORT_SEQUENCE_INFO, false)); + config.setEnableSeekIteratorByTimestamp(param.getBool(ENABLE_SEEK_SHARD_ITERATOR, false)); + config.setConfSimplifyEnable(param.getBool(CONF_SIMPLIFY_ENABLE, DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE)); + config.setEnableTableGroupSupport(param.getBool(KEY_ENABLE_TABLE_GROUP_SUPPORT, false)); if (param.getInt(KEY_THREAD_NUM) != null) { config.setThreadNum(param.getInt(KEY_THREAD_NUM)); } if (param.getString(KEY_DATE) == null && - (param.getLong(KEY_START_TIMESTAMP_MILLIS) == null || param.getLong(KEY_END_TIMESTAMP_MILLIS) == null) && + (param.getLong(KEY_START_TIMESTAMP_MILLIS) == null || param.getLong(KEY_END_TIMESTAMP_MILLIS) == null) && (param.getLong(KEY_START_TIME_STRING) == null || param.getLong(KEY_END_TIME_STRING) == null)) { throw new OTSStreamReaderException("Must set date or time range millis or time range string, please check your config."); } - + if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null) && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { throw new OTSStreamReaderException("Can't set date and time range millis and time range string, please check your config."); } - + if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null)) { throw new OTSStreamReaderException("Can't set date and time range both, please check your config."); } - + if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { throw new OTSStreamReaderException("Can't set date and time range string both, please check your config."); } - - if ((param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null)&& + + if ((param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null) && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { - throw new OTSStreamReaderException("Can't set time range millis and time range string both, please check your config."); + throw new OTSStreamReaderException("Can't set time range millis and time range string both, expect timestamp like '1516010400000'."); } if (param.getString(KEY_START_TIME_STRING) != null && param.getString(KEY_END_TIME_STRING) != null) { - String startTime=ParamChecker.checkStringAndGet(param, KEY_START_TIME_STRING, true); - String endTime=ParamChecker.checkStringAndGet(param, KEY_END_TIME_STRING, true); + String startTime = ParamChecker.checkStringAndGet(param, KEY_START_TIME_STRING, true); + String endTime = ParamChecker.checkStringAndGet(param, KEY_END_TIME_STRING, true); try { long startTimestampMillis = TimeUtils.parseTimeStringToTimestampMillis(startTime); config.setStartTimestampMillis(startTimestampMillis); } catch (Exception ex) { - throw new OTSStreamReaderException("Can't parse startTimeString: " + startTime); + throw new OTSStreamReaderException("Can't parse startTimeString: " + startTime + ", expect format date like '201801151612'."); } try { long endTimestampMillis = TimeUtils.parseTimeStringToTimestampMillis(endTime); config.setEndTimestampMillis(endTimestampMillis); } catch (Exception ex) { - throw new OTSStreamReaderException("Can't parse startTimeString: " + startTime); - } - - }else if (param.getString(KEY_DATE) == null) { + throw new OTSStreamReaderException("Can't parse endTimeString: " + endTime + ", expect format date like '201801151612'."); + } + + } else if (param.getString(KEY_DATE) == null) { config.setStartTimestampMillis(param.getLong(KEY_START_TIMESTAMP_MILLIS)); config.setEndTimestampMillis(param.getLong(KEY_END_TIMESTAMP_MILLIS)); } else { @@ -241,8 +313,6 @@ public class OTSStreamReaderConfig { } } - - if (config.getStartTimestampMillis() >= config.getEndTimestampMillis()) { throw new OTSStreamReaderException("EndTimestamp must be larger than startTimestamp."); @@ -262,15 +332,21 @@ public class OTSStreamReaderConfig { config.setMode(Mode.MULTI_VERSION); List values = param.getList(KEY_COLUMN); if (values != null) { - throw new OTSStreamReaderException("The multi version mode doesn't support setting columns."); + LOG.warn("The multi version mode doesn't support setting columns, column config will ignore."); + } + Boolean isTimeseriesTable = param.getBool(IS_TIMESERIES_TABLE); + if (isTimeseriesTable != null) { + LOG.warn("The multi version mode doesn't support setting Timeseries stream, stream config will ignore."); } } - LOG.info("endpoint: {}, accessId: {}, accessKey: {}, instanceName: {}, dataTableName: {}, statusTableName: {}," + - " isExportSequenceInfo: {}, startTimestampMillis: {}, endTimestampMillis:{}, maxRetries:{}.", config.getEndpoint(), + LOG.info("endpoint: {}, accessKeyId: {}, accessKeySecret: {}, instanceName: {}, dataTableName: {}, statusTableName: {}," + + " isExportSequenceInfo: {}, startTimestampMillis: {}, endTimestampMillis:{}, maxRetries:{}, enableSeekIteratorByTimestamp: {}, " + + "confSimplifyEnable: {}, isTimeseriesTable: {}.", config.getEndpoint(), config.getAccessId(), config.getAccessKey(), config.getInstanceName(), config.getDataTable(), config.getStatusTable(), config.isExportSequenceInfo(), config.getStartTimestampMillis(), - config.getEndTimestampMillis(), config.getMaxRetries()); + config.getEndTimestampMillis(), config.getMaxRetries(), config.getEnableSeekIteratorByTimestamp(), + config.isConfSimplifyEnable(), config.isTimeseriesTable()); return config; } @@ -282,7 +358,6 @@ public class OTSStreamReaderConfig { public SyncClientInterface getOtsForTest() { return otsForTest; } - /** * test use * @param otsForTest @@ -290,36 +365,36 @@ public class OTSStreamReaderConfig { public void setOtsForTest(SyncClientInterface otsForTest) { this.otsForTest = otsForTest; } - public int getMaxRetries() { return maxRetries; } - public void setMaxRetries(int maxRetries) { this.maxRetries = maxRetries; } - public int getThreadNum() { return threadNum; } - public void setSlaveLoopInterval(long slaveLoopInterval) { this.slaveLoopInterval = slaveLoopInterval; } - public void setSlaveLoggingStatusInterval(long slaveLoggingStatusInterval) { this.slaveLoggingStatusInterval = slaveLoggingStatusInterval; } - public long getSlaveLoopInterval() { return slaveLoopInterval; } - public long getSlaveLoggingStatusInterval() { return slaveLoggingStatusInterval; } - public void setThreadNum(int threadNum) { this.threadNum = threadNum; } + + public boolean isConfSimplifyEnable() { + return confSimplifyEnable; + } + + public void setConfSimplifyEnable(boolean confSimplifyEnable) { + this.confSimplifyEnable = confSimplifyEnable; + } } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConstants.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConstants.java index 19db148a..c95fdf2c 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConstants.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConstants.java @@ -21,7 +21,20 @@ public class OTSStreamReaderConstants { public static final String STREAM_JOB = "STREAM_JOB"; public static final String OWNED_SHARDS = "OWNED_SHARDS"; public static final String ALL_SHARDS = "ALL_SHARDS"; + public static final String VERSION = "STREAM_VERSION"; + /** + * 是否开启OTS分布式模式降低Job Split阶段切分的Task Conf大小启动优化, + * 新增该参数的目的是为了保证DataX灰度过程,避免因为OTS分布式任务运行部分子进程运行在老版本、部分运行在新版本导致任务失败问题, + * 当DataX版本集群粒度已全量升级到新版本以后,再开启该参数为"true",默认值是"false" + */ + public static final String CONF_SIMPLIFY_ENABLE = "confSimplifyEnable"; + + public static final Integer RETRY_TIMES = 3; + + public static final Long DEFAULT_SLEEP_TIME_IN_MILLS = 500l; + + public static final boolean DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE = false; static { String beforeOffsetMillis = System.getProperty("BEFORE_OFFSET_TIME_MILLIS"); diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_en_US.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_en_US.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_ja_JP.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_CN.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_HK.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_TW.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/OTSStreamReaderChecker.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/OTSStreamReaderChecker.java index 086d0159..560dcb7c 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/OTSStreamReaderChecker.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/OTSStreamReaderChecker.java @@ -40,11 +40,11 @@ public class OTSStreamReaderChecker { * 为了避免时间误差影响,允许导出的范围为: [now - expirationTime + beforeOffset, now - afterOffset] */ public void checkStreamEnabledAndTimeRangeOK() { - boolean exists = OTSHelper.checkTableExists(ots, config.getDataTable()); + boolean exists = OTSHelper.checkTableExists(ots, config.getDataTable(), config.isTimeseriesTable()); if (!exists) { throw new OTSStreamReaderException("The data table is not exist."); } - StreamDetails streamDetails = OTSHelper.getStreamDetails(ots, config.getDataTable()); + StreamDetails streamDetails = OTSHelper.getStreamDetails(ots, config.getDataTable(), config.isTimeseriesTable()); if (streamDetails == null || !streamDetails.isEnableStream()) { throw new OTSStreamReaderException("The stream of data table is not enabled."); } @@ -81,7 +81,7 @@ public class OTSStreamReaderChecker { * 检查statusTable是否存在,如果不存在就创建statusTable,并等待表ready。 */ public void checkAndCreateStatusTableIfNotExist() { - boolean tableExist = OTSHelper.checkTableExists(ots, config.getStatusTable()); + boolean tableExist = OTSHelper.checkTableExists(ots, config.getStatusTable(), false); if (tableExist) { DescribeTableResponse describeTableResult = OTSHelper.describeTable(ots, config.getStatusTable()); checkTableMetaOfStatusTable(describeTableResult.getTableMeta()); @@ -135,23 +135,6 @@ public class OTSStreamReaderChecker { } } - // 检查是否有丢失的shard - for (Map.Entry entry : allShardsMap.entrySet()) { - StreamShard shard = entry.getValue(); - String parentId = shard.getParentId(); - // shard不在本次任务中,且shard也不在上一次任务中 - if (parentId != null && !allShardsMap.containsKey(parentId) && !allCheckpoints.containsKey(parentId)) { - LOG.error("Shard is lost: {}.", shard); - throw new OTSStreamReaderException("Can't find checkpoint for shard: " + parentId); - } - - parentId = shard.getParentSiblingId(); - if (parentId != null && !allShardsMap.containsKey(parentId) && !allCheckpoints.containsKey(parentId)) { - LOG.error("Shard is lost: {}.", shard); - throw new OTSStreamReaderException("Can't find checkpoint for shard: " + parentId); - } - } - return true; } } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/RecordProcessor.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/RecordProcessor.java index ba17bd9c..feb99722 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/RecordProcessor.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/RecordProcessor.java @@ -1,5 +1,6 @@ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; +import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.Mode; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; @@ -48,6 +49,9 @@ public class RecordProcessor implements Runnable { private AtomicBoolean stop; private AtomicLong sendRecordCount; + //enable seek shardIterator by timestamp + private boolean enableSeekShardIteratorByTimestamp; + public enum State { READY, // initialized but not start RUNNING, // start to read and process records @@ -78,6 +82,7 @@ public class RecordProcessor implements Runnable { this.recordSender = recordSender; this.isExportSequenceInfo = config.isExportSequenceInfo(); this.lastRecordCheckpointTime = 0; + this.enableSeekShardIteratorByTimestamp = config.getEnableSeekIteratorByTimestamp(); // set init state startTime = 0; @@ -107,22 +112,31 @@ public class RecordProcessor implements Runnable { if (readerConfig.getMode().equals(Mode.MULTI_VERSION)) { this.otsStreamRecordSender = new MultiVerModeRecordSender(recordSender, shard.getShardId(), isExportSequenceInfo); } else if (readerConfig.getMode().equals(Mode.SINGLE_VERSION_AND_UPDATE_ONLY)) { - this.otsStreamRecordSender = new SingleVerAndUpOnlyModeRecordSender(recordSender, shard.getShardId(), isExportSequenceInfo, readerConfig.getColumns()); + this.otsStreamRecordSender = new SingleVerAndUpOnlyModeRecordSender(recordSender, shard.getShardId(), isExportSequenceInfo, readerConfig.getColumns(), readerConfig.getColumnsIsTimeseriesTags()); } else { throw new OTSStreamReaderException("Internal Error. Unhandled Mode: " + readerConfig.getMode()); } if (startCheckpoint.getCheckpoint().equals(CheckpointPosition.TRIM_HORIZON)) { lastShardIterator = null; - nextShardIterator = ots.getShardIterator(new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId())).getShardIterator(); + if (enableSeekShardIteratorByTimestamp) { + long beginTimeStamp = startTimestampMillis - 10 * 60 * 1000; + if (beginTimeStamp > 0) { + nextShardIterator = getShardIteratorWithBeginTime((startTimestampMillis - 10 * 60 * 1000) * 1000); + } else { + nextShardIterator = ots.getShardIterator(new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId())).getShardIterator(); + } + } else { + nextShardIterator = ots.getShardIterator(new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId())).getShardIterator(); + } skipCount = startCheckpoint.getSkipCount(); } else { lastShardIterator = null; nextShardIterator = startCheckpoint.getCheckpoint(); skipCount = startCheckpoint.getSkipCount(); } - LOG.info("Initialize record processor. Mode: {}, StartCheckpoint: [{}], ShardId: {}, ShardIterator: {}, SkipCount: {}.", - readerConfig.getMode(), startCheckpoint, shard.getShardId(), nextShardIterator, skipCount); + LOG.info("Initialize record processor. Mode: {}, StartCheckpoint: [{}], ShardId: {}, ShardIterator: {}, SkipCount: {}, enableSeekShardIteratorByTimestamp: {}, startTimestamp: {}.", + readerConfig.getMode(), startCheckpoint, shard.getShardId(), nextShardIterator, skipCount, enableSeekShardIteratorByTimestamp, startTimestampMillis); } private long getTimestamp(StreamRecord record) { @@ -181,15 +195,32 @@ public class RecordProcessor implements Runnable { * * @param records * @param nextShardIterator + * @param mayMoreRecord * @return */ - boolean process(List records, String nextShardIterator) { + boolean process(List records, String nextShardIterator, Boolean mayMoreRecord) { if (records.isEmpty() && nextShardIterator != null) { - LOG.info("ProcessFinished: No more data in shard, shardId: {}.", shard.getShardId()); - ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), nextShardIterator, 0); - checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); - checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), endTimestampMillis, nextShardIterator); - return true; + // 没有读到更多数据 + if (!readerConfig.isEnableTableGroupSupport()) { + LOG.info("ProcessFinished: No more data in shard, shardId: {}.", shard.getShardId()); + ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), nextShardIterator, 0); + checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); + checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), endTimestampMillis, nextShardIterator); + return true; + } else { + if (mayMoreRecord == null) { + LOG.error("mayMoreRecord can not be null when tablegroup is true"); + throw DataXException.asDataXException("mayMoreRecord can not be null when tablegroup is true"); + } else if (mayMoreRecord) { + return false; + } else { + LOG.info("ProcessFinished: No more data in shard, shardId: {}.", shard.getShardId()); + ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), nextShardIterator, 0); + checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); + checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), endTimestampMillis, nextShardIterator); + return true; + } + } } int size = records.size(); @@ -212,17 +243,19 @@ public class RecordProcessor implements Runnable { continue; } shouldSkip = false; - if (skipCount > 0) { - LOG.debug("Skip record. Timestamp: {}, SkipCount: {}.", timestamp, skipCount); - skipCount -= 1; - continue; - } LOG.debug("Send record. Timestamp: {}.", timestamp); sendRecord(records.get(i)); } else { LOG.info("ProcessFinished: Record in shard reach boundary of endTime, shardId: {}. Timestamp: {}, EndTime: {}", shard.getShardId(), timestamp, endTimestampMillis); - ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), lastShardIterator, i); + + String newIterator = lastShardIterator; + if (i > 0) { + newIterator = GetStreamRecordWithLimitRowCount(lastShardIterator, i); + } + + ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), newIterator, 0); + checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); return true; } @@ -240,14 +273,35 @@ public class RecordProcessor implements Runnable { private boolean readAndProcessRecords() { LOG.debug("Read and process records. ShardId: {}, ShardIterator: {}.", shard.getShardId(), nextShardIterator); + if (enableSeekShardIteratorByTimestamp && nextShardIterator == null) { + LOG.info("ProcessFinished: Shard has reach to end, shardId: {}.", shard.getShardId()); + ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), CheckpointPosition.SHARD_END, 0); + checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); + return true; + } + GetStreamRecordRequest request = new GetStreamRecordRequest(nextShardIterator); + if (readerConfig.isEnableTableGroupSupport()) { + request.setTableName(stream.getTableName()); + } + if (readerConfig.isTimeseriesTable()){ + request.setParseInTimeseriesDataFormat(true); + } GetStreamRecordResponse response = ots.getStreamRecord(request); lastShardIterator = nextShardIterator; nextShardIterator = response.getNextShardIterator(); - return processRecords(response.getRecords(), nextShardIterator); + return processRecords(response.getRecords(), nextShardIterator, response.getMayMoreRecord()); } - public boolean processRecords(List records, String nextShardIterator) { + private String GetStreamRecordWithLimitRowCount(String beginIterator, int expectedRowCount) { + LOG.debug("Read and process records. ShardId: {}, ShardIterator: {}, expectedRowCount: {}..", shard.getShardId(), beginIterator, expectedRowCount); + GetStreamRecordRequest request = new GetStreamRecordRequest(beginIterator); + request.setLimit(expectedRowCount); + GetStreamRecordResponse response = ots.getStreamRecord(request); + return response.getNextShardIterator(); + } + + public boolean processRecords(List records, String nextShardIterator, Boolean mayMoreRecord) { long startTime = System.currentTimeMillis(); if (records.isEmpty()) { @@ -256,7 +310,7 @@ public class RecordProcessor implements Runnable { LOG.debug("StartProcessRecords: size: {}, recordTime: {}.", records.size(), getTimestamp(records.get(0))); } - if (process(records, nextShardIterator)) { + if (process(records, nextShardIterator, mayMoreRecord)) { return true; } @@ -264,4 +318,27 @@ public class RecordProcessor implements Runnable { shard.getShardId(), System.currentTimeMillis() - startTime, records.size(), nextShardIterator); return false; } -} + + private String getShardIteratorWithBeginTime(long timestamp){ + LOG.info("Begin to seek shard iterator with timestamp, shardId: {}, timestamp: {}.", shard.getShardId(), timestamp); + GetShardIteratorRequest getShardIteratorRequest = new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId()); + getShardIteratorRequest.setTimestamp(timestamp); + + GetShardIteratorResponse response = ots.getShardIterator(getShardIteratorRequest); + String nextToken = response.getNextToken(); + + if (nextToken == null) { + return response.getShardIterator(); + } + + while (nextToken != null) { + getShardIteratorRequest = new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId()); + getShardIteratorRequest.setTimestamp(timestamp); + getShardIteratorRequest.setToken(nextToken); + + response = ots.getShardIterator(getShardIteratorRequest); + nextToken = response.getNextToken(); + } + return response.getShardIterator(); + } +} \ No newline at end of file diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/SingleVerAndUpOnlyModeRecordSender.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/SingleVerAndUpOnlyModeRecordSender.java index 1cc32bad..d962af76 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/SingleVerAndUpOnlyModeRecordSender.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/SingleVerAndUpOnlyModeRecordSender.java @@ -5,6 +5,7 @@ import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.ColumnValueTransformHelper; +import com.alicloud.openservices.tablestore.core.protocol.timeseries.TimeseriesResponseFactory; import com.alicloud.openservices.tablestore.model.*; import java.util.HashMap; @@ -17,21 +18,23 @@ import java.util.Map; * | pk1 | pk2 | col1 | col2 | col3 | sequence id | * | --- | --- | ---- | ---- | ---- | ----------- | * | a | b | c1 | null | null | 001 | - * + *

    * 注意:删除整行,删除某列(某个版本或所有),这些增量信息都会被忽略。 */ public class SingleVerAndUpOnlyModeRecordSender implements IStreamRecordSender { private final RecordSender dataxRecordSender; - private String shardId; private final boolean isExportSequenceInfo; + private String shardId; private List columnNames; + private List columnsIsTimeseriesTags; - public SingleVerAndUpOnlyModeRecordSender(RecordSender dataxRecordSender, String shardId, boolean isExportSequenceInfo, List columnNames) { + public SingleVerAndUpOnlyModeRecordSender(RecordSender dataxRecordSender, String shardId, boolean isExportSequenceInfo, List columnNames, List columnsIsTimeseriesTags) { this.dataxRecordSender = dataxRecordSender; this.shardId = shardId; this.isExportSequenceInfo = isExportSequenceInfo; this.columnNames = columnNames; + this.columnsIsTimeseriesTags = columnsIsTimeseriesTags; } @Override @@ -57,25 +60,49 @@ public class SingleVerAndUpOnlyModeRecordSender implements IStreamRecordSender { map.put(pkCol.getName(), pkCol.getValue()); } + /** + * 将时序数据中tags字段的字符串转化为Map + */ + Map tagsMap = new HashMap<>(); + if (columnsIsTimeseriesTags != null && columnsIsTimeseriesTags.contains(true)) { + try{ + tagsMap = TimeseriesResponseFactory.parseTagsOrAttrs(String.valueOf(map.get("_tags"))); + } + catch (Exception ex){ + throw new OTSStreamReaderException("Parse \"_tags\" fail, please check your config.", ex); + } + + } + for (RecordColumn recordColumn : columns) { if (recordColumn.getColumnType().equals(RecordColumn.ColumnType.PUT)) { map.put(recordColumn.getColumn().getName(), recordColumn.getColumn().getValue()); } } - boolean findColumn = false; + boolean findColumn = false; - for (String colName : columnNames) { - Object value = map.get(colName); - if (value != null) { - findColumn = true; - if (value instanceof ColumnValue) { - line.addColumn(ColumnValueTransformHelper.otsColumnValueToDataxColumn((ColumnValue) value)); + for (int i = 0; i < columnNames.size(); i++) { + if (columnsIsTimeseriesTags != null && columnsIsTimeseriesTags.get(i)) { + String value = tagsMap.get(columnNames.get(i)); + if (value != null) { + findColumn = true; + line.addColumn(new StringColumn(value)); } else { - line.addColumn(ColumnValueTransformHelper.otsPrimaryKeyValueToDataxColumn((PrimaryKeyValue) value)); + line.addColumn(new StringColumn(null)); } } else { - line.addColumn(new StringColumn(null)); + Object value = map.get(columnNames.get(i)); + if (value != null) { + findColumn = true; + if (value instanceof ColumnValue) { + line.addColumn(ColumnValueTransformHelper.otsColumnValueToDataxColumn((ColumnValue) value)); + } else { + line.addColumn(ColumnValueTransformHelper.otsPrimaryKeyValueToDataxColumn((PrimaryKeyValue) value)); + } + } else { + line.addColumn(new StringColumn(null)); + } } } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_en_US.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_en_US.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_ja_JP.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_CN.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_HK.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_TW.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/OTSStreamJobShard.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/OTSStreamJobShard.java new file mode 100644 index 00000000..d5d5f971 --- /dev/null +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/OTSStreamJobShard.java @@ -0,0 +1,42 @@ +package com.alibaba.datax.plugin.reader.otsstreamreader.internal.model; + +import com.alicloud.openservices.tablestore.model.StreamShard; + +import java.util.List; + +/** + * OTS streamJob & allShards model + * + * @author mingya.wmy (云时) + */ +public class OTSStreamJobShard { + + private StreamJob streamJob; + + private List allShards; + + public OTSStreamJobShard() { + } + + public OTSStreamJobShard(StreamJob streamJob, List allShards) { + this.streamJob = streamJob; + this.allShards = allShards; + } + + public StreamJob getStreamJob() { + return streamJob; + } + + public void setStreamJob(StreamJob streamJob) { + this.streamJob = streamJob; + } + + public List getAllShards() { + return allShards; + } + + public void setAllShards(List allShards) { + this.allShards = allShards; + } + +} diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_en_US.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_en_US.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_ja_JP.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_ja_JP.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_CN.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_CN.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_HK.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_HK.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_TW.properties b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_TW.properties new file mode 100644 index 00000000..e69de29b diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSHelper.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSHelper.java index 79b6c1d7..24ea732a 100644 --- a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSHelper.java +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSHelper.java @@ -2,11 +2,19 @@ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSRetryStrategyForStreamReader; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; +import com.alicloud.openservices.tablestore.ClientConfiguration; +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.model.*; -import com.alicloud.openservices.tablestore.*; +import com.alicloud.openservices.tablestore.model.timeseries.DescribeTimeseriesTableRequest; +import com.alicloud.openservices.tablestore.model.timeseries.DescribeTimeseriesTableResponse; import com.aliyun.openservices.ots.internal.streamclient.utils.TimeUtils; -import java.util.*; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; public class OTSHelper { @@ -35,14 +43,55 @@ public class OTSHelper { return ots; } + public static DescribeStreamResponse getStreamResponse(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { + /** + * 对于时序表,需要通过listStream&describeStream两次交互,获取streamID与expirationTime + */ + ListStreamRequest request = new ListStreamRequest(tableName); + ListStreamResponse response = ots.listStream(request); + String streamID = null; + for (Stream stream : response.getStreams()) { + if (stream.getTableName().equals(tableName)) { + streamID = stream.getStreamId(); + break; + } + } + if (streamID == null) { + throw new RuntimeException(String.format("Did not get any stream from table : (\"%s\") .", tableName)); + } + DescribeStreamRequest describeStreamRequest = new DescribeStreamRequest(streamID); + if (isTimeseriesTable) { + describeStreamRequest.setSupportTimeseriesTable(true); + } + DescribeStreamResponse result = ots.describeStream(describeStreamRequest); + if(isTimeseriesTable && !result.isTimeseriesDataTable()){ + throw new RuntimeException(String.format("The table [%s] is not timeseries data table, please remove the config: {isTimeseriesTable : true}.", tableName)); + } + return result; + } + public static StreamDetails getStreamDetails(SyncClientInterface ots, String tableName) { DescribeTableRequest describeTableRequest = new DescribeTableRequest(tableName); DescribeTableResponse result = ots.describeTable(describeTableRequest); return result.getStreamDetails(); } - public static List getOrderedShardList(SyncClientInterface ots, String streamId) { + public static StreamDetails getStreamDetails(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { + if (!isTimeseriesTable) { + return getStreamDetails(ots, tableName); + } else { + DescribeStreamResponse result = getStreamResponse(ots, tableName, isTimeseriesTable); + //TODO:时序表无法直接获取StreamDetails,需要手动构建。 + // 其中lastEnableTime字段暂时无法获取 + return new StreamDetails(true, result.getStreamId(), result.getExpirationTime(), 0); + } + } + + public static List getOrderedShardList(SyncClientInterface ots, String streamId, boolean isTimeseriesTable) { DescribeStreamRequest describeStreamRequest = new DescribeStreamRequest(streamId); + if (isTimeseriesTable) { + describeStreamRequest.setSupportTimeseriesTable(true); + } DescribeStreamResponse describeStreamResult = ots.describeStream(describeStreamRequest); List shardList = new ArrayList(); shardList.addAll(describeStreamResult.getShards()); @@ -54,10 +103,15 @@ public class OTSHelper { return shardList; } - public static boolean checkTableExists(SyncClientInterface ots, String tableName) { + public static boolean checkTableExists(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { boolean exist = false; try { - describeTable(ots, tableName); + if (isTimeseriesTable) { + describeTimeseriesTable(ots, tableName); + } else { + describeTable(ots, tableName); + } + exist = true; } catch (TableStoreException ex) { if (!ex.getErrorCode().equals(OBJECT_NOT_EXIST)) { @@ -71,6 +125,10 @@ public class OTSHelper { return ots.describeTable(new DescribeTableRequest(tableName)); } + public static DescribeTimeseriesTableResponse describeTimeseriesTable(SyncClientInterface ots, String tableName) { + return ((SyncClient) ots).asTimeseriesClient().describeTimeseriesTable(new DescribeTimeseriesTableRequest(tableName)); + } + public static void createTable(SyncClientInterface ots, TableMeta tableMeta, TableOptions tableOptions) { CreateTableRequest request = new CreateTableRequest(tableMeta, tableOptions, new ReservedThroughput(CREATE_TABLE_READ_CU, CREATE_TABLE_WRITE_CU)); @@ -109,11 +167,12 @@ public class OTSHelper { return false; } - public static Map toShardMap(List orderedShardList) { + public static Map toShardMap(List orderedShardList) { Map shardsMap = new HashMap(); for (StreamShard shard : orderedShardList) { shardsMap.put(shard.getShardId(), shard); } return shardsMap; } + } diff --git a/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSStreamJobShardUtil.java b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSStreamJobShardUtil.java new file mode 100644 index 00000000..a062b44f --- /dev/null +++ b/otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSStreamJobShardUtil.java @@ -0,0 +1,105 @@ +package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.RetryUtil; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.CheckpointTimeTracker; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.OTSStreamJobShard; +import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; +import com.alibaba.fastjson.JSON; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.StreamShard; +import org.apache.commons.lang3.StringUtils; + +import java.util.List; +import java.util.Set; +import java.util.concurrent.Callable; +import java.util.stream.Collectors; + +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.DEFAULT_SLEEP_TIME_IN_MILLS; +import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.RETRY_TIMES; + +/** + * @author mingya.wmy (云时) + */ +public class OTSStreamJobShardUtil { + + private static OTSStreamJobShard otsStreamJobShard = null; + + /** + * 获取全局OTS StreamJob 和 allShards ,懒汉单例模式,减少对OTS接口交互频次 + * 备注:config 和 version 所有TASK 均一样 + * + * @param config + * @param version + * @return + * @throws Exception + */ + public static OTSStreamJobShard getOTSStreamJobShard(OTSStreamReaderConfig config, String version) throws Exception { + if (otsStreamJobShard == null) { + synchronized (OTSHelper.class) { + if (otsStreamJobShard == null) { + otsStreamJobShard = RetryUtil.executeWithRetry(new Callable() { + @Override + public OTSStreamJobShard call() throws Exception { + return getOTSStreamJobShardByOtsClient(config, version); + } + }, RETRY_TIMES, DEFAULT_SLEEP_TIME_IN_MILLS, true); + } + } + } + + return otsStreamJobShard; + } + + /** + * 获取OTS StreamJob 和 allShards + * + * @param config OTS CONF + * @param version OTS STREAM VERSION + * @return + */ + private static OTSStreamJobShard getOTSStreamJobShardByOtsClient(OTSStreamReaderConfig config, String version) { + // Init ots,Task阶段从OTS中获取 allShards 和 streamJob + SyncClientInterface ots = null; + try { + ots = OTSHelper.getOTSInstance(config); + String streamId = OTSHelper.getStreamResponse(ots, config.getDataTable(), config.isTimeseriesTable()).getStreamId(); + List allShards = OTSHelper.getOrderedShardList(ots, streamId, config.isTimeseriesTable()); + + CheckpointTimeTracker checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), streamId); + StreamJob streamJobFromCPT = checkpointInfoTracker.readStreamJob(config.getEndTimestampMillis()); + if (!StringUtils.equals(streamJobFromCPT.getVersion(), version)) { + throw new RuntimeException(String.format("streamJob version (\"%s\") is not equal to \"%s\", streamJob: %s", + streamJobFromCPT.getVersion(), version, JSON.toJSONString(streamJobFromCPT))); + } + + Set shardIdSetsFromTracker = streamJobFromCPT.getShardIds(); + + if (shardIdSetsFromTracker == null || shardIdSetsFromTracker.isEmpty()) { + throw new RuntimeException(String.format("StreamJob [statusTable=%s, streamId=%s] shardIds can't be null!", + config.getStatusTable(), streamId)); + } + + Set currentAllStreamShardIdSets = allShards.stream().map(streamShard -> streamShard.getShardId()).collect(Collectors.toSet()); + + for (String shardId: shardIdSetsFromTracker) { + if (!currentAllStreamShardIdSets.contains(shardId)) { + allShards.add(new StreamShard(shardId)); + } + } + + StreamJob streamJob = new StreamJob(config.getDataTable(), streamId, version, shardIdSetsFromTracker, + config.getStartTimestampMillis(), config.getEndTimestampMillis()); + + return new OTSStreamJobShard(streamJob, allShards); + } catch (Throwable e) { + throw new DataXException(String.format("Get ots shards error: %s", e.getMessage())); + } finally { + if (ots != null) { + ots.shutdown(); + } + } + } + +} diff --git a/otswriter/doc/otswriter.md b/otswriter/doc/otswriter.md index cbfaf2a8..43697feb 100644 --- a/otswriter/doc/otswriter.md +++ b/otswriter/doc/otswriter.md @@ -7,13 +7,8 @@ ___ ## 1 快速介绍 -OTSWriter插件实现了向OTS写入数据,目前支持三种写入方式: +OTSWriter插件实现了向OTS写入数据,目前支持了多版本数据的写入、主键自增列的写入等功能。 -* PutRow,对应于OTS API PutRow,插入数据到指定的行,如果该行不存在,则新增一行;若该行存在,则覆盖原有行。 - -* UpdateRow,对应于OTS API UpdateRow,更新指定行的数据,如果该行不存在,则新增一行;若该行存在,则根据请求的内容在这一行中新增、修改或者删除指定列的值。 - -* DeleteRow,对应于OTS API DeleteRow,删除指定行的数据。 OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务,提供海量结构化数据的存储和实时访问。OTS 以实例和表的形式组织数据,通过数据分片和负载均衡技术,实现规模上的无缝扩展。 @@ -28,6 +23,7 @@ OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务, * 配置一个写入OTS作业: +`normal模式` ``` { "job": { @@ -37,48 +33,53 @@ OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务, { "reader": {}, "writer": { - "name": "otswriter", + "name": "otswriter", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", - // 导出数据表的表名 "table":"", - - // Writer支持不同类型之间进行相互转换 - // 如下类型转换不支持: - // ================================ - // int -> binary - // double -> bool, binary - // bool -> binary - // bytes -> int, double, bool - // ================================ - + + // 可选 multiVersion||normal,可选配置,默认normal + "mode":"normal", + + //newVersion定义是否使用新版本插件 可选值:true || false + "newVersion":"true", + + //是否允许向包含主键自增列的ots表中写入数据 + //与mode:multiVersion的多版本模式不兼容 + "enableAutoIncrement":"true", + // 需要导入的PK列名,区分大小写 - // 类型支持:STRING,INT + // 类型支持:STRING,INT,BINARY + // 必选 // 1. 支持类型转换,注意类型转换时的精度丢失 // 2. 顺序不要求和表的Meta一致 - "primaryKey" : [ - {"name":"pk1", "type":"string"}, - {"name":"pk2", "type":"int"} + // 3. name全局唯一 + "primaryKey":[ + "userid", + "groupid" ], - + // 需要导入的列名,区分大小写 // 类型支持STRING,INT,DOUBLE,BOOL和BINARY - "column" : [ - {"name":"col2", "type":"INT"}, - {"name":"col3", "type":"STRING"}, - {"name":"col4", "type":"STRING"}, - {"name":"col5", "type":"BINARY"}, - {"name":"col6", "type":"DOUBLE"} + // 必选 + // 1.name全局唯一 + "column":[ + {"name":"addr", "type":"string"}, + {"name":"height", "type":"int"} ], - + + // 如果用户配置了时间戳,系统将使用配置的时间戳,如果没有配置,使用OTS的系统时间戳 + // 可选 + "defaultTimestampInMillionSecond": 142722431, + // 写入OTS的方式 // PutRow : 等同于OTS API中PutRow操作,检查条件是ignore // UpdateRow : 等同于OTS API中UpdateRow操作,检查条件是ignore - // DeleteRow: 等同于OTS API中DeleteRow操作,检查条件是ignore - "writeMode" : "PutRow" + "writeMode":"PutRow" + } } } @@ -92,94 +93,168 @@ OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务, * **endpoint** - * 描述:OTS Server的EndPoint(服务地址),例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 + * 描述:OTS Server的EndPoint(服务地址),例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 - * 必选:是
    + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **accessId** - * 描述:OTS的accessId
    + * 描述:OTS的accessId
    - * 必选:是
    + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **accessKey** - * 描述:OTS的accessKey
    + * 描述:OTS的accessKey
    - * 必选:是
    + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **instanceName** - * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
    + * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
    - * 必选:是
    + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **table** - * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
    + * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
    - * 必选:是
    + * 必选:是
    + + * 默认值:无
    + +* **newVersion** + + * 描述:version定义了使用的ots SDK版本。
    + * true,新版本插件,使用com.alicloud.openservices.tablestore的依赖(推荐) + * false,旧版本插件,使用com.aliyun.openservices.ots的依赖,**不支持多版本数据的读取** + + * 必选:否
    + + * 默认值:false
    + +* **mode** + + * 描述:是否为多版本数据,目前有两种模式。
    + * normal,对应普通的数据 + * multiVersion,写入数据为多版本格式的数据,多版本模式下,配置参数有所不同,详见3.4节 + + * 必选:否
    + + * 默认值:normal
    + + +* **enableAutoIncrement** + + * 描述:是否允许向包含主键自增列的ots表中写入数据。
    + * true,插件会扫描表中的自增列信息,并在写入数据时自动添加自增列 + * false,写入含主键自增列的表时会报错 + + * 必选:否
    + + * 默认值:false
    + + +* **isTimeseriesTable** + + * 描述:写入的对应表是否为时序表,仅在mode=normal模式下生效。
    + * true,写入的数据表为时序数据表 + * false,写入的数据表为普通的宽表 + + * 必选:否
    + + * 默认值:false
    + + * 在写入时序数据表的模式下,不需要配置`primaryKey`字段,只需要配置`column`字段,配置样例: + ```json + "column": [ + { + "name": "_m_name", // 表示度量名称(measurement)字段 + }, + { + "name": "_data_source", // 表示数据源(dataSource)字段 + }, + { + "name": "_tags", // 表示标签字段,会被解析为Map类型 + }, + { + "name": "_time", // 表示时间戳字段,会被解析为long类型的值 + }, + { + "name": "tag_a", + "isTag":"true" // 表示标签内部字段,该字段会被解析到标签的字典内部 + }, + { + "name": "column1", // 属性列名称 + "type": "string" // 属性列类型,支持 bool string int double binary + }, + { + "name": "column2", + "type": "int" + } + ], + ``` + + - * 默认值:无
    * **primaryKey** - * 描述: OTS的主键信息,使用JSON的数组描述字段信息。OTS本身是NoSQL系统,在OTSWriter导入数据过程中,必须指定相应地字段名称。 + * 描述: OTS的主键信息,使用JSON的数组描述字段信息。OTS本身是NoSQL系统,在OTSWriter导入数据过程中,必须指定相应地字段名称。 - OTS的PrimaryKey只能支持STRING,INT两种类型,因此OTSWriter本身也限定填写上述两种类型。 + OTS的PrimaryKey只能支持STRING,INT两种类型,因此OTSWriter本身也限定填写上述两种类型。 - DataX本身支持类型转换的,因此对于源头数据非String/Int,OTSWriter会进行数据类型转换。 + DataX本身支持类型转换的,因此对于源头数据非String/Int,OTSWriter会进行数据类型转换。 - 配置实例: + 配置实例: - ```json - "primaryKey" : [ - {"name":"pk1", "type":"string"}, - {"name":"pk2", "type":"int"} - ], - ``` - * 必选:是
    + ```json + "primaryKey":[ + "userid", + "groupid" + ] + ``` + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **column** - * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。使用格式为 + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。使用格式为 - ```json - {"name":"col2", "type":"INT"}, - ``` + ```json + {"name":"col2", "type":"INT"}, + ``` - 其中的name指定写入的OTS列名,type指定写入的类型。OTS类型支持STRING,INT,DOUBLE,BOOL和BINARY几种类型 。 + 其中的name指定写入的OTS列名,type指定写入的类型。OTS类型支持STRING,INT,DOUBLE,BOOL和BINARY几种类型 。 - 写入过程不支持常量、函数或者自定义表达式。 + 写入过程不支持常量、函数或者自定义表达式。 - * 必选:是
    + * 必选:是
    - * 默认值:无
    + * 默认值:无
    * **writeMode** - * 描述:写入模式,目前支持两种模式, + * 描述:写入模式,目前支持两种模式, - * PutRow,对应于OTS API PutRow,插入数据到指定的行,如果该行不存在,则新增一行;若该行存在,则覆盖原有行。 + * PutRow,对应于OTS API PutRow,插入数据到指定的行,如果该行不存在,则新增一行;若该行存在,则覆盖原有行。 - * UpdateRow,对应于OTS API UpdateRow,更新指定行的数据,如果该行不存在,则新增一行;若该行存在,则根据请求的内容在这一行中新增、修改或者删除指定列的值。 + * UpdateRow,对应于OTS API UpdateRow,更新指定行的数据,如果该行不存在,则新增一行;若该行存在,则根据请求的内容在这一行中新增、修改或者删除指定列的值。 - * DeleteRow,对应于OTS API DeleteRow,删除指定行的数据。 - * 必选:是
    +* 必选:是
    - * 默认值:无
    +* 默认值:无
    ### 3.3 类型转换 @@ -197,43 +272,79 @@ OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务, * 注意,OTS本身不支持日期型类型。应用层一般使用Long报错时间的Unix TimeStamp。 -## 4 性能报告 +### 3.4 multiVersion模式 -### 4.1 环境准备 +#### 3.4.1 模式介绍 -#### 4.1.1 数据特征 +multiVersion模式解决了ots数据库中多版本数据的导入问题。支持Hbase的全量数据迁移到OTS -2列PK(10 + 8),15列String(10 Byte), 2两列Integer(8 Byte),算上Column Name每行大概327Byte,每次BatchWriteRow写入100行数据,所以当个请求的数据大小是32KB。 - -#### 4.1.2 机器参数 - -OTS端:3台前端机,5台后端机 - -DataX运行端: 24核CPU, 98GB内存 - -### 4.2 测试报告 - -#### 4.2.1 测试报告 - -|并发数|DataX CPU|DATAX流量 |OTS 流量 | BatchWrite前端QPS| BatchWriteRow前端延时| -|--------|--------| --------|--------|--------|------| -|40| 1027% |Speed 22.13MB/s, 112640 records/s|65.8M/s |42|153ms | -|50| 1218% |Speed 24.11MB/s, 122700 records/s|73.5M/s |47|174ms| -|60| 1355% |Speed 25.31MB/s, 128854 records/s|78.1M/s |50|190ms| -|70| 1578% |Speed 26.35MB/s, 134121 records/s|80.8M/s |52|210ms| -|80| 1771% |Speed 26.55MB/s, 135161 records/s|82.7M/s |53|230ms| +* 注意:这种模式的数据格式比较特殊,该writer需要reader也提供版本的输出 +* 当前只有hbase reader 与 ots reader提供这种模式,使用时切记注意 +#### 3.4.2 配置样例 +``` +{ + "job": { + "setting": { + }, + "content": [ + { + "reader": {}, + "writer": { + "name": "otswriter", + "parameter": { + "endpoint":"", + "accessId":"", + "accessKey":"", + "instanceName":"", + "table":"", + + // 多版本模式,插件会按照多版本模式去解析所有配置 + "mode":"multiVersion", + + "newVersion":"true", + + // 配置PK信息 + // 考虑到配置成本,并不需要配置PK在Record(Line)中的位置,要求 + // Record的格式固定,PK一定在行首,PK之后是columnName,格式如下: + // 如:{pk0,pk1,pk2,pk3}, {columnName}, {timestamp}, {value} + "primaryKey":[ + "userid", + "groupid" + ], + + // 列名前缀过滤 + // 描述:hbase导入过来的数据,cf和qulifier共同组成columnName, + // OTS并不支持cf,所以需要将cf过滤掉 + // 注意: + // 1.该参数选填,如果没有填写或者值为空字符串,表示不对列名进行过滤。 + // 2.如果datax传入的数据columnName列不是以前缀开始,则将该Record放入脏数据回收器中 + "columnNamePrefixFilter":"cf:" + } + } + } + ] + } +} +``` +## 4 约束限制 - -## 5 约束限制 - -### 5.1 写入幂等性 +### 4.1 写入幂等性 OTS写入本身是支持幂等性的,也就是使用OTS SDK同一条数据写入OTS系统,一次和多次请求的结果可以理解为一致的。因此对于OTSWriter多次尝试写入同一条数据与写入一条数据结果是等同的。 -### 5.2 单任务FailOver +### 4.2 单任务FailOver 由于OTS写入本身是幂等性的,因此可以支持单任务FailOver。即一旦写入Fail,DataX会重新启动相关子任务进行重试。 -## 6 FAQ +## 5 FAQ + +* 1.如果使用多版本模式,value为null应该怎么解释? + * : 表示删除指定的版本 +* 2.如果ts列为空怎么办? + * :插件记录为垃圾数据 +* 3.Record的count和期望不符? + * : 插件异常终止 +* 4.在普通模式下,采用UpdateRow的方式写入数据,如果不指定TS,相同行数的数据怎么写入到OTS中? + * : 后面的覆盖前面的数据 diff --git a/otswriter/pom.xml b/otswriter/pom.xml index cb255e1f..f393d76c 100644 --- a/otswriter/pom.xml +++ b/otswriter/pom.xml @@ -10,17 +10,6 @@ otswriter - - org.apache.logging.log4j - log4j-api - 2.17.1 - - - - org.apache.logging.log4j - log4j-core - 2.17.1 - com.alibaba.datax datax-common @@ -44,18 +33,25 @@ com.aliyun.openservices ots-public - 2.2.4 + 2.2.6 - - log4j-api - org.apache.logging.log4j - log4j-core org.apache.logging.log4j - + + + com.aliyun.openservices + tablestore + 5.13.10 + + + log4j-core + org.apache.logging.log4j + + + com.google.code.gson gson @@ -63,6 +59,14 @@ + + + src/main/java + + **/*.properties + + + diff --git a/otswriter/src/main/assembly/package.xml b/otswriter/src/main/assembly/package.xml index 5ae7a015..91523025 100644 --- a/otswriter/src/main/assembly/package.xml +++ b/otswriter/src/main/assembly/package.xml @@ -12,8 +12,8 @@ src/main/resources plugin.json - plugin_job_template.json - + plugin_job_template.json + plugin/writer/otswriter diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterMasterProxy.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterMasterProxy.java new file mode 100644 index 00000000..af364b86 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterMasterProxy.java @@ -0,0 +1,16 @@ +package com.alibaba.datax.plugin.writer.otswriter; + +import com.alibaba.datax.common.util.Configuration; + +import java.util.List; + +public interface IOtsWriterMasterProxy { + + public void init(Configuration param) throws Exception; + + public void close(); + + public List split(int mandatoryNumber); + + +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterSlaveProxy.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterSlaveProxy.java new file mode 100644 index 00000000..1ce78ccb --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterSlaveProxy.java @@ -0,0 +1,25 @@ +package com.alibaba.datax.plugin.writer.otswriter; + +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; + +public interface IOtsWriterSlaveProxy { + + /** + * Slave的初始化,创建Slave所使用的资源 + */ + public void init(Configuration configuration); + + /** + * 释放Slave的所有资源 + */ + public void close() throws OTSCriticalException; + + /** + * Slave的执行器,将Datax的数据写入到OTS中 + * @param recordReceiver + * @throws OTSCriticalException + */ + public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws OTSCriticalException; +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/Key.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/Key.java index 0724b9cf..10dd9cc9 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/Key.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/Key.java @@ -25,7 +25,11 @@ public final class Key { public final static String OTS_ACCESSKEY = "accessKey"; public final static String OTS_INSTANCE_NAME = "instanceName"; - + public final static String ENABLE_AUTO_INCREMENT = "enableAutoIncrement"; + public final static String IS_TIMESERIES_TABLE = "isTimeseriesTable"; + + public final static String TIMEUNIT_FORMAT = "timeunit"; + public final static String TABLE_NAME = "table"; public final static String PRIMARY_KEY = "primaryKey"; @@ -33,4 +37,11 @@ public final class Key { public final static String COLUMN = "column"; public final static String WRITE_MODE = "writeMode"; + + public final static String MODE = "mode"; + public final static String NEW_VERISON = "newVersion"; + + public final static String DEFAULT_TIMESTAMP = "defaultTimestampInMillisecond"; + + public final static String COLUMN_NAME_PREFIX_FILTER = "columnNamePrefixFilter"; } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSCriticalException.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSCriticalException.java new file mode 100644 index 00000000..b89df008 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSCriticalException.java @@ -0,0 +1,24 @@ +package com.alibaba.datax.plugin.writer.otswriter; + +/** + * 插件错误异常,该异常主要用于描述插件的异常退出 + * @author redchen + */ +public class OTSCriticalException extends Exception{ + + private static final long serialVersionUID = 5820460098894295722L; + + public OTSCriticalException() {} + + public OTSCriticalException(String message) { + super(message); + } + + public OTSCriticalException(Throwable a) { + super(a); + } + + public OTSCriticalException(String message, Throwable a) { + super(message, a); + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSErrorCode.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSErrorCode.java new file mode 100644 index 00000000..86877730 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSErrorCode.java @@ -0,0 +1,115 @@ +/** + * Copyright (C) Alibaba Cloud Computing + * All rights reserved. + * + * 版权所有 (C)阿里云计算有限公司 + */ + +package com.alibaba.datax.plugin.writer.otswriter; + +/** + * 表示来自开放结构化数据服务(Open Table Service,OTS)的错误代码。 + * + */ +public interface OTSErrorCode { + /** + * 用户身份验证失败。 + */ + static final String AUTHORIZATION_FAILURE = "OTSAuthFailed"; + + /** + * 服务器内部错误。 + */ + static final String INTERNAL_SERVER_ERROR = "OTSInternalServerError"; + + /** + * 参数错误。 + */ + static final String INVALID_PARAMETER = "OTSParameterInvalid"; + + /** + * 整个请求过大。 + */ + static final String REQUEST_TOO_LARGE = "OTSRequestBodyTooLarge"; + + /** + * 客户端请求超时。 + */ + static final String REQUEST_TIMEOUT = "OTSRequestTimeout"; + + /** + * 用户的配额已经用满。 + */ + static final String QUOTA_EXHAUSTED = "OTSQuotaExhausted"; + + /** + * 内部服务器发生failover,导致表的部分分区不可服务。 + */ + static final String PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; + + /** + * 表刚被创建还无法立马提供服务。 + */ + static final String TABLE_NOT_READY = "OTSTableNotReady"; + + /** + * 请求的表不存在。 + */ + static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; + + /** + * 请求创建的表已经存在。 + */ + static final String OBJECT_ALREADY_EXIST = "OTSObjectAlreadyExist"; + + /** + * 多个并发的请求写同一行数据,导致冲突。 + */ + static final String ROW_OPEARTION_CONFLICT = "OTSRowOperationConflict"; + + /** + * 主键不匹配。 + */ + static final String INVALID_PK = "OTSInvalidPK"; + + /** + * 读写能力调整过于频繁。 + */ + static final String TOO_FREQUENT_RESERVED_THROUGHPUT_ADJUSTMENT = "OTSTooFrequentReservedThroughputAdjustment"; + + /** + * 该行总列数超出限制。 + */ + static final String OUT_OF_COLUMN_COUNT_LIMIT = "OTSOutOfColumnCountLimit"; + + /** + * 该行所有列数据大小总和超出限制。 + */ + static final String OUT_OF_ROW_SIZE_LIMIT = "OTSOutOfRowSizeLimit"; + + /** + * 剩余预留读写能力不足。 + */ + static final String NOT_ENOUGH_CAPACITY_UNIT = "OTSNotEnoughCapacityUnit"; + + /** + * 预查条件检查失败。 + */ + static final String CONDITION_CHECK_FAIL = "OTSConditionCheckFail"; + + /** + * 在OTS内部操作超时。 + */ + static final String STORAGE_TIMEOUT = "OTSTimeout"; + + /** + * 在OTS内部有服务器不可访问。 + */ + static final String SERVER_UNAVAILABLE = "OTSServerUnavailable"; + + /** + * OTS内部服务器繁忙。 + */ + static final String SERVER_BUSY = "OTSServerBusy"; + +} \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriter.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriter.java index 4d2ed17b..46227238 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriter.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriter.java @@ -1,41 +1,44 @@ package com.alibaba.datax.plugin.writer.otswriter; -import java.util.List; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.writer.otswriter.utils.Common; -import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSException; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; +import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; +import com.alicloud.openservices.tablestore.ClientException; +import com.alicloud.openservices.tablestore.TableStoreException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; public class OtsWriter { + public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); - private OtsWriterMasterProxy proxy = new OtsWriterMasterProxy(); - + + private IOtsWriterMasterProxy proxy; + @Override public void init() { LOG.info("init() begin ..."); + proxy = new OtsWriterMasterProxy(); try { this.proxy.init(getPluginJobConf()); - } catch (OTSException e) { - LOG.error("OTSException: {}", e.getMessage(), e); - throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); + } catch (TableStoreException e) { + LOG.error("OTSException: {}", e.toString(), e); + throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS Client Error"), e.toString(), e); } catch (ClientException e) { - LOG.error("ClientException: {}", e.getMessage(), e); - throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (IllegalArgumentException e) { - LOG.error("IllegalArgumentException. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsWriterError.INVALID_PARAM, Common.getDetailMessage(e), e); + LOG.error("ClientException: {}", e.toString(), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } catch (Exception e) { - LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsWriterError.ERROR, Common.getDetailMessage(e), e); + LOG.error("Exception. ErrorMsg:{}", e.toString(), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } + LOG.info("init() end ..."); } @@ -50,42 +53,67 @@ public class OtsWriter { return this.proxy.split(mandatoryNumber); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsWriterError.ERROR, Common.getDetailMessage(e), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } } } - + public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); - private OtsWriterSlaveProxy proxy = new OtsWriterSlaveProxy(); - + private IOtsWriterSlaveProxy proxy = null; + + /** + * 基于配置,构建对应的worker代理 + */ @Override - public void init() {} + public void init() { + OTSConf conf = GsonParser.jsonToConf(this.getPluginJobConf().getString(OTSConst.OTS_CONF)); + // 是否使用新接口 + if(conf.isNewVersion()) { + if (conf.getMode() == OTSMode.MULTI_VERSION) { + LOG.info("init OtsWriterSlaveProxyMultiVersion"); + proxy = new OtsWriterSlaveProxyMultiversion(); + } else { + LOG.info("init OtsWriterSlaveProxyNormal"); + proxy = new OtsWriterSlaveProxyNormal(); + } + + } + else{ + proxy = new OtsWriterSlaveProxyOld(); + } + + proxy.init(this.getPluginJobConf()); + + } @Override public void destroy() { - this.proxy.close(); + try { + proxy.close(); + } catch (OTSCriticalException e) { + LOG.error("OTSCriticalException. ErrorMsg:{}", e.getMessage(), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); + } } @Override public void startWrite(RecordReceiver lineReceiver) { LOG.info("startWrite() begin ..."); + try { - this.proxy.init(this.getPluginJobConf()); - this.proxy.write(lineReceiver, this.getTaskPluginCollector()); - } catch (OTSException e) { - LOG.error("OTSException: {}", e.getMessage(), e); - throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); + proxy.write(lineReceiver, this.getTaskPluginCollector()); + } catch (TableStoreException e) { + LOG.error("OTSException: {}", e.toString(), e); + throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS Client Error"), e.toString(), e); } catch (ClientException e) { - LOG.error("ClientException: {}", e.getMessage(), e); - throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS端的错误"), Common.getDetailMessage(e), e); - } catch (IllegalArgumentException e) { - LOG.error("IllegalArgumentException. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsWriterError.INVALID_PARAM, Common.getDetailMessage(e), e); + LOG.error("ClientException: {}", e.toString(), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } catch (Exception e) { - LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); - throw DataXException.asDataXException(OtsWriterError.ERROR, Common.getDetailMessage(e), e); + LOG.error("Exception. ErrorMsg:{}", e.toString(), e); + throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } + LOG.info("startWrite() end ..."); } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterError.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterError.java index 67d1ee2b..092a7343 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterError.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterError.java @@ -14,10 +14,10 @@ public class OtsWriterError implements ErrorCode { public final static OtsWriterError ERROR = new OtsWriterError( "OtsWriterError", - "该错误表示插件的内部错误,表示系统没有处理到的异常"); + "This error represents an internal error of the ots writer plugin, which indicates that the system is not processed."); public final static OtsWriterError INVALID_PARAM = new OtsWriterError( "OtsWriterInvalidParameter", - "该错误表示参数错误,表示用户输入了错误的参数格式等"); + "This error represents a parameter error, indicating that the user entered the wrong parameter format."); public OtsWriterError (String code) { this.code = code; @@ -41,6 +41,6 @@ public class OtsWriterError implements ErrorCode { @Override public String toString() { - return this.code; + return "[ code:" + this.code + ", message:" + this.description + "]"; } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterMasterProxy.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterMasterProxy.java index 91cf9b12..774aca1e 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterMasterProxy.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterMasterProxy.java @@ -1,110 +1,138 @@ package com.alibaba.datax.plugin.writer.otswriter; -import java.util.ArrayList; -import java.util.List; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.callable.GetTableMetaCallable; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf.RestrictConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; import com.alibaba.datax.plugin.writer.otswriter.model.OTSOpType; -import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; -import com.alibaba.datax.plugin.writer.otswriter.utils.ParamChecker; -import com.alibaba.datax.plugin.writer.otswriter.utils.RetryHelper; -import com.alibaba.datax.plugin.writer.otswriter.utils.WriterModelParser; -import com.aliyun.openservices.ots.OTSClient; -import com.aliyun.openservices.ots.model.TableMeta; +import com.alibaba.datax.plugin.writer.otswriter.utils.*; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TimeseriesClient; +import com.alicloud.openservices.tablestore.model.TableMeta; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.TimeUnit; + +public class OtsWriterMasterProxy implements IOtsWriterMasterProxy { -public class OtsWriterMasterProxy { - - private OTSConf conf = new OTSConf(); - - private OTSClient ots = null; - - private TableMeta meta = null; - private static final Logger LOG = LoggerFactory.getLogger(OtsWriterMasterProxy.class); - + private OTSConf conf = new OTSConf(); + private SyncClientInterface ots = null; + private TableMeta meta = null; + /** * @param param * @throws Exception */ + @Override public void init(Configuration param) throws Exception { - + // 默认参数 - conf.setRetry(param.getInt(OTSConst.RETRY, 18)); - conf.setSleepInMillisecond(param.getInt(OTSConst.SLEEP_IN_MILLISECOND, 100)); - conf.setBatchWriteCount(param.getInt(OTSConst.BATCH_WRITE_COUNT, 100)); - conf.setConcurrencyWrite(param.getInt(OTSConst.CONCURRENCY_WRITE, 5)); - conf.setIoThreadCount(param.getInt(OTSConst.IO_THREAD_COUNT, 1)); - conf.setSocketTimeout(param.getInt(OTSConst.SOCKET_TIMEOUT, 20000)); - conf.setConnectTimeout(param.getInt(OTSConst.CONNECT_TIMEOUT, 10000)); - conf.setBufferSize(param.getInt(OTSConst.BUFFER_SIZE, 1024)); - - RestrictConf restrictConf = conf.new RestrictConf(); - restrictConf.setRequestTotalSizeLimition(param.getInt(OTSConst.REQUEST_TOTAL_SIZE_LIMITATION, 1024 * 1024)); - restrictConf.setAttributeColumnSize(param.getInt(OTSConst.ATTRIBUTE_COLUMN_SIZE_LIMITATION, 2 * 1024 * 1024)); - restrictConf.setPrimaryKeyColumnSize(param.getInt(OTSConst.PRIMARY_KEY_COLUMN_SIZE_LIMITATION, 1024)); - restrictConf.setMaxColumnsCount(param.getInt(OTSConst.ATTRIBUTE_COLUMN_MAX_COUNT, 1024)); - conf.setRestrictConf(restrictConf); + setStaticParams(param); + + conf.setTimestamp(param.getInt(Key.DEFAULT_TIMESTAMP, -1)); + conf.setRequestTotalSizeLimitation(param.getInt(OTSConst.REQUEST_TOTAL_SIZE_LIMITATION, 1024 * 1024)); // 必选参数 - conf.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT)); - conf.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID)); - conf.setAccessKey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY)); - conf.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME)); - conf.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME)); - - conf.setOperation(WriterModelParser.parseOTSOpType(ParamChecker.checkStringAndGet(param, Key.WRITE_MODE))); - - ots = new OTSClient( - this.conf.getEndpoint(), - this.conf.getAccessId(), - this.conf.getAccessKey(), - this.conf.getInstanceName()); - - meta = getTableMeta(ots, conf.getTableName()); - LOG.info("Table Meta : {}", GsonParser.metaToJson(meta)); - - conf.setPrimaryKeyColumn(WriterModelParser.parseOTSPKColumnList(ParamChecker.checkListAndGet(param, Key.PRIMARY_KEY, true))); - ParamChecker.checkPrimaryKey(meta, conf.getPrimaryKeyColumn()); - - conf.setAttributeColumn(WriterModelParser.parseOTSAttrColumnList(ParamChecker.checkListAndGet(param, Key.COLUMN, conf.getOperation() == OTSOpType.UPDATE_ROW ? true : false))); - ParamChecker.checkAttribute(conf.getAttributeColumn()); + conf.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT)); + conf.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID)); + conf.setAccessKey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY)); + conf.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME)); + conf.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME)); + + ots = Common.getOTSInstance(conf); + + conf.setNewVersion(param.getBool(Key.NEW_VERISON, false)); + conf.setMode(WriterModelParser.parseOTSMode(param.getString(Key.MODE, "normal"))); + conf.setEnableAutoIncrement(param.getBool(Key.ENABLE_AUTO_INCREMENT, false)); + conf.setTimeseriesTable(param.getBool(Key.IS_TIMESERIES_TABLE, false)); + ParamChecker.checkVersion(conf); + + if (!conf.isTimeseriesTable()){ + meta = getTableMeta(ots, conf.getTableName()); + LOG.debug("Table Meta : {}", GsonParser.metaToJson(meta)); + conf.setPrimaryKeyColumn(WriterModelParser.parseOTSPKColumnList(meta, ParamChecker.checkListAndGet(param, Key.PRIMARY_KEY, true))); + } + + if (conf.getMode() == OTSMode.MULTI_VERSION) { + conf.setOperation(OTSOpType.UPDATE_ROW);// 多版本只支持Update模式 + conf.setColumnNamePrefixFilter(param.getString(Key.COLUMN_NAME_PREFIX_FILTER, null)); + } else if (!conf.isTimeseriesTable()){ // 普通模式,写入宽表 + conf.setOperation(WriterModelParser.parseOTSOpType(ParamChecker.checkStringAndGet(param, Key.WRITE_MODE), conf.getMode())); + conf.setAttributeColumn(WriterModelParser.parseOTSAttrColumnList(conf.getPrimaryKeyColumn(), ParamChecker.checkListAndGet(param, Key.COLUMN, false), conf.getMode() + ) + ); + ParamChecker.checkAttribute(conf.getAttributeColumn()); + } else { // 普通模式,写入时序表 + conf.setOperation(OTSOpType.PUT_ROW);// 时序表只支持Put模式 + conf.setAttributeColumn(WriterModelParser.parseOTSTimeseriesRowAttrList(ParamChecker.checkListAndGet(param, Key.COLUMN, true))); + conf.setTimeUnit(ParamChecker.checkTimeUnitAndGet(param.getString(Key.TIMEUNIT_FORMAT, "MICROSECONDS"))); + } + + /** + * 如果配置支持主键列自增 + */ + if (conf.getEnableAutoIncrement()) { + ParamChecker.checkPrimaryKeyWithAutoIncrement(meta, conf.getPrimaryKeyColumn()); + conf.setEncodePkColumnMapping(Common.getEncodePkColumnMappingWithAutoIncrement(meta, conf.getPrimaryKeyColumn())); + } + /** + * 如果配置不支持主键列自增 + */ + else if (!conf.isTimeseriesTable()){ + ParamChecker.checkPrimaryKey(meta, conf.getPrimaryKeyColumn()); + conf.setEncodePkColumnMapping(Common.getEncodePkColumnMapping(meta, conf.getPrimaryKeyColumn())); + } + + } - - public List split(int mandatoryNumber){ + + @Override + public List split(int mandatoryNumber) { LOG.info("Begin split and MandatoryNumber : {}", mandatoryNumber); List configurations = new ArrayList(); + String json = GsonParser.confToJson(this.conf); for (int i = 0; i < mandatoryNumber; i++) { Configuration configuration = Configuration.newDefault(); - configuration.set(OTSConst.OTS_CONF, GsonParser.confToJson(this.conf)); + configuration.set(OTSConst.OTS_CONF, json); configurations.add(configuration); } LOG.info("End split."); - assert(mandatoryNumber == configurations.size()); return configurations; } - + + @Override public void close() { ots.shutdown(); } - + public OTSConf getOTSConf() { return conf; } // private function - private TableMeta getTableMeta(OTSClient ots, String tableName) throws Exception { + private TableMeta getTableMeta(SyncClientInterface ots, String tableName) throws Exception { return RetryHelper.executeWithRetry( new GetTableMetaCallable(ots, tableName), conf.getRetry(), conf.getSleepInMillisecond() - ); + ); + } + + public void setStaticParams(Configuration param) { + // 默认参数 + conf.setRetry(param.getInt(OTSConst.RETRY, 18)); + conf.setSleepInMillisecond(param.getInt(OTSConst.SLEEP_IN_MILLISECOND, 100)); + conf.setBatchWriteCount(param.getInt(OTSConst.BATCH_WRITE_COUNT, 100)); + conf.setConcurrencyWrite(param.getInt(OTSConst.CONCURRENCY_WRITE, 5)); + conf.setIoThreadCount(param.getInt(OTSConst.IO_THREAD_COUNT, 1)); + conf.setSocketTimeoutInMillisecond(param.getInt(OTSConst.SOCKET_TIMEOUTIN_MILLISECOND, 10000)); + conf.setConnectTimeoutInMillisecond(param.getInt(OTSConst.CONNECT_TIMEOUT_IN_MILLISECOND, 10000)); + } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyMultiversion.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyMultiversion.java new file mode 100644 index 00000000..6db75692 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyMultiversion.java @@ -0,0 +1,135 @@ +package com.alibaba.datax.plugin.writer.otswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.otswriter.model.*; +import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; +import com.alibaba.datax.plugin.writer.otswriter.utils.Common; +import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; +import com.alibaba.datax.plugin.writer.otswriter.utils.ParseRecord; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.PrimaryKey; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static com.alibaba.datax.plugin.writer.otswriter.utils.Common.getOTSInstance; + +public class OtsWriterSlaveProxyMultiversion implements IOtsWriterSlaveProxy { + + private OTSConf conf = null; + private SyncClientInterface ots = null; + private OTSSendBuffer buffer = null; + private Map pkColumnMapping = null; + private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyMultiversion.class); + + @Override + public void init(Configuration configuration) { + LOG.info("OtsWriterSlaveProxyMultiversion init begin"); + this.conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); + this.ots = getOTSInstance(conf); + this.pkColumnMapping = Common.getPkColumnMapping(conf.getEncodePkColumnMapping()); + buffer = new OTSSendBuffer(ots, conf); + LOG.info("init end"); + } + + @Override + public void close() throws OTSCriticalException { + LOG.info("close begin"); + ots.shutdown(); + LOG.info("close end"); + } + + @Override + public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws OTSCriticalException { + LOG.info("write begin"); + // 初始化全局垃圾回收器 + CollectorUtil.init(taskPluginCollector); + // Record format : {PK1, PK2, ...} {ColumnName} {TimeStamp} {Value} + int expectColumnCount = conf.getPrimaryKeyColumn().size()+ 3;// 3表示{ColumnName} {TimeStamp} {Value} + Record record = null; + PrimaryKey lastCellPk = null; + List rowBuffer = new ArrayList(); + while ((record = recordReceiver.getFromReader()) != null) { + + LOG.debug("Record Raw: {}", record.toString()); + + int columnCount = record.getColumnNumber(); + if (columnCount != expectColumnCount) { + // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 + throw new OTSCriticalException(String.format( + OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, + columnCount, + expectColumnCount, + record.toString() + )); + } + + PrimaryKey curPk = null; + if ((curPk = Common.getPKFromRecord(this.pkColumnMapping, record)) == null) { + continue; + } + + // check same row + if (lastCellPk == null) { + lastCellPk = curPk; + } else if (!lastCellPk.equals(curPk)) { + OTSLine line = ParseRecord.parseMultiVersionRecordToOTSLine( + conf.getTableName(), + conf.getOperation(), + pkColumnMapping, + conf.getColumnNamePrefixFilter(), + lastCellPk, + rowBuffer); + if (line != null) { + buffer.write(line); + } + rowBuffer.clear(); + lastCellPk = curPk; + } + rowBuffer.add(record); + } + // Flush剩余数据 + if (!rowBuffer.isEmpty()) { + OTSLine line = ParseRecord.parseMultiVersionRecordToOTSLine( + conf.getTableName(), + conf.getOperation(), + pkColumnMapping, + conf.getColumnNamePrefixFilter(), + lastCellPk, + rowBuffer); + if (line != null) { + buffer.write(line); + } + } + + buffer.close(); + LOG.info("write end"); + } + + public void setOts(SyncClientInterface ots){ + this.ots = ots; + } + + public OTSConf getConf() { + return conf; + } + + public void setConf(OTSConf conf) { + this.conf = conf; + } + + public void setBuffer(OTSSendBuffer buffer) { + this.buffer = buffer; + } + + public void setPkColumnMapping(Map pkColumnMapping) { + this.pkColumnMapping = pkColumnMapping; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyNormal.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyNormal.java new file mode 100644 index 00000000..aaa0ef04 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyNormal.java @@ -0,0 +1,153 @@ +package com.alibaba.datax.plugin.writer.otswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.otswriter.callable.GetTableMetaCallable; +import com.alibaba.datax.plugin.writer.otswriter.model.*; +import com.alibaba.datax.plugin.writer.otswriter.utils.*; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.alicloud.openservices.tablestore.model.TableMeta; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Map; + +import static com.alibaba.datax.plugin.writer.otswriter.utils.Common.getOTSInstance; + +public class OtsWriterSlaveProxyNormal implements IOtsWriterSlaveProxy { + + private OTSConf conf = null; + private SyncClientInterface ots = null; + private OTSSendBuffer buffer = null; + private Map pkColumnMapping = null; + private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyNormal.class); + private PrimaryKeySchema primaryKeySchema =null; + + + @Override + public void init(Configuration configuration) { + LOG.info("init begin"); + this.conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); + this.ots = getOTSInstance(conf); + if (!conf.isTimeseriesTable()){ + this.pkColumnMapping = Common.getPkColumnMapping(conf.getEncodePkColumnMapping()); + } + + buffer = new OTSSendBuffer(ots, conf); + + if(conf.getEnableAutoIncrement()){ + primaryKeySchema = getAutoIncrementKey(); + } + LOG.info("init end"); + } + + @Override + public void close() throws com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException { + LOG.info("close begin"); + ots.shutdown(); + LOG.info("close end"); + } + + @Override + public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException { + LOG.info("write begin"); + + // 初始化全局垃圾回收器 + CollectorUtil.init(taskPluginCollector); + int expectColumnCount = conf.getAttributeColumn().size(); + if (!conf.isTimeseriesTable()){ + expectColumnCount += conf.getPrimaryKeyColumn().size(); + } + Record record = null; + + while ((record = recordReceiver.getFromReader()) != null) { + + LOG.debug("Record Raw: {}", record.toString()); + + int columnCount = record.getColumnNumber(); + if (columnCount != expectColumnCount) { + // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 + throw new OTSCriticalException(String.format( + OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, + columnCount, + expectColumnCount, + record.toString() + )); + } + OTSLine line; + + if(conf.getEnableAutoIncrement()){ + line = ParseRecord.parseNormalRecordToOTSLineWithAutoIncrement( + conf.getTableName(), + conf.getOperation(), + pkColumnMapping, + conf.getAttributeColumn(), + record, + conf.getTimestamp(), + primaryKeySchema); + } + else if(!conf.isTimeseriesTable()){ + line = ParseRecord.parseNormalRecordToOTSLine( + conf.getTableName(), + conf.getOperation(), + pkColumnMapping, + conf.getAttributeColumn(), + record, + conf.getTimestamp()); + }else{ + line = ParseRecord.parseNormalRecordToOTSLineOfTimeseriesTable(conf.getAttributeColumn(), + record, conf.getTimeUnit()); + } + + + if (line != null) { + buffer.write(line); + } + } + + buffer.close(); + LOG.info("write end"); + } + + private PrimaryKeySchema getAutoIncrementKey() { + TableMeta tableMeta = null; + try { + tableMeta = RetryHelper.executeWithRetry( + new GetTableMetaCallable(ots, conf.getTableName()), + conf.getRetry(), + conf.getSleepInMillisecond() + ); + } catch (Exception e) { + throw new RuntimeException(e); + } + for (PrimaryKeySchema primaryKeySchema : tableMeta.getPrimaryKeyList()) { + if(primaryKeySchema.hasOption()){ + return primaryKeySchema; + } + } + return null; + } + + public void setOts(SyncClientInterface ots){ + this.ots = ots; + } + + public OTSConf getConf() { + return conf; + } + + public void setConf(OTSConf conf) { + this.conf = conf; + } + + public void setBuffer(OTSSendBuffer buffer) { + this.buffer = buffer; + } + + public void setPkColumnMapping(Map pkColumnMapping) { + this.pkColumnMapping = pkColumnMapping; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxy.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyOld.java similarity index 77% rename from otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxy.java rename to otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyOld.java index 762edfb4..625925f1 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxy.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyOld.java @@ -1,7 +1,16 @@ package com.alibaba.datax.plugin.writer.otswriter; -import com.alibaba.datax.plugin.writer.otswriter.model.*; -import com.alibaba.datax.plugin.writer.otswriter.utils.Common; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; +import com.alibaba.datax.plugin.writer.otswriter.utils.WriterRetryPolicy; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; +import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; +import com.alibaba.datax.plugin.writer.otswriter.utils.CommonOld; +import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; import com.aliyun.openservices.ots.*; import com.aliyun.openservices.ots.internal.OTSCallback; import com.aliyun.openservices.ots.internal.writer.WriterConfig; @@ -10,19 +19,13 @@ import org.apache.commons.math3.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import com.alibaba.datax.common.element.Record; -import com.alibaba.datax.common.plugin.RecordReceiver; -import com.alibaba.datax.common.plugin.TaskPluginCollector; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; - import java.util.List; import java.util.concurrent.Executors; -public class OtsWriterSlaveProxy { +public class OtsWriterSlaveProxyOld implements IOtsWriterSlaveProxy { - private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxy.class); + private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyOld.class); private OTSConf conf; private OTSAsync otsAsync; private OTSWriter otsWriter; @@ -54,14 +57,16 @@ public class OtsWriterSlaveProxy { } } + @Override public void init(Configuration configuration) { conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); - + ClientConfiguration clientConfigure = new ClientConfiguration(); clientConfigure.setIoThreadCount(conf.getIoThreadCount()); clientConfigure.setMaxConnections(conf.getConcurrencyWrite()); clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeout()); - clientConfigure.setConnectionTimeoutInMillisecond(conf.getConnectTimeout()); + // TODO + clientConfigure.setConnectionTimeoutInMillisecond(10000); OTSServiceConfiguration otsConfigure = new OTSServiceConfiguration(); otsConfigure.setRetryStrategy(new WriterRetryPolicy(conf)); @@ -75,39 +80,44 @@ public class OtsWriterSlaveProxy { otsConfigure); } + @Override public void close() { otsAsync.shutdown(); } - - public void write(RecordReceiver recordReceiver, TaskPluginCollector collector) throws Exception { + + @Override + public void write(RecordReceiver recordReceiver, TaskPluginCollector collector) throws OTSCriticalException { LOG.info("Writer slave started."); WriterConfig writerConfig = new WriterConfig(); writerConfig.setConcurrency(conf.getConcurrencyWrite()); writerConfig.setMaxBatchRowsCount(conf.getBatchWriteCount()); - writerConfig.setMaxBatchSize(conf.getRestrictConf().getRequestTotalSizeLimition()); - writerConfig.setBufferSize(conf.getBufferSize()); - writerConfig.setMaxAttrColumnSize(conf.getRestrictConf().getAttributeColumnSize()); - writerConfig.setMaxColumnsCount(conf.getRestrictConf().getMaxColumnsCount()); - writerConfig.setMaxPKColumnSize(conf.getRestrictConf().getPrimaryKeyColumnSize()); + // TODO + writerConfig.setMaxBatchSize(1024 * 1024); + writerConfig.setBufferSize(1024); + writerConfig.setMaxAttrColumnSize(2 * 1024 * 1024); + writerConfig.setMaxColumnsCount(1024); + writerConfig.setMaxPKColumnSize(1024); + otsWriter = new DefaultOTSWriter(otsAsync, conf.getTableName(), writerConfig, new WriterCallback(collector), Executors.newFixedThreadPool(3)); int expectColumnCount = conf.getPrimaryKeyColumn().size() + conf.getAttributeColumn().size(); Record record; while ((record = recordReceiver.getFromReader()) != null) { LOG.debug("Record Raw: {}", record.toString()); - + int columnCount = record.getColumnNumber(); if (columnCount != expectColumnCount) { // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 - throw new IllegalArgumentException(String.format(OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, columnCount, expectColumnCount)); + throw new IllegalArgumentException(String.format(OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, columnCount, expectColumnCount, record.toString())); } - + + // 类型转换 try { - RowPrimaryKey primaryKey = Common.getPKFromRecord(conf.getPrimaryKeyColumn(), record); - List> attributes = Common.getAttrFromRecord(conf.getPrimaryKeyColumn().size(), conf.getAttributeColumn(), record); - RowChange rowChange = Common.columnValuesToRowChange(conf.getTableName(), conf.getOperation(), primaryKey, attributes); + RowPrimaryKey primaryKey = CommonOld.getPKFromRecord(conf.getPrimaryKeyColumn(), record); + List> attributes = CommonOld.getAttrFromRecord(conf.getPrimaryKeyColumn().size(), conf.getAttributeColumn(), record); + RowChange rowChange = CommonOld.columnValuesToRowChange(conf.getTableName(), conf.getOperation(), primaryKey, attributes); WithRecord withRecord = (WithRecord)rowChange; withRecord.setRecord(record); otsWriter.addRowChange(rowChange); diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/BatchWriteRowCallable.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/BatchWriteRowCallable.java new file mode 100644 index 00000000..f7330937 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/BatchWriteRowCallable.java @@ -0,0 +1,25 @@ +package com.alibaba.datax.plugin.writer.otswriter.callable; + +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.BatchWriteRowRequest; +import com.alicloud.openservices.tablestore.model.BatchWriteRowResponse; + +import java.util.concurrent.Callable; + +public class BatchWriteRowCallable implements Callable{ + + private SyncClientInterface ots = null; + private BatchWriteRowRequest batchWriteRowRequest = null; + + public BatchWriteRowCallable(SyncClientInterface ots, BatchWriteRowRequest batchWriteRowRequest) { + this.ots = ots; + this.batchWriteRowRequest = batchWriteRowRequest; + + } + + @Override + public BatchWriteRowResponse call() throws Exception { + return ots.batchWriteRow(batchWriteRowRequest); + } + +} \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallable.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallable.java index d4128e14..b3b26d76 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallable.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallable.java @@ -1,29 +1,27 @@ package com.alibaba.datax.plugin.writer.otswriter.callable; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.DescribeTableRequest; +import com.alicloud.openservices.tablestore.model.DescribeTableResponse; +import com.alicloud.openservices.tablestore.model.TableMeta; + import java.util.concurrent.Callable; -import com.aliyun.openservices.ots.OTSClient; -import com.aliyun.openservices.ots.model.DescribeTableRequest; -import com.aliyun.openservices.ots.model.DescribeTableResult; -import com.aliyun.openservices.ots.model.TableMeta; - public class GetTableMetaCallable implements Callable{ - private OTSClient ots = null; + private SyncClientInterface ots = null; private String tableName = null; - public GetTableMetaCallable(OTSClient ots, String tableName) { + public GetTableMetaCallable(SyncClientInterface ots, String tableName) { this.ots = ots; this.tableName = tableName; } @Override public TableMeta call() throws Exception { - DescribeTableRequest describeTableRequest = new DescribeTableRequest(); - describeTableRequest.setTableName(tableName); - DescribeTableResult result = ots.describeTable(describeTableRequest); - TableMeta tableMeta = result.getTableMeta(); - return tableMeta; + DescribeTableRequest describeTableRequest = new DescribeTableRequest(tableName); + DescribeTableResponse result = ots.describeTable(describeTableRequest); + return result.getTableMeta(); } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallableOld.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallableOld.java new file mode 100644 index 00000000..af7d5088 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallableOld.java @@ -0,0 +1,29 @@ +package com.alibaba.datax.plugin.writer.otswriter.callable; + +import com.aliyun.openservices.ots.OTSClient; +import com.aliyun.openservices.ots.model.DescribeTableRequest; +import com.aliyun.openservices.ots.model.DescribeTableResult; +import com.aliyun.openservices.ots.model.TableMeta; + +import java.util.concurrent.Callable; + +public class GetTableMetaCallableOld implements Callable{ + + private OTSClient ots = null; + private String tableName = null; + + public GetTableMetaCallableOld(OTSClient ots, String tableName) { + this.ots = ots; + this.tableName = tableName; + } + + @Override + public TableMeta call() throws Exception { + DescribeTableRequest describeTableRequest = new DescribeTableRequest(); + describeTableRequest.setTableName(tableName); + DescribeTableResult result = ots.describeTable(describeTableRequest); + TableMeta tableMeta = result.getTableMeta(); + return tableMeta; + } + +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutRowChangeCallable.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutRowChangeCallable.java new file mode 100644 index 00000000..b3857094 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutRowChangeCallable.java @@ -0,0 +1,24 @@ +package com.alibaba.datax.plugin.writer.otswriter.callable; + +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.PutRowRequest; +import com.alicloud.openservices.tablestore.model.PutRowResponse; + +import java.util.concurrent.Callable; + +public class PutRowChangeCallable implements Callable{ + + private SyncClientInterface ots = null; + private PutRowRequest putRowRequest = null; + + public PutRowChangeCallable(SyncClientInterface ots, PutRowRequest putRowRequest) { + this.ots = ots; + this.putRowRequest = putRowRequest; + } + + @Override + public PutRowResponse call() throws Exception { + return ots.putRow(putRowRequest); + } + +} \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutTimeseriesDataCallable.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutTimeseriesDataCallable.java new file mode 100644 index 00000000..664f4b41 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutTimeseriesDataCallable.java @@ -0,0 +1,22 @@ +package com.alibaba.datax.plugin.writer.otswriter.callable; + +import com.alicloud.openservices.tablestore.TimeseriesClient; +import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataRequest; +import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataResponse; + +import java.util.concurrent.Callable; + +public class PutTimeseriesDataCallable implements Callable { + private TimeseriesClient client = null; + private PutTimeseriesDataRequest putTimeseriesDataRequest = null; + + public PutTimeseriesDataCallable(TimeseriesClient client, PutTimeseriesDataRequest putTimeseriesDataRequest) { + this.client = client; + this.putTimeseriesDataRequest = putTimeseriesDataRequest; + } + + @Override + public PutTimeseriesDataResponse call() throws Exception { + return client.putTimeseriesData(putTimeseriesDataRequest); + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/UpdateRowChangeCallable.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/UpdateRowChangeCallable.java new file mode 100644 index 00000000..c302e3a1 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/UpdateRowChangeCallable.java @@ -0,0 +1,24 @@ +package com.alibaba.datax.plugin.writer.otswriter.callable; + +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.model.UpdateRowRequest; +import com.alicloud.openservices.tablestore.model.UpdateRowResponse; + +import java.util.concurrent.Callable; + +public class UpdateRowChangeCallable implements Callable{ + + private SyncClientInterface ots = null; + private UpdateRowRequest updateRowRequest = null; + + public UpdateRowChangeCallable(SyncClientInterface ots, UpdateRowRequest updateRowRequest ) { + this.ots = ots; + this.updateRowRequest = updateRowRequest; + } + + @Override + public UpdateRowResponse call() throws Exception { + return ots.updateRow(updateRowRequest); + } + +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/LogExceptionManager.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/LogExceptionManager.java deleted file mode 100644 index 93175ddb..00000000 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/LogExceptionManager.java +++ /dev/null @@ -1,58 +0,0 @@ -package com.alibaba.datax.plugin.writer.otswriter.model; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.aliyun.openservices.ots.OTSErrorCode; -import com.aliyun.openservices.ots.OTSException; - -/** - * 添加这个类的主要目的是为了解决当用户遇到CU不够时,打印大量的日志 - * @author redchen - * - */ -public class LogExceptionManager { - - private long count = 0; - private long updateTimestamp = 0; - - private static final Logger LOG = LoggerFactory.getLogger(LogExceptionManager.class); - - private synchronized void countAndReset() { - count++; - long cur = System.currentTimeMillis(); - long interval = cur - updateTimestamp; - if (interval >= 10000) { - LOG.warn("Call callable fail, OTSNotEnoughCapacityUnit, total times:"+ count +", time range:"+ (interval/1000) +"s, times per second:" + ((float)count / (interval/1000))); - count = 0; - updateTimestamp = cur; - } - } - - public synchronized void addException(Exception exception) { - if (exception instanceof OTSException) { - OTSException e = (OTSException)exception; - if (e.getErrorCode().equals(OTSErrorCode.NOT_ENOUGH_CAPACITY_UNIT)) { - countAndReset(); - } else { - LOG.warn( - "Call callable fail, OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", - new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} - ); - } - } else { - LOG.warn("Call callable fail, {}", exception.getMessage()); - } - } - - public synchronized void addException(com.aliyun.openservices.ots.model.Error error, String requestId) { - if (error.getCode().equals(OTSErrorCode.NOT_ENOUGH_CAPACITY_UNIT)) { - countAndReset(); - } else { - LOG.warn( - "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", - new Object[]{error.getCode(), error.getMessage(), requestId} - ); - } - } -} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSAttrColumn.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSAttrColumn.java index d37960e0..7564130a 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSAttrColumn.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSAttrColumn.java @@ -1,16 +1,33 @@ package com.alibaba.datax.plugin.writer.otswriter.model; -import com.aliyun.openservices.ots.model.ColumnType; +import com.alicloud.openservices.tablestore.model.ColumnType; + public class OTSAttrColumn { - private String name; - private ColumnType type; + // 该字段只在多版本中使用,表示多版本中,输入源中columnName的值,由将对应的Cell写入用户配置name的列中 + private String srcName = null; + private String name = null; + private ColumnType type = null; + //该字段只在写入时序表时使用,该字段是否为时序数据的标签内部字段 + private Boolean isTag = false; public OTSAttrColumn(String name, ColumnType type) { this.name = name; this.type = type; } + public OTSAttrColumn(String srcName, String name, ColumnType type) { + this.srcName = srcName; + this.name = name; + this.type = type; + } + + public OTSAttrColumn(String name, ColumnType type, Boolean isTag) { + this.name = name; + this.type = type; + this.isTag = isTag; + } + public String getName() { return name; } @@ -18,4 +35,12 @@ public class OTSAttrColumn { public ColumnType getType() { return type; } + + public String getSrcName() { + return srcName; + } + + public Boolean getTag() { + return isTag; + } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriteRowTaskManager.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriteRowTaskManager.java new file mode 100644 index 00000000..fdeb2825 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriteRowTaskManager.java @@ -0,0 +1,41 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alicloud.openservices.tablestore.SyncClientInterface; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + +/** + * 控制Task的并发数目 + * + */ +public class OTSBatchWriteRowTaskManager implements OTSTaskManagerInterface { + + private SyncClientInterface ots = null; + private OTSBlockingExecutor executorService = null; + private OTSConf conf = null; + + private static final Logger LOG = LoggerFactory.getLogger(OTSBatchWriteRowTaskManager.class); + + public OTSBatchWriteRowTaskManager( + SyncClientInterface ots, + OTSConf conf) { + this.ots = ots; + this.conf = conf; + + executorService = new OTSBlockingExecutor(conf.getConcurrencyWrite()); + } + + public void execute(List lines) throws Exception { + LOG.debug("Begin execute."); + executorService.execute(new OTSBatchWriterRowTask(ots, conf, lines)); + LOG.debug("End execute."); + } + + public void close() throws Exception { + LOG.debug("Begin close."); + executorService.shutdown(); + LOG.debug("End close."); + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriterRowTask.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriterRowTask.java new file mode 100644 index 00000000..416526fd --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriterRowTask.java @@ -0,0 +1,196 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; +import com.alibaba.datax.plugin.writer.otswriter.callable.BatchWriteRowCallable; +import com.alibaba.datax.plugin.writer.otswriter.callable.PutRowChangeCallable; +import com.alibaba.datax.plugin.writer.otswriter.callable.UpdateRowChangeCallable; +import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; +import com.alibaba.datax.plugin.writer.otswriter.utils.Common; +import com.alibaba.datax.plugin.writer.otswriter.utils.LineAndError; +import com.alibaba.datax.plugin.writer.otswriter.utils.RetryHelper; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TableStoreException; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.BatchWriteRowResponse.RowResult; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +public class OTSBatchWriterRowTask implements Runnable { + private SyncClientInterface ots = null; + private OTSConf conf = null; + private List otsLines = new ArrayList(); + + private boolean isDone = false; + private int retryTimes = 0; + + private static final Logger LOG = LoggerFactory.getLogger(OTSBatchWriterRowTask.class); + + public OTSBatchWriterRowTask( + final SyncClientInterface ots, + final OTSConf conf, + final List lines + ) { + this.ots = ots; + this.conf = conf; + + this.otsLines.addAll(lines); + } + + @Override + public void run() { + LOG.debug("Begin run"); + sendAll(otsLines); + LOG.debug("End run"); + } + + public boolean isDone() { + return this.isDone; + } + + private boolean isExceptionForSendOneByOne(TableStoreException ee) { + if (ee.getErrorCode().equals(OTSErrorCode.INVALID_PARAMETER)|| + ee.getErrorCode().equals(OTSErrorCode.REQUEST_TOO_LARGE) + ) { + return true; + } + return false; + } + + private BatchWriteRowRequest createRequest(List lines) { + BatchWriteRowRequest newRequest = new BatchWriteRowRequest(); + switch (conf.getOperation()) { + case PUT_ROW: + case UPDATE_ROW: + for (OTSLine l : lines) { + newRequest.addRowChange(l.getRowChange()); + } + break; + default: + throw new RuntimeException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, conf.getOperation())); + } + return newRequest; + } + + /** + * 单行发送数据 + * @param line + */ + public void sendLine(OTSLine line) { + try { + switch (conf.getOperation()) { + case PUT_ROW: + PutRowRequest putRowRequest = new PutRowRequest(); + putRowRequest.setRowChange((RowPutChange) line.getRowChange()); + PutRowResponse putResult = RetryHelper.executeWithRetry( + new PutRowChangeCallable(ots, putRowRequest), + conf.getRetry(), + conf.getSleepInMillisecond()); + LOG.debug("Requst ID : {}", putResult.getRequestId()); + break; + case UPDATE_ROW: + UpdateRowRequest updateRowRequest = new UpdateRowRequest(); + updateRowRequest.setRowChange((RowUpdateChange) line.getRowChange()); + UpdateRowResponse updateResult = RetryHelper.executeWithRetry( + new UpdateRowChangeCallable(ots, updateRowRequest), + conf.getRetry(), + conf.getSleepInMillisecond()); + LOG.debug("Requst ID : {}", updateResult.getRequestId()); + break; + } + } catch (Exception e) { + LOG.warn("sendLine fail. ", e); + CollectorUtil.collect(line.getRecords(), e.getMessage()); + } + } + + private void sendAllOneByOne(List lines) { + for (OTSLine l : lines) { + sendLine(l); + } + } + + /** + * 批量发送数据 + * 如果程序发送失败,BatchWriteRow接口可能整体异常返回或者返回每个子行的操作状态 + * 1.在整体异常的情况下:方法会检查这个异常是否能通过把批量数据拆分成单行发送,如果不行, + * 将会把这一批数据记录到脏数据回收器中,如果可以,方法会调用sendAllOneByOne进行单行数据发送。 + * 2.如果BatchWriteRow成功执行,方法会检查每行的返回状态,如果子行操作失败,方法会收集所有失 + * 败的行,重新调用sendAll,发送失败的数据。 + * @param lines + */ + private void sendAll(List lines) { + try { + Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, conf.getSleepInMillisecond())); + BatchWriteRowRequest batchWriteRowRequest = createRequest(lines); + BatchWriteRowResponse result = RetryHelper.executeWithRetry( + new BatchWriteRowCallable(ots, batchWriteRowRequest), + conf.getRetry(), + conf.getSleepInMillisecond()); + + LOG.debug("Requst ID : {}", result.getRequestId()); + List errors = getLineAndError(result, lines); + if (!errors.isEmpty()){ + if(retryTimes < conf.getRetry()) { + retryTimes++; + LOG.warn("Retry times : {}", retryTimes); + List newLines = new ArrayList(); + for (LineAndError re : errors) { + LOG.warn("Because: {}", re.getError().getMessage()); + if (RetryHelper.canRetry(re.getError().getCode())) { + newLines.add(re.getLine()); + } else { + LOG.warn("Can not retry, record row to collector. {}", re.getError().getMessage()); + CollectorUtil.collect(re.getLine().getRecords(), re.getError().getMessage()); + } + } + if (!newLines.isEmpty()) { + sendAll(newLines); + } + } else { + LOG.warn("Retry times more than limitation. RetryTime : {}", retryTimes); + CollectorUtil.collect(errors); + } + } + } catch (TableStoreException e) { + LOG.warn("Send data fail. {}", e.getMessage()); + if (isExceptionForSendOneByOne(e)) { + if (lines.size() == 1) { + LOG.warn("Can not retry.", e); + CollectorUtil.collect(e.getMessage(), lines); + } else { + // 进入单行发送的分支 + sendAllOneByOne(lines); + } + } else { + LOG.error("Can not send lines to OTS for RuntimeException.", e); + CollectorUtil.collect(e.getMessage(), lines); + } + } catch (Exception e) { + LOG.error("Can not send lines to OTS for Exception.", e); + CollectorUtil.collect(e.getMessage(), lines); + } + } + + private List getLineAndError(BatchWriteRowResponse result, List lines) throws OTSCriticalException { + List errors = new ArrayList(); + + switch(conf.getOperation()) { + case PUT_ROW: + case UPDATE_ROW: { + List status = result.getFailedRows(); + for (RowResult r : status) { + errors.add(new LineAndError(lines.get(r.getIndex()), r.getError())); + } + } + break; + default: + LOG.error("Bug branch."); + throw new OTSCriticalException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, conf.getOperation())); + } + return errors; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBlockingExecutor.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBlockingExecutor.java new file mode 100644 index 00000000..059ba338 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBlockingExecutor.java @@ -0,0 +1,55 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.concurrent.*; + +/** + * 单个Channel会多线程并发的写入数据到OTS中,需要使用一个固定的线程池来执行Runnable对象,同时当 + * 线程池满时,阻塞execute方法。原生的Executor并不能做到阻塞execute方法。只是当queue满时, + * 方法抛出默认RejectedExecutionException,或者我们实现RejectedExecutionHandler, + * 这两种方法都无法满足阻塞用户请求的需求,所以我们用信号量来实现了一个阻塞的Executor + * @author redchen + * + */ +public class OTSBlockingExecutor { + private final ExecutorService exec; + private final Semaphore semaphore; + + private static final Logger LOG = LoggerFactory.getLogger(OTSBlockingExecutor.class); + + public OTSBlockingExecutor(int concurrency) { + this.exec = new ThreadPoolExecutor( + concurrency, concurrency, + 0L, TimeUnit.SECONDS, + new LinkedBlockingQueue()); + this.semaphore = new Semaphore(concurrency); + } + + public void execute(final Runnable task) + throws InterruptedException { + LOG.debug("Begin execute"); + try { + semaphore.acquire(); + exec.execute(new Runnable() { + public void run() { + try { + task.run(); + } finally { + semaphore.release(); + } + } + }); + } catch (RejectedExecutionException e) { + semaphore.release(); + throw new RuntimeException(OTSErrorMessage.INSERT_TASK_ERROR); + } + LOG.debug("End execute"); + } + + public void shutdown() throws InterruptedException { + this.exec.shutdown(); + while (!this.exec.awaitTermination(1, TimeUnit.SECONDS)){} + } +} \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConf.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConf.java index bd7eccc5..fee9ed55 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConf.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConf.java @@ -1,73 +1,51 @@ package com.alibaba.datax.plugin.writer.otswriter.model; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; + import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; public class OTSConf { - private String endpoint; - private String accessId; - private String accessKey; - private String instanceName; - private String tableName; + private String endpoint= null; + private String accessId = null; + private String accessKey = null; + private String instanceName = null; + private String tableName = null; + - private List primaryKeyColumn; - private List attributeColumn; - - private int bufferSize = 1024; - private int retry = 18; - private int sleepInMillisecond = 100; - private int batchWriteCount = 10; - private int concurrencyWrite = 5; - private int ioThreadCount = 1; - private int socketTimeout = 20000; - private int connectTimeout = 10000; + private List primaryKeyColumn = null; + private List attributeColumn = null; + + private int retry = -1; + private int sleepInMillisecond = -1; + private int batchWriteCount = -1; + private int concurrencyWrite = -1; + private int ioThreadCount = -1; + private int socketTimeoutInMillisecond = -1; + private int connectTimeoutInMillisecond = -1; - private OTSOpType operation; - private RestrictConf restrictConf; + private OTSOpType operation = null; - //限制项 - public class RestrictConf { - private int requestTotalSizeLimition = 1024 * 1024; - private int primaryKeyColumnSize = 1024; - private int attributeColumnSize = 2 * 1024 * 1024; - private int maxColumnsCount = 1024; + private int requestTotalSizeLimitation = -1; + + private OTSMode mode = null; + private boolean enableAutoIncrement = false; + private boolean isNewVersion = false; + private boolean isTimeseriesTable = false; + private TimeUnit timeUnit = TimeUnit.MICROSECONDS; + private long timestamp = -1; + private Map encodePkColumnMapping = null; + private String columnNamePrefixFilter = null; - public int getRequestTotalSizeLimition() { - return requestTotalSizeLimition; - } - public void setRequestTotalSizeLimition(int requestTotalSizeLimition) { - this.requestTotalSizeLimition = requestTotalSizeLimition; - } - - public void setPrimaryKeyColumnSize(int primaryKeyColumnSize) { - this.primaryKeyColumnSize = primaryKeyColumnSize; - } - - public void setAttributeColumnSize(int attributeColumnSize) { - this.attributeColumnSize = attributeColumnSize; - } - - public void setMaxColumnsCount(int maxColumnsCount) { - this.maxColumnsCount = maxColumnsCount; - } - - public int getAttributeColumnSize() { - return attributeColumnSize; - } - - public int getMaxColumnsCount() { - return maxColumnsCount; - } - - public int getPrimaryKeyColumnSize() { - return primaryKeyColumnSize; - } + public Map getEncodePkColumnMapping() { + return encodePkColumnMapping; } - - public RestrictConf getRestrictConf() { - return restrictConf; + public void setEncodePkColumnMapping(Map encodePkColumnMapping) { + this.encodePkColumnMapping = encodePkColumnMapping; } - public void setRestrictConf(RestrictConf restrictConf) { - this.restrictConf = restrictConf; + public int getSocketTimeoutInMillisecond() { + return socketTimeoutInMillisecond; } public OTSOpType getOperation() { return operation; @@ -75,10 +53,10 @@ public class OTSConf { public void setOperation(OTSOpType operation) { this.operation = operation; } - public List getPrimaryKeyColumn() { + public List getPrimaryKeyColumn() { return primaryKeyColumn; } - public void setPrimaryKeyColumn(List primaryKeyColumn) { + public void setPrimaryKeyColumn(List primaryKeyColumn) { this.primaryKeyColumn = primaryKeyColumn; } @@ -149,24 +127,72 @@ public class OTSConf { this.ioThreadCount = ioThreadCount; } public int getSocketTimeout() { - return socketTimeout; + return socketTimeoutInMillisecond; } - public void setSocketTimeout(int socketTimeout) { - this.socketTimeout = socketTimeout; + public void setSocketTimeoutInMillisecond(int socketTimeoutInMillisecond) { + this.socketTimeoutInMillisecond = socketTimeoutInMillisecond; } - public int getConnectTimeout() { - return connectTimeout; + public int getConnectTimeoutInMillisecond() { + return connectTimeoutInMillisecond; + } + public void setConnectTimeoutInMillisecond(int connectTimeoutInMillisecond) { + this.connectTimeoutInMillisecond = connectTimeoutInMillisecond; + } + public OTSMode getMode() { + return mode; + } + public void setMode(OTSMode mode) { + this.mode = mode; + } + public long getTimestamp() { + return timestamp; + } + public void setTimestamp(long timestamp) { + this.timestamp = timestamp; + } + public String getColumnNamePrefixFilter() { + return columnNamePrefixFilter; + } + public void setColumnNamePrefixFilter(String columnNamePrefixFilter) { + this.columnNamePrefixFilter = columnNamePrefixFilter; } - public int getBufferSize() { - return bufferSize; + public boolean getEnableAutoIncrement() { + return enableAutoIncrement; } - public void setBufferSize(int bufferSize) { - this.bufferSize = bufferSize; + public void setEnableAutoIncrement(boolean enableAutoIncrement) { + this.enableAutoIncrement = enableAutoIncrement; + } + public boolean isNewVersion() { + return isNewVersion; } - public void setConnectTimeout(int connectTimeout) { - this.connectTimeout = connectTimeout; + public void setNewVersion(boolean newVersion) { + isNewVersion = newVersion; + } + + public boolean isTimeseriesTable() { + return isTimeseriesTable; + } + + public void setTimeseriesTable(boolean timeseriesTable) { + isTimeseriesTable = timeseriesTable; + } + + public TimeUnit getTimeUnit() { + return timeUnit; + } + + public void setTimeUnit(TimeUnit timeUnit) { + this.timeUnit = timeUnit; + } + + public int getRequestTotalSizeLimitation() { + return requestTotalSizeLimitation; + } + + public void setRequestTotalSizeLimitation(int requestTotalSizeLimitation) { + this.requestTotalSizeLimitation = requestTotalSizeLimitation; } } \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConst.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConst.java index 1b8f8053..bda736e8 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConst.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConst.java @@ -2,20 +2,27 @@ package com.alibaba.datax.plugin.writer.otswriter.model; public class OTSConst { // Reader support type - public final static String TYPE_STRING = "STRING"; + public final static String TYPE_STRING = "STRING"; public final static String TYPE_INTEGER = "INT"; - public final static String TYPE_DOUBLE = "DOUBLE"; + public final static String TYPE_DOUBLE = "DOUBLE"; public final static String TYPE_BOOLEAN = "BOOL"; - public final static String TYPE_BINARY = "BINARY"; - + public final static String TYPE_BINARY = "BINARY"; + // Column public final static String NAME = "name"; + public final static String SRC_NAME = "srcName"; public final static String TYPE = "type"; - + public final static String IS_TAG = "is_timeseries_tag"; + public final static String OTS_CONF = "OTS_CONF"; - + + public final static String OTS_MODE_NORMAL = "normal"; + public final static String OTS_MODE_MULTI_VERSION = "multiVersion"; + public final static String OTS_MODE_TIME_SERIES = "timeseries"; + public final static String OTS_OP_TYPE_PUT = "PutRow"; public final static String OTS_OP_TYPE_UPDATE = "UpdateRow"; + // only support in old version public final static String OTS_OP_TYPE_DELETE = "DeleteRow"; // options @@ -24,13 +31,13 @@ public class OTSConst { public final static String BATCH_WRITE_COUNT = "batchWriteCount"; public final static String CONCURRENCY_WRITE = "concurrencyWrite"; public final static String IO_THREAD_COUNT = "ioThreadCount"; - public final static String SOCKET_TIMEOUT = "socketTimeoutInMillisecond"; - public final static String CONNECT_TIMEOUT = "connectTimeoutInMillisecond"; - public final static String BUFFER_SIZE = "bufferSize"; - - // 限制项 + public final static String MAX_CONNECT_COUNT = "maxConnectCount"; + public final static String SOCKET_TIMEOUTIN_MILLISECOND = "socketTimeoutInMillisecond"; + public final static String CONNECT_TIMEOUT_IN_MILLISECOND = "connectTimeoutInMillisecond"; public final static String REQUEST_TOTAL_SIZE_LIMITATION = "requestTotalSizeLimitation"; - public final static String ATTRIBUTE_COLUMN_SIZE_LIMITATION = "attributeColumnSizeLimitation"; - public final static String PRIMARY_KEY_COLUMN_SIZE_LIMITATION = "primaryKeyColumnSizeLimitation"; - public final static String ATTRIBUTE_COLUMN_MAX_COUNT = "attributeColumnMaxCount"; -} \ No newline at end of file + + public static final String MEASUREMENT_NAME = "_m_name"; + public static final String DATA_SOURCE = "_data_source"; + public static final String TAGS = "_tags"; + public static final String TIME = "_time"; +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSErrorMessage.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSErrorMessage.java index 9523342f..4bde553a 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSErrorMessage.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSErrorMessage.java @@ -2,13 +2,19 @@ package com.alibaba.datax.plugin.writer.otswriter.model; public class OTSErrorMessage { - public static final String OPERATION_PARSE_ERROR = "The 'writeMode' only support 'PutRow', 'UpdateRow' or 'DeleteRow', not '%s'."; + public static final String MODE_PARSE_ERROR = "The 'mode' only support 'normal' and 'multiVersion' not '%s'."; + + public static final String OPERATION_PARSE_ERROR = "The 'writeMode' only support 'PutRow' and 'UpdateRow' not '%s'."; + + public static final String MUTLI_MODE_OPERATION_PARSE_ERROR = "When configurion set mode='MultiVersion', the 'writeMode' only support 'UpdateRow' not '%s'."; public static final String UNSUPPORT_PARSE = "Unsupport parse '%s' to '%s'."; - public static final String RECORD_AND_COLUMN_SIZE_ERROR = "Size of record not equal size of config column. record size : %d, config column size : %d."; + public static final String UNSUPPORT = "Unsupport : '%s'."; - public static final String PK_TYPE_ERROR = "Primary key type only support 'string' and 'int', not support '%s'."; + public static final String RECORD_AND_COLUMN_SIZE_ERROR = "Size of record not equal size of config column. record size : %d, config column size : %d, record data : %s."; + + public static final String PK_TYPE_ERROR = "Primary key type only support 'string', 'int' and 'binary', not support '%s'."; public static final String ATTR_TYPE_ERROR = "Column type only support 'string','int','double','bool' and 'binary', not support '%s'."; @@ -17,7 +23,9 @@ public class OTSErrorMessage { public static final String INPUT_PK_COUNT_NOT_EQUAL_META_ERROR = "The count of 'primaryKey' not equal meta, input count : %d, primary key count : %d in meta."; public static final String INPUT_PK_TYPE_NOT_MATCH_META_ERROR = "The type of 'primaryKey' not match meta, column name : %s, input type: %s, primary key type : %s in meta."; - + + public static final String INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR = "The input primary column '%s' is not exist in meta."; + public static final String ATTR_REPEAT_COLUMN_ERROR = "Repeat column '%s' in 'column'."; public static final String MISSING_PARAMTER_ERROR = "The param '%s' is not exist."; @@ -36,25 +44,49 @@ public class OTSErrorMessage { public static final String ATTR_MAP_NAME_TYPE_ERROR = "The 'name' and 'type only support string in json map of 'column'."; + public static final String ATTR_MAP_SRCNAME_NAME_TYPE_ERROR = "The 'srcName', 'name' and 'type' only support string in json map of 'column'."; + + public static final String PK_MAP_KEY_TYPE_ERROR = "The '%s' only support string in json map of 'primaryKey'."; + + public static final String ATTR_MAP_KEY_TYPE_ERROR = "The '%s' only support string in json map of 'column'."; + public static final String PK_MAP_INCLUDE_NAME_TYPE_ERROR = "The only support 'name' and 'type' fileds in json map of 'primaryKey'."; public static final String ATTR_MAP_INCLUDE_NAME_TYPE_ERROR = "The only support 'name' and 'type' fileds in json map of 'column'."; - public static final String PK_ITEM_IS_NOT_MAP_ERROR = "The item is not map in 'primaryKey'."; + public static final String PK_MAP_FILED_MISSING_ERROR = "The '%s' fileds is missing in json map of 'primaryKey'."; + + public static final String ATTR_MAP_FILED_MISSING_ERROR = "The '%s' fileds is missing in json map of 'column'."; + + public static final String ATTR_MAP_INCLUDE_SRCNAME_NAME_TYPE_ERROR = "The only support 'srcName', 'name' and 'type' fileds in json map of 'column'."; + + public static final String PK_ITEM_IS_ILLEAGAL_ERROR = "The item is not string or map in 'primaryKey'."; + + public static final String PK_IS_NOT_EXIST_AT_OTS_ERROR = "Can not find the pk('%s') at ots in 'primaryKey'."; public static final String ATTR_ITEM_IS_NOT_MAP_ERROR = "The item is not map in 'column'."; public static final String PK_COLUMN_NAME_IS_EMPTY_ERROR = "The name of item can not be a empty string in 'primaryKey'."; + public static final String PK_COLUMN_TYPE_IS_EMPTY_ERROR = "The type of item can not be a empty string in 'primaryKey'."; + public static final String ATTR_COLUMN_NAME_IS_EMPTY_ERROR = "The name of item can not be a empty string in 'column'."; - public static final String MULTI_ATTR_COLUMN_ERROR = "Multi item in 'column', column name : %s ."; + public static final String ATTR_COLUMN_SRC_NAME_IS_EMPTY_ERROR = "The srcName of item can not be a empty string in 'column'."; + + public static final String ATTR_COLUMN_TYPE_IS_EMPTY_ERROR = "The type of item can not be a empty string in 'column'."; + + public static final String MULTI_PK_ATTR_COLUMN_ERROR = "Duplicate item in 'column' and 'primaryKey', column name : %s ."; + + public static final String MULTI_ATTR_COLUMN_ERROR = "Duplicate item in 'column', column name : %s ."; + + public static final String MULTI_ATTR_SRC_COLUMN_ERROR = "Duplicate src name in 'column', src name : %s ."; public static final String COLUMN_CONVERSION_ERROR = "Column coversion error, src type : %s, src value: %s, expect type: %s ."; public static final String PK_COLUMN_VALUE_IS_NULL_ERROR = "The column of record is NULL, primary key name : %s ."; - public static final String PK_STRONG_LENGTH_ERROR = "The length of pk string value is more than configuration, conf: %d, input: %d ."; + public static final String PK_STRING_LENGTH_ERROR = "The length of pk string value is more than configuration, conf: %d, input: %d ."; public static final String ATTR_STRING_LENGTH_ERROR = "The length of attr string value is more than configuration, conf: %d, input: %d ."; @@ -63,4 +95,31 @@ public class OTSErrorMessage { public static final String LINE_LENGTH_ERROR = "The length of row is more than length of request configuration, conf: %d, row: %d ."; public static final String INSERT_TASK_ERROR = "Can not execute the task, becase the ExecutorService is shutdown."; + + public static final String COLUMN_NOT_DEFINE = "The column name : '%s' not define in column."; + + public static final String INPUT_RECORDS_IS_EMPTY = "The input records can not be empty."; + + public static final String MULTI_VERSION_TIMESTAMP_IS_EMPTY = "The input timestamp can not be empty in the multiVersion mode."; + + public static final String MULTI_VERSION_VALUE_IS_EMPTY = "The input value can not be empty in the multiVersion mode."; + + public static final String INPUT_COLUMN_COUNT_LIMIT = "The input count(%d) of column more than max(%d)."; + + public static final String PUBLIC_SDK_NO_SUPPORT_MULTI_VERSION = "The old version do not support multi version function. Please add config in otswriter: \"newVersion\":\"true\" ."; + + public static final String PUBLIC_SDK_NO_SUPPORT_AUTO_INCREMENT = "The old version do not support auto increment primary key function. Please add config in otswriter: \"newVersion\":\"true\" ."; + + public static final String NOT_SUPPORT_MULTI_VERSION_AUTO_INCREMENT = "The multi version mode do not support auto increment primary key function."; + + public static final String PUBLIC_SDK_NO_SUPPORT_TIMESERIES_TABLE = "The old version do not support write timeseries table. Please add config in otswriter: \"newVersion\":\"true\" ."; + + public static final String NOT_SUPPORT_TIMESERIES_TABLE_AUTO_INCREMENT = "The timeseries table do not support auto increment primary key function."; + + public static final String NO_FOUND_M_NAME_FIELD_ERROR = "The '_m_name' field should be set in columns because 'measurement' is required in timeseries data."; + + public static final String NO_FOUND_TIME_FIELD_ERROR = "The '_time' field should be set in columns because 'time' is required in timeseries data."; + + public static final String TIMEUNIT_FORMAT_ERROR = "The value of param 'timeunit' is '%s', which should be in ['NANOSECONDS', 'MICROSECONDS', 'MILLISECONDS', 'SECONDS', 'MINUTES']."; + } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSLine.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSLine.java new file mode 100644 index 00000000..7be4a1a8 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSLine.java @@ -0,0 +1,85 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alibaba.datax.plugin.writer.otswriter.utils.CalculateHelper; +import com.alicloud.openservices.tablestore.model.PrimaryKey; +import com.alicloud.openservices.tablestore.model.RowChange; +import com.alicloud.openservices.tablestore.model.RowPutChange; +import com.alicloud.openservices.tablestore.model.RowUpdateChange; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; + +import java.util.ArrayList; +import java.util.List; + +public class OTSLine { + private int dataSize = 0; + + private PrimaryKey pk = null; + private RowChange change = null; + private TimeseriesRow timeseriesRow = null; + + private List records = new ArrayList(); + + public OTSLine( + PrimaryKey pk, + List records, + RowChange change) throws OTSCriticalException { + this.pk = pk; + this.change = change; + this.records.addAll(records); + setSize(this.change); + } + + public OTSLine( + PrimaryKey pk, + Record record, + RowChange change) throws OTSCriticalException { + this.pk = pk; + this.change = change; + this.records.add(record); + setSize(this.change); + } + + public OTSLine( + Record record, + TimeseriesRow row) throws OTSCriticalException { + this.timeseriesRow = row; + this.records.add(record); + setSize(this.timeseriesRow); + } + + private void setSize(RowChange change) throws OTSCriticalException { + if (change instanceof RowPutChange) { + this.dataSize = CalculateHelper.getRowPutChangeSize((RowPutChange) change); + } else if (change instanceof RowUpdateChange) { + this.dataSize = CalculateHelper.getRowUpdateChangeSize((RowUpdateChange) change); + } else { + throw new RuntimeException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, change.getClass().toString(), "RowPutChange or RowUpdateChange")); + } + } + + private void setSize(TimeseriesRow row) throws OTSCriticalException { + this.dataSize = CalculateHelper.getTimeseriesRowDataSize(row); + } + + public List getRecords() { + return records; + } + + public PrimaryKey getPk() { + return pk; + } + + public int getDataSize() { + return dataSize; + } + + public RowChange getRowChange() { + return change; + } + + public TimeseriesRow getTimeseriesRow() { + return timeseriesRow; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSMode.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSMode.java new file mode 100644 index 00000000..530ad5de --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSMode.java @@ -0,0 +1,6 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +public enum OTSMode { + NORMAL, // 普通模式 + MULTI_VERSION // 多版本模式 +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSOpType.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSOpType.java index 17b65033..80d70d6d 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSOpType.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSOpType.java @@ -3,5 +3,6 @@ package com.alibaba.datax.plugin.writer.otswriter.model; public enum OTSOpType { PUT_ROW, UPDATE_ROW, + @Deprecated DELETE_ROW } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSPKColumn.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSPKColumn.java deleted file mode 100644 index c873cb96..00000000 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSPKColumn.java +++ /dev/null @@ -1,22 +0,0 @@ -package com.alibaba.datax.plugin.writer.otswriter.model; - -import com.aliyun.openservices.ots.model.PrimaryKeyType; - -public class OTSPKColumn { - private String name; - private PrimaryKeyType type; - - public OTSPKColumn(String name, PrimaryKeyType type) { - this.name = name; - this.type = type; - } - - public PrimaryKeyType getType() { - return type; - } - - public String getName() { - return name; - } - -} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSRowPrimaryKey.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSRowPrimaryKey.java deleted file mode 100644 index d89d5017..00000000 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSRowPrimaryKey.java +++ /dev/null @@ -1,61 +0,0 @@ -package com.alibaba.datax.plugin.writer.otswriter.model; - -import java.util.Map; -import java.util.Map.Entry; - -import com.aliyun.openservices.ots.model.PrimaryKeyValue; - -public class OTSRowPrimaryKey { - - private Map columns; - - public OTSRowPrimaryKey(Map columns) { - if (null == columns) { - throw new IllegalArgumentException("Input columns can not be null."); - } - this.columns = columns; - } - - public Map getColumns() { - return columns; - } - - @Override - public int hashCode() { - int result = 31; - for (Entry entry : columns.entrySet()) { - result = result ^ entry.getKey().hashCode() ^ entry.getValue().hashCode(); - } - return result; - } - - @Override - public boolean equals(Object obj) { - if (this == obj) { - return true; - } - if (obj == null) { - return false; - } - if (!(obj instanceof OTSRowPrimaryKey)) { - return false; - } - OTSRowPrimaryKey other = (OTSRowPrimaryKey) obj; - - if (columns.size() != other.columns.size()) { - return false; - } - - for (Entry entry : columns.entrySet()) { - PrimaryKeyValue otherValue = other.columns.get(entry.getKey()); - - if (otherValue == null) { - return false; - } - if (!otherValue.equals(entry.getValue())) { - return false; - } - } - return true; - } -} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSSendBuffer.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSSendBuffer.java new file mode 100644 index 00000000..f85b2c16 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSSendBuffer.java @@ -0,0 +1,82 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +public class OTSSendBuffer { + + private OTSConf conf = null; + private OTSTaskManagerInterface manager = null; + + private int totalSize = 0; + private List buffer = new ArrayList(); + + + private static final Logger LOG = LoggerFactory.getLogger(OTSSendBuffer.class); + + public OTSSendBuffer( + SyncClientInterface ots, + OTSConf conf) { + this.conf = conf; + if (conf.isTimeseriesTable()){ + this.manager = new OTSTimeseriesRowTaskManager(ots, conf); + } + else { + this.manager = new OTSBatchWriteRowTaskManager(ots, conf); + } + + } + + public void write(OTSLine line) throws OTSCriticalException { + LOG.debug("write begin"); + // 检查是否满足发送条件 + if (buffer.size() >= conf.getBatchWriteCount() || + ((totalSize + line.getDataSize()) > conf.getRequestTotalSizeLimitation() && totalSize > 0) + ) { + try { + manager.execute(new ArrayList(buffer)); + } catch (Exception e) { + LOG.error("OTSBatchWriteRowTaskManager execute fail : {}", e.getMessage(), e); + throw new OTSCriticalException(e); + } + buffer.clear(); + totalSize = 0; + } + buffer.add(line); + totalSize += line.getDataSize(); + LOG.debug("write end"); + } + + public void flush() throws OTSCriticalException { + LOG.debug("flush begin"); + if (!buffer.isEmpty()) { + try { + manager.execute(new ArrayList(buffer)); + } catch (Exception e) { + LOG.error("OTSBatchWriteRowTaskManager flush fail : {}", e.getMessage(), e); + throw new OTSCriticalException(e); + } + } + LOG.debug("flush end"); + } + + public void close() throws OTSCriticalException { + LOG.debug("close begin"); + try { + flush(); + } finally { + try { + manager.close(); + } catch (Exception e) { + LOG.error("OTSBatchWriteRowTaskManager close fail : {}", e.getMessage(), e); + throw new OTSCriticalException(e); + } + } + LOG.debug("close end"); + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTaskManagerInterface.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTaskManagerInterface.java new file mode 100644 index 00000000..5db85d7d --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTaskManagerInterface.java @@ -0,0 +1,9 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import java.util.List; + +public interface OTSTaskManagerInterface { + public void execute(List lines) throws Exception; + + public void close() throws Exception; +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTask.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTask.java new file mode 100644 index 00000000..7cda8e33 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTask.java @@ -0,0 +1,167 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; +import com.alibaba.datax.plugin.writer.otswriter.callable.PutTimeseriesDataCallable; +import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; +import com.alibaba.datax.plugin.writer.otswriter.utils.Common; +import com.alibaba.datax.plugin.writer.otswriter.utils.LineAndError; +import com.alibaba.datax.plugin.writer.otswriter.utils.RetryHelper; +import com.alicloud.openservices.tablestore.TableStoreException; +import com.alicloud.openservices.tablestore.TimeseriesClient; +import com.alicloud.openservices.tablestore.model.PutRowRequest; +import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataRequest; +import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataResponse; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +public class OTSTimeseriesRowTask implements Runnable { + private static final Logger LOG = LoggerFactory.getLogger(OTSTimeseriesRowTask.class); + private TimeseriesClient client = null; + private OTSConf conf = null; + private List otsLines = new ArrayList(); + private boolean isDone = false; + private int retryTimes = 0; + + public OTSTimeseriesRowTask( + final TimeseriesClient client, + final OTSConf conf, + final List lines + ) { + this.client = client; + this.conf = conf; + + this.otsLines.addAll(lines); + } + + @Override + public void run() { + LOG.debug("Begin run"); + sendAll(otsLines); + LOG.debug("End run"); + } + + public boolean isDone() { + return this.isDone; + } + + private boolean isExceptionForSendOneByOne(TableStoreException ee) { + if (ee.getErrorCode().equals(OTSErrorCode.INVALID_PARAMETER) || + ee.getErrorCode().equals(OTSErrorCode.REQUEST_TOO_LARGE) + ) { + return true; + } + return false; + } + + private PutTimeseriesDataRequest createRequest(List lines) { + PutTimeseriesDataRequest newRequest = new PutTimeseriesDataRequest(conf.getTableName()); + for (OTSLine l : lines) { + newRequest.addRow(l.getTimeseriesRow()); + } + return newRequest; + } + + /** + * 单行发送数据 + * + * @param line + */ + public void sendLine(OTSLine line) { + try { + PutTimeseriesDataRequest putTimeseriesDataRequest = new PutTimeseriesDataRequest(conf.getTableName()); + putTimeseriesDataRequest.addRow(line.getTimeseriesRow()); + PutTimeseriesDataResponse result = RetryHelper.executeWithRetry( + new PutTimeseriesDataCallable(client, putTimeseriesDataRequest), + conf.getRetry(), + conf.getSleepInMillisecond()); + + + if (!result.isAllSuccess()){ + String errMsg = result.getFailedRows().get(0).getError().getMessage(); + LOG.warn("sendLine fail. " + errMsg); + CollectorUtil.collect(line.getRecords(), errMsg); + + }else { + LOG.debug("Request ID : {}", result.getRequestId()); + } + + } catch (Exception e) { + LOG.warn("sendLine fail. ", e); + CollectorUtil.collect(line.getRecords(), e.getMessage()); + } + } + + private void sendAllOneByOne(List lines) { + for (OTSLine l : lines) { + sendLine(l); + } + } + + private void sendAll(List lines) { + try { + Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, conf.getSleepInMillisecond())); + PutTimeseriesDataRequest putTimeseriesDataRequest = createRequest(lines); + PutTimeseriesDataResponse result = RetryHelper.executeWithRetry( + new PutTimeseriesDataCallable(client, putTimeseriesDataRequest), + conf.getRetry(), + conf.getSleepInMillisecond()); + + LOG.debug("Request ID : {}", result.getRequestId()); + List errors = getLineAndError(result, lines); + if (!errors.isEmpty()) { + if (retryTimes < conf.getRetry()) { + retryTimes++; + LOG.warn("Retry times : {}", retryTimes); + List newLines = new ArrayList(); + for (LineAndError re : errors) { + LOG.warn("Because: {}", re.getError().getMessage()); + if (RetryHelper.canRetry(re.getError().getCode())) { + newLines.add(re.getLine()); + } else { + LOG.warn("Can not retry, record row to collector. {}", re.getError().getMessage()); + CollectorUtil.collect(re.getLine().getRecords(), re.getError().getMessage()); + } + } + if (!newLines.isEmpty()) { + sendAll(newLines); + } + } else { + LOG.warn("Retry times more than limitation. RetryTime : {}", retryTimes); + CollectorUtil.collect(errors); + } + } + } catch (TableStoreException e) { + LOG.warn("Send data fail. {}", e.getMessage()); + if (isExceptionForSendOneByOne(e)) { + if (lines.size() == 1) { + LOG.warn("Can not retry.", e); + CollectorUtil.collect(e.getMessage(), lines); + } else { + // 进入单行发送的分支 + sendAllOneByOne(lines); + } + } else { + LOG.error("Can not send lines to OTS for RuntimeException.", e); + CollectorUtil.collect(e.getMessage(), lines); + } + } catch (Exception e) { + LOG.error("Can not send lines to OTS for Exception.", e); + CollectorUtil.collect(e.getMessage(), lines); + } + } + + private List getLineAndError(PutTimeseriesDataResponse result, List lines) throws OTSCriticalException { + List errors = new ArrayList(); + + List status = result.getFailedRows(); + for (PutTimeseriesDataResponse.FailedRowResult r : status) { + errors.add(new LineAndError(lines.get(r.getIndex()), r.getError())); + } + + return errors; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTaskManager.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTaskManager.java new file mode 100644 index 00000000..2816d955 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTaskManager.java @@ -0,0 +1,41 @@ +package com.alibaba.datax.plugin.writer.otswriter.model; + +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.SyncClientInterface; +import com.alicloud.openservices.tablestore.TimeseriesClient; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; + +public class OTSTimeseriesRowTaskManager implements OTSTaskManagerInterface{ + + private TimeseriesClient client = null; + private OTSBlockingExecutor executorService = null; + private OTSConf conf = null; + + private static final Logger LOG = LoggerFactory.getLogger(OTSTimeseriesRowTaskManager.class); + + public OTSTimeseriesRowTaskManager( + SyncClientInterface ots, + OTSConf conf) { + this.client = ((SyncClient)ots).asTimeseriesClient(); + this.conf = conf; + + executorService = new OTSBlockingExecutor(conf.getConcurrencyWrite()); + } + + @Override + public void execute(List lines) throws Exception { + LOG.debug("Begin execute."); + executorService.execute(new OTSTimeseriesRowTask(client, conf, lines)); + LOG.debug("End execute."); + } + + @Override + public void close() throws Exception { + LOG.debug("Begin close."); + executorService.shutdown(); + LOG.debug("End close."); + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowDeleteChangeWithRecord.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowDeleteChangeWithRecord.java index 5d77ad87..1986100a 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowDeleteChangeWithRecord.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowDeleteChangeWithRecord.java @@ -1,6 +1,7 @@ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowDeleteChangeWithRecord extends com.aliyun.openservices.ots.model.RowDeleteChange implements WithRecord { diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowPutChangeWithRecord.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowPutChangeWithRecord.java index e97a7d63..2e19dd77 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowPutChangeWithRecord.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowPutChangeWithRecord.java @@ -1,6 +1,7 @@ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowPutChangeWithRecord extends com.aliyun.openservices.ots.model.RowPutChange implements WithRecord { diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowUpdateChangeWithRecord.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowUpdateChangeWithRecord.java index f47ca1d2..63f27d65 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowUpdateChangeWithRecord.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowUpdateChangeWithRecord.java @@ -1,6 +1,7 @@ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowUpdateChangeWithRecord extends com.aliyun.openservices.ots.model.RowUpdateChange implements WithRecord { diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CalculateHelper.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CalculateHelper.java new file mode 100644 index 00000000..f0d8347d --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CalculateHelper.java @@ -0,0 +1,171 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alicloud.openservices.tablestore.core.utils.Pair; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesKey; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; + +import java.util.List; +import java.util.Map; + +import static com.alicloud.openservices.tablestore.model.PrimaryKeyValue.AUTO_INCREMENT; + +public class CalculateHelper { + private static int getPrimaryKeyValueSize(PrimaryKeyValue primaryKeyValue) throws OTSCriticalException { + int primaryKeySize = 0; + if(primaryKeyValue == AUTO_INCREMENT){ + return primaryKeySize; + } + switch (primaryKeyValue.getType()) { + case INTEGER: + primaryKeySize = 8; + break; + case STRING: + primaryKeySize = primaryKeyValue.asStringInBytes().length; + break; + case BINARY: + primaryKeySize = primaryKeyValue.asBinary().length; + break; + default: + throw new OTSCriticalException("Bug: not support the type : " + primaryKeyValue.getType() + " in getPrimaryKeyValueSize"); + } + return primaryKeySize; + } + + private static int getColumnValueSize(ColumnValue columnValue) throws OTSCriticalException { + int columnSize = 0; + switch (columnValue.getType()) { + case INTEGER: + columnSize += 8; + break; + case DOUBLE: + columnSize += 8; + break; + case STRING: + columnSize += columnValue.asStringInBytes().length; + break; + case BINARY: + columnSize += columnValue.asBinary().length; + break; + case BOOLEAN: + columnSize += 1; + break; + default: + throw new OTSCriticalException("Bug: not support the type : " + columnValue.getType() + " in getColumnValueSize"); + } + return columnSize; + } + + public static int getRowPutChangeSize(RowPutChange change) throws OTSCriticalException { + int primaryKeyTotalSize = 0; + int columnTotalSize = 0; + + // PrimaryKeys Total Size + PrimaryKey primaryKey = change.getPrimaryKey(); + PrimaryKeyColumn[] primaryKeyColumnArray = primaryKey.getPrimaryKeyColumns(); + PrimaryKeyColumn primaryKeyColumn; + byte[] primaryKeyName; + PrimaryKeyValue primaryKeyValue; + for (int i = 0; i < primaryKeyColumnArray.length; i++) { + primaryKeyColumn = primaryKeyColumnArray[i]; + primaryKeyName = primaryKeyColumn.getNameRawData(); + primaryKeyValue = primaryKeyColumn.getValue(); + + // += PrimaryKey Name Data + primaryKeyTotalSize += primaryKeyName.length; + + // += PrimaryKey Value Data + primaryKeyTotalSize += getPrimaryKeyValueSize(primaryKeyValue); + } + + // Columns Total Size + List columnList = change.getColumnsToPut(); + for (Column column : columnList) { + // += Column Name + columnTotalSize += column.getNameRawData().length; + + // += Column Value + ColumnValue columnValue = column.getValue(); + + columnTotalSize += getColumnValueSize(columnValue); + + // += Timestamp + if (column.hasSetTimestamp()) { + columnTotalSize += 8; + } + } + + return primaryKeyTotalSize + columnTotalSize; + } + + public static int getRowUpdateChangeSize(RowUpdateChange change) throws OTSCriticalException { + int primaryKeyTotalSize = 0; + int columnPutSize = 0; + int columnDeleteSize = 0; + + // PrimaryKeys Total Size + PrimaryKey primaryKey = change.getPrimaryKey(); + PrimaryKeyColumn[] primaryKeyColumnArray = primaryKey.getPrimaryKeyColumns(); + PrimaryKeyColumn primaryKeyColumn; + byte[] primaryKeyName; + PrimaryKeyValue primaryKeyValue; + for (int i = 0; i < primaryKeyColumnArray.length; i++) { + primaryKeyColumn = primaryKeyColumnArray[i]; + primaryKeyName = primaryKeyColumn.getNameRawData(); + primaryKeyValue = primaryKeyColumn.getValue(); + + // += PrimaryKey Name Data + primaryKeyTotalSize += primaryKeyName.length; + + // += PrimaryKey Value Data + primaryKeyTotalSize += getPrimaryKeyValueSize(primaryKeyValue); + } + + // Column Total Size + List> updatePairList = change.getColumnsToUpdate(); + Column column; + ColumnValue columnValue; + RowUpdateChange.Type type; + for (Pair updatePair : updatePairList) { + column = updatePair.getFirst(); + type = updatePair.getSecond(); + + switch (type) { + case DELETE: + columnDeleteSize += column.getNameRawData().length; + columnDeleteSize += 8;// Timestamp + break; + case DELETE_ALL: + columnDeleteSize += column.getNameRawData().length; + break; + case PUT: + // Name + columnPutSize += column.getNameRawData().length; + + // Value + columnValue = column.getValue(); + columnPutSize += getColumnValueSize(columnValue); + break; + default: + throw new OTSCriticalException("Bug: not support the type : " + type); + } + } + + return primaryKeyTotalSize + columnPutSize + columnDeleteSize; + } + + public static int getTimeseriesRowDataSize(TimeseriesRow row) { + TimeseriesKey timeseriesKey = row.getTimeseriesKey(); + Map fields = row.getFields(); + int totalSize = 0; + totalSize += 8; // time size + totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.getMeasurementName()); + totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.getDataSource()); + totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.buildTagsString()); + for (Map.Entry entry : fields.entrySet()) { + totalSize += entry.getValue().getDataSize() + com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(entry.getKey()); + } + return totalSize; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CollectorUtil.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CollectorUtil.java new file mode 100644 index 00000000..432ac37f --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CollectorUtil.java @@ -0,0 +1,40 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSLine; + +import java.util.List; + +public class CollectorUtil { + + private static TaskPluginCollector taskPluginCollector = null; + + public static void init(TaskPluginCollector collector) { + taskPluginCollector = collector; + } + + public static void collect(Record dirtyRecord, String errorMessage) { + if (taskPluginCollector != null) { + taskPluginCollector.collectDirtyRecord(dirtyRecord, errorMessage); + } + } + + public static void collect(List dirtyRecords, String errorMessage) { + for (Record r:dirtyRecords) { + collect(r, errorMessage); + } + } + + public static void collect(List errors) { + for (LineAndError e:errors) { + collect(e.getLine().getRecords(), e.getError().getMessage()); + } + } + + public static void collect(String errorMessage, List lines) { + for (OTSLine l:lines) { + collect(l.getRecords(), errorMessage); + } + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversion.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversion.java index 51162b84..5f7c91a5 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversion.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversion.java @@ -2,11 +2,12 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSPKColumn; -import com.aliyun.openservices.ots.model.ColumnValue; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; +import com.alicloud.openservices.tablestore.model.ColumnValue; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; /** @@ -17,45 +18,66 @@ import com.aliyun.openservices.ots.model.PrimaryKeyValue; * 4. long -> binary */ public class ColumnConversion { - public static PrimaryKeyValue columnToPrimaryKeyValue(Column c, OTSPKColumn col) { + public static PrimaryKeyValue columnToPrimaryKeyValue(Column c, PrimaryKeySchema col) throws OTSCriticalException { try { switch (col.getType()) { - case STRING: - return PrimaryKeyValue.fromString(c.asString()); - case INTEGER: - return PrimaryKeyValue.fromLong(c.asLong()); - default: - throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "PrimaryKeyValue")); + case STRING: + return PrimaryKeyValue.fromString(c.asString()); + case INTEGER: + return PrimaryKeyValue.fromLong(c.asLong()); + case BINARY: + return PrimaryKeyValue.fromBinary(c.asBytes()); + default: + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "PrimaryKeyValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() - )); + ), + e); } } - public static ColumnValue columnToColumnValue(Column c, OTSAttrColumn col) { - try { - switch (col.getType()) { + public static ColumnValue columnToColumnValue(Column c) throws OTSCriticalException { + switch (c.getType()) { case STRING: return ColumnValue.fromString(c.asString()); - case INTEGER: + case LONG: return ColumnValue.fromLong(c.asLong()); - case BOOLEAN: + case BOOL: return ColumnValue.fromBoolean(c.asBoolean()); case DOUBLE: return ColumnValue.fromDouble(c.asDouble()); - case BINARY: + case BYTES: return ColumnValue.fromBinary(c.asBytes()); default: - throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "ColumnValue")); + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, c.getType(), "ColumnValue")); + } + } + + public static ColumnValue columnToColumnValue(Column c, OTSAttrColumn col) throws OTSCriticalException { + try { + switch (col.getType()) { + case STRING: + return ColumnValue.fromString(c.asString()); + case INTEGER: + return ColumnValue.fromLong(c.asLong()); + case BOOLEAN: + return ColumnValue.fromBoolean(c.asBoolean()); + case DOUBLE: + return ColumnValue.fromDouble(c.asDouble()); + case BINARY: + return ColumnValue.fromBinary(c.asBytes()); + default: + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "ColumnValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() - )); + ), + e); } } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversionOld.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversionOld.java new file mode 100644 index 00000000..a2920b91 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversionOld.java @@ -0,0 +1,61 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.aliyun.openservices.ots.model.ColumnValue; +import com.aliyun.openservices.ots.model.PrimaryKeyValue; + + +/** + * 备注:datax提供的转换机制有如下限制,如下规则是不能转换的 + * 1. bool -> binary + * 2. binary -> long, double, bool + * 3. double -> bool, binary + * 4. long -> binary + */ +public class ColumnConversionOld { + public static PrimaryKeyValue columnToPrimaryKeyValue(Column c, PrimaryKeySchema col) { + try { + switch (col.getType()) { + case STRING: + return PrimaryKeyValue.fromString(c.asString()); + case INTEGER: + return PrimaryKeyValue.fromLong(c.asLong()); + default: + throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "PrimaryKeyValue")); + } + } catch (DataXException e) { + throw new IllegalArgumentException(String.format( + OTSErrorMessage.COLUMN_CONVERSION_ERROR, + c.getType(), c.asString(), col.getType().toString() + )); + } + } + + public static ColumnValue columnToColumnValue(Column c, OTSAttrColumn col) { + try { + switch (col.getType()) { + case STRING: + return ColumnValue.fromString(c.asString()); + case INTEGER: + return ColumnValue.fromLong(c.asLong()); + case BOOLEAN: + return ColumnValue.fromBoolean(c.asBoolean()); + case DOUBLE: + return ColumnValue.fromDouble(c.asDouble()); + case BINARY: + return ColumnValue.fromBinary(c.asBytes()); + default: + throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "ColumnValue")); + } + } catch (DataXException e) { + throw new IllegalArgumentException(String.format( + OTSErrorMessage.COLUMN_CONVERSION_ERROR, + c.getType(), c.asString(), col.getType().toString() + )); + } + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/Common.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/Common.java index 26eb9329..a48efa69 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/Common.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/Common.java @@ -1,108 +1,124 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; -import java.util.ArrayList; -import java.util.List; - import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; -import com.alibaba.datax.common.plugin.TaskPluginCollector; -import com.alibaba.datax.plugin.writer.otswriter.model.*; -import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSException; -import com.aliyun.openservices.ots.model.ColumnValue; -import com.aliyun.openservices.ots.model.PrimaryKeyValue; -import com.aliyun.openservices.ots.model.RowChange; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.RowPutChange; -import com.aliyun.openservices.ots.model.RowUpdateChange; -import org.apache.commons.math3.util.Pair; +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; +import com.alicloud.openservices.tablestore.ClientConfiguration; +import com.alicloud.openservices.tablestore.SyncClient; +import com.alicloud.openservices.tablestore.core.utils.Pair; +import com.alicloud.openservices.tablestore.model.*; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.*; +import java.util.Map.Entry; public class Common { + + private static final Logger LOG = LoggerFactory.getLogger(Common.class); - public static String getDetailMessage(Exception exception) { - if (exception instanceof OTSException) { - OTSException e = (OTSException) exception; - return "OTSException[ErrorCode:" + e.getErrorCode() + ", ErrorMessage:" + e.getMessage() + ", RequestId:" + e.getRequestId() + "]"; - } else if (exception instanceof ClientException) { - ClientException e = (ClientException) exception; - return "ClientException[ErrorCode:" + e.getErrorCode() + ", ErrorMessage:" + e.getMessage() + "]"; - } else if (exception instanceof IllegalArgumentException) { - IllegalArgumentException e = (IllegalArgumentException) exception; - return "IllegalArgumentException[ErrorMessage:" + e.getMessage() + "]"; - } else { - return "Exception[ErrorMessage:" + exception.getMessage() + "]"; + /** + * 从record中分析出PK,如果分析成功,则返回PK,如果分析失败,则返回null,并记录数据到脏数据回收器中 + * @param pkColumns + * @param r + * @return + * @throws OTSCriticalException + */ + public static PrimaryKey getPKFromRecord(Map pkColumns, Record r) throws OTSCriticalException { + if (r.getColumnNumber() < pkColumns.size()) { + throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record < count(%d) of (pk) from config.", r.getColumnNumber(), pkColumns.size())); } - } - - public static RowPrimaryKey getPKFromRecord(List pkColumns, Record r) { - RowPrimaryKey primaryKey = new RowPrimaryKey(); - int pkCount = pkColumns.size(); - for (int i = 0; i < pkCount; i++) { - Column col = r.getColumn(i); - OTSPKColumn expect = pkColumns.get(i); - - if (col.getRawData() == null) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); - } - - PrimaryKeyValue pk = ColumnConversion.columnToPrimaryKeyValue(col, expect); - primaryKey.addPrimaryKeyColumn(expect.getName(), pk); - } - return primaryKey; - } - - public static List> getAttrFromRecord(int pkCount, List attrColumns, Record r) { - List> attr = new ArrayList>(r.getColumnNumber()); - for (int i = 0; i < attrColumns.size(); i++) { - Column col = r.getColumn(i + pkCount); - OTSAttrColumn expect = attrColumns.get(i); - - if (col.getRawData() == null) { - attr.add(new Pair(expect.getName(), null)); - continue; - } - - ColumnValue cv = ColumnConversion.columnToColumnValue(col, expect); - attr.add(new Pair(expect.getName(), cv)); - } - return attr; - } - - public static RowChange columnValuesToRowChange(String tableName, OTSOpType type, RowPrimaryKey pk, List> values) { - switch (type) { - case PUT_ROW: - RowPutChangeWithRecord rowPutChange = new RowPutChangeWithRecord(tableName); - rowPutChange.setPrimaryKey(pk); - - for (Pair en : values) { - if (en.getValue() != null) { - rowPutChange.addAttributeColumn(en.getKey(), en.getValue()); - } + try { + PrimaryKeyBuilder builder = PrimaryKeyBuilder.createPrimaryKeyBuilder(); + for (Entry en : pkColumns.entrySet()) { + Column col = r.getColumn(en.getValue()); + PrimaryKeySchema expect = en.getKey(); + + if (col.getRawData() == null) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); } - return rowPutChange; - case UPDATE_ROW: - RowUpdateChangeWithRecord rowUpdateChange = new RowUpdateChangeWithRecord(tableName); - rowUpdateChange.setPrimaryKey(pk); - - for (Pair en : values) { - if (en.getValue() != null) { - rowUpdateChange.addAttributeColumn(en.getKey(), en.getValue()); - } else { - rowUpdateChange.deleteAttributeColumn(en.getKey()); - } - } - return rowUpdateChange; - case DELETE_ROW: - RowDeleteChangeWithRecord rowDeleteChange = new RowDeleteChangeWithRecord(tableName); - rowDeleteChange.setPrimaryKey(pk); - return rowDeleteChange; - default: - throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, type, "RowChange")); + PrimaryKeyValue pk = ColumnConversion.columnToPrimaryKeyValue(col, expect); + builder.addPrimaryKeyColumn(new PrimaryKeyColumn(expect.getName(), pk)); + } + return builder.build(); + } catch (IllegalArgumentException e) { + LOG.warn("getPKFromRecord fail : {}", e.getMessage(), e); + CollectorUtil.collect(r, e.getMessage()); + return null; } } - public static long getDelaySendMilliseconds(int hadRetryTimes, int initSleepInMilliSecond) { + public static PrimaryKey getPKFromRecordWithAutoIncrement(Map pkColumns, Record r, PrimaryKeySchema autoIncrementPrimaryKey) throws OTSCriticalException { + if (r.getColumnNumber() < pkColumns.size()) { + throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record < count(%d) of (pk) from config.", r.getColumnNumber(), pkColumns.size())); + } + try { + PrimaryKeyBuilder builder = PrimaryKeyBuilder.createPrimaryKeyBuilder(); + for (Entry en : pkColumns.entrySet()) { + Column col = r.getColumn(en.getValue()); + PrimaryKeySchema expect = en.getKey(); + + if (col.getRawData() == null) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); + } + + PrimaryKeyValue pk = ColumnConversion.columnToPrimaryKeyValue(col, expect); + builder.addPrimaryKeyColumn(new PrimaryKeyColumn(expect.getName(), pk)); + } + if(autoIncrementPrimaryKey != null){ + if(autoIncrementPrimaryKey.getOption()!= PrimaryKeyOption.AUTO_INCREMENT){ + throw new OTSCriticalException(String.format("The auto Increment PrimaryKey [(%s)] option should be PrimaryKeyOption.AUTO_INCREMENT.", autoIncrementPrimaryKey.getName())); + } + builder.addPrimaryKeyColumn(autoIncrementPrimaryKey.getName(),PrimaryKeyValue.AUTO_INCREMENT); + + } + return builder.build(); + } catch (IllegalArgumentException e) { + LOG.warn("getPKFromRecord fail : {}", e.getMessage(), e); + CollectorUtil.collect(r, e.getMessage()); + return null; + } + } + + /** + * 从Record中解析ColumnValue,如果Record转换为ColumnValue失败,方法会返回null + * @param pkCount + * @param attrColumns + * @param r + * @return + * @throws OTSCriticalException + */ + public static List> getAttrFromRecord(int pkCount, List attrColumns, Record r) throws OTSCriticalException { + if (pkCount + attrColumns.size() != r.getColumnNumber()) { + throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record != count(%d) of (pk + column) from config.", r.getColumnNumber(), (pkCount + attrColumns.size()))); + } + try { + List> attr = new ArrayList>(r.getColumnNumber()); + for (int i = 0; i < attrColumns.size(); i++) { + Column col = r.getColumn(i + pkCount); + OTSAttrColumn expect = attrColumns.get(i); + + if (col.getRawData() == null) { + attr.add(new Pair(expect.getName(), null)); + continue; + } + + ColumnValue cv = ColumnConversion.columnToColumnValue(col, expect); + attr.add(new Pair(expect.getName(), cv)); + } + return attr; + } catch (IllegalArgumentException e) { + LOG.warn("getAttrFromRecord fail : {}", e.getMessage(), e); + CollectorUtil.collect(r, e.getMessage()); + return null; + } + } + + public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { if (hadRetryTimes <= 0) { return 0; @@ -118,4 +134,83 @@ public class Common { } return sleepTime; } + + public static SyncClient getOTSInstance(OTSConf conf) { + ClientConfiguration clientConfigure = new ClientConfiguration(); + clientConfigure.setIoThreadCount(conf.getIoThreadCount()); + clientConfigure.setMaxConnections(conf.getConcurrencyWrite()); + clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeout()); + clientConfigure.setConnectionTimeoutInMillisecond(conf.getConnectTimeoutInMillisecond()); + clientConfigure.setRetryStrategy(new DefaultNoRetry()); + + SyncClient ots = new SyncClient( + conf.getEndpoint(), + conf.getAccessId(), + conf.getAccessKey(), + conf.getInstanceName(), + clientConfigure); + Map extraHeaders = new HashMap(); + extraHeaders.put("x-ots-sdk-type", "public"); + extraHeaders.put("x-ots-request-source", "datax-otswriter"); + ots.setExtraHeaders(extraHeaders); + return ots; + } + + public static LinkedHashMap getEncodePkColumnMapping(TableMeta meta, List attrColumns) throws OTSCriticalException { + LinkedHashMap attrColumnMapping = new LinkedHashMap(); + for (Entry en : meta.getPrimaryKeyMap().entrySet()) { + // don't care performance + int i = 0; + for (; i < attrColumns.size(); i++) { + if (attrColumns.get(i).getName().equals(en.getKey())) { + attrColumnMapping.put(GsonParser.primaryKeySchemaToJson(attrColumns.get(i)), i); + break; + } + } + if (i == attrColumns.size()) { + // exception branch + throw new OTSCriticalException(String.format(OTSErrorMessage.INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR, en.getKey())); + } + } + return attrColumnMapping; + } + + public static LinkedHashMap getEncodePkColumnMappingWithAutoIncrement(TableMeta meta, List attrColumns) throws OTSCriticalException { + LinkedHashMap attrColumnMapping = new LinkedHashMap(); + for (Entry en : meta.getPrimaryKeySchemaMap().entrySet()) { + // don't care performance + if(en.getValue().hasOption()){ + continue; + } + + int i = 0; + for (; i < attrColumns.size(); i++) { + if (attrColumns.get(i).getName().equals(en.getKey())) { + attrColumnMapping.put(GsonParser.primaryKeySchemaToJson(attrColumns.get(i)), i); + break; + } + } + if (i == attrColumns.size()) { + // exception branch + throw new OTSCriticalException(String.format(OTSErrorMessage.INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR, en.getKey())); + } + } + return attrColumnMapping; + } + + public static Map getPkColumnMapping(Map mapping) { + Map target = new LinkedHashMap(); + for (Entry en : mapping.entrySet()) { + target.put(GsonParser.jsonToPrimaryKeySchema(en.getKey()), en.getValue()); + } + return target; + } + + public static Map getAttrColumnMapping(List attrColumns) { + Map attrColumnMapping = new LinkedHashMap(); + for (OTSAttrColumn c : attrColumns) { + attrColumnMapping.put(c.getSrcName(), c); + } + return attrColumnMapping; + } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CommonOld.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CommonOld.java new file mode 100644 index 00000000..a62711cc --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CommonOld.java @@ -0,0 +1,93 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; +import com.alibaba.datax.plugin.writer.otswriter.model.RowDeleteChangeWithRecord; +import com.alibaba.datax.plugin.writer.otswriter.model.RowPutChangeWithRecord; +import com.alibaba.datax.plugin.writer.otswriter.model.RowUpdateChangeWithRecord; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.aliyun.openservices.ots.ClientException; +import com.aliyun.openservices.ots.OTSException; +import com.aliyun.openservices.ots.model.ColumnValue; +import com.aliyun.openservices.ots.model.PrimaryKeyValue; +import com.aliyun.openservices.ots.model.RowChange; +import com.aliyun.openservices.ots.model.RowPrimaryKey; +import org.apache.commons.math3.util.Pair; + +import java.util.ArrayList; +import java.util.List; + +public class CommonOld { + + public static RowPrimaryKey getPKFromRecord(List pkColumns, Record r) { + RowPrimaryKey primaryKey = new RowPrimaryKey(); + int pkCount = pkColumns.size(); + for (int i = 0; i < pkCount; i++) { + Column col = r.getColumn(i); + PrimaryKeySchema expect = pkColumns.get(i); + + if (col.getRawData() == null) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); + } + + PrimaryKeyValue pk = ColumnConversionOld.columnToPrimaryKeyValue(col, expect); + primaryKey.addPrimaryKeyColumn(expect.getName(), pk); + } + return primaryKey; + } + + public static List> getAttrFromRecord(int pkCount, List attrColumns, Record r) { + List> attr = new ArrayList>(r.getColumnNumber()); + for (int i = 0; i < attrColumns.size(); i++) { + Column col = r.getColumn(i + pkCount); + com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn expect = attrColumns.get(i); + + if (col.getRawData() == null) { + attr.add(new Pair(expect.getName(), null)); + continue; + } + + ColumnValue cv = ColumnConversionOld.columnToColumnValue(col, expect); + attr.add(new Pair(expect.getName(), cv)); + } + return attr; + } + + public static RowChange columnValuesToRowChange(String tableName, + com.alibaba.datax.plugin.writer.otswriter.model.OTSOpType type, + RowPrimaryKey pk, + List> values) { + switch (type) { + case PUT_ROW: + RowPutChangeWithRecord rowPutChange = new RowPutChangeWithRecord(tableName); + rowPutChange.setPrimaryKey(pk); + + for (Pair en : values) { + if (en.getValue() != null) { + rowPutChange.addAttributeColumn(en.getKey(), en.getValue()); + } + } + + return rowPutChange; + case UPDATE_ROW: + RowUpdateChangeWithRecord rowUpdateChange = new RowUpdateChangeWithRecord(tableName); + rowUpdateChange.setPrimaryKey(pk); + + for (Pair en : values) { + if (en.getValue() != null) { + rowUpdateChange.addAttributeColumn(en.getKey(), en.getValue()); + } else { + rowUpdateChange.deleteAttributeColumn(en.getKey()); + } + } + return rowUpdateChange; + case DELETE_ROW: + RowDeleteChangeWithRecord rowDeleteChange = new RowDeleteChangeWithRecord(tableName); + rowDeleteChange.setPrimaryKey(pk); + return rowDeleteChange; + default: + throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, type, "RowChange")); + } + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/DefaultNoRetry.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/DefaultNoRetry.java new file mode 100644 index 00000000..ec000566 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/DefaultNoRetry.java @@ -0,0 +1,33 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + + + +import com.alicloud.openservices.tablestore.model.DefaultRetryStrategy; +import com.alicloud.openservices.tablestore.model.RetryStrategy; + +public class DefaultNoRetry extends DefaultRetryStrategy { + + public DefaultNoRetry() { + super(); + } + + @Override + public RetryStrategy clone() { + return super.clone(); + } + + @Override + public int getRetries() { + return super.getRetries(); + } + + @Override + public boolean shouldRetry(String action, Exception ex) { + return false; + } + + @Override + public long nextPause(String action, Exception ex) { + return super.nextPause(action, ex); + } +} \ No newline at end of file diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/GsonParser.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/GsonParser.java index 0cae91f2..4e13a327 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/GsonParser.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/GsonParser.java @@ -1,9 +1,10 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; -import com.aliyun.openservices.ots.model.Direction; -import com.aliyun.openservices.ots.model.RowPrimaryKey; -import com.aliyun.openservices.ots.model.TableMeta; +import com.alicloud.openservices.tablestore.model.Direction; +import com.alicloud.openservices.tablestore.model.PrimaryKey; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.alicloud.openservices.tablestore.model.TableMeta; import com.google.gson.Gson; import com.google.gson.GsonBuilder; @@ -39,8 +40,18 @@ public class GsonParser { return g.toJson(meta); } - public static String rowPrimaryKeyToJson (RowPrimaryKey row) { + public static String primaryKeyToJson (PrimaryKey row) { Gson g = gsonBuilder(); return g.toJson(row); } + + public static String primaryKeySchemaToJson (PrimaryKeySchema schema) { + Gson g = gsonBuilder(); + return g.toJson(schema); + } + + public static PrimaryKeySchema jsonToPrimaryKeySchema (String jsonStr) { + Gson g = gsonBuilder(); + return g.fromJson(jsonStr, PrimaryKeySchema.class); + } } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/LineAndError.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/LineAndError.java new file mode 100644 index 00000000..f4e8833e --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/LineAndError.java @@ -0,0 +1,21 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.plugin.writer.otswriter.model.OTSLine; + +public class LineAndError { + private OTSLine line; + private com.alicloud.openservices.tablestore.model.Error error; + + public LineAndError(OTSLine record, com.alicloud.openservices.tablestore.model.Error error) { + this.line = record; + this.error = error; + } + + public OTSLine getLine() { + return line; + } + + public com.alicloud.openservices.tablestore.model.Error getError() { + return error; + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParamChecker.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParamChecker.java index f9e17af5..b04f8878 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParamChecker.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParamChecker.java @@ -1,18 +1,24 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; -import java.util.HashMap; +import com.alibaba.datax.common.exception.CommonErrorCode; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; +import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.alicloud.openservices.tablestore.model.PrimaryKeyType; +import com.alicloud.openservices.tablestore.model.TableMeta; + import java.util.HashSet; import java.util.List; import java.util.Map; -import java.util.Map.Entry; import java.util.Set; +import java.util.concurrent.TimeUnit; + +import static com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage.*; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSPKColumn; -import com.aliyun.openservices.ots.model.PrimaryKeyType; -import com.aliyun.openservices.ots.model.TableMeta; public class ParamChecker { @@ -28,16 +34,13 @@ public class ParamChecker { throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_LIST_IS_EMPTY_ERROR, key)); } - private static void throwNotListException(String key) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_IS_NOT_ARRAY_ERROR, key)); - } - - private static void throwNotMapException(String key) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_IS_NOT_MAP_ERROR, key)); + private static void throwNotListException(String key, Throwable t) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_IS_NOT_ARRAY_ERROR, key), t); } public static String checkStringAndGet(Configuration param, String key) { String value = param.getString(key); + value = value != null ? value.trim() : null; if (null == value) { throwNotExistException(key); } else if (value.length() == 0) { @@ -51,61 +54,7 @@ public class ParamChecker { try { value = param.getList(key); } catch (ClassCastException e) { - throwNotListException(key); - } - if (null == value) { - throwNotExistException(key); - } else if (isCheckEmpty && value.isEmpty()) { - throwEmptyListException(key); - } - return value; - } - - public static List checkListAndGet(Map range, String key) { - Object obj = range.get(key); - if (null == obj) { - return null; - } - return checkListAndGet(range, key, false); - } - - public static List checkListAndGet(Map range, String key, boolean isCheckEmpty) { - Object obj = range.get(key); - if (null == obj) { - throwNotExistException(key); - } - if (obj instanceof List) { - @SuppressWarnings("unchecked") - List value = (List)obj; - if (isCheckEmpty && value.isEmpty()) { - throwEmptyListException(key); - } - return value; - } else { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PARSE_TO_LIST_ERROR, key)); - } - } - - public static List checkListAndGet(Map range, String key, List defaultList) { - Object obj = range.get(key); - if (null == obj) { - return defaultList; - } - if (obj instanceof List) { - @SuppressWarnings("unchecked") - List value = (List)obj; - return value; - } else { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PARSE_TO_LIST_ERROR, key)); - } - } - - public static Map checkMapAndGet(Configuration param, String key, boolean isCheckEmpty) { - Map value = null; - try { - value = param.getMap(key); - } catch (ClassCastException e) { - throwNotMapException(key); + throwNotListException(key, e); } if (null == value) { throwNotExistException(key); @@ -115,26 +64,75 @@ public class ParamChecker { return value; } - public static void checkPrimaryKey(TableMeta meta, List pk) { - Map types = meta.getPrimaryKey(); + public static void checkPrimaryKey(TableMeta meta, List pk) { + Map pkNameAndTypeMapping = meta.getPrimaryKeyMap(); // 个数是否相等 - if (types.size() != pk.size()) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_COUNT_NOT_EQUAL_META_ERROR, pk.size(), types.size())); + if (pkNameAndTypeMapping.size() != pk.size()) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_COUNT_NOT_EQUAL_META_ERROR, pk.size(), pkNameAndTypeMapping.size())); } // 名字类型是否相等 - Map inputTypes = new HashMap(); - for (OTSPKColumn col : pk) { - inputTypes.put(col.getName(), col.getType()); - } - - for (Entry e : types.entrySet()) { - if (!inputTypes.containsKey(e.getKey())) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_MISSING_ERROR, e.getKey())); + for (PrimaryKeySchema col : pk) { + PrimaryKeyType type = pkNameAndTypeMapping.get(col.getName()); + if (type == null) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_MISSING_ERROR, col.getName())); } - PrimaryKeyType type = inputTypes.get(e.getKey()); - if (type != e.getValue()) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_TYPE_NOT_MATCH_META_ERROR, e.getKey(), type, e.getValue())); + if (type != col.getType()) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_TYPE_NOT_MATCH_META_ERROR, col.getName(), type, col.getType())); + } + } + } + + public static void checkVersion(OTSConf conf) { + /** + * conf检查遵循以下规则 + * 1. 旧版本插件 不支持 主键自增列 + * 2. 旧版本插件 不支持 多版本模式 + * 3. 多版本模式 不支持 主键自增列 + * 4. 旧版本插件 不支持 时序数据表 + * 5. 时序数据表 不支持 主键自增列 + */ + if (!conf.isNewVersion() && conf.getEnableAutoIncrement()) { + throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_AUTO_INCREMENT); + } + if (!conf.isNewVersion() && conf.getMode() == OTSMode.MULTI_VERSION) { + throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_MULTI_VERSION); + } + if (conf.getMode() == OTSMode.MULTI_VERSION && conf.getEnableAutoIncrement()) { + throw new IllegalArgumentException(NOT_SUPPORT_MULTI_VERSION_AUTO_INCREMENT); + } + if (!conf.isNewVersion() && conf.isTimeseriesTable()) { + throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_TIMESERIES_TABLE); + } + if (conf.isTimeseriesTable() && conf.getEnableAutoIncrement()) { + throw new IllegalArgumentException(NOT_SUPPORT_TIMESERIES_TABLE_AUTO_INCREMENT); + } + } + + public static void checkPrimaryKeyWithAutoIncrement(TableMeta meta, List pk) { + Map pkNameAndTypeMapping = meta.getPrimaryKeyMap(); + int autoIncrementKeySize = 0; + for(PrimaryKeySchema p : meta.getPrimaryKeyList()){ + if(p.hasOption()){ + autoIncrementKeySize++; + } + } + // 个数是否相等 + if (pkNameAndTypeMapping.size() != pk.size() + autoIncrementKeySize) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_COUNT_NOT_EQUAL_META_ERROR, pk.size() + autoIncrementKeySize, pkNameAndTypeMapping.size())); + } + + // 名字类型是否相等 + for (PrimaryKeySchema col : pk) { + if(col.hasOption()){ + continue; + } + PrimaryKeyType type = pkNameAndTypeMapping.get(col.getName()); + if (type == null) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_MISSING_ERROR, col.getName())); + } + if (type != col.getType()) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_TYPE_NOT_MATCH_META_ERROR, col.getName(), type, col.getType())); } } } @@ -150,4 +148,23 @@ public class ParamChecker { } } } + + public static TimeUnit checkTimeUnitAndGet(String str) { + if (null == str) { + return null; + } else if ("NANOSECONDS".equalsIgnoreCase(str)) { + return TimeUnit.NANOSECONDS; + } else if ("MICROSECONDS".equalsIgnoreCase(str)) { + return TimeUnit.MICROSECONDS; + } else if ("MILLISECONDS".equalsIgnoreCase(str)) { + return TimeUnit.MILLISECONDS; + } else if ("SECONDS".equalsIgnoreCase(str)) { + return TimeUnit.SECONDS; + } else if ("MINUTES".equalsIgnoreCase(str)) { + return TimeUnit.MINUTES; + } else { + throw new IllegalArgumentException(String.format(OTSErrorMessage.TIMEUNIT_FORMAT_ERROR, str)); + } + } + } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParseRecord.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParseRecord.java new file mode 100644 index 00000000..1f157131 --- /dev/null +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParseRecord.java @@ -0,0 +1,326 @@ +package com.alibaba.datax.plugin.writer.otswriter.utils; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; +import com.alibaba.datax.plugin.writer.otswriter.model.*; +import com.alicloud.openservices.tablestore.core.protocol.timeseries.TimeseriesResponseFactory; +import com.alicloud.openservices.tablestore.core.utils.Pair; +import com.alicloud.openservices.tablestore.model.*; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesKey; +import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + + +public class ParseRecord { + + private static final Logger LOG = LoggerFactory.getLogger(ParseRecord.class); + + private static com.alicloud.openservices.tablestore.model.Column buildColumn(String name, ColumnValue value, long timestamp) { + if (timestamp > 0) { + return new com.alicloud.openservices.tablestore.model.Column( + name, + value, + timestamp + ); + } else { + return new com.alicloud.openservices.tablestore.model.Column( + name, + value + ); + } + } + /** + * 基于普通方式处理Record + * 当PK或者Attr解析失败时,方法会返回null + * @param tableName + * @param type + * @param pkColumns + * @param attrColumns + * @param record + * @param timestamp + * @return + * @throws OTSCriticalException + */ + public static OTSLine parseNormalRecordToOTSLine( + String tableName, + OTSOpType type, + Map pkColumns, + List attrColumns, + Record record, + long timestamp) throws OTSCriticalException { + + PrimaryKey pk = Common.getPKFromRecord(pkColumns, record); + if (pk == null) { + return null; + } + List> values = Common.getAttrFromRecord(pkColumns.size(), attrColumns, record); + if (values == null) { + return null; + } + + switch (type) { + case PUT_ROW: + RowPutChange rowPutChange = new RowPutChange(tableName, pk); + for (Pair en : values) { + if (en.getSecond() != null) { + rowPutChange.addColumn(buildColumn(en.getFirst(), en.getSecond(), timestamp)); + } + } + if (rowPutChange.getColumnsToPut().isEmpty()) { + return null; + } + return new OTSLine(pk, record, rowPutChange); + case UPDATE_ROW: + RowUpdateChange rowUpdateChange = new RowUpdateChange(tableName, pk); + for (Pair en : values) { + if (en.getSecond() != null) { + rowUpdateChange.put(buildColumn(en.getFirst(), en.getSecond(), timestamp)); + } else { + rowUpdateChange.deleteColumns(en.getFirst()); // 删除整列 + } + } + return new OTSLine(pk, record, rowUpdateChange); + default: + LOG.error("Bug branch, can not support : {}(OTSOpType)", type); + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); + } + } + + + public static OTSLine parseNormalRecordToOTSLineWithAutoIncrement( + String tableName, + OTSOpType type, + Map pkColumns, + List attrColumns, + Record record, + long timestamp, + PrimaryKeySchema autoIncrementPrimaryKey) throws OTSCriticalException { + + PrimaryKey pk = Common.getPKFromRecordWithAutoIncrement(pkColumns, record, autoIncrementPrimaryKey); + if (pk == null) { + return null; + } + List> values = Common.getAttrFromRecord(pkColumns.size(), attrColumns, record); + if (values == null) { + return null; + } + + switch (type) { + case PUT_ROW: + RowPutChange rowPutChange = new RowPutChange(tableName, pk); + for (Pair en : values) { + if (en.getSecond() != null) { + rowPutChange.addColumn(buildColumn(en.getFirst(), en.getSecond(), timestamp)); + } + } + if (rowPutChange.getColumnsToPut().isEmpty()) { + return null; + } + return new OTSLine(pk, record, rowPutChange); + case UPDATE_ROW: + RowUpdateChange rowUpdateChange = new RowUpdateChange(tableName, pk); + for (Pair en : values) { + if (en.getSecond() != null) { + rowUpdateChange.put(buildColumn(en.getFirst(), en.getSecond(), timestamp)); + } else { + rowUpdateChange.deleteColumns(en.getFirst()); // 删除整列 + } + } + return new OTSLine(pk, record, rowUpdateChange); + default: + LOG.error("Bug branch, can not support : {}(OTSOpType)", type); + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); + } + } + + public static OTSLine parseNormalRecordToOTSLineOfTimeseriesTable( + List attrColumns, + Record record, + TimeUnit timeUnit + ) throws OTSCriticalException { + + if (attrColumns.size() != record.getColumnNumber()){ + throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record != count(%d) of column from config.", record.getColumnNumber(), (attrColumns.size()))); + } + + Map tags = new HashMap<>(); + String measurementName = null; + String dataSource = null; + Long timeInUs = null; + Map columnsValues = new HashMap<>(); + + try { + for (int i = 0; i < attrColumns.size(); i++) { + // 如果是tags内部字段 + if (attrColumns.get(i).getTag()){ + tags.put(attrColumns.get(i).getName(), record.getColumn(i).asString()); + } + else if (attrColumns.get(i).getName().equals(OTSConst.MEASUREMENT_NAME)){ + measurementName = record.getColumn(i).asString(); + } + else if (attrColumns.get(i).getName().equals(OTSConst.DATA_SOURCE)){ + dataSource = record.getColumn(i).asString(); + } + else if (attrColumns.get(i).getName().equals(OTSConst.TAGS)){ + String tagString = record.getColumn(i).asString(); + tags.putAll(TimeseriesResponseFactory.parseTagsOrAttrs(tagString)); + } + else if (attrColumns.get(i).getName().equals(OTSConst.TIME)){ + timeInUs = record.getColumn(i).asLong(); + } + else{ + switch (attrColumns.get(i).getType()){ + case INTEGER: + columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromLong(record.getColumn(i).asLong())); + break; + case BOOLEAN: + columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromBoolean(record.getColumn(i).asBoolean())); + break; + case DOUBLE: + columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromDouble(record.getColumn(i).asDouble())); + break; + case BINARY: + columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromBinary(record.getColumn(i).asBytes())); + break; + case STRING: + default: + columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromString(record.getColumn(i).asString())); + break; + } + } + } + // 度量名称与时间戳字段值不能为空,否则报错 + if (measurementName == null){ + throw new IllegalArgumentException("The value of the '_m_name' (measurement) field cannot be empty. Please check the input of writer"); + } + else if (timeInUs == null){ + throw new IllegalArgumentException("The value of the '_time' field cannot be empty. Please check the input of writer"); + } + } catch (IllegalArgumentException e) { + LOG.warn("getAttrFromRecord fail : {}", e.getMessage(), e); + CollectorUtil.collect(record, e.getMessage()); + return null; + } + TimeseriesKey key = new TimeseriesKey(measurementName, dataSource, tags); + TimeseriesRow row = new TimeseriesRow(key); + switch (timeUnit){ + case NANOSECONDS: + timeInUs = timeInUs / 1000; + break; + case MILLISECONDS: + timeInUs = timeInUs * 1000; + break; + case SECONDS: + timeInUs = timeInUs * 1000 * 1000; + break; + case MINUTES: + timeInUs = timeInUs * 1000 * 1000 * 60; + break; + case MICROSECONDS: + default: + break; + } + row.setTimeInUs(timeInUs); + + for (Map.Entry entry : columnsValues.entrySet()){ + row.addField(entry.getKey(), entry.getValue()); + } + + return new OTSLine(record, row); + } + + public static String getDefineCoumnName(String attrColumnNamePrefixFilter, int columnNameIndex, Record r) { + String columnName = r.getColumn(columnNameIndex).asString(); + if (attrColumnNamePrefixFilter != null) { + if (columnName.startsWith(attrColumnNamePrefixFilter) && columnName.length() > attrColumnNamePrefixFilter.length()) { + columnName = columnName.substring(attrColumnNamePrefixFilter.length()); + } else { + throw new IllegalArgumentException(String.format(OTSErrorMessage.COLUMN_NOT_DEFINE, columnName)); + } + } + return columnName; + } + + private static void appendCellToRowUpdateChange( + Map pkColumns, + String attrColumnNamePrefixFilter, + Record r, + RowUpdateChange updateChange + ) throws OTSCriticalException { + try { + String columnName = getDefineCoumnName(attrColumnNamePrefixFilter, pkColumns.size(), r); + Column timestamp = r.getColumn(pkColumns.size() + 1); + Column value = r.getColumn(pkColumns.size() + 2); + + if (timestamp.getRawData() == null) { + throw new IllegalArgumentException(OTSErrorMessage.MULTI_VERSION_TIMESTAMP_IS_EMPTY); + } + + if (value.getRawData() == null) { + updateChange.deleteColumn(columnName, timestamp.asLong()); + return; + } + + ColumnValue otsValue = ColumnConversion.columnToColumnValue(value); + + com.alicloud.openservices.tablestore.model.Column c = new com.alicloud.openservices.tablestore.model.Column( + columnName, + otsValue, + timestamp.asLong() + ); + updateChange.put(c); + return; + } catch (IllegalArgumentException e) { + LOG.warn("parseToColumn fail : {}", e.getMessage(), e); + CollectorUtil.collect(r, e.getMessage()); + return; + } catch (DataXException e) { + LOG.warn("parseToColumn fail : {}", e.getMessage(), e); + CollectorUtil.collect(r, e.getMessage()); + return; + } + } + + /** + * 基于特殊模式处理Record + * 当所有Record转换为Column失败时,方法会返回null + * @param tableName + * @param type + * @param pkColumns + * @param records + * @return + * @throws Exception + */ + public static OTSLine parseMultiVersionRecordToOTSLine( + String tableName, + OTSOpType type, + Map pkColumns, + String attrColumnNamePrefixFilter, + PrimaryKey pk, + List records) throws OTSCriticalException { + + switch(type) { + case UPDATE_ROW: + RowUpdateChange updateChange = new RowUpdateChange(tableName, pk); + for (Record r : records) { + appendCellToRowUpdateChange(pkColumns, attrColumnNamePrefixFilter, r, updateChange); + } + if (updateChange.getColumnsToUpdate().isEmpty()) { + return null; + } else { + return new OTSLine(pk, records, updateChange); + } + default: + LOG.error("Bug branch, can not support : {}(OTSOpType)", type); + throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); + } + } +} diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/RetryHelper.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/RetryHelper.java index a863b908..5f353777 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/RetryHelper.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/RetryHelper.java @@ -1,34 +1,40 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; +import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; +import com.alicloud.openservices.tablestore.ClientException; +import com.alicloud.openservices.tablestore.TableStoreException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import java.util.HashSet; import java.util.Set; import java.util.concurrent.Callable; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.alibaba.datax.plugin.writer.otswriter.model.LogExceptionManager; -import com.aliyun.openservices.ots.ClientException; -import com.aliyun.openservices.ots.OTSErrorCode; -import com.aliyun.openservices.ots.OTSException; - public class RetryHelper { private static final Logger LOG = LoggerFactory.getLogger(RetryHelper.class); private static final Set noRetryErrorCode = prepareNoRetryErrorCode(); - public static LogExceptionManager logManager = new LogExceptionManager(); - + /** + * 对重试的封装,方法需要用户传入最大重试次数,最大的重试时间。 + * 如果方法执行失败,方法会进入重试,每次重试之前,方法会sleep一段时间(sleep机制请参见 + * Common.getDelaySendMillinSeconds方法),直到重试次数达到上限,系统会抛出异常。 + * @param callable + * @param maxRetryTimes + * @param sleepInMilliSecond + * @return + * @throws Exception + */ public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { int retryTimes = 0; while (true){ - Thread.sleep(Common.getDelaySendMilliseconds(retryTimes, sleepInMilliSecond)); + Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); try { return callable.call(); } catch (Exception e) { - logManager.addException(e); + LOG.warn("Call callable fail.", e); if (!canRetry(e)){ - LOG.error("Can not retry for Exception.", e); + LOG.error("Can not retry for Exception : {}", e.getMessage()); throw e; } else if (retryTimes >= maxRetryTimes) { LOG.error("Retry times more than limition. maxRetryTimes : {}", maxRetryTimes); @@ -41,7 +47,7 @@ public class RetryHelper { } private static Set prepareNoRetryErrorCode() { - Set pool = new HashSet(); + final Set pool = new HashSet(); pool.add(OTSErrorCode.AUTHORIZATION_FAILURE); pool.add(OTSErrorCode.INVALID_PARAMETER); pool.add(OTSErrorCode.REQUEST_TOO_LARGE); @@ -63,11 +69,21 @@ public class RetryHelper { } public static boolean canRetry(Exception exception) { - OTSException e = null; - if (exception instanceof OTSException) { - e = (OTSException) exception; + TableStoreException e = null; + if (exception instanceof TableStoreException) { + e = (TableStoreException) exception; + LOG.warn( + "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", + new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} + ); return canRetry(e.getErrorCode()); + } else if (exception instanceof ClientException) { + ClientException ce = (ClientException) exception; + LOG.warn( + "ClientException:ErrorMsg:{}", + ce.getMessage() + ); return true; } else { return false; diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/WithRecord.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WithRecord.java similarity index 71% rename from otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/WithRecord.java rename to otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WithRecord.java index 2e1672a7..9bb4d4e3 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/WithRecord.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WithRecord.java @@ -1,4 +1,4 @@ -package com.alibaba.datax.plugin.writer.otswriter.model; +package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Record; diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterModelParser.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterModelParser.java index c81587b6..76d6c843 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterModelParser.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterModelParser.java @@ -1,18 +1,12 @@ package com.alibaba.datax.plugin.writer.otswriter.utils; -import java.util.ArrayList; -import java.util.HashSet; -import java.util.List; -import java.util.Map; -import java.util.Set; +import com.alibaba.datax.plugin.writer.otswriter.model.*; +import com.alicloud.openservices.tablestore.model.ColumnType; +import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; +import com.alicloud.openservices.tablestore.model.PrimaryKeyType; +import com.alicloud.openservices.tablestore.model.TableMeta; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSPKColumn; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; -import com.alibaba.datax.plugin.writer.otswriter.model.OTSOpType; -import com.aliyun.openservices.ots.model.ColumnType; -import com.aliyun.openservices.ots.model.PrimaryKeyType; +import java.util.*; /** * 解析配置中参数 @@ -26,39 +20,92 @@ public class WriterModelParser { return PrimaryKeyType.STRING; } else if (type.equalsIgnoreCase(OTSConst.TYPE_INTEGER)) { return PrimaryKeyType.INTEGER; + } else if (type.equalsIgnoreCase(OTSConst.TYPE_BINARY)) { + return PrimaryKeyType.BINARY; } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_TYPE_ERROR, type)); } } - public static OTSPKColumn parseOTSPKColumn(Map column) { - if (column.containsKey(OTSConst.NAME) && column.containsKey(OTSConst.TYPE) && column.size() == 2) { - Object type = column.get(OTSConst.TYPE); - Object name = column.get(OTSConst.NAME); - if (type instanceof String && name instanceof String) { - String typeStr = (String) type; - String nameStr = (String) name; - if (nameStr.isEmpty()) { - throw new IllegalArgumentException(OTSErrorMessage.PK_COLUMN_NAME_IS_EMPTY_ERROR); - } - return new OTSPKColumn(nameStr, parsePrimaryKeyType(typeStr)); - } else { - throw new IllegalArgumentException(OTSErrorMessage.PK_MAP_NAME_TYPE_ERROR); - } + private static Object columnGetObject(Map column, String key, String error) { + Object value = column.get(key); + + if (value == null) { + throw new IllegalArgumentException(error); + } + + return value; + } + + private static String checkString(Object value, String error) { + if (!(value instanceof String)) { + throw new IllegalArgumentException(error); + } + return (String)value; + } + + private static void checkStringEmpty(String value, String error) { + if (value.isEmpty()) { + throw new IllegalArgumentException(error); + } + } + + public static PrimaryKeySchema parseOTSPKColumn(Map column) { + String typeStr = checkString( + columnGetObject(column, OTSConst.TYPE, String.format(OTSErrorMessage.PK_MAP_FILED_MISSING_ERROR, OTSConst.TYPE)), + String.format(OTSErrorMessage.PK_MAP_KEY_TYPE_ERROR, OTSConst.TYPE) + ); + String nameStr = checkString( + columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.PK_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), + String.format(OTSErrorMessage.PK_MAP_KEY_TYPE_ERROR, OTSConst.NAME) + ); + + checkStringEmpty(typeStr, OTSErrorMessage.PK_COLUMN_TYPE_IS_EMPTY_ERROR); + checkStringEmpty(nameStr, OTSErrorMessage.PK_COLUMN_NAME_IS_EMPTY_ERROR); + + if (column.size() == 2) { + return new PrimaryKeySchema(nameStr, parsePrimaryKeyType(typeStr)); } else { throw new IllegalArgumentException(OTSErrorMessage.PK_MAP_INCLUDE_NAME_TYPE_ERROR); } } - public static List parseOTSPKColumnList(List values) { - List pks = new ArrayList(); + public static List parseOTSPKColumnList(TableMeta meta, List values) { + + Map pkMapping = meta.getPrimaryKeyMap(); + + List pks = new ArrayList(); for (Object obj : values) { - if (obj instanceof Map) { + /** + * json 中primary key格式为: + * "primaryKey":[ + * "userid", + * "groupid" + *] + */ + if (obj instanceof String) { + String name = (String) obj; + PrimaryKeyType type = pkMapping.get(name); + if (null == type) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_IS_NOT_EXIST_AT_OTS_ERROR, name)); + } else { + pks.add(new PrimaryKeySchema(name, type)); + } + } + /** + * json 中primary key格式为: + * "primaryKey" : [ + * {"name":"pk1", "type":"string"}, + * {"name":"pk2", "type":"int"} + *], + */ + else if (obj instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) obj; pks.add(parseOTSPKColumn(column)); - } else { - throw new IllegalArgumentException(OTSErrorMessage.PK_ITEM_IS_NOT_MAP_ERROR); + } + else { + throw new IllegalArgumentException(OTSErrorMessage.PK_ITEM_IS_ILLEAGAL_ERROR); } } return pks; @@ -80,60 +127,154 @@ public class WriterModelParser { } } - public static OTSAttrColumn parseOTSAttrColumn(Map column) { - if (column.containsKey(OTSConst.NAME) && column.containsKey(OTSConst.TYPE) && column.size() == 2) { - Object type = column.get(OTSConst.TYPE); - Object name = column.get(OTSConst.NAME); - if (type instanceof String && name instanceof String) { - String typeStr = (String) type; - String nameStr = (String) name; - if (nameStr.isEmpty()) { - throw new IllegalArgumentException(OTSErrorMessage.ATTR_COLUMN_NAME_IS_EMPTY_ERROR); - } - return new OTSAttrColumn(nameStr, parseColumnType(typeStr)); + public static OTSAttrColumn parseOTSAttrColumn(Map column, OTSMode mode) { + String typeStr = checkString( + columnGetObject(column, OTSConst.TYPE, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.TYPE)), + String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.TYPE) + ); + String nameStr = checkString( + columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), + String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.NAME) + ); + + checkStringEmpty(typeStr, OTSErrorMessage.ATTR_COLUMN_TYPE_IS_EMPTY_ERROR); + checkStringEmpty(nameStr, OTSErrorMessage.ATTR_COLUMN_NAME_IS_EMPTY_ERROR); + + if (mode == OTSMode.MULTI_VERSION) { + String srcNameStr = checkString( + columnGetObject(column, OTSConst.SRC_NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.SRC_NAME)), + String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.SRC_NAME) + ); + checkStringEmpty(srcNameStr, OTSErrorMessage.ATTR_COLUMN_SRC_NAME_IS_EMPTY_ERROR); + if (column.size() == 3) { + return new OTSAttrColumn(srcNameStr, nameStr, parseColumnType(typeStr)); } else { - throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_NAME_TYPE_ERROR); + throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_INCLUDE_SRCNAME_NAME_TYPE_ERROR); } } else { - throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_INCLUDE_NAME_TYPE_ERROR); + if (column.size() == 2) { + return new OTSAttrColumn(nameStr, parseColumnType(typeStr)); + } else { + throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_INCLUDE_NAME_TYPE_ERROR); + } } } - - private static void checkMultiAttrColumn(List attrs) { - Set pool = new HashSet(); - for (OTSAttrColumn col : attrs) { - if (pool.contains(col.getName())) { - throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_ATTR_COLUMN_ERROR, col.getName())); + + public static List parseOTSTimeseriesRowAttrList(List values) { + List attrs = new ArrayList(); + // columns内部必须配置_m_name与_time字段,否则报错 + boolean getMeasurementField = false; + boolean getTimeField = false; + for (Object obj : values) { + if (obj instanceof Map) { + @SuppressWarnings("unchecked") + Map column = (Map) obj; + + + String nameStr = checkString( + columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), + String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.NAME) + ); + boolean isTag = column.get(OTSConst.IS_TAG) != null && Boolean.parseBoolean((String) column.get(OTSConst.IS_TAG)); + String typeStr = "String"; + if (column.get(OTSConst.TYPE) != null){ + typeStr = (String) column.get(OTSConst.TYPE); + } + + checkStringEmpty(nameStr, OTSErrorMessage.ATTR_COLUMN_NAME_IS_EMPTY_ERROR); + + if (nameStr.equals(OTSConst.MEASUREMENT_NAME)){ + getMeasurementField = true; + } else if (nameStr.equals(OTSConst.TIME)) { + getTimeField = true; + } + + attrs.add(new OTSAttrColumn(nameStr, parseColumnType(typeStr), isTag)); } else { - pool.add(col.getName()); + throw new IllegalArgumentException(OTSErrorMessage.ATTR_ITEM_IS_NOT_MAP_ERROR); + } + } + if (!getMeasurementField){ + throw new IllegalArgumentException(OTSErrorMessage.NO_FOUND_M_NAME_FIELD_ERROR); + } else if (!getTimeField) { + throw new IllegalArgumentException(OTSErrorMessage.NO_FOUND_TIME_FIELD_ERROR); + } + return attrs; + } + + private static void checkMultiAttrColumn(List pk, List attrs, OTSMode mode) { + // duplicate column name + { + Set pool = new HashSet(); + for (OTSAttrColumn col : attrs) { + if (pool.contains(col.getName())) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_ATTR_COLUMN_ERROR, col.getName())); + } else { + pool.add(col.getName()); + } + } + for (PrimaryKeySchema col : pk) { + if (pool.contains(col.getName())) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_PK_ATTR_COLUMN_ERROR, col.getName())); + } else { + pool.add(col.getName()); + } + } + } + // duplicate src column name + if (mode == OTSMode.MULTI_VERSION) { + Set pool = new HashSet(); + for (OTSAttrColumn col : attrs) { + if (pool.contains(col.getSrcName())) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_ATTR_SRC_COLUMN_ERROR, col.getSrcName())); + } else { + pool.add(col.getSrcName()); + } } } } - public static List parseOTSAttrColumnList(List values) { + public static List parseOTSAttrColumnList(List pk, List values, OTSMode mode) { List attrs = new ArrayList(); for (Object obj : values) { if (obj instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) obj; - attrs.add(parseOTSAttrColumn(column)); + attrs.add(parseOTSAttrColumn(column, mode)); } else { throw new IllegalArgumentException(OTSErrorMessage.ATTR_ITEM_IS_NOT_MAP_ERROR); } } - checkMultiAttrColumn(attrs); + checkMultiAttrColumn(pk, attrs, mode); return attrs; } - - public static OTSOpType parseOTSOpType(String value) { + + public static OTSOpType parseOTSOpType(String value, OTSMode mode) { + OTSOpType type = null; if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_PUT)) { - return OTSOpType.PUT_ROW; + type = OTSOpType.PUT_ROW; } else if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_UPDATE)) { - return OTSOpType.UPDATE_ROW; - } else if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_DELETE)) { - return OTSOpType.DELETE_ROW; - } else { + type = OTSOpType.UPDATE_ROW; + }else if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_DELETE)) { + type = OTSOpType.DELETE_ROW; + }else { throw new IllegalArgumentException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, value)); } + + if (mode == OTSMode.MULTI_VERSION && type == OTSOpType.PUT_ROW) { + throw new IllegalArgumentException(String.format(OTSErrorMessage.MUTLI_MODE_OPERATION_PARSE_ERROR, value)); + } + return type; } + + public static OTSMode parseOTSMode(String value) { + if (value.equalsIgnoreCase(OTSConst.OTS_MODE_NORMAL)) { + return OTSMode.NORMAL; + } else if (value.equalsIgnoreCase(OTSConst.OTS_MODE_MULTI_VERSION)) { + return OTSMode.MULTI_VERSION; + } else { + throw new IllegalArgumentException(String.format(OTSErrorMessage.MODE_PARSE_ERROR, value)); + } + } + } diff --git a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/WriterRetryPolicy.java b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterRetryPolicy.java similarity index 92% rename from otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/WriterRetryPolicy.java rename to otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterRetryPolicy.java index 3aa61a68..18d12dde 100644 --- a/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/WriterRetryPolicy.java +++ b/otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterRetryPolicy.java @@ -1,4 +1,4 @@ -package com.alibaba.datax.plugin.writer.otswriter; +package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.aliyun.openservices.ots.internal.OTSRetryStrategy; diff --git a/otswriter/src/main/resources/plugin.json b/otswriter/src/main/resources/plugin.json index 315e96cc..5151b15d 100644 --- a/otswriter/src/main/resources/plugin.json +++ b/otswriter/src/main/resources/plugin.json @@ -3,4 +3,4 @@ "class": "com.alibaba.datax.plugin.writer.otswriter.OtsWriter", "description": "", "developer": "alibaba" -} \ No newline at end of file +} diff --git a/package.xml b/package.xml old mode 100755 new mode 100644 index 94c29a9a..c7e2004a --- a/package.xml +++ b/package.xml @@ -60,13 +60,6 @@ datax - - db2reader/target/datax/ - - **/*.* - - datax - postgresqlreader/target/datax/ @@ -103,13 +96,13 @@ datax - - otsstreamreader/target/datax/ - - **/*.* - - datax - + + otsstreamreader/target/datax/ + + **/*.* + + datax + txtfilereader/target/datax/ @@ -159,6 +152,13 @@ datax + + clickhousereader/target/datax/ + + **/*.* + + datax + hdfsreader/target/datax/ @@ -222,6 +222,41 @@ datax + + datahubreader/target/datax/ + + **/*.* + + datax + + + loghubreader/target/datax/ + + **/*.* + + datax + + + starrocksreader/target/datax/ + + **/*.* + + datax + + + dorisreader/target/datax/ + + **/*.* + + datax + + + sybasereader/target/datax/ + + **/*.* + + datax + @@ -245,6 +280,13 @@ datax + + starrockswriter/target/datax/ + + **/*.* + + datax + drdswriter/target/datax/ @@ -259,6 +301,13 @@ datax + + doriswriter/target/datax/ + + **/*.* + + datax + txtfilewriter/target/datax/ @@ -420,6 +469,13 @@ datax + + databendwriter/target/datax/ + + **/*.* + + datax + oscarwriter/target/datax/ @@ -455,5 +511,40 @@ datax + + datahubwriter/target/datax/ + + **/*.* + + datax + + + loghubwriter/target/datax/ + + **/*.* + + datax + + + selectdbwriter/target/datax/ + + **/*.* + + datax + + + neo4jwriter/target/datax/ + + **/*.* + + datax + + + sybasewriter/target/datax/ + + **/*.* + + datax + diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ObVersion.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ObVersion.java new file mode 100644 index 00000000..0eb34feb --- /dev/null +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ObVersion.java @@ -0,0 +1,88 @@ +package com.alibaba.datax.plugin.rdbms.reader.util; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * @author johnrobbet + */ +public class ObVersion implements Comparable { + + private static final Logger LOG = LoggerFactory.getLogger(ObVersion.class); + + private int majorVersion; + private int minorVersion; + private int releaseNumber; + private int patchNumber; + + public static final ObVersion V2276 = valueOf("2.2.76"); + public static final ObVersion V4000 = valueOf("4.0.0.0"); + + private static final ObVersion DEFAULT_VERSION = + valueOf(System.getProperty("defaultObVersion","3.2.3.0")); + + private static final int VERSION_PART_COUNT = 4; + + public ObVersion(String version) { + try { + String[] versionParts = version.split("\\."); + majorVersion = Integer.valueOf(versionParts[0]); + minorVersion = Integer.valueOf(versionParts[1]); + releaseNumber = Integer.valueOf(versionParts[2]); + int tempPatchNum = 0; + if (versionParts.length == VERSION_PART_COUNT) { + try { + tempPatchNum = Integer.valueOf(versionParts[3]); + } catch (Exception e) { + LOG.warn("fail to parse ob version: " + e.getMessage()); + } + } + patchNumber = tempPatchNum; + } catch (Exception ex) { + LOG.warn("fail to get ob version, using default {} {}", + DEFAULT_VERSION, ex.getMessage()); + majorVersion = DEFAULT_VERSION.majorVersion; + minorVersion = DEFAULT_VERSION.minorVersion; + releaseNumber = DEFAULT_VERSION.releaseNumber; + patchNumber = DEFAULT_VERSION.patchNumber; + } + } + + public static ObVersion valueOf(String version) { + return new ObVersion(version); + } + + @Override + public int compareTo(ObVersion o) { + if (this.majorVersion > o.majorVersion) { + return 1; + } else if (this.majorVersion < o.majorVersion) { + return -1; + } + + if (this.minorVersion > o.minorVersion) { + return 1; + } else if (this.minorVersion < o.minorVersion) { + return -1; + } + + if (this.releaseNumber > o.releaseNumber) { + return 1; + } else if (this.releaseNumber < o.releaseNumber) { + return -1; + } + + if (this.patchNumber > o.patchNumber) { + return 1; + } else if (this.patchNumber < o.patchNumber) { + return -1; + } + + return 0; + } + + @Override + public String toString() { + return String.format("%d.%d.%d.%d", majorVersion, minorVersion, releaseNumber, patchNumber); + } +} diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/OriginalConfPretreatmentUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/OriginalConfPretreatmentUtil.java index 3ac5f2af..ef3a876d 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/OriginalConfPretreatmentUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/OriginalConfPretreatmentUtil.java @@ -261,7 +261,7 @@ public final class OriginalConfPretreatmentUtil { // 混合配制 table 和 querySql if (!ListUtil.checkIfValueSame(tableModeFlags) - || !ListUtil.checkIfValueSame(tableModeFlags)) { + || !ListUtil.checkIfValueSame(querySqlModeFlags)) { throw DataXException.asDataXException(DBUtilErrorCode.TABLE_QUERYSQL_MIXED, "您配置凌乱了. 不能同时既配置table又配置querySql. 请检查您的配置并作出修改."); } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/SingleTableSplitUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/SingleTableSplitUtil.java old mode 100755 new mode 100644 index 7e09cce5..844b6cfd --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/SingleTableSplitUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/SingleTableSplitUtil.java @@ -5,8 +5,9 @@ import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.*; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; +import java.text.MessageFormat; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; @@ -20,6 +21,7 @@ import java.sql.ResultSetMetaData; import java.sql.Types; import java.util.ArrayList; import java.util.List; +import static org.apache.commons.lang3.StringUtils.EMPTY; public class SingleTableSplitUtil { private static final Logger LOG = LoggerFactory @@ -277,7 +279,24 @@ public class SingleTableSplitUtil { String splitPK = configuration.getString(Key.SPLIT_PK).trim(); String table = configuration.getString(Key.TABLE).trim(); String where = configuration.getString(Key.WHERE, null); - return genPKSql(splitPK,table,where); + String obMode = configuration.getString("obCompatibilityMode"); + // OceanBase对SELECT MIN(%s),MAX(%s) FROM %s这条sql没有做查询改写,会进行全表扫描,在数据量的时候查询耗时很大甚至超时; + // 所以对于OceanBase数据库,查询模板需要改写为分别查询最大值和最小值。这样可以提升查询数量级的性能。 + if (DATABASE_TYPE == DataBaseType.OceanBase && StringUtils.isNotEmpty(obMode)) { + boolean isOracleMode = "ORACLE".equalsIgnoreCase(obMode); + + String minMaxTemplate = isOracleMode ? "select v2.id as min_a, v1.id as max_a from (" + + "select * from (select %s as id from %s {0} order by id desc) where rownum =1 ) v1," + + "(select * from (select %s as id from %s order by id asc) where rownum =1 ) v2;" : + "select v2.id as min_a, v1.id as max_a from (select %s as id from %s {0} order by id desc limit 1) v1," + + "(select %s as id from %s order by id asc limit 1) v2;"; + + String pkRangeSQL = String.format(minMaxTemplate, splitPK, table, splitPK, table); + String whereString = StringUtils.isNotBlank(where) ? String.format("WHERE (%s AND %s IS NOT NULL)", where, splitPK) : EMPTY; + pkRangeSQL = MessageFormat.format(pkRangeSQL, whereString); + return pkRangeSQL; + } + return genPKSql(splitPK, table, where); } public static String genPKSql(String splitPK, String table, String where){ diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java index 2392d1ca..12a3aa74 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java @@ -380,6 +380,9 @@ public final class DBUtil { // unit ms prop.put("oracle.jdbc.ReadTimeout", socketTimeout); } + if (dataBaseType == DataBaseType.OceanBase) { + url = url.replace("jdbc:mysql:", "jdbc:oceanbase:"); + } return connect(dataBaseType, url, prop); } @@ -717,6 +720,11 @@ public final class DBUtil { new ArrayList(), String.class); DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); break; + case SQLServer: + sessionConfig = config.getList(Key.SESSION, + new ArrayList(), String.class); + DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); + break; default: break; } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java index 205919fe..dec2353d 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java @@ -18,12 +18,17 @@ public enum DataBaseType { PostgreSQL("postgresql", "org.postgresql.Driver"), RDBMS("rdbms", "com.alibaba.datax.plugin.rdbms.util.DataBaseType"), DB2("db2", "com.ibm.db2.jcc.DB2Driver"), + ADB("adb","com.mysql.jdbc.Driver"), ADS("ads","com.mysql.jdbc.Driver"), ClickHouse("clickhouse", "ru.yandex.clickhouse.ClickHouseDriver"), KingbaseES("kingbasees", "com.kingbase8.Driver"), Oscar("oscar", "com.oscar.Driver"), - OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"); - + OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"), + StarRocks("starrocks", "com.mysql.jdbc.Driver"), + Sybase("sybase", "com.sybase.jdbc4.jdbc.SybDriver"), + GaussDB("gaussdb", "org.opengauss.Driver"), + Databend("databend", "com.databend.jdbc.DatabendDriver"), + Doris("doris","com.mysql.jdbc.Driver"); private String typeName; private String driverClassName; @@ -67,6 +72,12 @@ public enum DataBaseType { break; case Oscar: break; + case StarRocks: + break; + case GaussDB: + break; + case Doris: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -86,6 +97,14 @@ public enum DataBaseType { result = jdbc + "?" + suffix; } break; + case ADB: + suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false"; + if (jdbc.contains("?")) { + result = jdbc + "&" + suffix; + } else { + result = jdbc + "?" + suffix; + } + break; case DRDS: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull"; if (jdbc.contains("?")) { @@ -106,6 +125,8 @@ public enum DataBaseType { break; case RDBMS: break; + case Databend: + break; case KingbaseES: break; case Oscar: @@ -118,6 +139,10 @@ public enum DataBaseType { result = jdbc + "?" + suffix; } break; + case Sybase: + break; + case GaussDB: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -145,6 +170,8 @@ public enum DataBaseType { case KingbaseES: case Oscar: break; + case GaussDB: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -170,6 +197,8 @@ public enum DataBaseType { case KingbaseES: case Oscar: break; + case GaussDB: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); } @@ -196,6 +225,8 @@ public enum DataBaseType { break; case Oscar: break; + case GaussDB: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsException.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsException.java index 4b6601ad..7091bd5a 100644 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsException.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsException.java @@ -69,32 +69,32 @@ public class RdbmsException extends DataXException{ } public static DataXException asQueryException(DataBaseType dataBaseType, Exception e,String querySql,String table,String userName){ - if (dataBaseType.equals(DataBaseType.MySql)){ + if (dataBaseType.equals(DataBaseType.MySql)) { DBUtilErrorCode dbUtilErrorCode = mySqlQueryErrorAna(e.getMessage()); - if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_TABLE_NAME_ERROR && table != null){ - return DataXException.asDataXException(dbUtilErrorCode,"表名为:"+table+" 执行的SQL为:"+querySql+" 具体错误信息为:"+e); + if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_TABLE_NAME_ERROR && table != null) { + return DataXException.asDataXException(dbUtilErrorCode, "表名为:" + table + " 执行的SQL为:" + querySql + " 具体错误信息为:" + e, e); } - if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_SELECT_PRI_ERROR && userName != null){ - return DataXException.asDataXException(dbUtilErrorCode,"用户名为:"+userName+" 具体错误信息为:"+e); + if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_SELECT_PRI_ERROR && userName != null) { + return DataXException.asDataXException(dbUtilErrorCode, "用户名为:" + userName + " 具体错误信息为:" + e, e); } - return DataXException.asDataXException(dbUtilErrorCode,"执行的SQL为: "+querySql+" 具体错误信息为:"+e); + return DataXException.asDataXException(dbUtilErrorCode, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } - if (dataBaseType.equals(DataBaseType.Oracle)){ + if (dataBaseType.equals(DataBaseType.Oracle)) { DBUtilErrorCode dbUtilErrorCode = oracleQueryErrorAna(e.getMessage()); - if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_TABLE_NAME_ERROR && table != null){ - return DataXException.asDataXException(dbUtilErrorCode,"表名为:"+table+" 执行的SQL为:"+querySql+" 具体错误信息为:"+e); + if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_TABLE_NAME_ERROR && table != null) { + return DataXException.asDataXException(dbUtilErrorCode, "表名为:" + table + " 执行的SQL为:" + querySql + " 具体错误信息为:" + e, e); } - if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_SELECT_PRI_ERROR){ - return DataXException.asDataXException(dbUtilErrorCode,"用户名为:"+userName+" 具体错误信息为:"+e); + if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_SELECT_PRI_ERROR) { + return DataXException.asDataXException(dbUtilErrorCode, "用户名为:" + userName + " 具体错误信息为:" + e, e); } - return DataXException.asDataXException(dbUtilErrorCode,"执行的SQL为: "+querySql+" 具体错误信息为:"+e); + return DataXException.asDataXException(dbUtilErrorCode, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } - return DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, "执行的SQL为: "+querySql+" 具体错误信息为:"+e); + return DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } public static DBUtilErrorCode mySqlQueryErrorAna(String e){ diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java index 27b88f44..7b84c320 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java @@ -12,6 +12,7 @@ import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; +import java.util.concurrent.atomic.AtomicLong; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Triple; import org.slf4j.Logger; @@ -199,6 +200,9 @@ public class CommonRdbmsWriter { protected boolean emptyAsNull; protected Triple, List, List> resultSetMetaData; + private int dumpRecordLimit = Constant.DEFAULT_DUMP_RECORD_LIMIT; + private AtomicLong dumpRecordCount = new AtomicLong(0); + public Task(DataBaseType dataBaseType) { this.dataBaseType = dataBaseType; } @@ -209,7 +213,7 @@ public class CommonRdbmsWriter { this.jdbcUrl = writerSliceConfig.getString(Key.JDBC_URL); //ob10的处理 - if (this.jdbcUrl.startsWith(Constant.OB10_SPLIT_STRING) && this.dataBaseType == DataBaseType.MySql) { + if (this.jdbcUrl.startsWith(Constant.OB10_SPLIT_STRING)) { String[] ss = this.jdbcUrl.split(Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException @@ -368,7 +372,11 @@ public class CommonRdbmsWriter { } } - protected void doOneInsert(Connection connection, List buffer) { + public boolean needToDumpRecord() { + return dumpRecordCount.incrementAndGet() <= dumpRecordLimit; + } + + public void doOneInsert(Connection connection, List buffer) { PreparedStatement preparedStatement = null; try { connection.setAutoCommit(true); @@ -381,7 +389,10 @@ public class CommonRdbmsWriter { preparedStatement, record); preparedStatement.execute(); } catch (SQLException e) { - LOG.debug(e.toString()); + if (needToDumpRecord()) { + LOG.warn("ERROR : record {}", record); + LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); + } this.taskPluginCollector.collectDirtyRecord(record, e); } finally { @@ -409,11 +420,6 @@ public class CommonRdbmsWriter { return preparedStatement; } - protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, - int columnSqltype, Column column) throws SQLException { - return fillPreparedStatementColumnType(preparedStatement, columnIndex, columnSqltype, null, column); - } - protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; @@ -524,7 +530,7 @@ public class CommonRdbmsWriter { break; case Types.BOOLEAN: - preparedStatement.setString(columnIndex + 1, column.asString()); + preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); break; // warn: bit(1) -> Types.BIT 可使用setBoolean diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Constant.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Constant.java index 0e4692e2..9510fd14 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Constant.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Constant.java @@ -19,4 +19,5 @@ public final class Constant { public static final String OB10_SPLIT_STRING = "||_dsc_ob10_dsc_||"; public static final String OB10_SPLIT_STRING_PATTERN = "\\|\\|_dsc_ob10_dsc_\\|\\|"; + public static final int DEFAULT_DUMP_RECORD_LIMIT = 10; } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Key.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Key.java index 25a2ab52..3c282d5d 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Key.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Key.java @@ -11,6 +11,8 @@ public final class Key { public final static String COLUMN = "column"; + public final static String ONCONFLICT_COLUMN = "onConflictColumn"; + //可选值为:insert,replace,默认为 insert (mysql 支持,oracle 没用 replace 机制,只能 insert,oracle 可以不暴露这个参数) public final static String WRITE_MODE = "writeMode"; diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java index 34d1b3af..556e50ac 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java @@ -10,6 +10,7 @@ import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import java.sql.Connection; import java.util.ArrayList; import java.util.List; @@ -120,9 +121,15 @@ public final class OriginalConfPretreatmentUtil { } else { // 确保用户配置的 column 不重复 ListUtil.makeSureNoValueDuplicate(userConfiguredColumns, false); + Connection connection = null; + try { + connection = connectionFactory.getConnecttion(); + // 检查列是否都为数据库表中正确的列(通过执行一次 select column from table 进行判断) + DBUtil.getColumnMetaData(connection, oneTable,StringUtils.join(userConfiguredColumns, ",")); + } finally { + DBUtil.closeDBResources(null, connection); + } - // 检查列是否都为数据库表中正确的列(通过执行一次 select column from table 进行判断) - DBUtil.getColumnMetaData(connectionFactory.getConnecttion(), oneTable,StringUtils.join(userConfiguredColumns, ",")); } } } diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ColumnEntry.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ColumnEntry.java index ee3af816..6bfc1bb9 100644 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ColumnEntry.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ColumnEntry.java @@ -1,11 +1,11 @@ package com.alibaba.datax.plugin.unstructuredstorage.reader; +import com.alibaba.fastjson2.JSON; +import org.apache.commons.lang3.StringUtils; + import java.text.DateFormat; import java.text.SimpleDateFormat; -import org.apache.commons.lang3.StringUtils; - -import com.alibaba.fastjson.JSON; public class ColumnEntry { private Integer index; @@ -13,6 +13,15 @@ public class ColumnEntry { private String value; private String format; private DateFormat dateParse; + private String name; + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } public Integer getIndex() { return index; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Key.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Key.java index 71e13ad2..0945779b 100755 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Key.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Key.java @@ -87,4 +87,7 @@ public class Key { public static final String TAR_FILE_FILTER_PATTERN = "tarFileFilterPattern"; public static final String ENABLE_INNER_SPLIT = "enableInnerSplit"; + public static final String HIVE_PARTION_COLUMN = "hivePartitionColumn"; + + } diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderUtil.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderUtil.java index 645971d0..27f4c48a 100755 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderUtil.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderUtil.java @@ -5,9 +5,9 @@ import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; +import com.alibaba.fastjson2.TypeReference; import com.csvreader.CsvReader; import org.apache.commons.beanutils.BeanUtils; import io.airlift.compress.snappy.SnappyCodec; @@ -715,4 +715,70 @@ public class UnstructuredStorageReaderUtil { public static void setSourceFile(Configuration configuration, List sourceFiles){ configuration.set(Constant.SOURCE_FILE, sourceFiles); } + + public static ArrayList getHivePartitionColumns(String filePath, List hivePartitionColumnEntrys) { + ArrayList hivePartitionColumns = new ArrayList<>(); + + if (null == hivePartitionColumnEntrys) { + return hivePartitionColumns; + } + + // 对于分区列pt,则从path中找/pt=xxx/,xxx即分区列的值,另外确认在path中只有一次出现 + + for (ColumnEntry columnEntry : hivePartitionColumnEntrys) { + String parColName = columnEntry.getValue(); + String patten = String.format("/%s=", parColName); + int index = filePath.indexOf(patten); + if (index != filePath.lastIndexOf(patten)) { + throw new DataXException(String.format("Found multiple partition folder in filePath %s, partition: %s", filePath, parColName)); + } + + String subPath = filePath.substring(index + 1); + int firstSeparatorIndex = subPath.indexOf(File.separator); + if (firstSeparatorIndex > 0) { + subPath = subPath.substring(0, firstSeparatorIndex); + } + + if (subPath.split("=").length != 2) { + throw new DataXException(String.format("Found partition column value in filePath %s failed, partition: %s", filePath, parColName)); + } + String parColVal = subPath.split("=")[1]; + + String colType = columnEntry.getType().toUpperCase(); + Type type = Type.valueOf(colType); + + Column generateColumn; + switch (type) { + case STRING: + generateColumn = new StringColumn(parColVal); + break; + + case DOUBLE: + generateColumn = new DoubleColumn(parColVal); + break; + + case LONG: + generateColumn = new LongColumn(parColVal); + break; + + case BOOLEAN: + generateColumn = new BoolColumn(parColVal); + break; + + case DATE: + generateColumn = new DateColumn(new StringColumn(parColVal.toString()).asDate()); + break; + + default: + String errorMessage = String.format("The column type you configured is not currently supported: %s", parColVal); + LOG.error(errorMessage); + throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.NOT_SUPPORT_TYPE, errorMessage); + } + + hivePartitionColumns.add(generateColumn); + } + + return hivePartitionColumns; + } + } diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/UnstructuredSplitUtil.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/UnstructuredSplitUtil.java index 8087ed63..4e42583d 100644 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/UnstructuredSplitUtil.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/UnstructuredSplitUtil.java @@ -5,7 +5,7 @@ import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RangeSplitUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.io.FileUtils; import org.apache.commons.lang3.tuple.ImmutableTriple; import org.apache.commons.lang3.tuple.Triple; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/ColumnTypeUtil.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/ColumnTypeUtil.java index 8215bc36..a03bf07e 100644 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/ColumnTypeUtil.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/ColumnTypeUtil.java @@ -2,8 +2,8 @@ package com.alibaba.datax.plugin.unstructuredstorage.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.ColumnEntry; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONObject; import java.util.ArrayList; import java.util.List; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Constant.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Constant.java index 092fbfd7..a485c124 100755 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Constant.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Constant.java @@ -12,9 +12,13 @@ public class Constant { public static final String FILE_FORMAT_TEXT = "text"; + public static final String FILE_FORMAT_SQL = "sql"; + //每个分块10MB,最大10000个分块, MAX_FILE_SIZE 单位: MB public static final Long MAX_FILE_SIZE = 10 * 10000L; + public static final int DEFAULT_COMMIT_SIZE = 2000; + public static final String DEFAULT_SUFFIX = ""; public static final String TRUNCATE = "truncate"; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Key.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Key.java index 125957f1..ee97abd8 100755 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Key.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Key.java @@ -5,12 +5,16 @@ public class Key { // must have public static final String FILE_NAME = "fileName"; + public static final String TABLE_NAME = "table"; + // must have public static final String WRITE_MODE = "writeMode"; // not must , not default , public static final String FIELD_DELIMITER = "fieldDelimiter"; + public static final String QUOTE_CHARACTER = "quoteChar"; + // not must , default os's line delimiter public static final String LINE_DELIMITER = "lineDelimiter"; @@ -38,6 +42,8 @@ public class Key { // writer maxFileSize public static final String MAX_FILE_SIZE = "maxFileSize"; + + public static final String COMMIT_SIZE = "commitSize"; // writer file type suffix, like .txt .csv public static final String SUFFIX = "suffix"; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/SqlWriter.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/SqlWriter.java new file mode 100644 index 00000000..18a9c1be --- /dev/null +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/SqlWriter.java @@ -0,0 +1,76 @@ +package com.alibaba.datax.plugin.unstructuredstorage.writer; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.Writer; +import java.util.List; +import java.util.stream.Collectors; + +public class SqlWriter implements UnstructuredWriter { + private static final Logger LOG = LoggerFactory.getLogger(SqlWriter.class); + + private Writer sqlWriter; + private String quoteChar; + private String lineSeparator; + private String tableName; + private String nullFormat; + private StringBuilder insertPrefix; + + public SqlWriter(Writer writer, String quoteChar, String tableName, String lineSeparator, List columnNames, String nullFormat) { + this.sqlWriter = writer; + this.quoteChar = quoteChar; + this.lineSeparator = lineSeparator; + this.tableName = quoteChar + tableName + quoteChar; + this.nullFormat = nullFormat; + buildInsertPrefix(columnNames); + } + + @Override + public void writeOneRecord(List splitedRows) throws IOException { + if (splitedRows.isEmpty()) { + LOG.info("Found one record line which is empty."); + return; + } + + StringBuilder sqlPatten = new StringBuilder(4096).append(insertPrefix); + sqlPatten.append(splitedRows.stream().map(e -> { + if (nullFormat.equals(e)) { + return "NULL"; + } + return "'" + DataXCsvWriter.replace(e, "'", "''") + "'"; + }).collect(Collectors.joining(","))); + sqlPatten.append(");").append(lineSeparator); + this.sqlWriter.write(sqlPatten.toString()); + } + + private void buildInsertPrefix(List columnNames) { + StringBuilder sb = new StringBuilder(columnNames.size() * 32); + + for (String columnName : columnNames) { + if (sb.length() > 0) { + sb.append(","); + } + sb.append(quoteChar).append(columnName).append(quoteChar); + } + + int capacity = 16 + tableName.length() + sb.length(); + this.insertPrefix = new StringBuilder(capacity); + this.insertPrefix.append("INSERT INTO ").append(tableName).append(" (").append(sb).append(")").append(" VALUES("); + } + + public void appendCommit() throws IOException { + this.sqlWriter.write("commit;" + lineSeparator); + } + + @Override + public void flush() throws IOException { + this.sqlWriter.flush(); + } + + @Override + public void close() throws IOException { + this.sqlWriter.close(); + } +} diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/TextCsvWriterManager.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/TextCsvWriterManager.java index 167a7a87..4a9b9197 100644 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/TextCsvWriterManager.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/TextCsvWriterManager.java @@ -6,8 +6,8 @@ import java.util.HashMap; import java.util.List; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.TypeReference; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.TypeReference; import org.apache.commons.beanutils.BeanUtils; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; diff --git a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterUtil.java b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterUtil.java index e9040662..e74e5698 100755 --- a/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterUtil.java +++ b/plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterUtil.java @@ -10,7 +10,10 @@ import java.util.Set; import java.util.UUID; import com.alibaba.datax.common.element.BytesColumn; + +import com.google.common.base.Preconditions; import org.apache.commons.codec.binary.Base64; +import org.apache.commons.collections.CollectionUtils; import org.apache.commons.compress.compressors.CompressorOutputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; @@ -90,7 +93,8 @@ public class UnstructuredStorageWriterUtil { writerConfiguration.set(Key.FILE_FORMAT, fileFormat); } if (!Constant.FILE_FORMAT_CSV.equals(fileFormat) - && !Constant.FILE_FORMAT_TEXT.equals(fileFormat)) { + && !Constant.FILE_FORMAT_TEXT.equals(fileFormat) + && !Constant.FILE_FORMAT_SQL.equals(fileFormat)) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, String.format("unsupported fileFormat %s ", fileFormat)); } @@ -232,22 +236,31 @@ public class UnstructuredStorageWriterUtil { // warn: default false String fileFormat = config.getString(Key.FILE_FORMAT, Constant.FILE_FORMAT_TEXT); - + boolean isSqlFormat = Constant.FILE_FORMAT_SQL.equalsIgnoreCase(fileFormat); + int commitSize = config.getInt(Key.COMMIT_SIZE, Constant.DEFAULT_COMMIT_SIZE); UnstructuredWriter unstructuredWriter = produceUnstructuredWriter(fileFormat, config, writer); List headers = config.getList(Key.HEADER, String.class); - if (null != headers && !headers.isEmpty()) { + if (null != headers && !headers.isEmpty() && !isSqlFormat) { unstructuredWriter.writeOneRecord(headers); } Record record = null; + int receivedCount = 0; String byteEncoding = config.getString(Key.BYTE_ENCODING); while ((record = lineReceiver.getFromReader()) != null) { UnstructuredStorageWriterUtil.transportOneRecord(record, nullFormat, dateParse, taskPluginCollector, unstructuredWriter, byteEncoding); + receivedCount++; + if (isSqlFormat && receivedCount % commitSize == 0) { + ((SqlWriter) unstructuredWriter).appendCommit(); + } } + if (isSqlFormat) { + ((SqlWriter)unstructuredWriter).appendCommit(); + } // warn:由调用方控制流的关闭 // IOUtils.closeQuietly(unstructuredWriter); } @@ -262,6 +275,16 @@ public class UnstructuredStorageWriterUtil { String fieldDelimiter = config.getString(Key.FIELD_DELIMITER, String.valueOf(Constant.DEFAULT_FIELD_DELIMITER)); unstructuredWriter = TextCsvWriterManager.produceTextWriter(writer, fieldDelimiter, config); + } else if (StringUtils.equalsIgnoreCase(fileFormat, Constant.FILE_FORMAT_SQL)) { + String tableName = config.getString(Key.TABLE_NAME); + Preconditions.checkArgument(StringUtils.isNotEmpty(tableName), "table name is empty"); + String quoteChar = config.getString(Key.QUOTE_CHARACTER); + Preconditions.checkArgument(StringUtils.isNotEmpty(quoteChar), "quote character is empty"); + String lineSeparator = config.getString(Key.LINE_DELIMITER, IOUtils.LINE_SEPARATOR); + List headers = config.getList(Key.HEADER, String.class); + Preconditions.checkArgument(CollectionUtils.isNotEmpty(headers), "column names are empty"); + String nullFormat = config.getString(Key.NULL_FORMAT, Constant.DEFAULT_NULL_FORMAT); + unstructuredWriter = new SqlWriter(writer, quoteChar, tableName, lineSeparator, headers, nullFormat); } return unstructuredWriter; diff --git a/pom.xml b/pom.xml index f9ec42ca..018af9bf 100644 --- a/pom.xml +++ b/pom.xml @@ -22,7 +22,7 @@ 3.3.2 1.10 1.2 - 1.2.49 + 2.0.23 16.0.1 3.7.2.1-SNAPSHOT @@ -70,6 +70,7 @@ ftpreader txtfilereader streamreader + clickhousereader mongodbreader tdengine20reader @@ -77,19 +78,22 @@ gdbreader tsdbreader opentsdbreader - - + loghubreader + datahubreader + starrocksreader + sybasereader + dorisreader mysqlwriter + starrockswriter drdswriter + databendwriter oraclewriter sqlserverwriter postgresqlwriter kingbaseeswriter adswriter oceanbasev10writer - cassandrawriter - clickhousewriter adbpgwriter hologresjdbcwriter rdbmswriter @@ -116,10 +120,22 @@ tsdbwriter gdbwriter oscarwriter - + loghubwriter + datahubwriter + cassandrawriter + clickhousewriter + doriswriter + selectdbwriter + adbmysqlwriter + sybasewriter + neo4jwriter plugin-rdbms-util plugin-unstructured-storage-util + gaussdbreader + gaussdbwriter + datax-example + @@ -130,8 +146,8 @@ ${commons-lang3-version} - com.alibaba - fastjson + com.alibaba.fastjson2 + fastjson2 ${fastjson-version} - com.dm - dm - system - ${basedir}/src/main/libs/Dm7JdbcDriver16.jar + com.dameng + Dm7JdbcDriver17 + 7.6.0.142 + com.sybase jconn3 @@ -38,13 +39,20 @@ system ${basedir}/src/main/libs/jconn3-1.0.0-SNAPSHOT.jar - + + ppas ppas 16 system ${basedir}/src/main/libs/edb-jdbc16.jar + + + com.ibm.db2.jcc + db2jcc + db2jcc4 + org.slf4j @@ -97,13 +105,4 @@ - - - - com.dm - dm - 16 - - - diff --git a/rdbmsreader/src/main/libs/Dm7JdbcDriver16.jar b/rdbmsreader/src/main/libs/Dm7JdbcDriver16.jar deleted file mode 100755 index 30740dcd..00000000 Binary files a/rdbmsreader/src/main/libs/Dm7JdbcDriver16.jar and /dev/null differ diff --git a/rdbmsreader/src/main/libs/db2jcc4.jar b/rdbmsreader/src/main/libs/db2jcc4.jar deleted file mode 100755 index fc53cfd9..00000000 Binary files a/rdbmsreader/src/main/libs/db2jcc4.jar and /dev/null differ diff --git a/rdbmsreader/src/main/resources/plugin.json b/rdbmsreader/src/main/resources/plugin.json index d344dd86..f79a6ace 100755 --- a/rdbmsreader/src/main/resources/plugin.json +++ b/rdbmsreader/src/main/resources/plugin.json @@ -3,5 +3,5 @@ "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", - "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver"] + "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver", "com.ibm.db2.jcc.DB2Driver"] } diff --git a/rdbmswriter/pom.xml b/rdbmswriter/pom.xml index 19461960..a74838b7 100755 --- a/rdbmswriter/pom.xml +++ b/rdbmswriter/pom.xml @@ -25,27 +25,34 @@ + - com.dm - dm + com.dameng + Dm7JdbcDriver17 + 7.6.0.142 + + + + com.sybase + jconn3 + 1.0.0-SNAPSHOT + system + ${basedir}/src/main/libs/jconn3-1.0.0-SNAPSHOT.jar + + + + ppas + ppas 16 system - ${basedir}/src/main/libs/Dm7JdbcDriver16.jar + ${basedir}/src/main/libs/edb-jdbc16.jar + - com.sybase - jconn3 - 1.0.0-SNAPSHOT - system - ${basedir}/src/main/libs/jconn3-1.0.0-SNAPSHOT.jar - - - ppas - ppas - 16 - system - ${basedir}/src/main/libs/edb-jdbc16.jar - + com.ibm.db2.jcc + db2jcc + db2jcc4 + org.slf4j diff --git a/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/SubCommonRdbmsWriter.java b/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/SubCommonRdbmsWriter.java index f1fbc552..88e50f11 100755 --- a/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/SubCommonRdbmsWriter.java +++ b/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/SubCommonRdbmsWriter.java @@ -29,7 +29,7 @@ public class SubCommonRdbmsWriter extends CommonRdbmsWriter { @Override protected PreparedStatement fillPreparedStatementColumnType( PreparedStatement preparedStatement, int columnIndex, - int columnSqltype, Column column) throws SQLException { + int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; try { switch (columnSqltype) { diff --git a/rdbmswriter/src/main/libs/Dm7JdbcDriver16.jar b/rdbmswriter/src/main/libs/Dm7JdbcDriver16.jar deleted file mode 100755 index 30740dcd..00000000 Binary files a/rdbmswriter/src/main/libs/Dm7JdbcDriver16.jar and /dev/null differ diff --git a/rdbmswriter/src/main/libs/db2jcc4.jar b/rdbmswriter/src/main/libs/db2jcc4.jar deleted file mode 100755 index fc53cfd9..00000000 Binary files a/rdbmswriter/src/main/libs/db2jcc4.jar and /dev/null differ diff --git a/rdbmswriter/src/main/resources/plugin.json b/rdbmswriter/src/main/resources/plugin.json index fa771af2..bf32140a 100755 --- a/rdbmswriter/src/main/resources/plugin.json +++ b/rdbmswriter/src/main/resources/plugin.json @@ -3,5 +3,5 @@ "class": "com.alibaba.datax.plugin.reader.rdbmswriter.RdbmsWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", - "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver"] + "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver", "com.ibm.db2.jcc.DB2Driver"] } diff --git a/selectdbwriter/doc/selectdbwriter.md b/selectdbwriter/doc/selectdbwriter.md new file mode 100644 index 00000000..cdf39263 --- /dev/null +++ b/selectdbwriter/doc/selectdbwriter.md @@ -0,0 +1,428 @@ +# SelectdbWriter 插件文档 + +## 1 快速介绍 +SelectdbWriter支持将大批量数据写入SELECTDB中。 + +## 2 实现原理 +SelectdbWriter 通过调用selectdb api (/copy/upload),返回一个重定向的S3地址,使用Http向S3地址发送字节流,设置参数达到要求时执行copy into + +## 3 编译 + +1. 运行 init-env.sh + +2. 编译 selectdbwriter: + +i. 单独编译 selectdbwriter 插件: + + ```text + mvn clean install -pl plugin-rdbms-util,selectdbwriter -DskipTests + ``` + + +ii.编译整个 DataX 项目: + + ```text + mvn package assembly:assembly -Dmaven.test.skip=true + ``` +产出在 target/datax/datax/. +hdfsreader, hdfswriter and oscarwriter 这三个插件需要额外的jar包。如果你并不需要这些插件,可以在 DataX/pom.xml 中删除这些插件的模块。 + + +iii.编译错误 + +如遇到如下编译错误: + ```text + Could not find artifact com.alibaba.datax:datax-all:pom:0.0.1-SNAPSHOT + ``` + +可尝试以下方式解决: + +a.下载 alibaba-datax-maven-m2-20210928.tar.gz + +b.解压后,将得到的 alibaba/datax/ 目录,拷贝到所使用的 maven 对应的 .m2/repository/com/alibaba/ 下。 + +c.再次尝试编译。 + +## 3 功能说明 + +### 3.1 配置样例 + +这里是一份从Stream读取数据后导入至selectdb的配置文件。 + +``` +{ + "job":{ + "content":[ + { + "reader":{ + "name":"streamreader", + "parameter":{ + "column":[ + { + "type":"string", + "random":"0,31" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"long", + "random":"0,5" + }, + { + "type":"string", + "random":"0,10" + }, + { + "type":"string", + "random":"0,5" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"string", + "random":"0,21" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"long", + "random":"0,20" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"string", + "random":"0,10" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"long", + "random":"0,10" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"string", + "random":"0,100" + }, + { + "type":"string", + "random":"0,1" + }, + { + "type":"long", + "random":"0,1" + }, + { + "type":"string", + "random":"0,64" + }, + { + "type":"string", + "random":"0,20" + }, + { + "type":"string", + "random":"0,31" + }, + { + "type":"long", + "random":"0,3" + }, + { + "type":"long", + "random":"0,3" + }, + { + "type":"long", + "random":"0,19" + }, + { + "type":"date", + "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" + }, + { + "type":"string", + "random":"0,1" + } + ], + "sliceRecordCount":10 + } + }, + "writer":{ + "name":"selectdbwriter", + "parameter":{ + "loadUrl":[ + "xxx:47150" + ], + "loadProps":{ + "file.type":"json", + "file.strip_outer_array":"true" + }, + "column":[ + "id", + "table_id", + "table_no", + "table_name", + "table_status", + "no_disturb", + "dinner_type", + "member_id", + "reserve_bill_no", + "pre_order_no", + "queue_num", + "person_num", + "open_time", + "open_time_format", + "order_time", + "order_time_format", + "table_bill_id", + "offer_time", + "offer_time_format", + "confirm_bill_time", + "confirm_bill_time_format", + "bill_time", + "bill_time_format", + "clear_time", + "clear_time_format", + "table_message", + "bill_close", + "table_type", + "pad_mac", + "company_id", + "shop_id", + "is_sync", + "table_split_no", + "ts", + "ts_format", + "dr" + ], + "username":"admin", + "password":"SelectDB2022", + "postSql":[ + + ], + "preSql":[ + + ], + "connection":[ + { + "jdbcUrl":"jdbc:mysql://xxx:34142/cl_test", + "table":[ + "ods_pos_pro_table_dynamic_delta_v4" + ], + "selectedDatabase":"cl_test" + } + ], + "maxBatchRows":1000000, + "maxBatchByteSize":536870912000 + } + } + } + ], + "setting":{ + "errorLimit":{ + "percentage":0.02, + "record":0 + }, + "speed":{ + "channel":5 + } + } + } +} + +``` + +### 3.2 参数说明 + +```text + **jdbcUrl** + + - 描述:selectdb 的 JDBC 连接串,用户执行 preSql 或 postSQL。 + - 必选:是 + - 默认值:无 + +* **loadUrl** + + - 描述:作为 selecdb 的连接目标。格式为 "ip:port"。其中 IP 是 selectdb的private-link,port 是selectdb 集群的 http_port + - 必选:是 + - 默认值:无 + +* **username** + + - 描述:访问selectdb数据库的用户名 + - 必选:是 + - 默认值:无 + +* **password** + + - 描述:访问selectdb数据库的密码 + - 必选:否 + - 默认值:空 + +* **connection.selectedDatabase** + - 描述:需要写入的selectdb数据库名称。 + - 必选:是 + - 默认值:无 + +* **connection.table** + - 描述:需要写入的selectdb表名称。 + - 必选:是 + - 默认值:无 + +* **column** + + - 描述:目的表**需要写入数据**的字段,这些字段将作为生成的 Json 数据的字段名。字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 + - 必选:是 + - 默认值:否 + +* **preSql** + + - 描述:写入数据到目的表前,会先执行这里的标准语句。 + - 必选:否 + - 默认值:无 + +* **postSql** + + - 描述:写入数据到目的表后,会执行这里的标准语句。 + - 必选:否 + - 默认值:无 + + +* **maxBatchRows** + + - 描述:每批次导入数据的最大行数。和 **batchSize** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 + - 必选:否 + - 默认值:500000 + +* **batchSize** + + - 描述:每批次导入数据的最大数据量。和 **maxBatchRows** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 + - 必选:否 + - 默认值:90M + +* **maxRetries** + + - 描述:每批次导入数据失败后的重试次数。 + - 必选:否 + - 默认值:3 + +* **labelPrefix** + + - 描述:每批次上传文件的 label 前缀。最终的 label 将有 `labelPrefix + UUID` 组成全局唯一的 label,确保数据不会重复导入 + - 必选:否 + - 默认值:`datax_selectdb_writer_` + +* **loadProps** + + - 描述:COPY INOT 的请求参数 + + 这里包括导入的数据格式:file.type等,导入数据格式默认我们使用csv,支持JSON,具体可以参照下面类型转换部分 + + - 必选:否 + + - 默认值:无 + +* **clusterName** + + - 描述:selectdb could 集群名称 + + - 必选:否 + + - 默认值:无 + +* **flushQueueLength** + + - 描述:队列长度 + + - 必选:否 + + - 默认值:1 + +* **flushInterval** + + - 描述:数据写入批次的时间间隔,如果maxBatchRows 和 batchSize 参数设置的有很大,那么很可能达不到你这设置的数据量大小,会执行导入。 + + - 必选:否 + + - 默认值:30000ms +``` + +### 类型转换 + +默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行Selectdb导入操作。 + +默认是csv格式导入,如需更改列分隔符, 则正确配置 `loadProps` 即可: + +```json +"loadProps": { + "file.column_separator": "\\x01", + "file.line_delimiter": "\\x02" +} +``` + +如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: +```json +"loadProps": { + "file.type": "json", + "file.strip_outer_array": true +} +``` \ No newline at end of file diff --git a/selectdbwriter/doc/stream2selectdb.json b/selectdbwriter/doc/stream2selectdb.json new file mode 100644 index 00000000..d5e14c48 --- /dev/null +++ b/selectdbwriter/doc/stream2selectdb.json @@ -0,0 +1,93 @@ +{ + "core":{ + "transport":{ + "channel":{ + "speed":{ + "byte":10485760 + } + } + } + }, + "job":{ + "content":[ + { + "reader":{ + "name":"streamreader", + "parameter":{ + "column":[ + { + "type":"string", + "value":"DataX" + }, + { + "type":"int", + "value":19890604 + }, + { + "type":"date", + "value":"1989-06-04 00:00:00" + }, + { + "type":"bool", + "value":true + }, + { + "type":"string", + "value":"test" + } + ], + "sliceRecordCount":1000000 + } + }, + "writer":{ + "name":"selectdbwriter", + "parameter":{ + "loadUrl":[ + "xxx:35871" + ], + "loadProps":{ + "file.type":"json", + "file.strip_outer_array":"true" + }, + "database":"db1", + "column":[ + "k1", + "k2", + "k3", + "k4", + "k5" + ], + "username":"admin", + "password":"SelectDB2022", + "postSql":[ + + ], + "preSql":[ + + ], + "connection":[ + { + "jdbcUrl":"jdbc:mysql://xxx:32386/cl_test", + "table":[ + "test_selectdb" + ], + "selectedDatabase":"cl_test" + } + ], + "maxBatchRows":200000, + "batchSize":53687091200 + } + } + } + ], + "setting":{ + "errorLimit":{ + "percentage":0.02, + "record":0 + }, + "speed":{ + "byte":10485760 + } + } + } +} \ No newline at end of file diff --git a/selectdbwriter/pom.xml b/selectdbwriter/pom.xml new file mode 100644 index 00000000..fd2a19f7 --- /dev/null +++ b/selectdbwriter/pom.xml @@ -0,0 +1,96 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + selectdbwriter + selectdbwriter + jar + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + mysql + mysql-connector-java + ${mysql.driver.version} + + + org.apache.httpcomponents + httpclient + 4.5.13 + + + com.fasterxml.jackson.core + jackson-annotations + 2.13.3 + + + com.fasterxml.jackson.core + jackson-core + 2.13.3 + + + com.fasterxml.jackson.core + jackson-databind + 2.13.3 + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/selectdbwriter/src/main/assembly/package.xml b/selectdbwriter/src/main/assembly/package.xml new file mode 100644 index 00000000..1ea0009e --- /dev/null +++ b/selectdbwriter/src/main/assembly/package.xml @@ -0,0 +1,34 @@ + + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/selectdbwriter + + + target/ + + selectdbwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/selectdbwriter + + + + + false + plugin/writer/selectdbwriter/libs + runtime + + + diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/BaseResponse.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/BaseResponse.java new file mode 100644 index 00000000..c02f725f --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/BaseResponse.java @@ -0,0 +1,23 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; + +@JsonIgnoreProperties(ignoreUnknown = true) +public class BaseResponse { + private int code; + private String msg; + private T data; + private int count; + + public int getCode() { + return code; + } + + public String getMsg() { + return msg; + } + + public T getData(){ + return data; + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopyIntoResp.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopyIntoResp.java new file mode 100644 index 00000000..4da002ac --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopyIntoResp.java @@ -0,0 +1,26 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.fasterxml.jackson.annotation.JsonIgnoreProperties; + +import java.util.Map; + +@JsonIgnoreProperties(ignoreUnknown = true) +public class CopyIntoResp extends BaseResponse{ + private String code; + private String exception; + + private Map result; + + public String getDataCode() { + return code; + } + + public String getException() { + return exception; + } + + public Map getResult() { + return result; + } + +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopySQLBuilder.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopySQLBuilder.java new file mode 100644 index 00000000..62910d5d --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopySQLBuilder.java @@ -0,0 +1,40 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + + +import java.util.Map; +import java.util.StringJoiner; + +public class CopySQLBuilder { + private final static String COPY_SYNC = "copy.async"; + private final String fileName; + private final Keys options; + private Map properties; + + + + public CopySQLBuilder(Keys options, String fileName) { + this.options=options; + this.fileName=fileName; + this.properties=options.getLoadProps(); + } + + public String buildCopySQL(){ + StringBuilder sb = new StringBuilder(); + sb.append("COPY INTO ") + .append(options.getDatabase() + "." + options.getTable()) + .append(" FROM @~('").append(fileName).append("') ") + .append("PROPERTIES ("); + + //copy into must be sync + properties.put(COPY_SYNC,false); + StringJoiner props = new StringJoiner(","); + for(Map.Entry entry : properties.entrySet()){ + String key = String.valueOf(entry.getKey()); + String value = String.valueOf(entry.getValue()); + String prop = String.format("'%s'='%s'",key,value); + props.add(prop); + } + sb.append(props).append(" )"); + return sb.toString(); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/DelimiterParser.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/DelimiterParser.java new file mode 100644 index 00000000..fa6b397c --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/DelimiterParser.java @@ -0,0 +1,54 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.google.common.base.Strings; + +import java.io.StringWriter; + +public class DelimiterParser { + + private static final String HEX_STRING = "0123456789ABCDEF"; + + public static String parse(String sp, String dSp) throws RuntimeException { + if ( Strings.isNullOrEmpty(sp)) { + return dSp; + } + if (!sp.toUpperCase().startsWith("\\X")) { + return sp; + } + String hexStr = sp.substring(2); + // check hex str + if (hexStr.isEmpty()) { + throw new RuntimeException("Failed to parse delimiter: Hex str is empty"); + } + if (hexStr.length() % 2 != 0) { + throw new RuntimeException("Failed to parse delimiter: Hex str length error"); + } + for (char hexChar : hexStr.toUpperCase().toCharArray()) { + if (HEX_STRING.indexOf(hexChar) == -1) { + throw new RuntimeException("Failed to parse delimiter: Hex str format error"); + } + } + // transform to separator + StringWriter writer = new StringWriter(); + for (byte b : hexStrToBytes(hexStr)) { + writer.append((char) b); + } + return writer.toString(); + } + + private static byte[] hexStrToBytes(String hexStr) { + String upperHexStr = hexStr.toUpperCase(); + int length = upperHexStr.length() / 2; + char[] hexChars = upperHexStr.toCharArray(); + byte[] bytes = new byte[length]; + for (int i = 0; i < length; i++) { + int pos = i * 2; + bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); + } + return bytes; + } + + private static byte charToByte(char c) { + return (byte) HEX_STRING.indexOf(c); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPostBuilder.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPostBuilder.java new file mode 100644 index 00000000..9471debb --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPostBuilder.java @@ -0,0 +1,51 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import org.apache.commons.codec.binary.Base64; +import org.apache.http.HttpEntity; +import org.apache.http.HttpHeaders; +import org.apache.http.client.methods.HttpPost; + +import java.nio.charset.StandardCharsets; +import java.util.HashMap; +import java.util.Map; + + +public class HttpPostBuilder { + String url; + Map header; + HttpEntity httpEntity; + public HttpPostBuilder() { + header = new HashMap<>(); + } + + public HttpPostBuilder setUrl(String url) { + this.url = url; + return this; + } + + public HttpPostBuilder addCommonHeader() { + header.put(HttpHeaders.EXPECT, "100-continue"); + return this; + } + + public HttpPostBuilder baseAuth(String user, String password) { + final String authInfo = user + ":" + password; + byte[] encoded = Base64.encodeBase64(authInfo.getBytes(StandardCharsets.UTF_8)); + header.put(HttpHeaders.AUTHORIZATION, "Basic " + new String(encoded)); + return this; + } + + public HttpPostBuilder setEntity(HttpEntity httpEntity) { + this.httpEntity = httpEntity; + return this; + } + + public HttpPost build() { + SelectdbUtil.checkNotNull(url); + SelectdbUtil.checkNotNull(httpEntity); + HttpPost put = new HttpPost(url); + header.forEach(put::setHeader); + put.setEntity(httpEntity); + return put; + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPutBuilder.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPutBuilder.java new file mode 100644 index 00000000..59d7dbca --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPutBuilder.java @@ -0,0 +1,65 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import org.apache.commons.codec.binary.Base64; +import org.apache.http.HttpEntity; +import org.apache.http.HttpHeaders; +import org.apache.http.client.methods.HttpPut; +import org.apache.http.entity.StringEntity; + +import java.nio.charset.StandardCharsets; +import java.util.HashMap; +import java.util.Map; + +public class HttpPutBuilder { + String url; + Map header; + HttpEntity httpEntity; + public HttpPutBuilder() { + header = new HashMap<>(); + } + + public HttpPutBuilder setUrl(String url) { + this.url = url; + return this; + } + + public HttpPutBuilder addFileName(String fileName){ + header.put("fileName", fileName); + return this; + } + + public HttpPutBuilder setEmptyEntity() { + try { + this.httpEntity = new StringEntity(""); + } catch (Exception e) { + throw new IllegalArgumentException(e); + } + return this; + } + + public HttpPutBuilder addCommonHeader() { + header.put(HttpHeaders.EXPECT, "100-continue"); + return this; + } + + public HttpPutBuilder baseAuth(String user, String password) { + final String authInfo = user + ":" + password; + byte[] encoded = Base64.encodeBase64(authInfo.getBytes(StandardCharsets.UTF_8)); + header.put(HttpHeaders.AUTHORIZATION, "Basic " + new String(encoded)); + return this; + } + + public HttpPutBuilder setEntity(HttpEntity httpEntity) { + this.httpEntity = httpEntity; + return this; + } + + public HttpPut build() { + SelectdbUtil.checkNotNull(url); + SelectdbUtil.checkNotNull(httpEntity); + HttpPut put = new HttpPut(url); + header.forEach(put::setHeader); + put.setEntity(httpEntity); + return put; + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/Keys.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/Keys.java new file mode 100644 index 00000000..6c767d93 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/Keys.java @@ -0,0 +1,186 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; + +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +public class Keys implements Serializable { + + private static final long serialVersionUID = 1l; + private static final int DEFAULT_MAX_RETRIES = 3; + private static final int BATCH_ROWS = 500000; + private static final long DEFAULT_FLUSH_INTERVAL = 30000; + + private static final String LOAD_PROPS_FORMAT = "file.type"; + public enum StreamLoadFormat { + CSV, JSON; + } + + private static final String USERNAME = "username"; + private static final String PASSWORD = "password"; + private static final String DATABASE = "connection[0].selectedDatabase"; + private static final String TABLE = "connection[0].table[0]"; + private static final String COLUMN = "column"; + private static final String PRE_SQL = "preSql"; + private static final String POST_SQL = "postSql"; + private static final String JDBC_URL = "connection[0].jdbcUrl"; + private static final String LABEL_PREFIX = "labelPrefix"; + private static final String MAX_BATCH_ROWS = "maxBatchRows"; + private static final String MAX_BATCH_SIZE = "batchSize"; + private static final String FLUSH_INTERVAL = "flushInterval"; + private static final String LOAD_URL = "loadUrl"; + private static final String FLUSH_QUEUE_LENGTH = "flushQueueLength"; + private static final String LOAD_PROPS = "loadProps"; + + private static final String DEFAULT_LABEL_PREFIX = "datax_selectdb_writer_"; + + private static final long DEFAULT_MAX_BATCH_SIZE = 90 * 1024 * 1024; //default 90M + + private static final String CLUSTER_NAME = "clusterName"; + + private static final String MAX_RETRIES = "maxRetries"; + private final Configuration options; + + private List infoSchemaColumns; + private List userSetColumns; + private boolean isWildcardColumn; + + public Keys ( Configuration options) { + this.options = options; + this.userSetColumns = options.getList(COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); + if (1 == options.getList(COLUMN, String.class).size() && "*".trim().equals(options.getList(COLUMN, String.class).get(0))) { + this.isWildcardColumn = true; + } + } + + public void doPretreatment() { + validateRequired(); + validateStreamLoadUrl(); + } + + public String getJdbcUrl() { + return options.getString(JDBC_URL); + } + + public String getDatabase() { + return options.getString(DATABASE); + } + + public String getTable() { + return options.getString(TABLE); + } + + public String getUsername() { + return options.getString(USERNAME); + } + + public String getPassword() { + return options.getString(PASSWORD); + } + + public String getClusterName(){ + return options.getString(CLUSTER_NAME); + } + + public String getLabelPrefix() { + String label = options.getString(LABEL_PREFIX); + return null == label ? DEFAULT_LABEL_PREFIX : label; + } + + public List getLoadUrlList() { + return options.getList(LOAD_URL, String.class); + } + + public List getColumns() { + if (isWildcardColumn) { + return this.infoSchemaColumns; + } + return this.userSetColumns; + } + + public boolean isWildcardColumn() { + return this.isWildcardColumn; + } + + public void setInfoCchemaColumns(List cols) { + this.infoSchemaColumns = cols; + } + + public List getPreSqlList() { + return options.getList(PRE_SQL, String.class); + } + + public List getPostSqlList() { + return options.getList(POST_SQL, String.class); + } + + public Map getLoadProps() { + return options.getMap(LOAD_PROPS); + } + + public int getMaxRetries() { + Integer retries = options.getInt(MAX_RETRIES); + return null == retries ? DEFAULT_MAX_RETRIES : retries; + } + + public int getBatchRows() { + Integer rows = options.getInt(MAX_BATCH_ROWS); + return null == rows ? BATCH_ROWS : rows; + } + + public long getBatchSize() { + Long size = options.getLong(MAX_BATCH_SIZE); + return null == size ? DEFAULT_MAX_BATCH_SIZE : size; + } + + public long getFlushInterval() { + Long interval = options.getLong(FLUSH_INTERVAL); + return null == interval ? DEFAULT_FLUSH_INTERVAL : interval; + } + + public int getFlushQueueLength() { + Integer len = options.getInt(FLUSH_QUEUE_LENGTH); + return null == len ? 1 : len; + } + + + public StreamLoadFormat getStreamLoadFormat() { + Map loadProps = getLoadProps(); + if (null == loadProps) { + return StreamLoadFormat.CSV; + } + if (loadProps.containsKey(LOAD_PROPS_FORMAT) + && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(LOAD_PROPS_FORMAT)))) { + return StreamLoadFormat.JSON; + } + return StreamLoadFormat.CSV; + } + + private void validateStreamLoadUrl() { + List urlList = getLoadUrlList(); + for (String host : urlList) { + if (host.split(":").length < 2) { + throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, + "The format of loadUrl is not correct, please enter:[`fe_ip:fe_http_ip;fe_ip:fe_http_ip`]."); + } + } + } + + private void validateRequired() { + final String[] requiredOptionKeys = new String[]{ + USERNAME, + DATABASE, + TABLE, + COLUMN, + LOAD_URL + }; + for (String optionKey : requiredOptionKeys) { + options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); + } + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbBaseCodec.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbBaseCodec.java new file mode 100644 index 00000000..d2fc1224 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbBaseCodec.java @@ -0,0 +1,23 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.element.Column; + +public class SelectdbBaseCodec { + protected String convertionField( Column col) { + if (null == col.getRawData() || Column.Type.NULL == col.getType()) { + return null; + } + if ( Column.Type.BOOL == col.getType()) { + return String.valueOf(col.asLong()); + } + if ( Column.Type.BYTES == col.getType()) { + byte[] bts = (byte[])col.getRawData(); + long value = 0; + for (int i = 0; i < bts.length; i++) { + value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); + } + return String.valueOf(value); + } + return col.asString(); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodec.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodec.java new file mode 100644 index 00000000..b7e9d6ae --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodec.java @@ -0,0 +1,10 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.element.Record; + +import java.io.Serializable; + +public interface SelectdbCodec extends Serializable { + + String codec( Record row); +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodecFactory.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodecFactory.java new file mode 100644 index 00000000..567f4c0b --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodecFactory.java @@ -0,0 +1,19 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import java.util.Map; + +public class SelectdbCodecFactory { + public SelectdbCodecFactory (){ + + } + public static SelectdbCodec createCodec( Keys writerOptions) { + if ( Keys.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { + Map props = writerOptions.getLoadProps(); + return new SelectdbCsvCodec (null == props || !props.containsKey("file.column_separator") ? null : String.valueOf(props.get("file.column_separator"))); + } + if ( Keys.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { + return new SelectdbJsonCodec (writerOptions.getColumns()); + } + throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCopyIntoObserver.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCopyIntoObserver.java new file mode 100644 index 00000000..c9228b22 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCopyIntoObserver.java @@ -0,0 +1,233 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.apache.commons.lang3.StringUtils; +import org.apache.http.Header; +import org.apache.http.HttpEntity; +import org.apache.http.client.methods.CloseableHttpResponse; +import org.apache.http.entity.InputStreamEntity; +import org.apache.http.entity.StringEntity; +import org.apache.http.impl.client.CloseableHttpClient; +import org.apache.http.impl.client.HttpClientBuilder; +import org.apache.http.impl.client.HttpClients; +import org.apache.http.util.EntityUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.ByteBuffer; +import java.nio.charset.StandardCharsets; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.regex.Pattern; + +public class SelectdbCopyIntoObserver { + private static final Logger LOG = LoggerFactory.getLogger(SelectdbCopyIntoObserver.class); + + private Keys options; + private long pos; + public static final int SUCCESS = 0; + public static final String FAIL = "1"; + private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); + private final HttpClientBuilder httpClientBuilder = HttpClients + .custom() + .disableRedirectHandling(); + private CloseableHttpClient httpClient; + private static final String UPLOAD_URL_PATTERN = "%s/copy/upload"; + private static final String COMMIT_PATTERN = "%s/copy/query"; + private static final Pattern COMMITTED_PATTERN = Pattern.compile("errCode = 2, detailMessage = No files can be copied, matched (\\d+) files, " + "filtered (\\d+) files because files may be loading or loaded"); + + + public SelectdbCopyIntoObserver(Keys options) { + this.options = options; + this.httpClient = httpClientBuilder.build(); + + } + + public void streamLoad(WriterTuple data) throws Exception { + String host = getLoadHost(); + if (host == null) { + throw new RuntimeException("load_url cannot be empty, or the host cannot connect.Please check your configuration."); + } + String loadUrl = String.format(UPLOAD_URL_PATTERN, host); + String uploadAddress = getUploadAddress(loadUrl, data.getLabel()); + put(uploadAddress, data.getLabel(), addRows(data.getRows(), data.getBytes().intValue())); + executeCopy(host,data.getLabel()); + + } + + private String getUploadAddress(String loadUrl, String fileName) throws IOException { + HttpPutBuilder putBuilder = new HttpPutBuilder(); + putBuilder.setUrl(loadUrl) + .addFileName(fileName) + .addCommonHeader() + .setEmptyEntity() + .baseAuth(options.getUsername(), options.getPassword()); + CloseableHttpResponse execute = httpClientBuilder.build().execute(putBuilder.build()); + int statusCode = execute.getStatusLine().getStatusCode(); + String reason = execute.getStatusLine().getReasonPhrase(); + if (statusCode == 307) { + Header location = execute.getFirstHeader("location"); + String uploadAddress = location.getValue(); + LOG.info("redirect to s3:{}", uploadAddress); + return uploadAddress; + } else { + HttpEntity entity = execute.getEntity(); + String result = entity == null ? null : EntityUtils.toString(entity); + LOG.error("Failed get the redirected address, status {}, reason {}, response {}", statusCode, reason, result); + throw new RuntimeException("Could not get the redirected address."); + } + + } + + private byte[] addRows(List rows, int totalBytes) { + if (Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { + Map props = (options.getLoadProps() == null ? new HashMap<>() : options.getLoadProps()); + byte[] lineDelimiter = DelimiterParser.parse((String) props.get("file.line_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); + ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); + for (byte[] row : rows) { + bos.put(row); + bos.put(lineDelimiter); + } + return bos.array(); + } + + if (Keys.StreamLoadFormat.JSON.equals(options.getStreamLoadFormat())) { + ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); + bos.put("[".getBytes(StandardCharsets.UTF_8)); + byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); + boolean isFirstElement = true; + for (byte[] row : rows) { + if (!isFirstElement) { + bos.put(jsonDelimiter); + } + bos.put(row); + isFirstElement = false; + } + bos.put("]".getBytes(StandardCharsets.UTF_8)); + return bos.array(); + } + throw new RuntimeException("Failed to join rows data, unsupported `file.type` from copy into properties:"); + } + + public void put(String loadUrl, String fileName, byte[] data) throws IOException { + LOG.info(String.format("Executing upload file to: '%s', size: '%s'", loadUrl, data.length)); + HttpPutBuilder putBuilder = new HttpPutBuilder(); + putBuilder.setUrl(loadUrl) + .addCommonHeader() + .setEntity(new InputStreamEntity(new ByteArrayInputStream(data))); + CloseableHttpResponse response = httpClient.execute(putBuilder.build()); + final int statusCode = response.getStatusLine().getStatusCode(); + if (statusCode != 200) { + String result = response.getEntity() == null ? null : EntityUtils.toString(response.getEntity()); + LOG.error("upload file {} error, response {}", fileName, result); + throw new SelectdbWriterException("upload file error: " + fileName,true); + } + } + + private String getLoadHost() { + List hostList = options.getLoadUrlList(); + long tmp = pos + hostList.size(); + for (; pos < tmp; pos++) { + String host = new StringBuilder("http://").append(hostList.get((int) (pos % hostList.size()))).toString(); + if (checkConnection(host)) { + return host; + } + } + return null; + } + + private boolean checkConnection(String host) { + try { + URL url = new URL(host); + HttpURLConnection co = (HttpURLConnection) url.openConnection(); + co.setConnectTimeout(5000); + co.connect(); + co.disconnect(); + return true; + } catch (Exception e1) { + e1.printStackTrace(); + return false; + } + } + + + /** + * execute copy into + */ + public void executeCopy(String hostPort, String fileName) throws IOException{ + long start = System.currentTimeMillis(); + CopySQLBuilder copySQLBuilder = new CopySQLBuilder(options, fileName); + String copySQL = copySQLBuilder.buildCopySQL(); + LOG.info("build copy SQL is {}", copySQL); + Map params = new HashMap<>(); + params.put("sql", copySQL); + if(StringUtils.isNotBlank(options.getClusterName())){ + params.put("cluster",options.getClusterName()); + } + HttpPostBuilder postBuilder = new HttpPostBuilder(); + postBuilder.setUrl(String.format(COMMIT_PATTERN, hostPort)) + .baseAuth(options.getUsername(), options.getPassword()) + .setEntity(new StringEntity(OBJECT_MAPPER.writeValueAsString(params))); + + CloseableHttpResponse response = httpClient.execute(postBuilder.build()); + final int statusCode = response.getStatusLine().getStatusCode(); + final String reasonPhrase = response.getStatusLine().getReasonPhrase(); + String loadResult = ""; + if (statusCode != 200) { + LOG.warn("commit failed with status {} {}, reason {}", statusCode, hostPort, reasonPhrase); + throw new SelectdbWriterException("commit error with file: " + fileName,true); + } else if (response.getEntity() != null){ + loadResult = EntityUtils.toString(response.getEntity()); + boolean success = handleCommitResponse(loadResult); + if(success){ + LOG.info("commit success cost {}ms, response is {}", System.currentTimeMillis() - start, loadResult); + }else{ + throw new SelectdbWriterException("commit fail",true); + } + } + } + + public boolean handleCommitResponse(String loadResult) throws IOException { + BaseResponse baseResponse = OBJECT_MAPPER.readValue(loadResult, new TypeReference>(){}); + if(baseResponse.getCode() == SUCCESS){ + CopyIntoResp dataResp = baseResponse.getData(); + if(FAIL.equals(dataResp.getDataCode())){ + LOG.error("copy into execute failed, reason:{}", loadResult); + return false; + }else{ + Map result = dataResp.getResult(); + if(!result.get("state").equals("FINISHED") && !isCommitted(result.get("msg"))){ + LOG.error("copy into load failed, reason:{}", loadResult); + return false; + }else{ + return true; + } + } + }else{ + LOG.error("commit failed, reason:{}", loadResult); + return false; + } + } + + public static boolean isCommitted(String msg) { + return COMMITTED_PATTERN.matcher(msg).matches(); + } + + + public void close() throws IOException { + if (null != httpClient) { + try { + httpClient.close(); + } catch (IOException e) { + LOG.error("Closing httpClient failed.", e); + throw new RuntimeException("Closing httpClient failed.", e); + } + } + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCsvCodec.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCsvCodec.java new file mode 100644 index 00000000..57cad84d --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCsvCodec.java @@ -0,0 +1,27 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.element.Record; + +public class SelectdbCsvCodec extends SelectdbBaseCodec implements SelectdbCodec { + + private static final long serialVersionUID = 1L; + + private final String columnSeparator; + + public SelectdbCsvCodec ( String sp) { + this.columnSeparator = DelimiterParser.parse(sp, "\t"); + } + + @Override + public String codec( Record row) { + StringBuilder sb = new StringBuilder(); + for (int i = 0; i < row.getColumnNumber(); i++) { + String value = convertionField(row.getColumn(i)); + sb.append(null == value ? "\\N" : value); + if (i < row.getColumnNumber() - 1) { + sb.append(columnSeparator); + } + } + return sb.toString(); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbJsonCodec.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbJsonCodec.java new file mode 100644 index 00000000..8b1a3760 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbJsonCodec.java @@ -0,0 +1,33 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.fastjson2.JSON; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +public class SelectdbJsonCodec extends SelectdbBaseCodec implements SelectdbCodec { + + private static final long serialVersionUID = 1L; + + private final List fieldNames; + + public SelectdbJsonCodec ( List fieldNames) { + this.fieldNames = fieldNames; + } + + @Override + public String codec( Record row) { + if (null == fieldNames) { + return ""; + } + Map rowMap = new HashMap<> (fieldNames.size()); + int idx = 0; + for (String fieldName : fieldNames) { + rowMap.put(fieldName, convertionField(row.getColumn(idx))); + idx++; + } + return JSON.toJSONString(rowMap); + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbUtil.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbUtil.java new file mode 100644 index 00000000..6cfcc8bf --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbUtil.java @@ -0,0 +1,113 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.util.RdbmsException; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.druid.sql.parser.ParserException; +import com.google.common.base.Strings; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.ResultSet; +import java.sql.Statement; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +/** + * jdbc util + */ +public class SelectdbUtil { + private static final Logger LOG = LoggerFactory.getLogger(SelectdbUtil.class); + + private SelectdbUtil() {} + + public static List getDorisTableColumns( Connection conn, String databaseName, String tableName) { + String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); + List columns = new ArrayList<> (); + ResultSet rs = null; + try { + rs = DBUtil.query(conn, currentSql); + while (DBUtil.asyncResultSetNext(rs)) { + String colName = rs.getString("COLUMN_NAME"); + columns.add(colName); + } + return columns; + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(rs, null, null); + } + } + + public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { + if (null == preOrPostSqls) { + return Collections.emptyList(); + } + List renderedSqls = new ArrayList<>(); + for (String sql : preOrPostSqls) { + if (! Strings.isNullOrEmpty(sql)) { + renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); + } + } + return renderedSqls; + } + + public static void executeSqls(Connection conn, List sqls) { + Statement stmt = null; + String currentSql = null; + try { + stmt = conn.createStatement(); + for (String sql : sqls) { + currentSql = sql; + DBUtil.executeSqlWithoutResultSet(stmt, sql); + } + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(null, stmt, null); + } + } + + public static void preCheckPrePareSQL( Keys options) { + String table = options.getTable(); + List preSqls = options.getPreSqlList(); + List renderedPreSqls = SelectdbUtil.renderPreOrPostSqls(preSqls, table); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); + for (String sql : renderedPreSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch ( ParserException e) { + throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } + + public static void preCheckPostSQL( Keys options) { + String table = options.getTable(); + List postSqls = options.getPostSqlList(); + List renderedPostSqls = SelectdbUtil.renderPreOrPostSqls(postSqls, table); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); + for(String sql : renderedPostSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch (ParserException e){ + throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } + + public static T checkNotNull(T reference) { + if (reference == null) { + throw new NullPointerException(); + } else { + return reference; + } + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriter.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriter.java new file mode 100644 index 00000000..2b91f122 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriter.java @@ -0,0 +1,149 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.util.ArrayList; +import java.util.List; + +/** + * doris data writer + */ +public class SelectdbWriter extends Writer { + + public static class Job extends Writer.Job { + + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + private Configuration originalConfig = null; + private Keys options; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + options = new Keys (super.getPluginJobConf()); + options.doPretreatment(); + } + + @Override + public void preCheck(){ + this.init(); + SelectdbUtil.preCheckPrePareSQL(options); + SelectdbUtil.preCheckPostSQL(options); + } + + @Override + public void prepare() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPreSqls = SelectdbUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); + SelectdbUtil.executeSqls(conn, renderedPreSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public List split(int mandatoryNumber) { + List configurations = new ArrayList<>(mandatoryNumber); + for (int i = 0; i < mandatoryNumber; i++) { + configurations.add(originalConfig); + } + return configurations; + } + + @Override + public void post() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPostSqls = SelectdbUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Start to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); + SelectdbUtil.executeSqls(conn, renderedPostSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public void destroy() { + } + + } + + public static class Task extends Writer.Task { + private SelectdbWriterManager writerManager; + private Keys options; + private SelectdbCodec rowCodec; + + @Override + public void init() { + options = new Keys (super.getPluginJobConf()); + if (options.isWildcardColumn()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); + List columns = SelectdbUtil.getDorisTableColumns(conn, options.getDatabase(), options.getTable()); + options.setInfoCchemaColumns(columns); + } + writerManager = new SelectdbWriterManager(options); + rowCodec = SelectdbCodecFactory.createCodec(options); + } + + @Override + public void prepare() { + } + + public void startWrite(RecordReceiver recordReceiver) { + try { + Record record; + while ((record = recordReceiver.getFromReader()) != null) { + if (record.getColumnNumber() != options.getColumns().size()) { + throw DataXException + .asDataXException( + DBUtilErrorCode.CONF_ERROR, + String.format( + "There is an error in the column configuration information. " + + "This is because you have configured a task where the number of fields to be read from the source:%s " + + "is not equal to the number of fields to be written to the destination table:%s. " + + "Please check your configuration and make changes.", + record.getColumnNumber(), + options.getColumns().size())); + } + writerManager.writeRecord(rowCodec.codec(record)); + } + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void post() { + try { + writerManager.close(); + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void destroy() {} + + @Override + public boolean supportFailOver(){ + return false; + } + } + + +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterException.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterException.java new file mode 100644 index 00000000..f85a06d1 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterException.java @@ -0,0 +1,39 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + + +public class SelectdbWriterException extends RuntimeException { + + private boolean reCreateLabel; + + + public SelectdbWriterException() { + super(); + } + + public SelectdbWriterException(String message) { + super(message); + } + + public SelectdbWriterException(String message, boolean reCreateLabel) { + super(message); + this.reCreateLabel = reCreateLabel; + } + + public SelectdbWriterException(String message, Throwable cause) { + super(message, cause); + } + + public SelectdbWriterException(Throwable cause) { + super(cause); + } + + protected SelectdbWriterException(String message, Throwable cause, + boolean enableSuppression, + boolean writableStackTrace) { + super(message, cause, enableSuppression, writableStackTrace); + } + + public boolean needReCreateLabel() { + return reCreateLabel; + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterManager.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterManager.java new file mode 100644 index 00000000..e8b22b7f --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterManager.java @@ -0,0 +1,196 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import com.google.common.base.Strings; +import org.apache.commons.lang3.concurrent.BasicThreadFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; +import java.util.concurrent.Executors; +import java.util.concurrent.LinkedBlockingDeque; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; + +public class SelectdbWriterManager { + + private static final Logger LOG = LoggerFactory.getLogger(SelectdbWriterManager.class); + + private final SelectdbCopyIntoObserver visitor; + private final Keys options; + private final List buffer = new ArrayList<>(); + private int batchCount = 0; + private long batchSize = 0; + private volatile boolean closed = false; + private volatile Exception flushException; + private final LinkedBlockingDeque flushQueue; + private ScheduledExecutorService scheduler; + private ScheduledFuture scheduledFuture; + + public SelectdbWriterManager(Keys options) { + this.options = options; + this.visitor = new SelectdbCopyIntoObserver(options); + flushQueue = new LinkedBlockingDeque<>(options.getFlushQueueLength()); + this.startScheduler(); + this.startAsyncFlushing(); + } + + public void startScheduler() { + stopScheduler(); + this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("Doris-interval-flush").daemon(true).build()); + this.scheduledFuture = this.scheduler.schedule(() -> { + synchronized (SelectdbWriterManager.this) { + if (!closed) { + try { + String label = createBatchLabel(); + LOG.info(String.format("Selectdb interval Sinking triggered: label[%s].", label)); + if (batchCount == 0) { + startScheduler(); + } + flush(label, false); + } catch (Exception e) { + flushException = e; + } + } + } + }, options.getFlushInterval(), TimeUnit.MILLISECONDS); + } + + public void stopScheduler() { + if (this.scheduledFuture != null) { + scheduledFuture.cancel(false); + this.scheduler.shutdown(); + } + } + + public final synchronized void writeRecord(String record) throws IOException { + checkFlushException(); + try { + byte[] bts = record.getBytes(StandardCharsets.UTF_8); + buffer.add(bts); + batchCount++; + batchSize += bts.length; + if (batchCount >= options.getBatchRows() || batchSize >= options.getBatchSize()) { + String label = createBatchLabel(); + if(LOG.isDebugEnabled()){ + LOG.debug(String.format("buffer Sinking triggered: rows[%d] label [%s].", batchCount, label)); + } + flush(label, false); + } + } catch (Exception e) { + throw new SelectdbWriterException("Writing records to selectdb failed.", e); + } + } + + public synchronized void flush(String label, boolean waitUtilDone) throws Exception { + checkFlushException(); + if (batchCount == 0) { + if (waitUtilDone) { + waitAsyncFlushingDone(); + } + return; + } + flushQueue.put(new WriterTuple(label, batchSize, new ArrayList<>(buffer))); + if (waitUtilDone) { + // wait the last flush + waitAsyncFlushingDone(); + } + buffer.clear(); + batchCount = 0; + batchSize = 0; + } + + public synchronized void close() throws IOException { + if (!closed) { + closed = true; + try { + String label = createBatchLabel(); + if (batchCount > 0) { + if (LOG.isDebugEnabled()) { + LOG.debug(String.format("Selectdb Sink is about to close: label[%s].", label)); + } + } + flush(label, true); + } catch (Exception e) { + throw new RuntimeException("Writing records to Selectdb failed.", e); + } + } + checkFlushException(); + } + + public String createBatchLabel() { + StringBuilder sb = new StringBuilder(); + if (!Strings.isNullOrEmpty(options.getLabelPrefix())) { + sb.append(options.getLabelPrefix()); + } + return sb.append(UUID.randomUUID().toString()) + .toString(); + } + + private void startAsyncFlushing() { + // start flush thread + Thread flushThread = new Thread(new Runnable() { + public void run() { + while (true) { + try { + asyncFlush(); + } catch (Exception e) { + flushException = e; + } + } + } + }); + flushThread.setDaemon(true); + flushThread.start(); + } + + private void waitAsyncFlushingDone() throws InterruptedException { + // wait previous flushings + for (int i = 0; i <= options.getFlushQueueLength(); i++) { + flushQueue.put(new WriterTuple("", 0l, null)); + } + checkFlushException(); + } + + private void asyncFlush() throws Exception { + WriterTuple flushData = flushQueue.take(); + if (Strings.isNullOrEmpty(flushData.getLabel())) { + return; + } + stopScheduler(); + for (int i = 0; i <= options.getMaxRetries(); i++) { + try { + // copy into + visitor.streamLoad(flushData); + startScheduler(); + break; + } catch (Exception e) { + LOG.warn("Failed to flush batch data to selectdb, retry times = {}", i, e); + if (i >= options.getMaxRetries()) { + throw new RuntimeException(e); + } + if (e instanceof SelectdbWriterException && ((SelectdbWriterException)e).needReCreateLabel()) { + String newLabel = createBatchLabel(); + LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); + flushData.setLabel(newLabel); + } + try { + Thread.sleep(1000l * Math.min(i + 1, 100)); + } catch (InterruptedException ex) { + Thread.currentThread().interrupt(); + throw new RuntimeException("Unable to flush, interrupted while doing another attempt", e); + } + } + } + } + + private void checkFlushException() { + if (flushException != null) { + throw new RuntimeException("Writing records to selectdb failed.", flushException); + } + } +} diff --git a/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/WriterTuple.java b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/WriterTuple.java new file mode 100644 index 00000000..483ade05 --- /dev/null +++ b/selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/WriterTuple.java @@ -0,0 +1,22 @@ +package com.alibaba.datax.plugin.writer.selectdbwriter; + +import java.util.List; + +public class WriterTuple { + private String label; + private Long bytes; + private List rows; + + + public WriterTuple ( String label, Long bytes, List rows){ + this.label = label; + this.rows = rows; + this.bytes = bytes; + } + + public String getLabel() { return label; } + public void setLabel(String label) { this.label = label; } + public Long getBytes() { return bytes; } + public List getRows() { return rows; } + +} diff --git a/selectdbwriter/src/main/resources/plugin.json b/selectdbwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..4b84a945 --- /dev/null +++ b/selectdbwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "selectdbwriter", + "class": "com.alibaba.datax.plugin.writer.selectdbwriter.SelectdbWriter", + "description": "selectdb writer plugin", + "developer": "selectdb" +} \ No newline at end of file diff --git a/selectdbwriter/src/main/resources/plugin_job_template.json b/selectdbwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..c603b7e0 --- /dev/null +++ b/selectdbwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,19 @@ +{ + "name": "selectdbwriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "preSql": [], + "postSql": [], + "loadUrl": [], + "loadProps": {}, + "connection": [ + { + "jdbcUrl": "", + "selectedDatabase": "", + "table": [] + } + ] + } +} \ No newline at end of file diff --git a/sqlserverreader/pom.xml b/sqlserverreader/pom.xml index 5372a057..326f1ce5 100755 --- a/sqlserverreader/pom.xml +++ b/sqlserverreader/pom.xml @@ -31,10 +31,7 @@ com.microsoft.sqlserver sqljdbc4 4.0 - system - ${basedir}/src/main/lib/sqljdbc4-4.0.jar - com.alibaba.datax plugin-rdbms-util diff --git a/sqlserverreader/src/main/assembly/package.xml b/sqlserverreader/src/main/assembly/package.xml index 55fbdc0b..6180fbc0 100755 --- a/sqlserverreader/src/main/assembly/package.xml +++ b/sqlserverreader/src/main/assembly/package.xml @@ -16,13 +16,6 @@ plugin/reader/sqlserverreader - - src/main/lib - - sqljdbc4-4.0.jar - - plugin/reader/sqlserverreader/libs - target/ diff --git a/sqlserverreader/src/main/lib/sqljdbc4-4.0.jar b/sqlserverreader/src/main/lib/sqljdbc4-4.0.jar deleted file mode 100644 index d6b7f6da..00000000 Binary files a/sqlserverreader/src/main/lib/sqljdbc4-4.0.jar and /dev/null differ diff --git a/sqlserverwriter/doc/sqlserverwriter.md b/sqlserverwriter/doc/sqlserverwriter.md index cdaf1526..7d786292 100644 --- a/sqlserverwriter/doc/sqlserverwriter.md +++ b/sqlserverwriter/doc/sqlserverwriter.md @@ -69,6 +69,7 @@ SqlServerWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据 "jdbcUrl": "jdbc:sqlserver://[HOST_NAME]:PORT;DatabaseName=[DATABASE_NAME]" } ], + "session": ["SET IDENTITY_INSERT TABLE_NAME ON"], "preSql": [ "delete from @table where db_id = -1;" ], @@ -139,6 +140,14 @@ SqlServerWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据 * 默认值:否
    +* **session** + + * 描述:DataX在获取 seqlserver 连接时,执行session指定的SQL语句,修改当前connection session属性
    + + * 必选:否
    + + * 默认值:无
    + * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。
    diff --git a/sqlserverwriter/pom.xml b/sqlserverwriter/pom.xml index d2b1eea1..6f52c14c 100644 --- a/sqlserverwriter/pom.xml +++ b/sqlserverwriter/pom.xml @@ -35,8 +35,6 @@ com.microsoft.sqlserver sqljdbc4 4.0 - system - ${basedir}/src/main/lib/sqljdbc4-4.0.jar
    com.alibaba.datax diff --git a/sqlserverwriter/src/main/assembly/package.xml b/sqlserverwriter/src/main/assembly/package.xml index 761dffcd..f8f26298 100755 --- a/sqlserverwriter/src/main/assembly/package.xml +++ b/sqlserverwriter/src/main/assembly/package.xml @@ -16,13 +16,6 @@ plugin/writer/sqlserverwriter - - src/main/lib - - sqljdbc4-4.0.jar - - plugin/writer/sqlserverwriter/libs - target/ diff --git a/sqlserverwriter/src/main/lib/sqljdbc4-4.0.jar b/sqlserverwriter/src/main/lib/sqljdbc4-4.0.jar deleted file mode 100644 index d6b7f6da..00000000 Binary files a/sqlserverwriter/src/main/lib/sqljdbc4-4.0.jar and /dev/null differ diff --git a/starrocksreader/pom.xml b/starrocksreader/pom.xml new file mode 100644 index 00000000..a8b049ea --- /dev/null +++ b/starrocksreader/pom.xml @@ -0,0 +1,95 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + starrocksreader + starrocksreader + jar + + + 8 + 8 + + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + mysql + mysql-connector-java + 5.1.46 + + + + + + + + src/main/java + + **/*.properties + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + \ No newline at end of file diff --git a/starrocksreader/src/main/assembly/package.xml b/starrocksreader/src/main/assembly/package.xml new file mode 100644 index 00000000..c126c107 --- /dev/null +++ b/starrocksreader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/starrocksreader + + + target/ + + starrocksreader-0.0.1-SNAPSHOT.jar + + plugin/reader/starrocksreader + + + + + + false + plugin/reader/starrocksreader/libs + runtime + + + diff --git a/starrocksreader/src/main/java/com/alibaba/datax/plugin/reader/starrocksreader/StarRocksReader.java b/starrocksreader/src/main/java/com/alibaba/datax/plugin/reader/starrocksreader/StarRocksReader.java new file mode 100644 index 00000000..d4bf3437 --- /dev/null +++ b/starrocksreader/src/main/java/com/alibaba/datax/plugin/reader/starrocksreader/StarRocksReader.java @@ -0,0 +1,116 @@ +package com.alibaba.datax.plugin.reader.starrocksreader; + +import java.util.List; + +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.reader.Constant; +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + + +public class StarRocksReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.StarRocks; + + public static class Job extends Reader.Job { + private static final Logger LOG = LoggerFactory + .getLogger(Job.class); + + private Configuration originalConfig = null; + private CommonRdbmsReader.Job commonRdbmsReaderJob; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + int fetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE, + Integer.MIN_VALUE); + this.originalConfig.set(Constant.FETCH_SIZE, fetchSize); + + this.commonRdbmsReaderJob = new CommonRdbmsReader.Job(DATABASE_TYPE); + this.commonRdbmsReaderJob.init(this.originalConfig); + } + + @Override + public void preCheck(){ + init(); + this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); + + } + + @Override + public void prepare() { + } + + @Override + public List split(int adviceNumber) { + LOG.info("split() begin..."); + List splitResult = this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); + /** + * 在日志中告知用户,为什么实际datax切分跑的channel数会小于用户配置的channel数 + */ + if(splitResult.size() < adviceNumber){ + // 如果用户没有配置切分主键splitPk + if(StringUtils.isBlank(this.originalConfig.getString(Key.SPLIT_PK, null))){ + LOG.info("User has not configured splitPk."); + }else{ + // 用户配置了切分主键,但是切分主键可能重复太多,或者要同步的表的记录太少,无法切分成adviceNumber个task + LOG.info("User has configured splitPk. But the number of task finally split is smaller than that user has configured. " + + "The possible reasons are: 1) too many repeated splitPk values, 2) too few records."); + } + } + LOG.info("split() ok and end..."); + return splitResult; + } + + @Override + public void post() { + this.commonRdbmsReaderJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderJob.destroy(this.originalConfig); + } + + } + + public static class Task extends Reader.Task { + + private Configuration readerSliceConfig; + private CommonRdbmsReader.Task commonRdbmsReaderTask; + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderTask = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderTask.init(this.readerSliceConfig); + + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); + + this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, + super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderTask.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); + } + + } +} diff --git a/starrocksreader/src/main/resources/plugin.json b/starrocksreader/src/main/resources/plugin.json new file mode 100644 index 00000000..b0d6e039 --- /dev/null +++ b/starrocksreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "starrocksreader", + "class": "com.alibaba.datax.plugin.reader.starrocksreader.StarRocksReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/starrockswriter/doc/starrockswriter.md b/starrockswriter/doc/starrockswriter.md new file mode 100644 index 00000000..6ebe3681 --- /dev/null +++ b/starrockswriter/doc/starrockswriter.md @@ -0,0 +1,222 @@ +# DataX StarRocksWriter + + +--- + + +## 1 快速介绍 + +StarRocksWriter 插件实现了写入数据到 StarRocks 主库的目的表的功能。在底层实现上, StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks。 + + +## 2 实现原理 + + StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks, 内部将`reader`读取的数据进行缓存后批量导入至StarRocks,以提高写入性能。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 这里使用一份从内存Mysql读取数据后导入至StarRocks。 + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + }, + "errorLimit": { + "record": 0, + "percentage": 0 + } + }, + "content": [ + { + "reader": { + "name": "mysqlreader", + "parameter": { + "username": "xxxx", + "password": "xxxx", + "column": [ "k1", "k2", "v1", "v2" ], + "connection": [ + { + "table": [ "table1", "table2" ], + "jdbcUrl": [ + "jdbc:mysql://127.0.0.1:3306/datax_test1" + ] + }, + { + "table": [ "table3", "table4" ], + "jdbcUrl": [ + "jdbc:mysql://127.0.0.1:3306/datax_test2" + ] + } + ] + } + }, + "writer": { + "name": "starrockswriter", + "parameter": { + "username": "xxxx", + "password": "xxxx", + "column": ["k1", "k2", "v1", "v2"], + "preSql": [], + "postSql": [], + "connection": [ + { + "table": ["xxx"], + "jdbcUrl": "jdbc:mysql://172.28.17.100:9030/", + "selectedDatabase": "xxxx" + } + ], + "loadUrl": ["172.28.17.100:8030", "172.28.17.100:8030"], + "loadProps": {} + } + } + } + ] + } +} + +``` + + +### 3.2 参数说明 + +* **username** + + * 描述:StarRocks数据库的用户名
    + + * 必选:是
    + + * 默认值:无
    + +* **password** + + * 描述:StarRocks数据库的密码
    + + * 必选:是
    + + * 默认值:无
    + +* **selectedDatabase** + + * 描述:StarRocks表的数据库名称。 + + * 必选:是
    + + * 默认值:无
    + +* **table** + + * 描述:StarRocks表的表名称。 + + * 必选:是
    + + * 默认值:无
    + +* **loadUrl** + + * 描述:StarRocks FE的地址用于Streamload,可以为多个fe地址,`fe_ip:fe_http_port`。 + + * 必选:是
    + + * 默认值:无
    + +* **column** + + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 + + **column配置项必须指定,不能留空!** + + 注意:我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 + + * 必选:是
    + + * 默认值:否
    + +* **preSql** + + * 描述:写入数据到目的表前,会先执行这里的标准语句。
    + + * 必选:否
    + + * 默认值:无
    + +* **postSql** + + * 描述:写入数据到目的表后,会执行这里的标准语句。
    + + * 必选:否
    + + * 默认值:无
    + +* **jdbcUrl** + + * 描述:目的数据库的 JDBC 连接信息,用于执行`preSql`及`postSql`。
    + + * 必选:否
    + + * 默认值:无
    + +* **maxBatchRows** + + * 描述:单次StreamLoad导入的最大行数
    + + * 必选:否
    + + * 默认值:500000 (50W)
    + +* **maxBatchSize** + + * 描述:单次StreamLoad导入的最大字节数。
    + + * 必选:否
    + + * 默认值:104857600 (100M) + +* **flushInterval** + + * 描述:上一次StreamLoad结束至下一次开始的时间间隔(单位:ms)。
    + + * 必选:否
    + + * 默认值:300000 (ms) + +* **loadProps** + + * 描述:StreamLoad 的请求参数,详情参照StreamLoad介绍页面。
    + + * 必选:否
    + + * 默认值:无
    + + +### 3.3 类型转换 + +默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行StreamLoad导入操作。 +如需更改列分隔符, 则正确配置 `loadProps` 即可: +```json +"loadProps": { + "column_separator": "\\x01", + "row_delimiter": "\\x02" +} +``` + +如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: +```json +"loadProps": { + "format": "json", + "strip_outer_array": true +} +``` + +## 4 性能报告 + + +## 5 约束限制 + + +## FAQ diff --git a/starrockswriter/pom.xml b/starrockswriter/pom.xml new file mode 100755 index 00000000..73a51422 --- /dev/null +++ b/starrockswriter/pom.xml @@ -0,0 +1,155 @@ + + 4.0.0 + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + starrockswriter + starrockswriter + 1.1.0 + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + commons-codec + commons-codec + 1.9 + + + org.apache.commons + commons-lang3 + 3.12.0 + + + commons-logging + commons-logging + 1.1.1 + + + org.apache.httpcomponents + httpcore + 4.4.6 + + + org.apache.httpcomponents + httpclient + 4.5.3 + + + com.alibaba.fastjson2 + fastjson2 + + + mysql + mysql-connector-java + 5.1.46 + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + org.apache.maven.plugins + maven-shade-plugin + 3.0.0 + + + + package + + shade + + + true + + + org.apache.http + com.starrocks.shade.org.apache.http + + + org.apache.commons + com.starrocks.shade.org.apache.commons + + + + + org.apache.commons:commons-lang3 + commons-codec:commons-codec + commons-logging:* + org.apache.httpcomponents:httpclient + org.apache.httpcomponents:httpcore + + + + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + + + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/starrockswriter/src/main/assembly/package.xml b/starrockswriter/src/main/assembly/package.xml new file mode 100755 index 00000000..c63845b4 --- /dev/null +++ b/starrockswriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/starrockswriter + + + target/ + + starrockswriter-1.1.0.jar + + plugin/writer/starrockswriter + + + + + + false + plugin/writer/starrockswriter/libs + runtime + + + diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriter.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriter.java new file mode 100755 index 00000000..d5f2887a --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriter.java @@ -0,0 +1,151 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.manager.StarRocksWriterManager; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksISerializer; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksSerializerFactory; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.util.StarRocksWriterUtil; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.util.ArrayList; +import java.util.List; + +public class StarRocksWriter extends Writer { + + public static class Job extends Writer.Job { + + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + private Configuration originalConfig = null; + private StarRocksWriterOptions options; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + String selectedDatabase = super.getPluginJobConf().getString(StarRocksWriterOptions.KEY_SELECTED_DATABASE); + if(StringUtils.isBlank(this.originalConfig.getString(StarRocksWriterOptions.KEY_DATABASE)) && StringUtils.isNotBlank(selectedDatabase)){ + this.originalConfig.set(StarRocksWriterOptions.KEY_DATABASE, selectedDatabase); + } + options = new StarRocksWriterOptions(super.getPluginJobConf()); + options.doPretreatment(); + } + + @Override + public void preCheck(){ + this.init(); + StarRocksWriterUtil.preCheckPrePareSQL(options); + StarRocksWriterUtil.preCheckPostSQL(options); + } + + @Override + public void prepare() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPreSqls = StarRocksWriterUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); + StarRocksWriterUtil.executeSqls(conn, renderedPreSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public List split(int mandatoryNumber) { + List configurations = new ArrayList<>(mandatoryNumber); + for (int i = 0; i < mandatoryNumber; i++) { + configurations.add(originalConfig); + } + return configurations; + } + + @Override + public void post() { + String username = options.getUsername(); + String password = options.getPassword(); + String jdbcUrl = options.getJdbcUrl(); + List renderedPostSqls = StarRocksWriterUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); + LOG.info("Begin to execute postSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); + StarRocksWriterUtil.executeSqls(conn, renderedPostSqls); + DBUtil.closeDBResources(null, null, conn); + } + } + + @Override + public void destroy() { + } + + } + + public static class Task extends Writer.Task { + private StarRocksWriterManager writerManager; + private StarRocksWriterOptions options; + private StarRocksISerializer rowSerializer; + + @Override + public void init() { + options = new StarRocksWriterOptions(super.getPluginJobConf()); + if (options.isWildcardColumn()) { + Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); + List columns = StarRocksWriterUtil.getStarRocksColumns(conn, options.getDatabase(), options.getTable()); + options.setInfoCchemaColumns(columns); + } + writerManager = new StarRocksWriterManager(options); + rowSerializer = StarRocksSerializerFactory.createSerializer(options); + } + + @Override + public void prepare() { + } + + public void startWrite(RecordReceiver recordReceiver) { + try { + Record record; + while ((record = recordReceiver.getFromReader()) != null) { + if (record.getColumnNumber() != options.getColumns().size()) { + throw DataXException + .asDataXException( + DBUtilErrorCode.CONF_ERROR, + String.format( + "Column configuration error. The number of reader columns %d and the number of writer columns %d are not equal.", + record.getColumnNumber(), + options.getColumns().size())); + } + writerManager.writeRecord(rowSerializer.serialize(record)); + } + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void post() { + try { + writerManager.close(); + } catch (Exception e) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); + } + } + + @Override + public void destroy() {} + + @Override + public boolean supportFailOver(){ + return false; + } + } +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriterOptions.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriterOptions.java new file mode 100644 index 00000000..5c6ddacd --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriterOptions.java @@ -0,0 +1,199 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter; + +import java.io.Serializable; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import org.apache.commons.lang3.StringUtils; + +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +public class StarRocksWriterOptions implements Serializable { + + private static final long serialVersionUID = 1l; + private static final long KILO_BYTES_SCALE = 1024l; + private static final long MEGA_BYTES_SCALE = KILO_BYTES_SCALE * KILO_BYTES_SCALE; + private static final int MAX_RETRIES = 1; + private static final int BATCH_ROWS = 500000; + private static final long BATCH_BYTES = 5 * MEGA_BYTES_SCALE; + private static final long FLUSH_INTERVAL = 300000; + + private static final String KEY_LOAD_PROPS_FORMAT = "format"; + public enum StreamLoadFormat { + CSV, JSON; + } + + public static final String KEY_USERNAME = "username"; + public static final String KEY_PASSWORD = "password"; + public static final String KEY_DATABASE = "database"; + public static final String KEY_SELECTED_DATABASE = "selectedDatabase"; + public static final String KEY_TABLE = "table"; + public static final String KEY_COLUMN = "column"; + public static final String KEY_PRE_SQL = "preSql"; + public static final String KEY_POST_SQL = "postSql"; + public static final String KEY_JDBC_URL = "jdbcUrl"; + public static final String KEY_LABEL_PREFIX = "labelPrefix"; + public static final String KEY_MAX_BATCH_ROWS = "maxBatchRows"; + public static final String KEY_MAX_BATCH_SIZE = "maxBatchSize"; + public static final String KEY_FLUSH_INTERVAL = "flushInterval"; + public static final String KEY_LOAD_URL = "loadUrl"; + public static final String KEY_FLUSH_QUEUE_LENGTH = "flushQueueLength"; + public static final String KEY_LOAD_PROPS = "loadProps"; + public static final String CONNECTION_JDBC_URL = "connection[0].jdbcUrl"; + public static final String CONNECTION_TABLE_NAME = "connection[0].table[0]"; + public static final String CONNECTION_SELECTED_DATABASE = "connection[0].selectedDatabase"; + + private final Configuration options; + private List infoCchemaColumns; + private List userSetColumns; + private boolean isWildcardColumn; + + public StarRocksWriterOptions(Configuration options) { + this.options = options; + // database + String database = this.options.getString(CONNECTION_SELECTED_DATABASE); + if (StringUtils.isBlank(database)) { + database = this.options.getString(KEY_SELECTED_DATABASE); + } + if (StringUtils.isNotBlank(database)) { + this.options.set(KEY_DATABASE, database); + } + // jdbcUrl + String jdbcUrl = this.options.getString(CONNECTION_JDBC_URL); + if (StringUtils.isNotBlank(jdbcUrl)) { + this.options.set(KEY_JDBC_URL, jdbcUrl); + } + // table + String table = this.options.getString(CONNECTION_TABLE_NAME); + if (StringUtils.isNotBlank(table)) { + this.options.set(KEY_TABLE, table); + } + // column + this.userSetColumns = options.getList(KEY_COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); + if (1 == options.getList(KEY_COLUMN, String.class).size() && "*".trim().equals(options.getList(KEY_COLUMN, String.class).get(0))) { + this.isWildcardColumn = true; + } + } + + public void doPretreatment() { + validateRequired(); + validateStreamLoadUrl(); + } + + public String getJdbcUrl() { + return options.getString(KEY_JDBC_URL); + } + + public String getDatabase() { + return options.getString(KEY_DATABASE); + } + + public String getTable() { + return options.getString(KEY_TABLE); + } + + public String getUsername() { + return options.getString(KEY_USERNAME); + } + + public String getPassword() { + return options.getString(KEY_PASSWORD); + } + + public String getLabelPrefix() { + return options.getString(KEY_LABEL_PREFIX); + } + + public List getLoadUrlList() { + return options.getList(KEY_LOAD_URL, String.class); + } + + public List getColumns() { + if (isWildcardColumn) { + return this.infoCchemaColumns; + } + return this.userSetColumns; + } + + public boolean isWildcardColumn() { + return this.isWildcardColumn; + } + + public void setInfoCchemaColumns(List cols) { + this.infoCchemaColumns = cols; + } + + public List getPreSqlList() { + return options.getList(KEY_PRE_SQL, String.class); + } + + public List getPostSqlList() { + return options.getList(KEY_POST_SQL, String.class); + } + + public Map getLoadProps() { + return options.getMap(KEY_LOAD_PROPS); + } + + public int getMaxRetries() { + return MAX_RETRIES; + } + + public int getBatchRows() { + Integer rows = options.getInt(KEY_MAX_BATCH_ROWS); + return null == rows ? BATCH_ROWS : rows; + } + + public long getBatchSize() { + Long size = options.getLong(KEY_MAX_BATCH_SIZE); + return null == size ? BATCH_BYTES : size; + } + + public long getFlushInterval() { + Long interval = options.getLong(KEY_FLUSH_INTERVAL); + return null == interval ? FLUSH_INTERVAL : interval; + } + + public int getFlushQueueLength() { + Integer len = options.getInt(KEY_FLUSH_QUEUE_LENGTH); + return null == len ? 1 : len; + } + + public StreamLoadFormat getStreamLoadFormat() { + Map loadProps = getLoadProps(); + if (null == loadProps) { + return StreamLoadFormat.CSV; + } + if (loadProps.containsKey(KEY_LOAD_PROPS_FORMAT) + && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(KEY_LOAD_PROPS_FORMAT)))) { + return StreamLoadFormat.JSON; + } + return StreamLoadFormat.CSV; + } + + private void validateStreamLoadUrl() { + List urlList = getLoadUrlList(); + for (String host : urlList) { + if (host.split(":").length < 2) { + throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, + "The format of loadUrl is illegal, please input `fe_ip:fe_http_ip;fe_ip:fe_http_ip`."); + } + } + } + + private void validateRequired() { + final String[] requiredOptionKeys = new String[]{ + KEY_USERNAME, + KEY_DATABASE, + KEY_TABLE, + KEY_COLUMN, + KEY_LOAD_URL + }; + for (String optionKey : requiredOptionKeys) { + options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); + } + } +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksFlushTuple.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksFlushTuple.java new file mode 100644 index 00000000..5c939f9b --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksFlushTuple.java @@ -0,0 +1,21 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; + +import java.util.List; + +public class StarRocksFlushTuple { + + private String label; + private Long bytes; + private List rows; + + public StarRocksFlushTuple(String label, Long bytes, List rows) { + this.label = label; + this.bytes = bytes; + this.rows = rows; + } + + public String getLabel() { return label; } + public void setLabel(String label) { this.label = label; } + public Long getBytes() { return bytes; } + public List getRows() { return rows; } +} \ No newline at end of file diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadFailedException.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadFailedException.java new file mode 100644 index 00000000..4eb47048 --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadFailedException.java @@ -0,0 +1,33 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; + +import java.io.IOException; +import java.util.Map; + + +public class StarRocksStreamLoadFailedException extends IOException { + + static final long serialVersionUID = 1L; + + private final Map response; + private boolean reCreateLabel; + + public StarRocksStreamLoadFailedException(String message, Map response) { + super(message); + this.response = response; + } + + public StarRocksStreamLoadFailedException(String message, Map response, boolean reCreateLabel) { + super(message); + this.response = response; + this.reCreateLabel = reCreateLabel; + } + + public Map getFailedResponse() { + return response; + } + + public boolean needReCreateLabel() { + return reCreateLabel; + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadVisitor.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadVisitor.java new file mode 100644 index 00000000..b3671556 --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadVisitor.java @@ -0,0 +1,304 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; + +import java.io.IOException; +import java.net.HttpURLConnection; +import java.net.URL; +import java.nio.ByteBuffer; +import java.nio.charset.StandardCharsets; + +import com.alibaba.fastjson2.JSON; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksDelimiterParser; + +import org.apache.commons.codec.binary.Base64; +import org.apache.http.HttpEntity; +import org.apache.http.client.config.RequestConfig; +import org.apache.http.client.methods.CloseableHttpResponse; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.client.methods.HttpPut; +import org.apache.http.entity.ByteArrayEntity; +import org.apache.http.impl.client.CloseableHttpClient; +import org.apache.http.impl.client.DefaultRedirectStrategy; +import org.apache.http.impl.client.HttpClientBuilder; +import org.apache.http.impl.client.HttpClients; +import org.apache.http.util.EntityUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + + +public class StarRocksStreamLoadVisitor { + + private static final Logger LOG = LoggerFactory.getLogger(StarRocksStreamLoadVisitor.class); + + private final StarRocksWriterOptions writerOptions; + private long pos; + private static final String RESULT_FAILED = "Fail"; + private static final String RESULT_LABEL_EXISTED = "Label Already Exists"; + private static final String LAEBL_STATE_VISIBLE = "VISIBLE"; + private static final String LAEBL_STATE_COMMITTED = "COMMITTED"; + private static final String RESULT_LABEL_PREPARE = "PREPARE"; + private static final String RESULT_LABEL_ABORTED = "ABORTED"; + private static final String RESULT_LABEL_UNKNOWN = "UNKNOWN"; + + public StarRocksStreamLoadVisitor(StarRocksWriterOptions writerOptions) { + this.writerOptions = writerOptions; + } + + public void doStreamLoad(StarRocksFlushTuple flushData) throws IOException { + String host = getAvailableHost(); + if (null == host) { + throw new IOException("None of the host in `load_url` could be connected."); + } + String loadUrl = new StringBuilder(host) + .append("/api/") + .append(writerOptions.getDatabase()) + .append("/") + .append(writerOptions.getTable()) + .append("/_stream_load") + .toString(); + if (LOG.isDebugEnabled()) { + LOG.debug(String.format("Start to join batch data: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); + } + Map loadResult = doHttpPut(loadUrl, flushData.getLabel(), joinRows(flushData.getRows(), flushData.getBytes().intValue())); + final String keyStatus = "Status"; + if (null == loadResult || !loadResult.containsKey(keyStatus)) { + LOG.error("unknown result status. {}", loadResult); + throw new IOException("Unable to flush data to StarRocks: unknown result status. " + loadResult); + } + if (LOG.isDebugEnabled()) { + LOG.debug(new StringBuilder("StreamLoad response:\n").append(JSON.toJSONString(loadResult)).toString()); + } + if (RESULT_FAILED.equals(loadResult.get(keyStatus))) { + StringBuilder errorBuilder = new StringBuilder("Failed to flush data to StarRocks.\n"); + if (loadResult.containsKey("Message")) { + errorBuilder.append(loadResult.get("Message")); + errorBuilder.append('\n'); + } + if (loadResult.containsKey("ErrorURL")) { + LOG.error("StreamLoad response: {}", loadResult); + try { + errorBuilder.append(doHttpGet(loadResult.get("ErrorURL").toString())); + errorBuilder.append('\n'); + } catch (IOException e) { + LOG.warn("Get Error URL failed. {} ", loadResult.get("ErrorURL"), e); + } + } else { + errorBuilder.append(JSON.toJSONString(loadResult)); + errorBuilder.append('\n'); + } + throw new IOException(errorBuilder.toString()); + } else if (RESULT_LABEL_EXISTED.equals(loadResult.get(keyStatus))) { + LOG.debug(new StringBuilder("StreamLoad response:\n").append(JSON.toJSONString(loadResult)).toString()); + // has to block-checking the state to get the final result + checkLabelState(host, flushData.getLabel()); + } + } + + private String getAvailableHost() { + List hostList = writerOptions.getLoadUrlList(); + long tmp = pos + hostList.size(); + for (; pos < tmp; pos++) { + String host = new StringBuilder("http://").append(hostList.get((int) (pos % hostList.size()))).toString(); + if (tryHttpConnection(host)) { + return host; + } + } + return null; + } + + private boolean tryHttpConnection(String host) { + try { + URL url = new URL(host); + HttpURLConnection co = (HttpURLConnection) url.openConnection(); + co.setConnectTimeout(1000); + co.connect(); + co.disconnect(); + return true; + } catch (Exception e1) { + LOG.warn("Failed to connect to address:{}", host, e1); + return false; + } + } + + private byte[] joinRows(List rows, int totalBytes) { + if (StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { + Map props = (writerOptions.getLoadProps() == null ? new HashMap<>() : writerOptions.getLoadProps()); + byte[] lineDelimiter = StarRocksDelimiterParser.parse((String)props.get("row_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); + ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); + for (byte[] row : rows) { + bos.put(row); + bos.put(lineDelimiter); + } + return bos.array(); + } + + if (StarRocksWriterOptions.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { + ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); + bos.put("[".getBytes(StandardCharsets.UTF_8)); + byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); + boolean isFirstElement = true; + for (byte[] row : rows) { + if (!isFirstElement) { + bos.put(jsonDelimiter); + } + bos.put(row); + isFirstElement = false; + } + bos.put("]".getBytes(StandardCharsets.UTF_8)); + return bos.array(); + } + throw new RuntimeException("Failed to join rows data, unsupported `format` from stream load properties:"); + } + + @SuppressWarnings("unchecked") + private void checkLabelState(String host, String label) throws IOException { + int idx = 0; + while(true) { + try { + TimeUnit.SECONDS.sleep(Math.min(++idx, 5)); + } catch (InterruptedException ex) { + break; + } + try (CloseableHttpClient httpclient = HttpClients.createDefault()) { + HttpGet httpGet = new HttpGet(new StringBuilder(host).append("/api/").append(writerOptions.getDatabase()).append("/get_load_state?label=").append(label).toString()); + httpGet.setHeader("Authorization", getBasicAuthHeader(writerOptions.getUsername(), writerOptions.getPassword())); + httpGet.setHeader("Connection", "close"); + + try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { + HttpEntity respEntity = getHttpEntity(resp); + if (respEntity == null) { + throw new IOException(String.format("Failed to flush data to StarRocks, Error " + + "could not get the final state of label[%s].\n", label), null); + } + Map result = (Map)JSON.parse(EntityUtils.toString(respEntity)); + String labelState = (String)result.get("state"); + if (null == labelState) { + throw new IOException(String.format("Failed to flush data to StarRocks, Error " + + "could not get the final state of label[%s]. response[%s]\n", label, EntityUtils.toString(respEntity)), null); + } + LOG.info(String.format("Checking label[%s] state[%s]\n", label, labelState)); + switch(labelState) { + case LAEBL_STATE_VISIBLE: + case LAEBL_STATE_COMMITTED: + return; + case RESULT_LABEL_PREPARE: + continue; + case RESULT_LABEL_ABORTED: + throw new StarRocksStreamLoadFailedException(String.format("Failed to flush data to StarRocks, Error " + + "label[%s] state[%s]\n", label, labelState), null, true); + case RESULT_LABEL_UNKNOWN: + default: + throw new IOException(String.format("Failed to flush data to StarRocks, Error " + + "label[%s] state[%s]\n", label, labelState), null); + } + } + } + } + } + + @SuppressWarnings("unchecked") + private Map doHttpPut(String loadUrl, String label, byte[] data) throws IOException { + LOG.info(String.format("Executing stream load to: '%s', size: '%s'", loadUrl, data.length)); + final HttpClientBuilder httpClientBuilder = HttpClients.custom() + .setRedirectStrategy(new DefaultRedirectStrategy() { + @Override + protected boolean isRedirectable(String method) { + return true; + } + }); + try (CloseableHttpClient httpclient = httpClientBuilder.build()) { + HttpPut httpPut = new HttpPut(loadUrl); + List cols = writerOptions.getColumns(); + if (null != cols && !cols.isEmpty() && StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { + httpPut.setHeader("columns", String.join(",", cols.stream().map(f -> String.format("`%s`", f)).collect(Collectors.toList()))); + } + if (null != writerOptions.getLoadProps()) { + for (Map.Entry entry : writerOptions.getLoadProps().entrySet()) { + httpPut.setHeader(entry.getKey(), String.valueOf(entry.getValue())); + } + } + httpPut.setHeader("Expect", "100-continue"); + httpPut.setHeader("label", label); + httpPut.setHeader("Content-Type", "application/x-www-form-urlencoded"); + httpPut.setHeader("Authorization", getBasicAuthHeader(writerOptions.getUsername(), writerOptions.getPassword())); + httpPut.setEntity(new ByteArrayEntity(data)); + httpPut.setConfig(RequestConfig.custom().setRedirectsEnabled(true).build()); + try (CloseableHttpResponse resp = httpclient.execute(httpPut)) { + int code = resp.getStatusLine().getStatusCode(); + if (200 != code) { + String errorText; + try { + HttpEntity respEntity = resp.getEntity(); + errorText = EntityUtils.toString(respEntity); + } catch (Exception err) { + errorText = "find errorText failed: " + err.getMessage(); + } + LOG.warn("Request failed with code:{}, err:{}", code, errorText); + Map errorMap = new HashMap<>(); + errorMap.put("Status", "Fail"); + errorMap.put("Message", errorText); + return errorMap; + } + HttpEntity respEntity = resp.getEntity(); + if (null == respEntity) { + LOG.warn("Request failed with empty response."); + return null; + } + return (Map)JSON.parse(EntityUtils.toString(respEntity)); + } + } + } + + private String getBasicAuthHeader(String username, String password) { + String auth = username + ":" + password; + byte[] encodedAuth = Base64.encodeBase64(auth.getBytes(StandardCharsets.UTF_8)); + return new StringBuilder("Basic ").append(new String(encodedAuth)).toString(); + } + + private HttpEntity getHttpEntity(CloseableHttpResponse resp) { + int code = resp.getStatusLine().getStatusCode(); + if (200 != code) { + LOG.warn("Request failed with code:{}", code); + return null; + } + HttpEntity respEntity = resp.getEntity(); + if (null == respEntity) { + LOG.warn("Request failed with empty response."); + return null; + } + return respEntity; + } + + private String doHttpGet(String getUrl) throws IOException { + LOG.info("Executing GET from {}.", getUrl); + try (CloseableHttpClient httpclient = buildHttpClient()) { + HttpGet httpGet = new HttpGet(getUrl); + try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { + HttpEntity respEntity = resp.getEntity(); + if (null == respEntity) { + LOG.warn("Request failed with empty response."); + return null; + } + return EntityUtils.toString(respEntity); + } + } + } + + private CloseableHttpClient buildHttpClient(){ + final HttpClientBuilder httpClientBuilder = HttpClients.custom() + .setRedirectStrategy(new DefaultRedirectStrategy() { + @Override + protected boolean isRedirectable(String method) { + return true; + } + }); + return httpClientBuilder.build(); + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksWriterManager.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksWriterManager.java new file mode 100644 index 00000000..a0cb1f8b --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksWriterManager.java @@ -0,0 +1,203 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; + +import org.apache.commons.lang3.concurrent.BasicThreadFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; +import java.util.concurrent.Executors; +import java.util.concurrent.LinkedBlockingDeque; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; + +import com.google.common.base.Strings; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; + +public class StarRocksWriterManager { + + private static final Logger LOG = LoggerFactory.getLogger(StarRocksWriterManager.class); + + private final StarRocksStreamLoadVisitor starrocksStreamLoadVisitor; + private final StarRocksWriterOptions writerOptions; + + private final List buffer = new ArrayList<>(); + private int batchCount = 0; + private long batchSize = 0; + private volatile boolean closed = false; + private volatile Exception flushException; + private final LinkedBlockingDeque flushQueue; + private ScheduledExecutorService scheduler; + private ScheduledFuture scheduledFuture; + + public StarRocksWriterManager(StarRocksWriterOptions writerOptions) { + this.writerOptions = writerOptions; + this.starrocksStreamLoadVisitor = new StarRocksStreamLoadVisitor(writerOptions); + flushQueue = new LinkedBlockingDeque<>(writerOptions.getFlushQueueLength()); + this.startScheduler(); + this.startAsyncFlushing(); + } + + public void startScheduler() { + stopScheduler(); + this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("starrocks-interval-flush").daemon(true).build()); + this.scheduledFuture = this.scheduler.schedule(() -> { + synchronized (StarRocksWriterManager.this) { + if (!closed) { + try { + String label = createBatchLabel(); + LOG.info(String.format("StarRocks interval Sinking triggered: label[%s].", label)); + if (batchCount == 0) { + startScheduler(); + } + flush(label, false); + } catch (Exception e) { + flushException = e; + } + } + } + }, writerOptions.getFlushInterval(), TimeUnit.MILLISECONDS); + } + + public void stopScheduler() { + if (this.scheduledFuture != null) { + scheduledFuture.cancel(false); + this.scheduler.shutdown(); + } + } + + public final synchronized void writeRecord(String record) throws IOException { + checkFlushException(); + try { + byte[] bts = record.getBytes(StandardCharsets.UTF_8); + buffer.add(bts); + batchCount++; + batchSize += bts.length; + if (batchCount >= writerOptions.getBatchRows() || batchSize >= writerOptions.getBatchSize()) { + String label = createBatchLabel(); + if (LOG.isDebugEnabled()) { + LOG.debug(String.format("StarRocks buffer Sinking triggered: rows[%d] label[%s].", batchCount, label)); + } + flush(label, false); + } + } catch (Exception e) { + throw new IOException("Writing records to StarRocks failed.", e); + } + } + + public synchronized void flush(String label, boolean waitUtilDone) throws Exception { + checkFlushException(); + if (batchCount == 0) { + if (waitUtilDone) { + waitAsyncFlushingDone(); + } + return; + } + flushQueue.put(new StarRocksFlushTuple(label, batchSize, new ArrayList<>(buffer))); + if (waitUtilDone) { + // wait the last flush + waitAsyncFlushingDone(); + } + buffer.clear(); + batchCount = 0; + batchSize = 0; + } + + public synchronized void close() { + if (!closed) { + closed = true; + try { + String label = createBatchLabel(); + if (batchCount > 0) { + if (LOG.isDebugEnabled()) { + LOG.debug(String.format("StarRocks Sink is about to close: label[%s].", label)); + } + } + flush(label, true); + } catch (Exception e) { + throw new RuntimeException("Writing records to StarRocks failed.", e); + } + } + checkFlushException(); + } + + public String createBatchLabel() { + StringBuilder sb = new StringBuilder(); + if (!Strings.isNullOrEmpty(writerOptions.getLabelPrefix())) { + sb.append(writerOptions.getLabelPrefix()); + } + return sb.append(UUID.randomUUID().toString()) + .toString(); + } + + private void startAsyncFlushing() { + // start flush thread + Thread flushThread = new Thread(new Runnable(){ + public void run() { + while(true) { + try { + asyncFlush(); + } catch (Exception e) { + flushException = e; + } + } + } + }); + flushThread.setDaemon(true); + flushThread.start(); + } + + private void waitAsyncFlushingDone() throws InterruptedException { + // wait previous flushings + for (int i = 0; i <= writerOptions.getFlushQueueLength(); i++) { + flushQueue.put(new StarRocksFlushTuple("", 0l, null)); + } + checkFlushException(); + } + + private void asyncFlush() throws Exception { + StarRocksFlushTuple flushData = flushQueue.take(); + if (Strings.isNullOrEmpty(flushData.getLabel())) { + return; + } + stopScheduler(); + if (LOG.isDebugEnabled()) { + LOG.debug(String.format("Async stream load: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); + } + for (int i = 0; i <= writerOptions.getMaxRetries(); i++) { + try { + // flush to StarRocks with stream load + starrocksStreamLoadVisitor.doStreamLoad(flushData); + LOG.info(String.format("Async stream load finished: label[%s].", flushData.getLabel())); + startScheduler(); + break; + } catch (Exception e) { + LOG.warn("Failed to flush batch data to StarRocks, retry times = {}", i, e); + if (i >= writerOptions.getMaxRetries()) { + throw new IOException(e); + } + if (e instanceof StarRocksStreamLoadFailedException && ((StarRocksStreamLoadFailedException)e).needReCreateLabel()) { + String newLabel = createBatchLabel(); + LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); + flushData.setLabel(newLabel); + } + try { + Thread.sleep(1000l * Math.min(i + 1, 10)); + } catch (InterruptedException ex) { + Thread.currentThread().interrupt(); + throw new IOException("Unable to flush, interrupted while doing another attempt", e); + } + } + } + } + + private void checkFlushException() { + if (flushException != null) { + throw new RuntimeException("Writing records to StarRocks failed.", flushException); + } + } +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksBaseSerializer.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksBaseSerializer.java new file mode 100644 index 00000000..a7ad499d --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksBaseSerializer.java @@ -0,0 +1,26 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Column.Type; + +public class StarRocksBaseSerializer { + + protected String fieldConvertion(Column col) { + if (null == col.getRawData() || Type.NULL == col.getType()) { + return null; + } + if (Type.BOOL == col.getType()) { + return String.valueOf(col.asLong()); + } + if (Type.BYTES == col.getType()) { + byte[] bts = (byte[])col.getRawData(); + long value = 0; + for (int i = 0; i < bts.length; i++) { + value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); + } + return String.valueOf(value); + } + return col.asString(); + } + +} \ No newline at end of file diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksCsvSerializer.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksCsvSerializer.java new file mode 100644 index 00000000..1366d570 --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksCsvSerializer.java @@ -0,0 +1,32 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import java.io.StringWriter; + +import com.alibaba.datax.common.element.Record; + +import com.google.common.base.Strings; + +public class StarRocksCsvSerializer extends StarRocksBaseSerializer implements StarRocksISerializer { + + private static final long serialVersionUID = 1L; + + private final String columnSeparator; + + public StarRocksCsvSerializer(String sp) { + this.columnSeparator = StarRocksDelimiterParser.parse(sp, "\t"); + } + + @Override + public String serialize(Record row) { + StringBuilder sb = new StringBuilder(); + for (int i = 0; i < row.getColumnNumber(); i++) { + String value = fieldConvertion(row.getColumn(i)); + sb.append(null == value ? "\\N" : value); + if (i < row.getColumnNumber() - 1) { + sb.append(columnSeparator); + } + } + return sb.toString(); + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksDelimiterParser.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksDelimiterParser.java new file mode 100644 index 00000000..04301e0f --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksDelimiterParser.java @@ -0,0 +1,55 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import java.io.StringWriter; + +import com.google.common.base.Strings; + +public class StarRocksDelimiterParser { + + private static final String HEX_STRING = "0123456789ABCDEF"; + + public static String parse(String sp, String dSp) throws RuntimeException { + if (Strings.isNullOrEmpty(sp)) { + return dSp; + } + if (!sp.toUpperCase().startsWith("\\X")) { + return sp; + } + String hexStr = sp.substring(2); + // check hex str + if (hexStr.isEmpty()) { + throw new RuntimeException("Failed to parse delimiter: `Hex str is empty`"); + } + if (hexStr.length() % 2 != 0) { + throw new RuntimeException("Failed to parse delimiter: `Hex str length error`"); + } + for (char hexChar : hexStr.toUpperCase().toCharArray()) { + if (HEX_STRING.indexOf(hexChar) == -1) { + throw new RuntimeException("Failed to parse delimiter: `Hex str format error`"); + } + } + // transform to separator + StringWriter writer = new StringWriter(); + for (byte b : hexStrToBytes(hexStr)) { + writer.append((char) b); + } + return writer.toString(); + } + + private static byte[] hexStrToBytes(String hexStr) { + String upperHexStr = hexStr.toUpperCase(); + int length = upperHexStr.length() / 2; + char[] hexChars = upperHexStr.toCharArray(); + byte[] bytes = new byte[length]; + for (int i = 0; i < length; i++) { + int pos = i * 2; + bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); + } + return bytes; + } + + private static byte charToByte(char c) { + return (byte) HEX_STRING.indexOf(c); + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksISerializer.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksISerializer.java new file mode 100644 index 00000000..7bcb8973 --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksISerializer.java @@ -0,0 +1,11 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import java.io.Serializable; + +import com.alibaba.datax.common.element.Record; + +public interface StarRocksISerializer extends Serializable { + + String serialize(Record row); + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksJsonSerializer.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksJsonSerializer.java new file mode 100644 index 00000000..f235a08d --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksJsonSerializer.java @@ -0,0 +1,34 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.fastjson2.JSON; + +public class StarRocksJsonSerializer extends StarRocksBaseSerializer implements StarRocksISerializer { + + private static final long serialVersionUID = 1L; + + private final List fieldNames; + + public StarRocksJsonSerializer(List fieldNames) { + this.fieldNames = fieldNames; + } + + @Override + public String serialize(Record row) { + if (null == fieldNames) { + return ""; + } + Map rowMap = new HashMap<>(fieldNames.size()); + int idx = 0; + for (String fieldName : fieldNames) { + rowMap.put(fieldName, fieldConvertion(row.getColumn(idx))); + idx++; + } + return JSON.toJSONString(rowMap); + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksSerializerFactory.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksSerializerFactory.java new file mode 100644 index 00000000..85f446cd --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksSerializerFactory.java @@ -0,0 +1,22 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; + +import java.util.Map; + +import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; + +public class StarRocksSerializerFactory { + + private StarRocksSerializerFactory() {} + + public static StarRocksISerializer createSerializer(StarRocksWriterOptions writerOptions) { + if (StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { + Map props = writerOptions.getLoadProps(); + return new StarRocksCsvSerializer(null == props || !props.containsKey("column_separator") ? null : String.valueOf(props.get("column_separator"))); + } + if (StarRocksWriterOptions.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { + return new StarRocksJsonSerializer(writerOptions.getColumns()); + } + throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); + } + +} diff --git a/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/util/StarRocksWriterUtil.java b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/util/StarRocksWriterUtil.java new file mode 100755 index 00000000..8de4ad60 --- /dev/null +++ b/starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/util/StarRocksWriterUtil.java @@ -0,0 +1,102 @@ +package com.starrocks.connector.datax.plugin.writer.starrockswriter.util; + +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.util.RdbmsException; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.druid.sql.parser.ParserException; +import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; +import com.google.common.base.Strings; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.ResultSet; +import java.sql.Statement; +import java.util.*; + +public final class StarRocksWriterUtil { + private static final Logger LOG = LoggerFactory.getLogger(StarRocksWriterUtil.class); + + private StarRocksWriterUtil() {} + + public static List getStarRocksColumns(Connection conn, String databaseName, String tableName) { + String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); + List columns = new ArrayList<>(); + ResultSet rs = null; + try { + rs = DBUtil.query(conn, currentSql); + while (DBUtil.asyncResultSetNext(rs)) { + String colName = rs.getString("COLUMN_NAME"); + columns.add(colName); + } + return columns; + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(rs, null, null); + } + } + + public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { + if (null == preOrPostSqls) { + return Collections.emptyList(); + } + List renderedSqls = new ArrayList<>(); + for (String sql : preOrPostSqls) { + if (!Strings.isNullOrEmpty(sql)) { + renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); + } + } + return renderedSqls; + } + + public static void executeSqls(Connection conn, List sqls) { + Statement stmt = null; + String currentSql = null; + try { + stmt = conn.createStatement(); + for (String sql : sqls) { + currentSql = sql; + DBUtil.executeSqlWithoutResultSet(stmt, sql); + } + } catch (Exception e) { + throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); + } finally { + DBUtil.closeDBResources(null, stmt, null); + } + } + + public static void preCheckPrePareSQL(StarRocksWriterOptions options) { + String table = options.getTable(); + List preSqls = options.getPreSqlList(); + List renderedPreSqls = StarRocksWriterUtil.renderPreOrPostSqls(preSqls, table); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); + for (String sql : renderedPreSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch (ParserException e) { + throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } + + public static void preCheckPostSQL(StarRocksWriterOptions options) { + String table = options.getTable(); + List postSqls = options.getPostSqlList(); + List renderedPostSqls = StarRocksWriterUtil.renderPreOrPostSqls(postSqls, table); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); + for(String sql : renderedPostSqls) { + try { + DBUtil.sqlValid(sql, DataBaseType.MySql); + } catch (ParserException e){ + throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); + } + } + } + } +} diff --git a/starrockswriter/src/main/resources/plugin.json b/starrockswriter/src/main/resources/plugin.json new file mode 100755 index 00000000..8edec1e0 --- /dev/null +++ b/starrockswriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "starrockswriter", + "class": "com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriter", + "description": "useScene: prod. mechanism: StarRocksStreamLoad. warn: The more you know about the database, the less problems you encounter.", + "developer": "starrocks" +} \ No newline at end of file diff --git a/starrockswriter/src/main/resources/plugin_job_template.json b/starrockswriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..06c075bc --- /dev/null +++ b/starrockswriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,18 @@ +{ + "name": "starrockswriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "preSql": [], + "postSql": [], + "loadUrl": [], + "connection": [ + { + "jdbcUrl": "", + "selectedDatabase": "", + "table": [] + } + ] + } +} \ No newline at end of file diff --git a/streamreader/pom.xml b/streamreader/pom.xml index dc754d9a..7d186076 100755 --- a/streamreader/pom.xml +++ b/streamreader/pom.xml @@ -39,6 +39,16 @@ + + + + src/main/resources + + **/*.* + + true + + diff --git a/streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReader.java b/streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReader.java index e3b86659..6b8c55bc 100755 --- a/streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReader.java +++ b/streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReader.java @@ -5,7 +5,7 @@ import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.RandomStringUtils; import org.apache.commons.lang3.RandomUtils; diff --git a/streamwriter/pom.xml b/streamwriter/pom.xml index 4a987fac..2fa95d7b 100755 --- a/streamwriter/pom.xml +++ b/streamwriter/pom.xml @@ -34,6 +34,16 @@ + + + + src/main/resources + + **/*.* + + true + + diff --git a/sybasereader/doc/sybasereader.md b/sybasereader/doc/sybasereader.md new file mode 100644 index 00000000..abde7cb1 --- /dev/null +++ b/sybasereader/doc/sybasereader.md @@ -0,0 +1,327 @@ + +# SybaseReader 插件文档 + + +___ + + +## 1 快速介绍 + +SybaseReader插件实现了从Sybase读取数据。在底层实现上,SybaseReader通过JDBC连接远程Sybase数据库,并执行相应的sql语句将数据从Sybase库中SELECT出来。 + +## 2 实现原理 + +简而言之,SybaseReader通过JDBC连接器连接到远程的Sybase数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Sybase数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 + +对于用户配置Table、Column、Where的信息,SybaseReader将其拼接为SQL语句发送到Sybase数据库;对于用户配置querySql信息,Sybase直接将其发送到Sybase数据库。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 配置一个从Sybase数据库同步抽取数据到本地的作业: + +``` +{ + "job": { + "setting": { + "speed": { + //设置传输速度 byte/s 尽量逼近这个速度但是不高于它. + // channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel + "byte": 1048576 + }, + //出错限制 + "errorLimit": { + //先选择record + "record": 0, + //百分比 1表示100% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "SybaseReader", + "parameter": { + // 数据库连接用户名 + "username": "root", + // 数据库连接密码 + "password": "root", + "column": [ + "id","name" + ], + //切分主键 + "splitPk": "db_id", + "connection": [ + { + "table": [ + "table" + ], + "jdbcUrl": [ + "jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936" + ] + } + ] + } + }, + "writer": { + //writer类型 + "name": "streamwriter", + // 是否打印内容 + "parameter": { + "print": true + } + } + } + ] + } +} + +``` + +* 配置一个自定义SQL的数据库同步任务到本地内容的作业: + +``` +{ + "job": { + "setting": { + "speed": { + "channel": 5 + } + }, + "content": [ + { + "reader": { + "name": "SybaseReader", + "parameter": { + "username": "root", + "password": "root", + "where": "", + "connection": [ + { + "querySql": [ + "select db_id,on_line_flag from db_info where db_id < 10" + ], + "jdbcUrl": [ + "jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "visible": false, + "encoding": "UTF-8" + } + } + } + ] + } +} +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,SybaseReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,SybaseReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 + + jdbcUrl按照Sybase官方规范,并可以填写连接附件控制信息。具体请参看[Sybase官方文档](http://www.Sybase.com/technetwork/database/enterprise-edition/documentation/index.html)。 + + * 必选:是
    + + * 默认值:无
    + +* **username** + + * 描述:数据源的用户名
    + + * 必选:是
    + + * 默认值:无
    + +* **password** + + * 描述:数据源指定用户名的密码
    + + * 必选:是
    + + * 默认值:无
    + +* **table** + + * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,SybaseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
    + + * 必选:是
    + + * 默认值:无
    + +* **column** + + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 + + 支持列裁剪,即列可以挑选部分列进行导出。 + + 支持列换序,即列可以不按照表schema信息进行导出。 + + 支持常量配置,用户需要按照JSON格式: + ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] + id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 + + Column必须显示填写,不允许为空! + + * 必选:是
    + + * 默认值:无
    + +* **splitPk** + + * 描述:SybaseReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 + + 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 + + 目前splitPk仅支持整形、字符串型数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,SybaseReader将报错! + + splitPk如果不填写,将视作用户不对单表进行切分,SybaseReader使用单通道同步全量数据。 + + * 必选:否
    + + * 默认值:无
    + +* **where** + + * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
    + + where条件可以有效地进行业务增量同步。 + + * 必选:否
    + + * 默认值:无
    + +* **querySql** + + * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
    + + `当用户配置querySql时,SybaseReader直接忽略table、column、where条件的配置`。 + + * 必选:否
    + + * 默认值:无
    + +* **fetchSize** + + * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
    + + `注意,该值过大(>2048)可能造成DataX进程OOM。`。 + + * 必选:否
    + + * 默认值:1024
    + + + +### 3.3 类型转换 + +目前SybaseReader支持大部分Sybase类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出SybaseReader针对Sybase类型转换列表: + + +| DataX 内部类型| Sybase 数据类型 | +| -------- | ----- | +| Long |Tinyint,Smallint,Int,Money,Smallmoney| +| Double |Float,Real,Numeric,Decimal| +| String |Char,Varchar,Nchar,Nvarchar,Text| +| Date |Timestamp,Datetime,Smalldatetime| +| Boolean |bit, bool| +| Bytes |Binary,Varbinary,Image| + + + +请注意: + +* `除上述罗列字段类型外,其他类型均不支持`。 + + +## 4 性能报告 + +### 4.1 环境准备 + +#### 4.1.1 数据特征 + +为了模拟线上真实数据,我们设计两个Sybase数据表,分别为: + +#### 4.1.2 机器参数 + +* 执行DataX的机器参数为: + +* Sybase数据库机器参数为: + +### 4.2 测试报告 + +#### 4.2.1 表1测试报告 + + +| 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载| +|--------| --------|--------|--------|--------|--------| +|1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载| + +## 5 约束限制 + + +### 5.1 一致性约束 + +Sybase在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,SybaseReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) + +上述是在SybaseReader单线程模型下数据同步一致性的特性,由于SybaseReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当SybaseReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 + +针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: + +1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 + +2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 + +### 5.2 数据库编码问题 + + +SybaseReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此SybaseReader不需用户指定编码,可以自动获取编码并转码。 + +对于Sybase底层写入编码和其设定的编码不一致的混乱情况,SybaseReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 + +### 5.3 增量数据同步 + +SybaseReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: + +* 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,SybaseReader只需要WHERE条件跟上一同步阶段时间戳即可。 +* 对于新增流水型数据,SybaseReader可以WHERE条件后跟上一阶段最大自增ID即可。 + +对于业务上无字段区分新增、修改数据情况,SybaseReader也无法进行增量数据同步,只能同步全量数据。 + +### 5.4 Sql安全性 + +SybaseReader提供querySql语句交给用户自己实现SELECT抽取语句,SybaseReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 + +## 6 FAQ + +*** + +**Q: 目前已验证支持sybase的版本?** + + A: Sybase ASE 16/15.7 + +**Q: SybaseReader同步报错,报错信息为XXX** + + A: 网络或者权限问题,请使用Sybase命令行或者可视化工具进行测试: + 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 + + +**Q: SybaseReader抽取速度很慢怎么办?** + + A: 影响抽取时间的原因大概有如下几个: + 1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描; + 2. 合理sql的并发度,减少抽取时间;根据表的大小, + 3. 设置合理fetchsize,减少网络IO; diff --git a/sybasereader/pom.xml b/sybasereader/pom.xml new file mode 100644 index 00000000..cc3a3840 --- /dev/null +++ b/sybasereader/pom.xml @@ -0,0 +1,111 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + sybasereader + sybasereader + jar + + + 8 + 8 + + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + com.oracle + ojdbc6 + 11.2.0.3 + + + com.alibaba.datax + datax-common + 0.0.1-SNAPSHOT + compile + + + + com.sybase.jconnect + jconn4 + 16.0 + system + ${project.basedir}/libs/jconn4-16.0.jar + + + + junit + junit + 4.13.2 + test + + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + + \ No newline at end of file diff --git a/sybasereader/src/main/assembly/package.xml b/sybasereader/src/main/assembly/package.xml new file mode 100755 index 00000000..40060050 --- /dev/null +++ b/sybasereader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/sybasereader + + + target/ + + sybasereader-0.0.1-SNAPSHOT.jar + + plugin/reader/sybasereader + + + + + + false + plugin/reader/sybasereader/libs + runtime + + + diff --git a/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/Constants.java b/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/Constants.java new file mode 100755 index 00000000..2de97644 --- /dev/null +++ b/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/Constants.java @@ -0,0 +1,7 @@ +package com.alibaba.datax.plugin.reader.sybasereader; + +public class Constants { + + public static final int DEFAULT_FETCH_SIZE = 1024; + +} diff --git a/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseReader.java b/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseReader.java new file mode 100755 index 00000000..f0a0ac1a --- /dev/null +++ b/sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseReader.java @@ -0,0 +1,108 @@ +package com.alibaba.datax.plugin.reader.sybasereader; + +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.reader.Constant; + + +public class SybaseReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.Sybase; + + public static class Job extends Reader.Job { + private static final Logger LOG = LoggerFactory + .getLogger(SybaseReader.Job.class); + + private Configuration originalConfig = null; + private CommonRdbmsReader.Job commonRdbmsReaderJob; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + dealFetchSize(this.originalConfig); + + this.commonRdbmsReaderJob = new CommonRdbmsReader.Job( + DATABASE_TYPE); + this.commonRdbmsReaderJob.init(this.originalConfig); + + } + + @Override + public void preCheck(){ + init(); + this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); + } + + @Override + public List split(int adviceNumber) { + return this.commonRdbmsReaderJob.split(this.originalConfig, + adviceNumber); + } + + @Override + public void post() { + this.commonRdbmsReaderJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderJob.destroy(this.originalConfig); + } + + private void dealFetchSize(Configuration originalConfig) { + int fetchSize = originalConfig.getInt( + com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, + Constants.DEFAULT_FETCH_SIZE); + if (fetchSize < 1) { + LOG.warn("对 sybasereader 需要配置 fetchSize, 对性能提升有较大影响 请配置fetchSize."); + } + originalConfig.set( + com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, + fetchSize); + } + } + + public static class Task extends Reader.Task { + + private Configuration readerSliceConfig; + private CommonRdbmsReader.Task commonRdbmsReaderTask; + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderTask = new CommonRdbmsReader.Task( + DATABASE_TYPE ,super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderTask.init(this.readerSliceConfig); + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig + .getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); + + this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, + recordSender, super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderTask.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); + } + + } + +} diff --git a/sybasereader/src/main/resources/plugin.json b/sybasereader/src/main/resources/plugin.json new file mode 100755 index 00000000..39dd61d7 --- /dev/null +++ b/sybasereader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "sybasereader", + "class": "com.alibaba.datax.plugin.reader.sybasereader.SybaseReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/sybasereader/src/main/resources/plugin_job_template.json b/sybasereader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..5d5a1f45 --- /dev/null +++ b/sybasereader/src/main/resources/plugin_job_template.json @@ -0,0 +1,14 @@ +{ + "name": "sybasereader", + "parameter": { + "username": "", + "password": "", + "column": [], + "connection": [ + { + "table": [], + "jdbcUrl": [] + } + ] + } +} \ No newline at end of file diff --git a/sybasereader/src/test/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseDatabaseUnitTest.java b/sybasereader/src/test/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseDatabaseUnitTest.java new file mode 100644 index 00000000..f77caccd --- /dev/null +++ b/sybasereader/src/test/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseDatabaseUnitTest.java @@ -0,0 +1,55 @@ +package com.alibaba.datax.plugin.reader.sybasereader; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.sql.Statement; + +import static org.junit.Assert.assertEquals; + +public class SybaseDatabaseUnitTest { + private Connection connection; + + @Before + public void setUp() { + // 连接到 Sybase 数据库 + String jdbcUrl = "jdbc:sybase:Tds:192.172.172.80:1680/database"; + String username = "admin"; + String password = "admin123"; + + try { + connection = DriverManager.getConnection(jdbcUrl, username, password); + } catch (SQLException e) { + e.printStackTrace(); + } + } + + @After + public void tearDown() { + if (connection != null) { + try { + connection.close(); + } catch (SQLException e) { + e.printStackTrace(); + } + } + } + + @Test + public void testDatabaseQuery() throws SQLException { + String query = "SELECT COUNT(*) FROM your_table"; + int expectedRowCount = 10; // 假设期望返回的行数是 10 + + Statement statement = connection.createStatement(); + ResultSet resultSet = statement.executeQuery(query); + resultSet.next(); + int rowCount = resultSet.getInt(1); + + assertEquals(expectedRowCount, rowCount); + } +} diff --git a/sybasewriter/doc/sybasewriter.md b/sybasewriter/doc/sybasewriter.md new file mode 100644 index 00000000..cccc62d6 --- /dev/null +++ b/sybasewriter/doc/sybasewriter.md @@ -0,0 +1,228 @@ +# DataX SybaseWriter + + +--- + + +## 1 快速介绍 + +SybaseWriter 插件实现了写入数据到 Sybase 主库的目的表的功能。在底层实现上, SybaseWriter 通过 JDBC 连接远程 Sybase 数据库,并执行相应的 insert into ... 或者 ( replace into ...) 的 sql 语句将数据写入 Sybase,内部会分批次提交入库,需要数据库本身采用 innodb 引擎。 + +SybaseWriter 面向ETL开发工程师,他们使用 SybaseWriter 从数仓导入数据到 Sybase。同时 SybaseWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 + + +## 2 实现原理 + +SybaseWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置的 `writeMode` 生成 + + +* `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行) + +##### 或者 + +* `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 Sybase。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。 + +
    + + 注意:目的表所在数据库必须是主库才能写入数据;整个任务至少需要具备 insert/replace into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 这里使用一份从内存产生到 Sybase 导入的数据。 + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + } + }, + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column" : [ + { + "value": "DataX", + "type": "string" + }, + { + "value": 19880808, + "type": "long" + }, + { + "value": "1988-08-08 08:08:08", + "type": "date" + }, + { + "value": true, + "type": "bool" + }, + { + "value": "test", + "type": "bytes" + } + ], + "sliceRecordCount": 1000 + } + }, + "writer": { + "name": "Sybasewriter", + "parameter": { + "writeMode": "insert", + "username": "root", + "password": "root", + "column": [ + "id", + "name" + ], + "preSql": [ + "delete from test" + ], + "connection": [ + { + "jdbcUrl":"jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936", + "table": [ + "test" + ] + } + ] + } + } + } + ] + } +} + +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true + + 注意:1、在一个数据库上只能配置一个 jdbcUrl 值。这与 SybaseReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况) + 2、jdbcUrl按照Sybase官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Sybase官方文档或者咨询对应 DBA。 + + + * 必选:是
    + + * 默认值:无
    + +* **username** + + * 描述:目的数据库的用户名
    + + * 必选:是
    + + * 默认值:无
    + +* **password** + + * 描述:目的数据库的密码
    + + * 必选:是
    + + * 默认值:无
    + +* **table** + + * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 + + 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 + + * 必选:是
    + + * 默认值:无
    + +* **column** + + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 + + **column配置项必须指定,不能留空!** + + 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 + 2、 column 不能配置任何常量值 + + * 必选:是
    + + * 默认值:否
    + +* **preSql** + + * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from 表名"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
    + + * 必选:否
    + + * 默认值:无
    + +* **postSql** + + * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
    + + * 必选:否
    + + * 默认值:无
    + +* **writeMode** + + * 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句
    + + * 必选:是
    + + * 所有选项:insert/replace/update
    + + * 默认值:insert
    + +* **batchSize** + + * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Sybase的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
    + + * 必选:否
    + + * 默认值:1024
    + + +### 3.3 类型转换 + +类似 SybaseReader ,目前 SybaseWriter 支持大部分 Sybase 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出 SybaseWriter 针对 Sybase 类型转换列表: + + +| DataX 内部类型| Sybase 数据类型 | +| -------- | ----- | +| Long |Tinyint,Smallint,Int,Money,Smallmoney| +| Double |Float,Real,Numeric,Decimal| +| String |Char,Varchar,Nchar,Nvarchar,Text| +| Date |Timestamp,Datetime,Smalldatetime| +| Boolean |bit, bool| +| Bytes |Binary,Varbinary,Image| + +## 4 性能报告 + + +## 5 约束限制 + + + + +## FAQ + +*** + +**Q: 目前已验证支持sybase的版本?** + +A: Sybase ASE 16/15.7 + +**Q: SybaseReader同步报错,报错信息为XXX** + +A: 网络或者权限问题,请使用Sybase命令行或者可视化工具进行测试: +如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 diff --git a/sybasewriter/pom.xml b/sybasewriter/pom.xml new file mode 100644 index 00000000..c5b57549 --- /dev/null +++ b/sybasewriter/pom.xml @@ -0,0 +1,100 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + sybasewriter + + + 8 + 8 + + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + com.oracle + ojdbc6 + 11.2.0.3 + + + com.alibaba.datax + datax-common + 0.0.1-SNAPSHOT + compile + + + + com.sybase.jconnect + jconn4 + 16.0 + system + ${project.basedir}/libs/jconn4-16.0.jar + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + \ No newline at end of file diff --git a/sybasewriter/src/main/assembly/package.xml b/sybasewriter/src/main/assembly/package.xml new file mode 100755 index 00000000..15500d3d --- /dev/null +++ b/sybasewriter/src/main/assembly/package.xml @@ -0,0 +1,36 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/sybasewriter + + + target/ + + sybasewriter-0.0.1-SNAPSHOT.jar + + plugin/reader/sybasewriter + + + + + + false + plugin/reader/sybasewriter/libs + runtime + + + + diff --git a/sybasewriter/src/main/java/com/alibaba/datax/plugin/writer/sybasewriter/SybaseWriter.java b/sybasewriter/src/main/java/com/alibaba/datax/plugin/writer/sybasewriter/SybaseWriter.java new file mode 100755 index 00000000..51b90d66 --- /dev/null +++ b/sybasewriter/src/main/java/com/alibaba/datax/plugin/writer/sybasewriter/SybaseWriter.java @@ -0,0 +1,100 @@ +package com.alibaba.datax.plugin.writer.sybasewriter; + +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Key; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; + + +import java.util.List; + + +public class SybaseWriter extends Writer { + private static final DataBaseType DATABASE_TYPE = DataBaseType.Sybase; + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterJob; + + @Override + public void preCheck(){ + this.init(); + this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); + } + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonRdbmsWriterJob.init(this.originalConfig); + } + + // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) + @Override + public void prepare() { + //实跑先不支持 权限 检验 + //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); + this.commonRdbmsWriterJob.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); + } + + // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) + @Override + public void post() { + this.commonRdbmsWriterJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterJob.destroy(this.originalConfig); + } + + } + + public static class Task extends Writer.Task { + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task commonRdbmsWriterTask; + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); + this.commonRdbmsWriterTask.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); + } + + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, + super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterTask.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); + } + + @Override + public boolean supportFailOver(){ + String writeMode = writerSliceConfig.getString(Key.WRITE_MODE); + return "replace".equalsIgnoreCase(writeMode); + } + + } + + +} diff --git a/sybasewriter/src/main/java/resources/plugin.json b/sybasewriter/src/main/java/resources/plugin.json new file mode 100755 index 00000000..6bfa66f3 --- /dev/null +++ b/sybasewriter/src/main/java/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "sybasewriter", + "class": "com.alibaba.datax.plugin.reader.sybasewriter.SybaseWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/sybasewriter/src/main/java/resources/plugin_job_template.json b/sybasewriter/src/main/java/resources/plugin_job_template.json new file mode 100644 index 00000000..212f76b9 --- /dev/null +++ b/sybasewriter/src/main/java/resources/plugin_job_template.json @@ -0,0 +1,14 @@ +{ + "name": "sybasewriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "connection": [ + { + "table": [], + "jdbcUrl": [] + } + ] + } +} \ No newline at end of file diff --git a/tdengine20writer/doc/tdengine20writer.md b/tdengine20writer/doc/tdengine20writer.md index fbb0be92..ddb82db8 100644 --- a/tdengine20writer/doc/tdengine20writer.md +++ b/tdengine20writer/doc/tdengine20writer.md @@ -14,7 +14,7 @@ TDengineWriter can be used as a data migration tool for DBAs to import data from TDengineWriter obtains the protocol data generated by Reader through DataX framework, connects to TDengine through JDBC Driver, executes insert statement /schemaless statement, and writes the data to TDengine. -In TDengine, table can be divided into super table, sub-table and ordinary table. Super table and sub-table include Colum and Tag. The value of tag column of sub-table is fixed value. (details please refer to: [data model](https://www.taosdata.com/docs/cn/v2.0/architecture#model)) +In TDengine, table can be divided into super table, sub-table and ordinary table. Super table and sub-table include Column and Tag. The value of tag column of sub-table is fixed value. (details please refer to: [data model](https://www.taosdata.com/docs/cn/v2.0/architecture#model)) The TDengineWriter can write data to super tables, sub-tables, and ordinary tables using the following methods based on the type of the table and whether the column parameter contains TBName: diff --git a/transformer/doc/transformer.md b/transformer/doc/transformer.md index 247ab39b..0a00dbaa 100644 --- a/transformer/doc/transformer.md +++ b/transformer/doc/transformer.md @@ -42,7 +42,7 @@ dx_substr(1,"5","10") column 1的value为“dataxTest”=>"Test" * 举例: ``` dx_replace(1,"2","4","****") column 1的value为“dataxTest”=>"da****est" -dx_replace(1,"5","10","****") column 1的value为“dataxTest”=>"data****" +dx_replace(1,"5","10","****") column 1的value为“dataxTest”=>"datax****" ``` 4. dx_filter (关联filter暂不支持,即多个字段的联合判断,函参太过复杂,用户难以使用。) * 参数: @@ -59,7 +59,17 @@ dx_replace(1,"5","10","****") column 1的value为“dataxTest”=>"data****" dx_filter(1,"like","dataTest") dx_filter(1,">=","10") ``` -5. dx_groovy +5. dx_digest +* 参数:3个 + * 第一个参数:字段编号,对应record中第几个字段。 + * 第二个参数:hash类型,md5、sha1 + * 第三个参数:hash值大小写 toUpperCase(大写)、toLowerCase(小写) +* 返回: 返回指定类型的hashHex,如果字段为空,则转为空字符串,再返回对应hashHex +* 举例: +``` +dx_digest(1,"md5","toUpperCase"), column 1的值为 xyzzzzz => 9CDFFC4FA4E45A99DB8BBCD762ACFFA2 +``` +6. dx_groovy * 参数。 * 第一个参数: groovy code * 第二个参数(列表或者为空):extraPackage @@ -67,7 +77,9 @@ dx_filter(1,">=","10") * dx_groovy只能调用一次。不能多次调用。 * groovy code中支持java.lang, java.util的包,可直接引用的对象有record,以及element下的各种column(BoolColumn.class,BytesColumn.class,DateColumn.class,DoubleColumn.class,LongColumn.class,StringColumn.class)。不支持其他包,如果用户有需要用到其他包,可设置extraPackage,注意extraPackage不支持第三方jar包。 * groovy code中,返回更新过的Record(比如record.setColumn(columnIndex, new StringColumn(newValue));),或者null。返回null表示过滤此行。 - * 用户可以直接调用静态的Util方式(GroovyTransformerStaticUtil),目前GroovyTransformerStaticUtil的方法列表 (按需补充): + * 用户可以直接调用静态的Util方式(GroovyTransformerStaticUtil),目前GroovyTransformerStaticUtil的方法列表: + * md5(String):String + * sha1(String):String * 举例: ``` groovy 实现的subStr: @@ -109,7 +121,7 @@ String code3 = "Column column = record.getColumn(1);\n" + ``` ## Job定义 -* 本例中,配置3个UDF。 +* 本例中,配置4个UDF。 ``` { @@ -176,6 +188,14 @@ String code3 = "Column column = record.getColumn(1);\n" + "paras":["3","4","****"] } }, + { + "name": "dx_digest", + "parameter": + { + "columnIndex":3, + "paras":["md5", "toLowerCase"] + } + }, { "name": "dx_groovy", "parameter": diff --git a/tsdbreader/pom.xml b/tsdbreader/pom.xml index 0f990234..4b3f58c6 100644 --- a/tsdbreader/pom.xml +++ b/tsdbreader/pom.xml @@ -24,9 +24,6 @@ 4.5 2.4 - - 1.2.28 - 4.13.1 @@ -44,10 +41,6 @@ slf4j-log4j12 org.slf4j - - fastjson - com.alibaba - commons-math3 org.apache.commons @@ -89,9 +82,8 @@ - com.alibaba - fastjson - ${fastjson.version} + com.alibaba.fastjson2 + fastjson2 diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReader.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReader.java index 550a010a..1f8c3d18 100755 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReader.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReader.java @@ -6,7 +6,7 @@ import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.tsdbreader.conn.TSDBConnection; import com.alibaba.datax.plugin.reader.tsdbreader.util.TimeUtils; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.joda.time.DateTime; import org.slf4j.Logger; diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4MultiFieldsTSDB.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4MultiFieldsTSDB.java index 5b380c73..3e8d43d4 100644 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4MultiFieldsTSDB.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4MultiFieldsTSDB.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.reader.tsdbreader.conn; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import java.util.Map; diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4TSDB.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4TSDB.java index 5c5c1349..8724bfbb 100644 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4TSDB.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4TSDB.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.reader.tsdbreader.conn; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import java.util.Map; diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnection.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnection.java index d466da39..479c16c1 100644 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnection.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnection.java @@ -2,7 +2,7 @@ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.tsdbreader.util.TSDBUtils; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBDump.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBDump.java index c911a062..05b9c5c2 100644 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBDump.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBDump.java @@ -4,8 +4,10 @@ import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.tsdbreader.Constant; import com.alibaba.datax.plugin.reader.tsdbreader.util.HttpUtils; -import com.alibaba.fastjson.JSON; -import com.alibaba.fastjson.parser.Feature; +import com.alibaba.fastjson2.JSON; +import com.alibaba.fastjson2.JSONReader; +import com.alibaba.fastjson2.JSONReader.Feature; +import com.alibaba.fastjson2.JSONWriter; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -29,7 +31,7 @@ final class TSDBDump { private static final String QUERY_MULTI_FIELD = "/api/mquery"; static { - JSON.DEFAULT_PARSER_FEATURE &= ~Feature.UseBigDecimal.getMask(); + JSON.config(Feature.UseBigDecimalForDoubles); } private TSDBDump() { diff --git a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/HttpUtils.java b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/HttpUtils.java index 5cba4e54..af81988c 100644 --- a/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/HttpUtils.java +++ b/tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/HttpUtils.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.reader.tsdbreader.util; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; diff --git a/tsdbreader/src/main/resources/plugin.json b/tsdbreader/src/main/resources/plugin.json index f2dbb1f0..3b10d228 100755 --- a/tsdbreader/src/main/resources/plugin.json +++ b/tsdbreader/src/main/resources/plugin.json @@ -6,5 +6,5 @@ "mechanism": "通过 /api/query 接口查询出符合条件的数据点", "warn": "指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00)" }, - "developer": "Benedict Jin" + "developer": "alibaba" } diff --git a/tsdbwriter/pom.xml b/tsdbwriter/pom.xml index 6f2bac52..9f997123 100644 --- a/tsdbwriter/pom.xml +++ b/tsdbwriter/pom.xml @@ -24,9 +24,6 @@ 4.5 2.4 - - 1.2.28 - 4.13.1 @@ -41,10 +38,6 @@ slf4j-log4j12 org.slf4j - - fastjson - com.alibaba - commons-math3 org.apache.commons @@ -86,9 +79,8 @@ - com.alibaba - fastjson - ${fastjson.version} + com.alibaba.fastjson2 + fastjson2 diff --git a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/DataPoint4TSDB.java b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/DataPoint4TSDB.java index fee012df..b6e2d309 100644 --- a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/DataPoint4TSDB.java +++ b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/DataPoint4TSDB.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.writer.conn; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import java.util.Map; diff --git a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/TSDBConnection.java b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/TSDBConnection.java index 074f0295..5266f5d9 100644 --- a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/TSDBConnection.java +++ b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/TSDBConnection.java @@ -2,7 +2,7 @@ package com.alibaba.datax.plugin.writer.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.writer.util.TSDBUtils; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; diff --git a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBConverter.java b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBConverter.java index 86e35c56..9bde0c9e 100644 --- a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBConverter.java +++ b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBConverter.java @@ -2,7 +2,7 @@ package com.alibaba.datax.plugin.writer.tsdbwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import com.aliyun.hitsdb.client.value.request.MultiFieldPoint; import com.aliyun.hitsdb.client.value.request.Point; import org.slf4j.Logger; diff --git a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/HttpUtils.java b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/HttpUtils.java index 29b14dab..97055adc 100644 --- a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/HttpUtils.java +++ b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/HttpUtils.java @@ -1,6 +1,6 @@ package com.alibaba.datax.plugin.writer.util; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; diff --git a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/TSDBUtils.java b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/TSDBUtils.java index d57c5935..83250b32 100644 --- a/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/TSDBUtils.java +++ b/tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/TSDBUtils.java @@ -1,7 +1,7 @@ package com.alibaba.datax.plugin.writer.util; import com.alibaba.datax.plugin.writer.conn.DataPoint4TSDB; -import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/tsdbwriter/src/main/resources/plugin.json b/tsdbwriter/src/main/resources/plugin.json index 78c8273f..26f927c2 100755 --- a/tsdbwriter/src/main/resources/plugin.json +++ b/tsdbwriter/src/main/resources/plugin.json @@ -6,5 +6,5 @@ "mechanism": "调用 TSDB 的 /api/put 接口,实现数据点的写入", "warn": "" }, - "developer": "Benedict Jin" + "developer": "alibaba" } diff --git a/txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReader.java b/txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReader.java index 914305c6..a74ef8fc 100755 --- a/txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReader.java +++ b/txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReader.java @@ -182,6 +182,7 @@ public class TxtFileReader extends Reader { delimiterInStr)); } + UnstructuredStorageReaderUtil.validateCsvReaderConfig(this.originConfig); } @Override diff --git a/userGuid.md b/userGuid.md index 16771a5e..badb1b4e 100644 --- a/userGuid.md +++ b/userGuid.md @@ -17,7 +17,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 * 工具部署 - * 方法一、直接下载DataX工具包:[DataX下载地址](http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz) + * 方法一、直接下载DataX工具包:[DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz) 下载后解压至本地某个目录,进入bin目录,即可运行同步作业: