diff --git a/README.md b/README.md index 80cf3c0f..37a21022 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,14 @@ ![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/DataX-logo.jpg) - # DataX -DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。 +DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。 +# DataX 商业版本 +阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。 + +商业版本参见: https://www.aliyun.com/product/bigdata/ide # Features @@ -36,6 +39,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N | ------------ | ---------- | :-------: | :-------: |:-------: | | RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| |             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| +|             | OceanBase  |     √     |     √     |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)| | | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| | | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| | | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| @@ -49,7 +53,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N | | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)| | | Phoenix4.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md)| | | Phoenix5.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md)| -| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)| +| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md)| | | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| | | Cassandra | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md)| | 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)| @@ -59,9 +63,33 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N | 时间序列数据库 | OpenTSDB | √ | |[读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md)| | | TSDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md)| +# 阿里云DataWorks数据集成 + +目前DataX的已有能力已经全部融和进阿里云的数据集成,并且比DataX更加高效、安全,同时数据集成具备DataX不具备的其它高级特性和功能。可以理解为数据集成是DataX的全面升级的商业化用版本,为企业可以提供稳定、可靠、安全的数据传输服务。与DataX相比,数据集成主要有以下几大突出特点: + +支持实时同步: + +- 功能简介:https://help.aliyun.com/document_detail/181912.html +- 支持的数据源:https://help.aliyun.com/document_detail/146778.html +- 支持数据处理:https://help.aliyun.com/document_detail/146777.html + +离线同步数据源种类大幅度扩充: + +- 新增比如:DB2、Kafka、Hologres、MetaQ、SAPHANA、达梦等等,持续扩充中 +- 离线同步支持的数据源:https://help.aliyun.com/document_detail/137670.html +- 具备同步解决方案: + - 解决方案系统:https://help.aliyun.com/document_detail/171765.html + - 一键全增量:https://help.aliyun.com/document_detail/175676.html + - 整库迁移:https://help.aliyun.com/document_detail/137809.html + - 批量上云:https://help.aliyun.com/document_detail/146671.html + - 更新更多能力请访问:https://help.aliyun.com/document_detail/137663.html + + # 我要开发新的插件 + 请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md) + # 项目成员 核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时 @@ -108,7 +136,23 @@ This software is free to use under the Apache License [Apache license](https://g 8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑; 9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。 ```` -钉钉用户群:23169395 +钉钉用户群: +- DataX开源用户交流群 + - +- DataX开源用户交流群2 + - + +- DataX开源用户交流群3 + - + +- DataX开源用户交流群4 + - + +- DataX开源用户交流群5 + - + +- DataX开源用户交流群6 + - diff --git a/clickhousewriter/pom.xml b/clickhousewriter/pom.xml new file mode 100644 index 00000000..76c5fb1f --- /dev/null +++ b/clickhousewriter/pom.xml @@ -0,0 +1,88 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + + 4.0.0 + clickhousewriter + clickhousewriter + jar + + + + ru.yandex.clickhouse + clickhouse-jdbc + 0.2.4 + + + com.alibaba.datax + datax-core + ${datax-project-version} + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + + + src/main/java + + **/*.properties + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + \ No newline at end of file diff --git a/clickhousewriter/src/main/assembly/package.xml b/clickhousewriter/src/main/assembly/package.xml new file mode 100755 index 00000000..d1128bd1 --- /dev/null +++ b/clickhousewriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/clickhousewriter + + + target/ + + clickhousewriter-0.0.1-SNAPSHOT.jar + + plugin/writer/clickhousewriter + + + + + + false + plugin/writer/clickhousewriter/libs + runtime + + + diff --git a/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java new file mode 100644 index 00000000..b928d421 --- /dev/null +++ b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java @@ -0,0 +1,329 @@ +package com.alibaba.datax.plugin.writer.clickhousewriter; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.common.exception.CommonErrorCode; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.fastjson.JSON; +import com.alibaba.fastjson.JSONArray; + +import java.sql.Array; +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.sql.Timestamp; +import java.sql.Types; +import java.util.List; +import java.util.regex.Pattern; + +public class ClickhouseWriter extends Writer { + private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse; + + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterMaster; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonRdbmsWriterMaster.init(this.originalConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterMaster.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); + } + + @Override + public void post() { + this.commonRdbmsWriterMaster.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterMaster.destroy(this.originalConfig); + } + } + + public static class Task extends Writer.Task { + private Configuration writerSliceConfig; + + private CommonRdbmsWriter.Task commonRdbmsWriterSlave; + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + + this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE) { + @Override + protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, Column column) throws SQLException { + try { + if (column.getRawData() == null) { + preparedStatement.setNull(columnIndex + 1, columnSqltype); + return preparedStatement; + } + + java.util.Date utilDate; + switch (columnSqltype) { + case Types.CHAR: + case Types.NCHAR: + case Types.CLOB: + case Types.NCLOB: + case Types.VARCHAR: + case Types.LONGVARCHAR: + case Types.NVARCHAR: + case Types.LONGNVARCHAR: + preparedStatement.setString(columnIndex + 1, column + .asString()); + break; + + case Types.TINYINT: + case Types.SMALLINT: + case Types.INTEGER: + case Types.BIGINT: + case Types.DECIMAL: + case Types.FLOAT: + case Types.REAL: + case Types.DOUBLE: + String strValue = column.asString(); + if (emptyAsNull && "".equals(strValue)) { + preparedStatement.setNull(columnIndex + 1, columnSqltype); + } else { + switch (columnSqltype) { + case Types.TINYINT: + case Types.SMALLINT: + case Types.INTEGER: + preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); + break; + case Types.BIGINT: + preparedStatement.setLong(columnIndex + 1, column.asLong()); + break; + case Types.DECIMAL: + preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal()); + break; + case Types.REAL: + case Types.FLOAT: + preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue()); + break; + case Types.DOUBLE: + preparedStatement.setDouble(columnIndex + 1, column.asDouble()); + break; + } + } + break; + + case Types.DATE: + if (this.resultSetMetaData.getRight().get(columnIndex) + .equalsIgnoreCase("year")) { + if (column.asBigInteger() == null) { + preparedStatement.setString(columnIndex + 1, null); + } else { + preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); + } + } else { + java.sql.Date sqlDate = null; + try { + utilDate = column.asDate(); + } catch (DataXException e) { + throw new SQLException(String.format( + "Date 类型转换错误:[%s]", column)); + } + + if (null != utilDate) { + sqlDate = new java.sql.Date(utilDate.getTime()); + } + preparedStatement.setDate(columnIndex + 1, sqlDate); + } + break; + + case Types.TIME: + java.sql.Time sqlTime = null; + try { + utilDate = column.asDate(); + } catch (DataXException e) { + throw new SQLException(String.format( + "Date 类型转换错误:[%s]", column)); + } + + if (null != utilDate) { + sqlTime = new java.sql.Time(utilDate.getTime()); + } + preparedStatement.setTime(columnIndex + 1, sqlTime); + break; + + case Types.TIMESTAMP: + Timestamp sqlTimestamp = null; + if (column instanceof StringColumn && column.asString() != null) { + String timeStampStr = column.asString(); + // JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式 + String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+"; + boolean isMatch = Pattern.matches(pattern, timeStampStr); + if (isMatch) { + sqlTimestamp = Timestamp.valueOf(timeStampStr); + preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); + break; + } + } + try { + utilDate = column.asDate(); + } catch (DataXException e) { + throw new SQLException(String.format( + "Date 类型转换错误:[%s]", column)); + } + + if (null != utilDate) { + sqlTimestamp = new Timestamp( + utilDate.getTime()); + } + preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); + break; + + case Types.BINARY: + case Types.VARBINARY: + case Types.BLOB: + case Types.LONGVARBINARY: + preparedStatement.setBytes(columnIndex + 1, column + .asBytes()); + break; + + case Types.BOOLEAN: + preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); + break; + + // warn: bit(1) -> Types.BIT 可使用setBoolean + // warn: bit(>1) -> Types.VARBINARY 可使用setBytes + case Types.BIT: + if (this.dataBaseType == DataBaseType.MySql) { + Boolean asBoolean = column.asBoolean(); + if (asBoolean != null) { + preparedStatement.setBoolean(columnIndex + 1, asBoolean); + } else { + preparedStatement.setNull(columnIndex + 1, Types.BIT); + } + } else { + preparedStatement.setString(columnIndex + 1, column.asString()); + } + break; + + default: + boolean isHandled = fillPreparedStatementColumnType4CustomType(preparedStatement, + columnIndex, columnSqltype, column); + if (isHandled) { + break; + } + throw DataXException + .asDataXException( + DBUtilErrorCode.UNSUPPORTED_TYPE, + String.format( + "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", + this.resultSetMetaData.getLeft() + .get(columnIndex), + this.resultSetMetaData.getMiddle() + .get(columnIndex), + this.resultSetMetaData.getRight() + .get(columnIndex))); + } + return preparedStatement; + } catch (DataXException e) { + // fix类型转换或者溢出失败时,将具体哪一列打印出来 + if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT || + e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) { + throw DataXException + .asDataXException( + e.getErrorCode(), + String.format( + "类型转化错误. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", + this.resultSetMetaData.getLeft() + .get(columnIndex), + this.resultSetMetaData.getMiddle() + .get(columnIndex), + this.resultSetMetaData.getRight() + .get(columnIndex))); + } else { + throw e; + } + } + } + + private Object toJavaArray(Object val) { + if (null == val) { + return null; + } else if (val instanceof JSONArray) { + Object[] valArray = ((JSONArray) val).toArray(); + for (int i = 0; i < valArray.length; i++) { + valArray[i] = this.toJavaArray(valArray[i]); + } + return valArray; + } else { + return val; + } + } + + boolean fillPreparedStatementColumnType4CustomType(PreparedStatement ps, + int columnIndex, int columnSqltype, + Column column) throws SQLException { + switch (columnSqltype) { + case Types.OTHER: + if (this.resultSetMetaData.getRight().get(columnIndex).startsWith("Tuple")) { + throw DataXException + .asDataXException(ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR, ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR.getDescription()); + } else { + ps.setString(columnIndex + 1, column.asString()); + } + return true; + + case Types.ARRAY: + Connection conn = ps.getConnection(); + List values = JSON.parseArray(column.asString(), Object.class); + for (int i = 0; i < values.size(); i++) { + values.set(i, this.toJavaArray(values.get(i))); + } + Array array = conn.createArrayOf("String", values.toArray()); + ps.setArray(columnIndex + 1, array); + return true; + + default: + break; + } + + return false; + } + }; + + this.commonRdbmsWriterSlave.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); + } + + @Override + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterSlave.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); + } + } + +} \ No newline at end of file diff --git a/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriterErrorCode.java b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriterErrorCode.java new file mode 100644 index 00000000..4fc63ae1 --- /dev/null +++ b/clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriterErrorCode.java @@ -0,0 +1,31 @@ +package com.alibaba.datax.plugin.writer.clickhousewriter; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum ClickhouseWriterErrorCode implements ErrorCode { + TUPLE_NOT_SUPPORTED_ERROR("ClickhouseWriter-00", "不支持TUPLE类型导入."), + ; + + private final String code; + private final String description; + + private ClickhouseWriterErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s].", this.code, this.description); + } +} diff --git a/clickhousewriter/src/main/resources/plugin.json b/clickhousewriter/src/main/resources/plugin.json new file mode 100755 index 00000000..ff1acf01 --- /dev/null +++ b/clickhousewriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "clickhousewriter", + "class": "com.alibaba.datax.plugin.writer.clickhousewriter.ClickhouseWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql.", + "developer": "jiye.tjy" +} \ No newline at end of file diff --git a/clickhousewriter/src/main/resources/plugin_job_template.json b/clickhousewriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..2e1ceed0 --- /dev/null +++ b/clickhousewriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,21 @@ +{ + "name": "clickhousewriter", + "parameter": { + "username": "username", + "password": "password", + "column": ["col1", "col2", "col3"], + "connection": [ + { + "jdbcUrl": "jdbc:clickhouse://:[/]", + "table": ["table1", "table2"] + } + ], + "preSql": [], + "postSql": [], + + "batchSize": 65536, + "batchByteSize": 134217728, + "dryRun": false, + "writeMode": "insert" + } +} \ No newline at end of file diff --git a/core/pom.xml b/core/pom.xml index ea3d53e7..174a18d3 100755 --- a/core/pom.xml +++ b/core/pom.xml @@ -41,12 +41,12 @@ org.apache.httpcomponents httpclient - 4.4 + 4.5 org.apache.httpcomponents fluent-hc - 4.4 + 4.5 org.slf4j diff --git a/core/src/main/bin/perftrace.py b/core/src/main/bin/perftrace.py index 41a1ecb3..b9c79a43 100755 --- a/core/src/main/bin/perftrace.py +++ b/core/src/main/bin/perftrace.py @@ -174,6 +174,9 @@ def parsePluginName(jdbcUrl, pluginType): db2Regex = re.compile('jdbc:(db2)://.*') if (db2Regex.match(jdbcUrl)): name = 'db2' + kingbaseesRegex = re.compile('jdbc:(kingbase8)://.*') + if (kingbaseesRegex.match(jdbcUrl)): + name = 'kingbasees' return "%s%s" % (name, pluginType) def renderDataXJson(paramsDict, readerOrWriter = 'reader', channel = 1): diff --git a/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java b/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java index 50f1cf7b..26b2989f 100755 --- a/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java +++ b/core/src/main/java/com/alibaba/datax/core/job/JobContainer.java @@ -427,7 +427,7 @@ public class JobContainer extends AbstractContainer { Long channelLimitedByteSpeed = this.configuration .getLong(CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE); if (channelLimitedByteSpeed == null || channelLimitedByteSpeed <= 0) { - DataXException.asDataXException( + throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, "在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数"); } @@ -448,7 +448,7 @@ public class JobContainer extends AbstractContainer { Long channelLimitedRecordSpeed = this.configuration.getLong( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD); if (channelLimitedRecordSpeed == null || channelLimitedRecordSpeed <= 0) { - DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, + throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "在有总tps限速条件下,单个channel的tps值不能为空,也不能为非正数"); } diff --git a/dataxPluginDev.md b/dataxPluginDev.md index e4828d5a..4483f270 100644 --- a/dataxPluginDev.md +++ b/dataxPluginDev.md @@ -111,7 +111,7 @@ public class SomeReader extends Reader { ``` `Job`接口功能如下: -- `init`: Job对象初始化工作,测试可以通过`super.getPluginJobConf()`获取与本插件相关的配置。读插件获得配置中`reader`部分,写插件获得`writer`部分。 +- `init`: Job对象初始化工作,此时可以通过`super.getPluginJobConf()`获取与本插件相关的配置。读插件获得配置中`reader`部分,写插件获得`writer`部分。 - `prepare`: 全局准备工作,比如odpswriter清空目标表。 - `split`: 拆分`Task`。参数`adviceNumber`框架建议的拆分数,一般是运行时所配置的并发度。值返回的是`Task`的配置列表。 - `post`: 全局的后置工作,比如mysqlwriter同步完影子表后的rename操作。 @@ -155,7 +155,7 @@ public class SomeReader extends Reader { ``` - `name`: 插件名称,大小写敏感。框架根据用户在配置文件中指定的名称来搜寻插件。 **十分重要** 。 -- `class`: 入口类的全限定名称,框架通过反射穿件入口类的实例。**十分重要** 。 +- `class`: 入口类的全限定名称,框架通过反射插件入口类的实例。**十分重要** 。 - `description`: 描述信息。 - `developer`: 开发人员。 @@ -435,7 +435,7 @@ DataX的内部类型在实现上会选用不同的java类型: #### 如何处理脏数据 -在`Reader.Task`和`Writer.Task`中,功过`AbstractTaskPlugin.getPluginCollector()`可以拿到一个`TaskPluginCollector`,它提供了一系列`collectDirtyRecord`的方法。当脏数据出现时,只需要调用合适的`collectDirtyRecord`方法,把被认为是脏数据的`Record`传入即可。 +在`Reader.Task`和`Writer.Task`中,通过`AbstractTaskPlugin.getTaskPluginCollector()`可以拿到一个`TaskPluginCollector`,它提供了一系列`collectDirtyRecord`的方法。当脏数据出现时,只需要调用合适的`collectDirtyRecord`方法,把被认为是脏数据的`Record`传入即可。 用户可以在任务的配置中指定脏数据限制条数或者百分比限制,当脏数据超出限制时,框架会结束同步任务,退出。插件需要保证脏数据都被收集到,其他工作交给框架就好。 @@ -468,4 +468,4 @@ DataX的内部类型在实现上会选用不同的java类型: - 测试参数集(多组),系统参数(比如并发数),插件参数(比如batchSize) - 不同参数下同步速度(Rec/s, MB/s),机器负载(load, cpu)等,对数据源压力(load, cpu, mem等)。 6. **约束限制**:是否存在其他的使用限制条件。 -7. **FQA**:用户经常会遇到的问题。 +7. **FAQ**:用户经常会遇到的问题。 diff --git a/drdsreader/doc/drdsreader.md b/drdsreader/doc/drdsreader.md index 25df9200..c54e6bd1 100644 --- a/drdsreader/doc/drdsreader.md +++ b/drdsreader/doc/drdsreader.md @@ -50,7 +50,7 @@ DRDS的插件目前DataX只适配了Mysql引擎的场景,DRDS对于DataX而言 // 数据库连接密码 "password": "root", "column": [ - "id","name" + "id","name" ], "connection": [ { diff --git a/drdsreader/pom.xml b/drdsreader/pom.xml index 71c7108d..e38884ab 100755 --- a/drdsreader/pom.xml +++ b/drdsreader/pom.xml @@ -42,7 +42,7 @@ mysql mysql-connector-java - 5.1.34 + ${mysql.driver.version} diff --git a/drdswriter/pom.xml b/drdswriter/pom.xml index 6a361760..35a7f28d 100755 --- a/drdswriter/pom.xml +++ b/drdswriter/pom.xml @@ -44,7 +44,7 @@ mysql mysql-connector-java - 5.1.34 + ${mysql.driver.version} diff --git a/elasticsearchwriter/pom.xml b/elasticsearchwriter/pom.xml index 2a246805..a60dbd88 100644 --- a/elasticsearchwriter/pom.xml +++ b/elasticsearchwriter/pom.xml @@ -50,7 +50,7 @@ junit junit - 4.11 + 4.13.1 test diff --git a/ftpwriter/doc/ftpwriter.md b/ftpwriter/doc/ftpwriter.md index bf2e726f..6b1b2687 100644 --- a/ftpwriter/doc/ftpwriter.md +++ b/ftpwriter/doc/ftpwriter.md @@ -63,6 +63,7 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结 "nullFormat": "null", "dateFormat": "yyyy-MM-dd", "fileFormat": "csv", + "suffix": ".csv", "header": [] } } @@ -200,6 +201,14 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结 * 必选:否
* 默认值:text
+ +* **suffix** + + * 描述:最后输出文件的后缀,当前支持 ".text"以及".csv" + + * 必选:否
+ + * 默认值:""
* **header** diff --git a/gdbreader/doc/gdbreader.md b/gdbreader/doc/gdbreader.md new file mode 100644 index 00000000..e883f20d --- /dev/null +++ b/gdbreader/doc/gdbreader.md @@ -0,0 +1,260 @@ + +# DataX GDBReader + +## 1. 快速介绍 + +GDBReader插件实现读取GDB实例数据的功能,通过`Gremlin Client`连接远程GDB实例,按配置提供的`label`生成查询DSL,遍历点或边数据,包括属性数据,并将数据写入到Record中给到Writer使用。 + +## 2. 实现原理 + +GDBReader使用`Gremlin Client`连接GDB实例,按`label`分不同Task取点或边数据。 +单个Task中按`label`遍历点或边的id,再切分范围分多次请求查询点或边和属性数据,最后将点或边数据根据配置转换成指定格式记录发送给下游写插件。 + +GDBReader按`label`切分多个Task并发,同一个`label`的数据批量异步获取来加快读取速度。如果配置读取的`label`列表为空,任务启动前会从GDB查询所有`label`再切分Task。 + +## 3. 功能说明 + +GDB中点和边不同,读取需要区分点和边点配置。 + +### 3.1 点配置样例 + +``` +{ + "job": { + "setting": { + "speed": { + "channel": 1 + } + "errorLimit": { + "record": 1 + } + }, + + "content": [ + { + "reader": { + "name": "gdbreader", + "parameter": { + "host": "10.218.145.24", + "port": 8182, + "username": "***", + "password": "***", + "fetchBatchSize": 100, + "rangeSplitSize": 1000, + "labelType": "VERTEX", + "labels": ["label1", "label2"], + "column": [ + { + "name": "id", + "type": "string", + "columnType": "primaryKey" + }, + { + "name": "label", + "type": "string", + "columnType": "primaryLabel" + }, + { + "name": "age", + "type": "int", + "columnType": "vertexProperty" + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ] + } +} +``` + +### 3.2 边配置样例 + +``` +{ + "job": { + "setting": { + "speed": { + "channel": 1 + }, + "errorLimit": { + "record": 1 + } + }, + + "content": [ + { + "reader": { + "name": "gdbreader", + "parameter": { + "host": "10.218.145.24", + "port": 8182, + "username": "***", + "password": "***", + "fetchBatchSize": 100, + "rangeSplitSize": 1000, + "labelType": "EDGE", + "labels": ["label1", "label2"], + "column": [ + { + "name": "id", + "type": "string", + "columnType": "primaryKey" + }, + { + "name": "label", + "type": "string", + "columnType": "primaryLabel" + }, + { + "name": "srcId", + "type": "string", + "columnType": "srcPrimaryKey" + }, + { + "name": "srcLabel", + "type": "string", + "columnType": "srcPrimaryLabel" + }, + { + "name": "dstId", + "type": "string", + "columnType": "srcPrimaryKey" + }, + { + "name": "dstLabel", + "type": "string", + "columnType": "srcPrimaryLabel" + }, + { + "name": "name", + "type": "string", + "columnType": "edgeProperty" + }, + { + "name": "weight", + "type": "double", + "columnType": "edgeProperty" + } + ] + } + }, + + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ] + } +} +``` + +### 3.3 参数说明 + +* **host** + * 描述:GDB实例连接地址,对应'实例管理'->'基本信息'页面的网络地址 + * 必选:是 + * 默认值:无 + +* **port** + * 描述:GDB实例连接地址对应的端口 + * 必选:是 + * 默认值:8182 + +* **username** + * 描述:GDB实例账号名 + * 必选:是 + * 默认值:无 + +* **password** + * 描述:GDB实例账号名对应的密码 + * 必选:是 + * 默认值:无 + +* **fetchBatchSize** + * 描述:一次GDB请求读取点或边的数量,响应包含点或边以及属性 + * 必选:是 + * 默认值:100 + +* **rangeSplitSize** + * 描述:id遍历,一次遍历请求扫描的id个数 + * 必选:是 + * 默认值:10 \* fetchBatchSize + +* **labels** + * 描述:标签数组,即需要导出的点或边标签,支持读取多个标签,用数组表示。如果留空([]),表示GDB中所有点或边标签 + * 必选:是 + * 默认值:无 + +* **labelType** + * 描述:数据标签类型,支持点、边两种枚举值 + * VERTEX:表示点 + * EDGE:表示边 + * 必选:是 + * 默认值:无 + +* **column** + * 描述:点或边字段映射关系配置 + * 必选:是 + * 默认值:无 + +* **column -> name** + * 描述:点或边映射关系的字段名,指定属性时表示读取的属性名,读取其他字段时会被忽略 + * 必选:是 + * 默认值:无 + +* **column -> type** + * 描述:点或边映射关系的字段类型 + * id, label在GDB中都是string类型,配置非string类型时可能会转换失败 + * 普通属性支持基础类型,包括int, long, float, double, boolean, string + * GDBReader尽量将读取到的数据转换成配置要求的类型,但转换失败会导致该条记录错误 + * 必选:是 + * 默认值:无 + +* **column -> columnType** + * 描述:GDB点或边数据到列数据的映射关系,支持以下枚举值: + * primaryKey: 表示该字段是点或边的id + * primaryLabel: 表示该字段是点或边的label + * srcPrimaryKey: 表示该字段是边关联的起点id,只在读取边时使用 + * srcPrimaryLabel: 表示该字段是边关联的起点label,只在读取边时使用 + * dstPrimaryKey: 表示该字段是边关联的终点id,只在读取边时使用 + * dstPrimaryLabel: 表示该字段是边关联的终点label,只在读取边时使用 + * vertexProperty: 表示该字段是点的属性,只在读取点时使用,应用到SET属性时只读取其中的一个属性值 + * vertexJsonProperty: 表示该字段是点的属性集合,只在读取点时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他vertexProperty配置一起使用 + * edgeProperty: 表示该字段是边的属性,只在读取边时使用 + * edgeJsonProperty: 表示该字段是边的属性集合,只在读取边时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他edgeProperty配置一起使用 + * 必选:是 + * 默认值:无 + * vertexJsonProperty格式示例,新增`c`字段区分SET属性,但是SET属性只包含单个属性值时会标记成普通属性 + ``` + {"properties":[ + {"k":"name","t","string","v":"Jack","c":"set"}, + {"k":"name","t","string","v":"Luck","c":"set"}, + {"k":"age","t","int","v":"20","c":"single"} + ]} + ``` + * edgeJsonProperty格式示例,边不支持多值属性 + ``` + {"properties":[ + {"k":"created_at","t","long","v":"153498653"}, + {"k":"weight","t","double","v":"3.14"} + ]} + +## 4 性能报告 +(TODO) + +## 5 使用约束 +无 + +## 6 FAQ +无 + diff --git a/gdbreader/pom.xml b/gdbreader/pom.xml new file mode 100644 index 00000000..a226a21f --- /dev/null +++ b/gdbreader/pom.xml @@ -0,0 +1,125 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + gdbreader + + com.alibaba.datax + 0.0.1-SNAPSHOT + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + com.alibaba.datax + datax-core + ${datax-project-version} + test + + + slf4j-log4j12 + org.slf4j + + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + org.apache.tinkerpop + gremlin-driver + 3.4.1 + + + org.projectlombok + lombok + 1.18.8 + + + org.junit.jupiter + junit-jupiter-api + 5.4.0 + test + + + org.junit.jupiter + junit-jupiter-engine + 5.4.0 + test + + + + + + + + + maven-compiler-plugin + + 1.6 + 1.6 + ${project-sourceEncoding} + + + + + org.apache.maven.plugins + maven-surefire-plugin + 2.22.0 + + + **/*Test*.class + + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 8 + 8 + + + + + diff --git a/gdbreader/src/main/assembly/package.xml b/gdbreader/src/main/assembly/package.xml new file mode 100644 index 00000000..c834c2f2 --- /dev/null +++ b/gdbreader/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/gdbreader + + + target/ + + gdbreader-0.0.1-SNAPSHOT.jar + + plugin/reader/gdbreader + + + + + + false + plugin/reader/gdbreader/libs + runtime + + + diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReader.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReader.java new file mode 100644 index 00000000..025e7b51 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReader.java @@ -0,0 +1,231 @@ +package com.alibaba.datax.plugin.reader.gdbreader; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.gdbreader.mapping.DefaultGdbMapper; +import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRule; +import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRuleFactory; +import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement; +import com.alibaba.datax.plugin.reader.gdbreader.model.GdbGraph; +import com.alibaba.datax.plugin.reader.gdbreader.model.ScriptGdbGraph; +import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper; +import org.apache.tinkerpop.gremlin.driver.ResultSet; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.LinkedList; +import java.util.List; + +public class GdbReader extends Reader { + private final static int DEFAULT_FETCH_BATCH_SIZE = 200; + private static GdbGraph graph; + private static Key.ExportType exportType; + + /** + * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 + *

+ * 整个 Reader 执行流程是: + *

+     * Job类init-->prepare-->split
+     *
+     *                            Task类init-->prepare-->startRead-->post-->destroy
+     *                            Task类init-->prepare-->startRead-->post-->destroy
+     *
+     *                                                                             Job类post-->destroy
+     * 
+ */ + public static class Job extends Reader.Job { + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + + private Configuration jobConfig = null; + + @Override + public void init() { + this.jobConfig = super.getPluginJobConf(); + + /** + * 注意:此方法仅执行一次。 + * 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... + * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 + */ + + ConfigHelper.assertGdbClient(jobConfig); + ConfigHelper.assertLabels(jobConfig); + try { + exportType = Key.ExportType.valueOf(jobConfig.getString(Key.EXPORT_TYPE)); + } catch (NullPointerException | IllegalArgumentException e) { + throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.EXPORT_TYPE); + } + } + + @Override + public void prepare() { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 + */ + + try { + graph = new ScriptGdbGraph(jobConfig, exportType); + } catch (RuntimeException e) { + throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_CLIENT_CONNECT, e.getMessage()); + } + } + + @Override + public List split(int adviceNumber) { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 + * 这里的 adviceNumber 是框架根据用户的同步速度的要求建议的切分份数,仅供参考,不是强制必须切分的份数。 + */ + List labels = ConfigHelper.assertLabels(jobConfig); + + /** + * 配置label列表为空时,尝试查询GDB中所有label,添加到读取列表 + */ + if (labels.isEmpty()) { + try { + labels.addAll(graph.getLabels().keySet()); + } catch (RuntimeException ex) { + throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, ex.getMessage()); + } + } + + if (labels.isEmpty()) { + throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, "none labels to read"); + } + + return ConfigHelper.splitConfig(jobConfig, labels); + } + + @Override + public void post() { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 + */ + } + + @Override + public void destroy() { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 + */ + try { + graph.close(); + } catch (Exception ex) { + LOG.error("Failed to close client : {}", ex); + } + } + + } + + public static class Task extends Reader.Task { + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + private static MappingRule rule; + private Configuration taskConfig; + private String fetchLabel = null; + + private int rangeSplitSize; + private int fetchBatchSize; + + @Override + public void init() { + this.taskConfig = super.getPluginJobConf(); + + /** + * 注意:此方法每个 Task 都会执行一次。 + * 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startRead()做准备。 + */ + fetchLabel = taskConfig.getString(Key.LABEL); + fetchBatchSize = taskConfig.getInt(Key.FETCH_BATCH_SIZE, DEFAULT_FETCH_BATCH_SIZE); + rangeSplitSize = taskConfig.getInt(Key.RANGE_SPLIT_SIZE, fetchBatchSize * 10); + rule = MappingRuleFactory.getInstance().create(taskConfig, exportType); + } + + @Override + public void prepare() { + /** + * 注意:此方法仅执行一次。 + * 最佳实践:如果 Job 中有需要进行数据同步之后的处理,可以在此处完成,如果没有必要则可以直接去掉。 + */ + } + + @Override + public void startRead(RecordSender recordSender) { + /** + * 注意:此方法每个 Task 都会执行一次。 + * 最佳实践:此处适当封装确保简洁清晰完成数据读取工作。 + */ + + String start = ""; + while (true) { + List ids; + try { + ids = graph.fetchIds(fetchLabel, start, rangeSplitSize); + if (ids.isEmpty()) { + break; + } + start = ids.get(ids.size() - 1); + } catch (Exception ex) { + throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_IDS, ex.getMessage()); + } + + // send range fetch async + int count = ids.size(); + List resultSets = new LinkedList<>(); + for (int pos = 0; pos < count; pos += fetchBatchSize) { + int rangeSize = Math.min(fetchBatchSize, count - pos); + String endId = ids.get(pos + rangeSize - 1); + String beginId = ids.get(pos); + + List propNames = rule.isHasProperty() ? rule.getPropertyNames() : null; + try { + resultSets.add(graph.fetchElementsAsync(fetchLabel, beginId, endId, propNames)); + } catch (Exception ex) { + // just print error logs and continues + LOG.error("failed to request label: {}, start: {}, end: {}, e: {}", fetchLabel, beginId, endId, ex); + } + } + + // get range fetch dsl results + resultSets.forEach(results -> { + try { + List elements = graph.getElement(results); + elements.forEach(element -> { + Record record = recordSender.createRecord(); + DefaultGdbMapper.getMapper(rule).accept(element, record); + recordSender.sendToWriter(record); + }); + recordSender.flush(); + } catch (Exception ex) { + LOG.error("failed to send records e {}", ex); + } + }); + } + } + + @Override + public void post() { + /** + * 注意:此方法每个 Task 都会执行一次。 + * 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 + */ + } + + @Override + public void destroy() { + /** + * 注意:此方法每个 Task 都会执行一次。 + * 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 + */ + } + + } + +} \ No newline at end of file diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReaderErrorCode.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReaderErrorCode.java new file mode 100644 index 00000000..1d320bbd --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReaderErrorCode.java @@ -0,0 +1,39 @@ +package com.alibaba.datax.plugin.reader.gdbreader; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum GdbReaderErrorCode implements ErrorCode { + /** + * + */ + BAD_CONFIG_VALUE("GdbReader-00", "The value you configured is invalid."), + FAIL_CLIENT_CONNECT("GdbReader-02", "GDB connection is abnormal."), + UNSUPPORTED_TYPE("GdbReader-03", "Unsupported data type conversion."), + FAIL_FETCH_LABELS("GdbReader-04", "Error pulling all labels, it is recommended to configure the specified label pull."), + FAIL_FETCH_IDS("GdbReader-05", "Pull range id error."), + ; + + private final String code; + private final String description; + + private GdbReaderErrorCode(String code, String description) { + this.code = code; + this.description = description; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.description; + } + + @Override + public String toString() { + return String.format("Code:[%s], Description:[%s]. ", this.code, + this.description); + } +} \ No newline at end of file diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/Key.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/Key.java new file mode 100644 index 00000000..31d5e631 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/Key.java @@ -0,0 +1,86 @@ +package com.alibaba.datax.plugin.reader.gdbreader; + +public final class Key { + + /** + * 此处声明插件用到的需要插件使用者提供的配置项 + */ + public final static String HOST = "host"; + public final static String PORT = "port"; + public final static String USERNAME = "username"; + public static final String PASSWORD = "password"; + + public static final String LABEL = "labels"; + public static final String EXPORT_TYPE = "labelType"; + + public static final String RANGE_SPLIT_SIZE = "RangeSplitSize"; + public static final String FETCH_BATCH_SIZE = "fetchBatchSize"; + + public static final String COLUMN = "column"; + public static final String COLUMN_NAME = "name"; + public static final String COLUMN_TYPE = "type"; + public static final String COLUMN_NODE_TYPE = "columnType"; + + public enum ExportType { + /** + * Import vertices + */ + VERTEX, + /** + * Import edges + */ + EDGE + } + + public enum ColumnType { + /** + * vertex or edge id + */ + primaryKey, + + /** + * vertex or edge label + */ + primaryLabel, + + /** + * vertex property + */ + vertexProperty, + + /** + * collects all vertex property to Json list + */ + vertexJsonProperty, + + /** + * start vertex id of edge + */ + srcPrimaryKey, + + /** + * start vertex label of edge + */ + srcPrimaryLabel, + + /** + * end vertex id of edge + */ + dstPrimaryKey, + + /** + * end vertex label of edge + */ + dstPrimaryLabel, + + /** + * edge property + */ + edgeProperty, + + /** + * collects all edge property to Json list + */ + edgeJsonProperty, + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/DefaultGdbMapper.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/DefaultGdbMapper.java new file mode 100644 index 00000000..d874cf36 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/DefaultGdbMapper.java @@ -0,0 +1,150 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.mapping; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement; +import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceProperty; +import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertexProperty; + +import java.util.List; +import java.util.Map; +import java.util.function.BiConsumer; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public class DefaultGdbMapper { + + public static BiConsumer getMapper(MappingRule rule) { + return (gdbElement, record) -> rule.getColumns().forEach(columnMappingRule -> { + Object value = null; + ValueType type = columnMappingRule.getValueType(); + String name = columnMappingRule.getName(); + Map props = gdbElement.getProperties(); + + switch (columnMappingRule.getColumnType()) { + case dstPrimaryKey: + value = gdbElement.getTo(); + break; + case srcPrimaryKey: + value = gdbElement.getFrom(); + break; + case primaryKey: + value = gdbElement.getId(); + break; + case primaryLabel: + value = gdbElement.getLabel(); + break; + case dstPrimaryLabel: + value = gdbElement.getToLabel(); + break; + case srcPrimaryLabel: + value = gdbElement.getFromLabel(); + break; + case vertexProperty: + value = forVertexOnePropertyValue().apply(props.get(name)); + break; + case edgeProperty: + value = forEdgePropertyValue().apply(props.get(name)); + break; + case edgeJsonProperty: + value = forEdgeJsonProperties().apply(props); + break; + case vertexJsonProperty: + value = forVertexJsonProperties().apply(props); + break; + default: + break; + } + record.addColumn(type.applyObject(value)); + }); + } + + + /** + * parser ReferenceProperty value for edge + * + * @return property value + */ + private static Function forEdgePropertyValue() { + return prop -> { + if (prop instanceof ReferenceProperty) { + return ((ReferenceProperty) prop).value(); + } + return null; + }; + } + + /** + * parser ReferenceVertexProperty value for vertex + * + * @return the first property value in list + */ + private static Function forVertexOnePropertyValue() { + return props -> { + if (props instanceof List) { + // get the first one property if more than one + Object o = ((List) props).get(0); + if (o instanceof ReferenceVertexProperty) { + return ((ReferenceVertexProperty) o).value(); + } + } + return null; + }; + } + + /** + * parser all edge properties to json string + * + * @return json string + */ + private static Function, String> forEdgeJsonProperties() { + return props -> "{\"properties\":[" + + props.entrySet().stream().filter(p -> p.getValue() instanceof ReferenceProperty) + .map(p -> "{\"k\":\"" + ((ReferenceProperty) p.getValue()).key() + "\"," + + "\"t\":\"" + ((ReferenceProperty) p.getValue()).value().getClass().getSimpleName().toLowerCase() + "\"," + + "\"v\":\"" + String.valueOf(((ReferenceProperty) p.getValue()).value()) + "\"}") + .collect(Collectors.joining(",")) + + "]}"; + } + + /** + * parser all vertex properties to json string, include set-property + * + * @return json string + */ + private static Function, String> forVertexJsonProperties() { + return props -> "{\"properties\":[" + + props.entrySet().stream().filter(p -> p.getValue() instanceof List) + .map(p -> forVertexPropertyStr().apply((List) p.getValue())) + .collect(Collectors.joining(",")) + + "]}"; + } + + /** + * parser one vertex property to json string item, set 'cardinality' + * + * @return json string item + */ + private static Function, String> forVertexPropertyStr() { + return vp -> { + final String setFlag = vp.size() > 1 ? "set" : "single"; + return vp.stream().filter(p -> p instanceof ReferenceVertexProperty) + .map(p -> "{\"k\":\"" + ((ReferenceVertexProperty) p).key() + "\"," + + "\"t\":\"" + ((ReferenceVertexProperty) p).value().getClass().getSimpleName().toLowerCase() + "\"," + + "\"v\":\"" + String.valueOf(((ReferenceVertexProperty) p).value()) + "\"," + + "\"c\":\"" + setFlag + "\"}") + .collect(Collectors.joining(",")); + }; + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRule.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRule.java new file mode 100644 index 00000000..7baed498 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRule.java @@ -0,0 +1,79 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.mapping; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; +import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType; +import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; +import lombok.Data; + +import java.util.ArrayList; +import java.util.List; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +@Data +public class MappingRule { + private boolean hasRelation = false; + private boolean hasProperty = false; + private ExportType type = ExportType.VERTEX; + + /** + * property names for property key-value + */ + private List propertyNames = new ArrayList<>(); + + private List columns = new ArrayList<>(); + + void addColumn(ColumnType columnType, ValueType type, String name) { + ColumnMappingRule rule = new ColumnMappingRule(); + rule.setColumnType(columnType); + rule.setName(name); + rule.setValueType(type); + + if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty) { + propertyNames.add(name); + hasProperty = true; + } + + boolean hasTo = columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel; + boolean hasFrom = columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel; + if (hasTo || hasFrom) { + hasRelation = true; + } + + columns.add(rule); + } + + void addJsonColumn(ColumnType columnType) { + ColumnMappingRule rule = new ColumnMappingRule(); + rule.setColumnType(columnType); + rule.setName("json"); + rule.setValueType(ValueType.STRING); + + if (!propertyNames.isEmpty()) { + throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "JsonProperties should be only property"); + } + + columns.add(rule); + hasProperty = true; + } + + @Data + protected static class ColumnMappingRule { + private String name = null; + + private ValueType valueType = null; + + private ColumnType columnType = null; + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRuleFactory.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRuleFactory.java new file mode 100644 index 00000000..c71a19ac --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRuleFactory.java @@ -0,0 +1,76 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.mapping; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; +import com.alibaba.datax.plugin.reader.gdbreader.Key; +import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType; +import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; +import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper; + +import java.util.List; + +/** + * @author : Liu Jianping + * @date : 2019/9/20 + */ + +public class MappingRuleFactory { + private static final MappingRuleFactory instance = new MappingRuleFactory(); + + public static MappingRuleFactory getInstance() { + return instance; + } + + public MappingRule create(Configuration config, ExportType exportType) { + MappingRule rule = new MappingRule(); + + rule.setType(exportType); + List configurationList = config.getListConfiguration(Key.COLUMN); + for (Configuration column : configurationList) { + ColumnType columnType; + try { + columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE)); + } catch (NullPointerException | IllegalArgumentException e) { + throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.COLUMN_NODE_TYPE); + } + + if (exportType == ExportType.VERTEX) { + // only id/label/property column allow when vertex + ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () -> + columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel + || columnType == ColumnType.vertexProperty || columnType == ColumnType.vertexJsonProperty); + } else if (exportType == ExportType.EDGE) { + // edge + ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () -> + columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel + || columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel + || columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel + || columnType == ColumnType.edgeProperty || columnType == ColumnType.edgeJsonProperty); + } + + if (columnType == ColumnType.edgeProperty || columnType == ColumnType.vertexProperty) { + String name = column.getString(Key.COLUMN_NAME); + ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); + + ConfigHelper.assertConfig(Key.COLUMN_NAME, () -> name != null); + if (propType == null) { + throw DataXException.asDataXException(GdbReaderErrorCode.UNSUPPORTED_TYPE, Key.COLUMN_TYPE); + } + rule.addColumn(columnType, propType, name); + } else if (columnType == ColumnType.vertexJsonProperty || columnType == ColumnType.edgeJsonProperty) { + rule.addJsonColumn(columnType); + } else { + rule.addColumn(columnType, ValueType.STRING, null); + } + } + return rule; + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/ValueType.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/ValueType.java new file mode 100644 index 00000000..826e0493 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/ValueType.java @@ -0,0 +1,128 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.mapping; + +import com.alibaba.datax.common.element.BoolColumn; +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.DoubleColumn; +import com.alibaba.datax.common.element.LongColumn; +import com.alibaba.datax.common.element.StringColumn; + +import java.util.HashMap; +import java.util.Map; +import java.util.function.Function; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public enum ValueType { + /** + * transfer gdb element object value to DataX Column data + *

+ * int, long -> LongColumn + * float, double -> DoubleColumn + * bool -> BooleanColumn + * string -> StringColumn + */ + INT(Integer.class, "int", ValueTypeHolder::longColumnMapper), + INTEGER(Integer.class, "integer", ValueTypeHolder::longColumnMapper), + LONG(Long.class, "long", ValueTypeHolder::longColumnMapper), + DOUBLE(Double.class, "double", ValueTypeHolder::doubleColumnMapper), + FLOAT(Float.class, "float", ValueTypeHolder::doubleColumnMapper), + BOOLEAN(Boolean.class, "boolean", ValueTypeHolder::boolColumnMapper), + STRING(String.class, "string", ValueTypeHolder::stringColumnMapper), + ; + + private Class type = null; + private String shortName = null; + private Function columnFunc = null; + + ValueType(Class type, String name, Function columnFunc) { + this.type = type; + this.shortName = name; + this.columnFunc = columnFunc; + + ValueTypeHolder.shortName2type.put(shortName, this); + } + + public static ValueType fromShortName(String name) { + return ValueTypeHolder.shortName2type.get(name); + } + + public Column applyObject(Object value) { + if (value == null) { + return null; + } + return columnFunc.apply(value); + } + + private static class ValueTypeHolder { + private static Map shortName2type = new HashMap<>(); + + private static LongColumn longColumnMapper(Object o) { + long v; + if (o instanceof Integer) { + v = (int) o; + } else if (o instanceof Long) { + v = (long) o; + } else if (o instanceof String) { + v = Long.valueOf((String) o); + } else { + throw new RuntimeException("Failed to cast " + o.getClass() + " to Long"); + } + + return new LongColumn(v); + } + + private static DoubleColumn doubleColumnMapper(Object o) { + double v; + if (o instanceof Integer) { + v = (double) (int) o; + } else if (o instanceof Long) { + v = (double) (long) o; + } else if (o instanceof Float) { + v = (double) (float) o; + } else if (o instanceof Double) { + v = (double) o; + } else if (o instanceof String) { + v = Double.valueOf((String) o); + } else { + throw new RuntimeException("Failed to cast " + o.getClass() + " to Double"); + } + + return new DoubleColumn(v); + } + + private static BoolColumn boolColumnMapper(Object o) { + boolean v; + if (o instanceof Integer) { + v = ((int) o != 0); + } else if (o instanceof Long) { + v = ((long) o != 0); + } else if (o instanceof Boolean) { + v = (boolean) o; + } else if (o instanceof String) { + v = Boolean.valueOf((String) o); + } else { + throw new RuntimeException("Failed to cast " + o.getClass() + " to Boolean"); + } + + return new BoolColumn(v); + } + + private static StringColumn stringColumnMapper(Object o) { + if (o instanceof String) { + return new StringColumn((String) o); + } else { + return new StringColumn(String.valueOf(o)); + } + } + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/AbstractGdbGraph.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/AbstractGdbGraph.java new file mode 100644 index 00000000..4eda2eed --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/AbstractGdbGraph.java @@ -0,0 +1,89 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.model; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.gdbreader.Key; +import org.apache.tinkerpop.gremlin.driver.Client; +import org.apache.tinkerpop.gremlin.driver.Cluster; +import org.apache.tinkerpop.gremlin.driver.RequestOptions; +import org.apache.tinkerpop.gremlin.driver.Result; +import org.apache.tinkerpop.gremlin.driver.ResultSet; +import org.apache.tinkerpop.gremlin.driver.ser.Serializers; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public abstract class AbstractGdbGraph implements GdbGraph { + final static int DEFAULT_TIMEOUT = 30000; + private static final Logger log = LoggerFactory.getLogger(AbstractGdbGraph.class); + private Client client; + + AbstractGdbGraph() { + } + + AbstractGdbGraph(Configuration config) { + log.info("init graphdb client"); + String host = config.getString(Key.HOST); + int port = config.getInt(Key.PORT); + String username = config.getString(Key.USERNAME); + String password = config.getString(Key.PASSWORD); + + try { + Cluster cluster = Cluster.build(host).port(port).credentials(username, password) + .serializer(Serializers.GRAPHBINARY_V1D0) + .maxContentLength(1024 * 1024) + .resultIterationBatchSize(64) + .create(); + client = cluster.connect().init(); + + warmClient(); + } catch (RuntimeException e) { + log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e); + throw e; + } + } + + protected List runInternal(String dsl, Map params) throws Exception { + return runInternalAsync(dsl, params).all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); + } + + protected ResultSet runInternalAsync(String dsl, Map params) throws Exception { + RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT); + if (params != null && !params.isEmpty()) { + params.forEach(options::addParameter); + } + return client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS); + } + + private void warmClient() { + try { + runInternal("g.V('test')", null); + log.info("warm graphdb client over"); + } catch (Exception e) { + log.error("warmClient error"); + throw new RuntimeException(e); + } + } + + @Override + public void close() throws Exception { + if (client != null) { + log.info("close graphdb client"); + client.close(); + } + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbElement.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbElement.java new file mode 100644 index 00000000..79619ad0 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbElement.java @@ -0,0 +1,39 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.model; + +import lombok.Data; + +import java.util.HashMap; +import java.util.Map; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +@Data +public class GdbElement { + String id = null; + String label = null; + String to = null; + String from = null; + String toLabel = null; + String fromLabel = null; + + Map properties = new HashMap<>(); + + public GdbElement() { + } + + public GdbElement(String id, String label) { + this.id = id; + this.label = label; + } + +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbGraph.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbGraph.java new file mode 100644 index 00000000..e6651293 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbGraph.java @@ -0,0 +1,65 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.model; + +import org.apache.tinkerpop.gremlin.driver.ResultSet; + +import java.util.List; +import java.util.Map; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public interface GdbGraph extends AutoCloseable { + + /** + * Get All labels of GraphDB + * + * @return labels map included numbers + */ + Map getLabels(); + + /** + * Get the Ids list of special 'label', size up to 'limit' + * + * @param label is Label of Vertex or Edge + * @param start of Ids range to get + * @param limit size of Ids list + * @return Ids list + */ + List fetchIds(String label, String start, long limit); + + /** + * Fetch element in async mode, just send query dsl to server + * + * @param label node label to filter + * @param start range begin(included) + * @param end range end(included) + * @param propNames propKey list to fetch + * @return future to get result later + */ + ResultSet fetchElementsAsync(String label, String start, String end, List propNames); + + /** + * Get get element from Response @{ResultSet} + * + * @param results Response of Server + * @return element sets + */ + List getElement(ResultSet results); + + /** + * close graph client + * + * @throws Exception if fails + */ + @Override + void close() throws Exception; +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/ScriptGdbGraph.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/ScriptGdbGraph.java new file mode 100644 index 00000000..8c08b819 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/ScriptGdbGraph.java @@ -0,0 +1,192 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.model; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; +import org.apache.tinkerpop.gremlin.driver.Result; +import org.apache.tinkerpop.gremlin.driver.ResultSet; +import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceEdge; +import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.concurrent.TimeUnit; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public class ScriptGdbGraph extends AbstractGdbGraph { + private static final Logger log = LoggerFactory.getLogger(ScriptGdbGraph.class); + + private final static String LABEL = "GDB___LABEL"; + private final static String START_ID = "GDB___ID"; + private final static String END_ID = "GDB___ID_END"; + private final static String LIMIT = "GDB___LIMIT"; + + private final static String FETCH_VERTEX_IDS_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()"; + private final static String FETCH_EDGE_IDS_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()"; + + private final static String FETCH_VERTEX_LABELS_DSL = "g.V().groupCount().by(label)"; + private final static String FETCH_EDGE_LABELS_DSL = "g.E().groupCount().by(label)"; + + /** + * fetch node range [START_ID, END_ID] + */ + private final static String FETCH_RANGE_VERTEX_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))"; + private final static String FETCH_RANGE_EDGE_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))"; + private final static String PART_WITH_PROP_DSL = ".as('a').project('node', 'props').by(select('a')).by(select('a').propertyMap("; + + private final ExportType exportType; + + public ScriptGdbGraph(ExportType exportType) { + super(); + this.exportType = exportType; + } + + public ScriptGdbGraph(Configuration config, ExportType exportType) { + super(config); + this.exportType = exportType; + } + + @Override + public List fetchIds(final String label, final String start, long limit) { + Map params = new HashMap(3) {{ + put(LABEL, label); + put(START_ID, start); + put(LIMIT, limit); + }}; + String fetchDsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_IDS_DSL : FETCH_EDGE_IDS_DSL; + + List ids = new ArrayList<>(); + try { + List results = runInternal(fetchDsl, params); + + // transfer result to id string + results.forEach(id -> ids.add(id.getString())); + } catch (Exception e) { + log.error("fetch range node failed, label {}, start {}", label, start); + throw new RuntimeException(e); + } + return ids; + } + + @Override + public ResultSet fetchElementsAsync(final String label, final String start, final String end, final List propNames) { + Map params = new HashMap<>(3); + params.put(LABEL, label); + params.put(START_ID, start); + params.put(END_ID, end); + + String prefixDsl = exportType == ExportType.VERTEX ? FETCH_RANGE_VERTEX_DSL : FETCH_RANGE_EDGE_DSL; + StringBuilder fetchDsl = new StringBuilder(prefixDsl); + if (propNames != null) { + fetchDsl.append(PART_WITH_PROP_DSL); + for (int i = 0; i < propNames.size(); i++) { + String propName = "GDB___PK" + String.valueOf(i); + params.put(propName, propNames.get(i)); + + fetchDsl.append(propName); + if (i != propNames.size() - 1) { + fetchDsl.append(", "); + } + } + fetchDsl.append("))"); + } + + try { + return runInternalAsync(fetchDsl.toString(), params); + } catch (Exception e) { + log.error("Failed to fetch range node startId {}, end {} , e {}", start, end, e); + throw new RuntimeException(e); + } + } + + @Override + @SuppressWarnings("unchecked") + public List getElement(ResultSet results) { + List elements = new LinkedList<>(); + try { + List resultList = results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); + + resultList.forEach(n -> { + Object o = n.getObject(); + GdbElement element = new GdbElement(); + if (o instanceof Map) { + // project response + Object node = ((Map) o).get("node"); + Object props = ((Map) o).get("props"); + + mapNodeToElement(node, element); + mapPropToElement((Map) props, element); + } else { + // range node response + mapNodeToElement(n.getObject(), element); + } + if (element.getId() != null) { + elements.add(element); + } + }); + } catch (Exception e) { + log.error("Failed to get node: {}", e); + throw new RuntimeException(e); + } + return elements; + } + + private void mapNodeToElement(Object node, GdbElement element) { + if (node instanceof ReferenceVertex) { + ReferenceVertex v = (ReferenceVertex) node; + + element.setId((String) v.id()); + element.setLabel(v.label()); + } else if (node instanceof ReferenceEdge) { + ReferenceEdge e = (ReferenceEdge) node; + + element.setId((String) e.id()); + element.setLabel(e.label()); + element.setTo((String) e.inVertex().id()); + element.setToLabel(e.inVertex().label()); + element.setFrom((String) e.outVertex().id()); + element.setFromLabel(e.outVertex().label()); + } + } + + private void mapPropToElement(Map props, GdbElement element) { + element.setProperties(props); + } + + @Override + public Map getLabels() { + String dsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_LABELS_DSL : FETCH_EDGE_LABELS_DSL; + + try { + List results = runInternal(dsl, null); + Map labelMap = new HashMap<>(2); + + Map labels = results.get(0).get(Map.class); + labels.forEach((k, v) -> { + String label = (String) k; + Long count = (Long) v; + labelMap.put(label, count); + }); + + return labelMap; + } catch (Exception e) { + log.error("Failed to fetch label list, please give special labels and run again, e {}", e); + throw new RuntimeException(e); + } + } +} diff --git a/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/util/ConfigHelper.java b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/util/ConfigHelper.java new file mode 100644 index 00000000..2ec9d153 --- /dev/null +++ b/gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/util/ConfigHelper.java @@ -0,0 +1,77 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.reader.gdbreader.util; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; +import com.alibaba.datax.plugin.reader.gdbreader.Key; +import org.apache.commons.lang3.StringUtils; + +import java.io.IOException; +import java.io.InputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.function.Supplier; + +/** + * @author : Liu Jianping + * @date : 2019/9/6 + */ + +public interface ConfigHelper { + static void assertConfig(String key, Supplier f) { + if (!f.get()) { + throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, key); + } + } + + static void assertHasContent(Configuration config, String key) { + assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key))); + } + + static void assertGdbClient(Configuration config) { + assertHasContent(config, Key.HOST); + assertConfig(Key.PORT, () -> config.getInt(Key.PORT) > 0); + + assertHasContent(config, Key.USERNAME); + assertHasContent(config, Key.PASSWORD); + } + + static List assertLabels(Configuration config) { + Object labels = config.get(Key.LABEL); + if (!(labels instanceof List)) { + throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "labels should be List"); + } + + List list = (List) labels; + List configLabels = new ArrayList<>(0); + list.forEach(n -> configLabels.add(String.valueOf(n))); + + return configLabels; + } + + static List splitConfig(Configuration config, List labels) { + List configs = new ArrayList<>(); + for (String label : labels) { + Configuration conf = config.clone(); + conf.set(Key.LABEL, label); + + configs.add(conf); + } + return configs; + } + + static Configuration fromClasspath(String name) { + try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { + return Configuration.from(is); + } catch (IOException e) { + throw new IllegalArgumentException("File not found: " + name); + } + } +} diff --git a/gdbreader/src/main/resources/plugin.json b/gdbreader/src/main/resources/plugin.json new file mode 100644 index 00000000..4fedfa24 --- /dev/null +++ b/gdbreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "gdbreader", + "class": "com.alibaba.datax.plugin.reader.gdbreader.GdbReader", + "description": "useScene: prod. mechanism: connect GDB with gremlin-client, execute 'g.V().propertyMap() or g.E().propertyMap()' to get record", + "developer": "alibaba" +} \ No newline at end of file diff --git a/gdbreader/src/main/resources/plugin_job_template.json b/gdbreader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..bfca7780 --- /dev/null +++ b/gdbreader/src/main/resources/plugin_job_template.json @@ -0,0 +1,77 @@ +{ + "job": { + "setting": { + "speed": { + "channel": 1 + }, + "errorLimit": { + "record": 1 + } + }, + + "content": [ + { + "reader": { + "name": "gdbreader", + "parameter": { + "host": "10.218.145.24", + "port": 8182, + "username": "***", + "password": "***", + "labelType": "EDGE", + "labels": ["label1", "label2"], + "column": [ + { + "name": "id", + "type": "string", + "columnType": "primaryKey" + }, + { + "name": "label", + "type": "string", + "columnType": "primaryLabel" + }, + { + "name": "srcId", + "type": "string", + "columnType": "srcPrimaryKey" + }, + { + "name": "srcLabel", + "type": "string", + "columnType": "srcPrimaryLabel" + }, + { + "name": "dstId", + "type": "string", + "columnType": "srcPrimaryKey" + }, + { + "name": "dstLabel", + "type": "string", + "columnType": "srcPrimaryLabel" + }, + { + "name": "name", + "type": "string", + "columnType": "edgeProperty" + }, + { + "name": "weight", + "type": "double", + "columnType": "edgeProperty" + } + ] + } + }, + + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ] + } +} \ No newline at end of file diff --git a/gdbwriter/doc/gdbwriter.md b/gdbwriter/doc/gdbwriter.md index 82cdd899..8c1e11e6 100644 --- a/gdbwriter/doc/gdbwriter.md +++ b/gdbwriter/doc/gdbwriter.md @@ -41,6 +41,14 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD { "random": "60,64", "type": "string" + }, + { + "random": "100,1000", + "type": "long" + }, + { + "random": "32,48", + "type": "string" } ], "sliceRecordCount": 1000 @@ -55,20 +63,32 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD "password": "***", "writeMode": "INSERT", "labelType": "VERTEX", - "label": "${1}", + "label": "#{1}", "idTransRule": "none", "session": true, "maxRecordsInBatch": 64, "column": [ { "name": "id", - "value": "${0}", + "value": "#{0}", "type": "string", "columnType": "primaryKey" }, { "name": "vertex_propKey", - "value": "${2}", + "value": "#{2}", + "type": "string", + "columnType": "vertexSetProperty" + }, + { + "name": "vertex_propKey", + "value": "#{3}", + "type": "long", + "columnType": "vertexSetProperty" + }, + { + "name": "vertex_propKey2", + "value": "#{4}", "type": "string", "columnType": "vertexProperty" } @@ -134,7 +154,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD "password": "***", "writeMode": "INSERT", "labelType": "EDGE", - "label": "${3}", + "label": "#{3}", "idTransRule": "none", "srcIdTransRule": "labelPrefix", "dstIdTransRule": "labelPrefix", @@ -144,25 +164,25 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD "column": [ { "name": "id", - "value": "${0}", + "value": "#{0}", "type": "string", "columnType": "primaryKey" }, { "name": "id", - "value": "${1}", + "value": "#{1}", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "id", - "value": "${2}", + "value": "#{2}", "type": "string", "columnType": "dstPrimaryKey" }, { "name": "edge_propKey", - "value": "${4}", + "value": "#{4}", "type": "string", "columnType": "edgeProperty" } @@ -199,7 +219,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD * 默认值:无 * **label** - * 描述:类型名,即点/边名称; label支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始; + * 描述:类型名,即点/边名称; label支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:是 * 默认值:无 @@ -211,12 +231,12 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD * 默认值:无 * **srcLabel** - * 描述:当label为边时,表示起点的点名称;srcLabel支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始; + * 描述:当label为边时,表示起点的点名称;srcLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:labelType为边,srcIdTransRule为none时可不填写,否则必填; * 默认值:无 * **dstLabel** - * 描述:当label为边时,表示终点的点名称;dstLabel支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始; + * 描述:当label为边时,表示终点的点名称;dstLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:labelType为边,dstIdTransRule为none时可不填写,否则必填; * 默认值:无 @@ -271,9 +291,9 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD * **column -> value** * 描述:点/边映射关系的字段值; - * ${N}表示直接映射源端值,N为源端column索引,从0开始;${0}表示映射源端column第1个字段; - * test-${0} 表示源端值做拼接转换,${0}值前/后可添加固定字符串; - * ${0}-${1}表示做多字段拼接,也可在任意位置添加固定字符串,如test-${0}-test1-${1}-test2 + * #{N}表示直接映射源端值,N为源端column索引,从0开始;#{0}表示映射源端column第1个字段; + * test-#{0} 表示源端值做拼接转换,#{0}值前/后可添加固定字符串; + * #{0}-#{1}表示做多字段拼接,也可在任意位置添加固定字符串,如test-#{0}-test1-#{1}-test2 * 必选:是 * 默认值:无 @@ -290,6 +310,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD * primaryKey:表示该字段是主键id * 点枚举值: * vertexProperty:labelType为点时,表示该字段是点的普通属性 + * vertexSetProperty:labelType为点时,表示该字段是点的SET属性,value是SET属性中的一个属性值 * vertexJsonProperty:labelType为点时,表示是点json属性,value结构请见备注**json properties示例**,点配置最多只允许出现一个json属性; * 边枚举值: * srcPrimaryKey:labelType为边时,表示该字段是起点主键id @@ -305,6 +326,14 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD > {"k":"age","t":"int","v":"20"}, > {"k":"sex","t":"string","v":"male"} > ]} + > + > # json格式同样支持给点添加SET属性,格式如下 + > {"properties":[ + > {"k":"name","t":"string","v":"tom","c":"set"}, + > {"k":"name","t":"string","v":"jack","c":"set"}, + > {"k":"age","t":"int","v":"20"}, + > {"k":"sex","t":"string","v":"male"} + > ]} > ``` ## 4 性能报告 @@ -367,4 +396,5 @@ DataX压测机器 - GDBWriter插件与用户查询DSL使用相同的GDB实例端口,导入时可能会影响查询性能 ## FAQ -无 +1. 使用SET属性需要升级GDB实例到`1.0.20`版本及以上。 +2. 边只支持普通单值属性,不能给边写SET属性数据。 diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriter.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriter.java index 753d89fc..6470e9e6 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriter.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriter.java @@ -1,10 +1,5 @@ package com.alibaba.datax.plugin.writer.gdbwriter; -import java.util.ArrayList; -import java.util.List; -import java.util.concurrent.*; -import java.util.function.Function; - import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; @@ -18,24 +13,33 @@ import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRuleFactory; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph; - import groovy.lang.Tuple2; import io.netty.util.concurrent.DefaultThreadFactory; import lombok.extern.slf4j.Slf4j; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -public class GdbWriter extends Writer { - private static final Logger log = LoggerFactory.getLogger(GdbWriter.class); +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; +import java.util.concurrent.LinkedBlockingDeque; +import java.util.concurrent.ThreadPoolExecutor; +import java.util.concurrent.TimeUnit; +import java.util.function.Function; - private static Function mapper = null; - private static GdbGraph globalGraph = null; - private static boolean session = false; +public class GdbWriter extends Writer { + private static final Logger log = LoggerFactory.getLogger(GdbWriter.class); + + private static Function mapper = null; + private static GdbGraph globalGraph = null; + private static boolean session = false; /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Writer 执行流程是: + * *

      * Job类init-->prepare-->split
      *
@@ -46,17 +50,16 @@ public class GdbWriter extends Writer {
      * 
*/ public static class Job extends Writer.Job { - private static final Logger LOG = LoggerFactory - .getLogger(Job.class); + private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration jobConfig = null; - + @Override public void init() { - LOG.info("GDB datax plugin writer job init begin ..."); - this.jobConfig = getPluginJobConf(); - GdbWriterConfig.of(this.jobConfig); - LOG.info("GDB datax plugin writer job init end."); + LOG.info("GDB datax plugin writer job init begin ..."); + this.jobConfig = getPluginJobConf(); + GdbWriterConfig.of(this.jobConfig); + LOG.info("GDB datax plugin writer job init end."); /** * 注意:此方法仅执行一次。 @@ -71,37 +74,37 @@ public class GdbWriter extends Writer { * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ - super.prepare(); + super.prepare(); - MappingRule rule = MappingRuleFactory.getInstance().createV2(jobConfig); + final MappingRule rule = MappingRuleFactory.getInstance().createV2(this.jobConfig); - mapper = new DefaultGdbMapper().getMapper(rule); - session = jobConfig.getBool(Key.SESSION_STATE, false); + mapper = new DefaultGdbMapper(this.jobConfig).getMapper(rule); + session = this.jobConfig.getBool(Key.SESSION_STATE, false); /** * client connect check before task */ try { - globalGraph = GdbGraphManager.instance().getGraph(jobConfig, false); - } catch (RuntimeException e) { + globalGraph = GdbGraphManager.instance().getGraph(this.jobConfig, false); + } catch (final RuntimeException e) { throw DataXException.asDataXException(GdbWriterErrorCode.FAIL_CLIENT_CONNECT, e.getMessage()); } } @Override - public List split(int mandatoryNumber) { + public List split(final int mandatoryNumber) { /** * 注意:此方法仅执行一次。 * 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 * 这里的 mandatoryNumber 是强制必须切分的份数。 */ - LOG.info("split begin..."); - List configurationList = new ArrayList(); - for (int i = 0; i < mandatoryNumber; i++) { - configurationList.add(this.jobConfig.clone()); - } - LOG.info("split end..."); - return configurationList; + LOG.info("split begin..."); + final List configurationList = new ArrayList(); + for (int i = 0; i < mandatoryNumber; i++) { + configurationList.add(this.jobConfig.clone()); + } + LOG.info("split end..."); + return configurationList; } @Override @@ -127,7 +130,7 @@ public class GdbWriter extends Writer { public static class Task extends Writer.Task { private Configuration taskConfig; - + private int failed = 0; private int batchRecords; private ExecutorService submitService = null; @@ -139,24 +142,24 @@ public class GdbWriter extends Writer { * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startWrite()做准备。 */ - this.taskConfig = super.getPluginJobConf(); - batchRecords = taskConfig.getInt(Key.MAX_RECORDS_IN_BATCH, GdbWriterConfig.DEFAULT_RECORD_NUM_IN_BATCH); - submitService = new ThreadPoolExecutor(1, 1, 0L, - TimeUnit.MILLISECONDS, new LinkedBlockingDeque<>(), new DefaultThreadFactory("submit-dsl")); + this.taskConfig = super.getPluginJobConf(); + this.batchRecords = this.taskConfig.getInt(Key.MAX_RECORDS_IN_BATCH, GdbWriterConfig.DEFAULT_RECORD_NUM_IN_BATCH); + this.submitService = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingDeque<>(), + new DefaultThreadFactory("submit-dsl")); - if (!session) { - graph = globalGraph; - } else { - /** - * 分批创建session client,由于服务端groovy编译性能的限制 - */ - try { - Thread.sleep((getTaskId()/10)*10000); - } catch (Exception e) { - // ... - } - graph = GdbGraphManager.instance().getGraph(taskConfig, session); - } + if (!session) { + this.graph = globalGraph; + } else { + /** + * 分批创建session client,由于服务端groovy编译性能的限制 + */ + try { + Thread.sleep((getTaskId() / 10) * 10000); + } catch (final Exception e) { + // ... + } + this.graph = GdbGraphManager.instance().getGraph(this.taskConfig, session); + } } @Override @@ -165,64 +168,69 @@ public class GdbWriter extends Writer { * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:如果 Task 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ - super.prepare(); + super.prepare(); } @Override - public void startWrite(RecordReceiver recordReceiver) { + public void startWrite(final RecordReceiver recordReceiver) { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。 */ - Record r; - Future future = null; - List> records = new ArrayList<>(batchRecords); + Record r; + Future future = null; + List> records = new ArrayList<>(this.batchRecords); - while ((r = recordReceiver.getFromReader()) != null) { - records.add(new Tuple2<>(r, mapper.apply(r))); + while ((r = recordReceiver.getFromReader()) != null) { + try { + records.add(new Tuple2<>(r, mapper.apply(r))); + } catch (final Exception ex) { + getTaskPluginCollector().collectDirtyRecord(r, ex); + continue; + } - if (records.size() >= batchRecords) { - wait4Submit(future); + if (records.size() >= this.batchRecords) { + wait4Submit(future); - final List> batch = records; - future = submitService.submit(() -> batchCommitRecords(batch)); - records = new ArrayList<>(batchRecords); - } - } + final List> batch = records; + future = this.submitService.submit(() -> batchCommitRecords(batch)); + records = new ArrayList<>(this.batchRecords); + } + } - wait4Submit(future); - if (!records.isEmpty()) { - final List> batch = records; - future = submitService.submit(() -> batchCommitRecords(batch)); - wait4Submit(future); - } + wait4Submit(future); + if (!records.isEmpty()) { + final List> batch = records; + future = this.submitService.submit(() -> batchCommitRecords(batch)); + wait4Submit(future); + } } - private void wait4Submit(Future future) { - if (future == null) { - return; - } + private void wait4Submit(final Future future) { + if (future == null) { + return; + } - try { - future.get(); - } catch (Exception e) { - e.printStackTrace(); - } + try { + future.get(); + } catch (final Exception e) { + e.printStackTrace(); + } } private boolean batchCommitRecords(final List> records) { - TaskPluginCollector collector = getTaskPluginCollector(); - try { - List> errors = graph.add(records); - errors.forEach(t -> collector.collectDirtyRecord(t.getFirst(), t.getSecond())); - failed += errors.size(); - } catch (Exception e) { - records.forEach(t -> collector.collectDirtyRecord(t.getFirst(), e)); - failed += records.size(); - } + final TaskPluginCollector collector = getTaskPluginCollector(); + try { + final List> errors = this.graph.add(records); + errors.forEach(t -> collector.collectDirtyRecord(t.getFirst(), t.getSecond())); + this.failed += errors.size(); + } catch (final Exception e) { + records.forEach(t -> collector.collectDirtyRecord(t.getFirst(), e)); + this.failed += records.size(); + } - records.clear(); - return true; + records.clear(); + return true; } @Override @@ -231,7 +239,7 @@ public class GdbWriter extends Writer { * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ - log.info("Task done, dirty record count - {}", failed); + log.info("Task done, dirty record count - {}", this.failed); } @Override @@ -241,9 +249,9 @@ public class GdbWriter extends Writer { * 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 */ if (session) { - graph.close(); + this.graph.close(); } - submitService.shutdown(); + this.submitService.shutdown(); } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriterErrorCode.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriterErrorCode.java index a6f506ef..e1c9080b 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriterErrorCode.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriterErrorCode.java @@ -27,7 +27,6 @@ public enum GdbWriterErrorCode implements ErrorCode { @Override public String toString() { - return String.format("Code:[%s], Description:[%s]. ", this.code, - this.description); + return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } \ No newline at end of file diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/Key.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/Key.java index f2e37005..afa58239 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/Key.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/Key.java @@ -6,136 +6,164 @@ public final class Key { * 此处声明插件用到的需要插件使用者提供的配置项 */ - public final static String HOST = "host"; - public final static String PORT = "port"; + public final static String HOST = "host"; + public final static String PORT = "port"; public final static String USERNAME = "username"; - public static final String PASSWORD = "password"; + public static final String PASSWORD = "password"; - /** - * import type and mode - */ - public static final String IMPORT_TYPE = "labelType"; - public static final String UPDATE_MODE = "writeMode"; + /** + * import type and mode + */ + public static final String IMPORT_TYPE = "labelType"; + public static final String UPDATE_MODE = "writeMode"; - /** - * label prefix issue - */ - public static final String ID_TRANS_RULE = "idTransRule"; - public static final String SRC_ID_TRANS_RULE = "srcIdTransRule"; - public static final String DST_ID_TRANS_RULE = "dstIdTransRule"; + /** + * label prefix issue + */ + public static final String ID_TRANS_RULE = "idTransRule"; + public static final String SRC_ID_TRANS_RULE = "srcIdTransRule"; + public static final String DST_ID_TRANS_RULE = "dstIdTransRule"; - public static final String LABEL = "label"; - public static final String SRC_LABEL = "srcLabel"; - public static final String DST_LABEL = "dstLabel"; + public static final String LABEL = "label"; + public static final String SRC_LABEL = "srcLabel"; + public static final String DST_LABEL = "dstLabel"; - public static final String MAPPING = "mapping"; + public static final String MAPPING = "mapping"; - /** - * column define in Gdb - */ - public static final String COLUMN = "column"; - public static final String COLUMN_NAME = "name"; - public static final String COLUMN_VALUE = "value"; - public static final String COLUMN_TYPE = "type"; - public static final String COLUMN_NODE_TYPE = "columnType"; + /** + * column define in Gdb + */ + public static final String COLUMN = "column"; + public static final String COLUMN_NAME = "name"; + public static final String COLUMN_VALUE = "value"; + public static final String COLUMN_TYPE = "type"; + public static final String COLUMN_NODE_TYPE = "columnType"; - /** - * Gdb Vertex/Edge elements - */ - public static final String ID = "id"; - public static final String FROM = "from"; - public static final String TO = "to"; - public static final String PROPERTIES = "properties"; - public static final String PROP_KEY = "name"; - public static final String PROP_VALUE = "value"; - public static final String PROP_TYPE = "type"; + /** + * Gdb Vertex/Edge elements + */ + public static final String ID = "id"; + public static final String FROM = "from"; + public static final String TO = "to"; + public static final String PROPERTIES = "properties"; + public static final String PROP_KEY = "name"; + public static final String PROP_VALUE = "value"; + public static final String PROP_TYPE = "type"; - public static final String PROPERTIES_JSON_STR = "propertiesJsonStr"; - public static final String MAX_PROPERTIES_BATCH_NUM = "maxPropertiesBatchNumber"; + public static final String PROPERTIES_JSON_STR = "propertiesJsonStr"; + public static final String MAX_PROPERTIES_BATCH_NUM = "maxPropertiesBatchNumber"; - /** - * session less client configure for connect pool - */ - public static final String MAX_IN_PROCESS_PER_CONNECTION = "maxInProcessPerConnection"; - public static final String MAX_CONNECTION_POOL_SIZE = "maxConnectionPoolSize"; - public static final String MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = "maxSimultaneousUsagePerConnection"; + /** + * session less client configure for connect pool + */ + public static final String MAX_IN_PROCESS_PER_CONNECTION = "maxInProcessPerConnection"; + public static final String MAX_CONNECTION_POOL_SIZE = "maxConnectionPoolSize"; + public static final String MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = "maxSimultaneousUsagePerConnection"; - public static final String MAX_RECORDS_IN_BATCH = "maxRecordsInBatch"; - public static final String SESSION_STATE = "session"; + public static final String MAX_RECORDS_IN_BATCH = "maxRecordsInBatch"; + public static final String SESSION_STATE = "session"; - public static enum ImportType { - /** - * Import vertices - */ - VERTEX, - /** - * Import edges - */ - EDGE; - } - - public static enum UpdateMode { - /** - * Insert new records, fail if exists - */ - INSERT, - /** - * Skip this record if exists - */ - SKIP, - /** - * Update property of this record if exists - */ - MERGE; - } + /** + * request length limit, include gdb element string length GDB字段长度限制配置,可分别配置各字段的限制,超过限制的记录会当脏数据处理 + */ + public static final String MAX_GDB_STRING_LENGTH = "maxStringLengthLimit"; + public static final String MAX_GDB_ID_LENGTH = "maxIdStringLengthLimit"; + public static final String MAX_GDB_LABEL_LENGTH = "maxLabelStringLengthLimit"; + public static final String MAX_GDB_PROP_KEY_LENGTH = "maxPropKeyStringLengthLimit"; + public static final String MAX_GDB_PROP_VALUE_LENGTH = "maxPropValueStringLengthLimit"; - public static enum ColumnType { - /** - * vertex or edge id - */ - primaryKey, + public static final String MAX_GDB_REQUEST_LENGTH = "maxRequestLengthLimit"; - /** - * vertex property - */ - vertexProperty, + public static enum ImportType { + /** + * Import vertices + */ + VERTEX, + /** + * Import edges + */ + EDGE; + } - /** - * start vertex id of edge - */ - srcPrimaryKey, + public static enum UpdateMode { + /** + * Insert new records, fail if exists + */ + INSERT, + /** + * Skip this record if exists + */ + SKIP, + /** + * Update property of this record if exists + */ + MERGE; + } - /** - * end vertex id of edge - */ - dstPrimaryKey, + public static enum ColumnType { + /** + * vertex or edge id + */ + primaryKey, - /** - * edge property - */ - edgeProperty, + /** + * vertex property + */ + vertexProperty, - /** - * vertex json style property - */ - vertexJsonProperty, + /** + * vertex setProperty + */ + vertexSetProperty, - /** - * edge json style property - */ - edgeJsonProperty - } + /** + * start vertex id of edge + */ + srcPrimaryKey, - public static enum IdTransRule { - /** - * vertex or edge id with 'label' prefix - */ - labelPrefix, + /** + * end vertex id of edge + */ + dstPrimaryKey, - /** - * vertex or edge id raw - */ - none - } + /** + * edge property + */ + edgeProperty, + + /** + * vertex json style property + */ + vertexJsonProperty, + + /** + * edge json style property + */ + edgeJsonProperty + } + + public static enum IdTransRule { + /** + * vertex or edge id with 'label' prefix + */ + labelPrefix, + + /** + * vertex or edge id raw + */ + none + } + + public static enum PropertyType { + /** + * single Vertex Property + */ + single, + + /** + * set Vertex Property + */ + set + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbGraphManager.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbGraphManager.java index ac06013c..53668127 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbGraphManager.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbGraphManager.java @@ -3,37 +3,37 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.client; +import java.util.ArrayList; +import java.util.List; + import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph; import com.alibaba.datax.plugin.writer.gdbwriter.model.ScriptGdbGraph; -import java.util.ArrayList; -import java.util.List; - /** * @author jerrywang * */ public class GdbGraphManager implements AutoCloseable { - private static final GdbGraphManager instance = new GdbGraphManager(); - - private List graphs = new ArrayList<>(); - - public static GdbGraphManager instance() { - return instance; - } + private static final GdbGraphManager INSTANCE = new GdbGraphManager(); - public GdbGraph getGraph(Configuration config, boolean session) { - GdbGraph graph = new ScriptGdbGraph(config, session); - graphs.add(graph); - return graph; - } + private List graphs = new ArrayList<>(); - @Override - public void close() { - for(GdbGraph graph : graphs) { - graph.close(); - } - graphs.clear(); - } + public static GdbGraphManager instance() { + return INSTANCE; + } + + public GdbGraph getGraph(final Configuration config, final boolean session) { + final GdbGraph graph = new ScriptGdbGraph(config, session); + this.graphs.add(graph); + return graph; + } + + @Override + public void close() { + for (final GdbGraph graph : this.graphs) { + graph.close(); + } + this.graphs.clear(); + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbWriterConfig.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbWriterConfig.java index 0266a010..dbc68b90 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbWriterConfig.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbWriterConfig.java @@ -3,39 +3,43 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.client; +import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertConfig; +import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertHasContent; + import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; -import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.*; - /** * @author jerrywang * */ public class GdbWriterConfig { - public static final int DEFAULT_MAX_IN_PROCESS_PER_CONNECTION = 4; - public static final int DEFAULT_MAX_CONNECTION_POOL_SIZE = 8; - public static final int DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = 8; - public static final int DEFAULT_BATCH_PROPERTY_NUM = 30; - public static final int DEFAULT_RECORD_NUM_IN_BATCH = 16; + public static final int DEFAULT_MAX_IN_PROCESS_PER_CONNECTION = 4; + public static final int DEFAULT_MAX_CONNECTION_POOL_SIZE = 8; + public static final int DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = 8; + public static final int DEFAULT_BATCH_PROPERTY_NUM = 30; + public static final int DEFAULT_RECORD_NUM_IN_BATCH = 16; - private Configuration config; + public static final int MAX_STRING_LENGTH = 10240; + public static final int MAX_REQUEST_LENGTH = 65535 - 1000; - private GdbWriterConfig(Configuration config) { - this.config = config; + private Configuration config; - validate(); - } + private GdbWriterConfig(final Configuration config) { + this.config = config; - private void validate() { - assertHasContent(config, Key.HOST); - assertConfig(Key.PORT, () -> config.getInt(Key.PORT) > 0); + validate(); + } - assertHasContent(config, Key.USERNAME); - assertHasContent(config, Key.PASSWORD); - } - - public static GdbWriterConfig of(Configuration config) { - return new GdbWriterConfig(config); - } + public static GdbWriterConfig of(final Configuration config) { + return new GdbWriterConfig(config); + } + + private void validate() { + assertHasContent(this.config, Key.HOST); + assertConfig(Key.PORT, () -> this.config.getInt(Key.PORT) > 0); + + assertHasContent(this.config, Key.USERNAME); + assertHasContent(this.config, Key.PASSWORD); + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java index f5957295..73a94cf5 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java @@ -3,6 +3,8 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; +import static com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType.VERTEX; + import java.util.ArrayList; import java.util.List; import java.util.UUID; @@ -12,179 +14,191 @@ import java.util.regex.Matcher; import java.util.regex.Pattern; import com.alibaba.datax.common.element.Record; -import com.alibaba.fastjson.JSONArray; -import com.alibaba.fastjson.JSONObject; +import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex; +import com.alibaba.fastjson.JSONArray; +import com.alibaba.fastjson.JSONObject; import lombok.extern.slf4j.Slf4j; -import static com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType.VERTEX; - /** * @author jerrywang * */ @Slf4j public class DefaultGdbMapper implements GdbMapper { - private static final Pattern STR_PATTERN = Pattern.compile("\\$\\{(\\d+)}"); - private static final Pattern NORMAL_PATTERN = Pattern.compile("^\\$\\{(\\d+)}$"); + private static final Pattern STR_DOLLAR_PATTERN = Pattern.compile("\\$\\{(\\d+)}"); + private static final Pattern NORMAL_DOLLAR_PATTERN = Pattern.compile("^\\$\\{(\\d+)}$"); - @Override - public Function getMapper(MappingRule rule) { - return r -> { - GdbElement e = (rule.getImportType() == VERTEX) ? new GdbVertex() : new GdbEdge(); - forElement(rule).accept(r, e); - return e; + private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}"); + private static final Pattern NORMAL_NUM_PATTERN = Pattern.compile("^#\\{(\\d+)}$"); + + public DefaultGdbMapper() {} + + public DefaultGdbMapper(final Configuration config) { + MapperConfig.getInstance().updateConfig(config); + } + + private static BiConsumer forElement(final MappingRule rule) { + final boolean numPattern = rule.isNumPattern(); + final List> properties = new ArrayList<>(); + for (final MappingRule.PropertyMappingRule propRule : rule.getProperties()) { + final Function keyFunc = forStrColumn(numPattern, propRule.getKey()); + + if (propRule.getValueType() == ValueType.STRING) { + final Function valueFunc = forStrColumn(numPattern, propRule.getValue()); + properties.add((r, e) -> { + e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType()); + }); + } else { + final Function valueFunc = + forObjColumn(numPattern, propRule.getValue(), propRule.getValueType()); + properties.add((r, e) -> { + e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType()); + }); + } + } + + if (rule.getPropertiesJsonStr() != null) { + final Function jsonFunc = forStrColumn(numPattern, rule.getPropertiesJsonStr()); + properties.add((r, e) -> { + final String propertiesStr = jsonFunc.apply(r); + final JSONObject root = (JSONObject)JSONObject.parse(propertiesStr); + final JSONArray propertiesList = root.getJSONArray("properties"); + + for (final Object object : propertiesList) { + final JSONObject jsonObject = (JSONObject)object; + final String key = jsonObject.getString("k"); + final String name = jsonObject.getString("v"); + final String type = jsonObject.getString("t"); + final String card = jsonObject.getString("c"); + + if (key == null || name == null) { + continue; + } + addToProperties(e, key, name, type, card); + } + }); + } + + final BiConsumer ret = (r, e) -> { + final String label = forStrColumn(numPattern, rule.getLabel()).apply(r); + String id = forStrColumn(numPattern, rule.getId()).apply(r); + + if (rule.getImportType() == Key.ImportType.EDGE) { + final String to = forStrColumn(numPattern, rule.getTo()).apply(r); + final String from = forStrColumn(numPattern, rule.getFrom()).apply(r); + if (to == null || from == null) { + log.error("invalid record to: {} , from: {}", to, from); + throw new IllegalArgumentException("to or from missed in edge"); + } + ((GdbEdge)e).setTo(to); + ((GdbEdge)e).setFrom(from); + + // generate UUID for edge + if (id == null) { + id = UUID.randomUUID().toString(); + } + } + + if (id == null || label == null) { + log.error("invalid record id: {} , label: {}", id, label); + throw new IllegalArgumentException("id or label missed"); + } + + e.setId(id); + e.setLabel(label); + + properties.forEach(p -> p.accept(r, e)); }; - } + return ret; + } - private static BiConsumer forElement(MappingRule rule) { - List> properties = new ArrayList<>(); - for (MappingRule.PropertyMappingRule propRule : rule.getProperties()) { - Function keyFunc = forStrColumn(propRule.getKey()); + private static Function forObjColumn(final boolean numPattern, final String rule, final ValueType type) { + final Pattern pattern = numPattern ? NORMAL_NUM_PATTERN : NORMAL_DOLLAR_PATTERN; + final Matcher m = pattern.matcher(rule); + if (m.matches()) { + final int index = Integer.valueOf(m.group(1)); + return r -> type.applyColumn(r.getColumn(index)); + } else { + return r -> type.fromStrFunc(rule); + } + } - if (propRule.getValueType() == ValueType.STRING) { - final Function valueFunc = forStrColumn(propRule.getValue()); - properties.add((r, e) -> { - String k = keyFunc.apply(r); - String v = valueFunc.apply(r); - if (k != null && v != null) { - e.getProperties().put(k, v); - } - }); - } else { - final Function valueFunc = forObjColumn(propRule.getValue(), propRule.getValueType()); - properties.add((r, e) -> { - String k = keyFunc.apply(r); - Object v = valueFunc.apply(r); - if (k != null && v != null) { - e.getProperties().put(k, v); - } - }); - } - } + private static Function forStrColumn(final boolean numPattern, final String rule) { + final List> list = new ArrayList<>(); + final Pattern pattern = numPattern ? STR_NUM_PATTERN : STR_DOLLAR_PATTERN; + final Matcher m = pattern.matcher(rule); + int last = 0; + while (m.find()) { + final String index = m.group(1); + // as simple integer index. + final int i = Integer.parseInt(index); - if (rule.getPropertiesJsonStr() != null) { - Function jsonFunc = forStrColumn(rule.getPropertiesJsonStr()); - properties.add((r, e) -> { - String propertiesStr = jsonFunc.apply(r); - JSONObject root = (JSONObject)JSONObject.parse(propertiesStr); - JSONArray propertiesList = root.getJSONArray("properties"); + final int tmp = last; + final int start = m.start(); + list.add((sb, record) -> { + sb.append(rule.subSequence(tmp, start)); + if (record.getColumn(i) != null && record.getColumn(i).getByteSize() > 0) { + sb.append(record.getColumn(i).asString()); + } + }); - for (Object object : propertiesList) { - JSONObject jsonObject = (JSONObject)object; - String key = jsonObject.getString("k"); - String name = jsonObject.getString("v"); - String type = jsonObject.getString("t"); + last = m.end(); + } - if (key == null || name == null) { - continue; - } - addToProperties(e, key, name, type); - } - }); - } + final int tmp = last; + list.add((sb, record) -> { + sb.append(rule.subSequence(tmp, rule.length())); + }); - BiConsumer ret = (r, e) -> { - String label = forStrColumn(rule.getLabel()).apply(r); - String id = forStrColumn(rule.getId()).apply(r); + return r -> { + final StringBuilder sb = new StringBuilder(); + list.forEach(c -> c.accept(sb, r)); + final String res = sb.toString(); + return res.isEmpty() ? null : res; + }; + } - if (rule.getImportType() == Key.ImportType.EDGE) { - String to = forStrColumn(rule.getTo()).apply(r); - String from = forStrColumn(rule.getFrom()).apply(r); - if (to == null || from == null) { - log.error("invalid record to: {} , from: {}", to, from); - throw new IllegalArgumentException("to or from missed in edge"); - } - ((GdbEdge)e).setTo(to); - ((GdbEdge)e).setFrom(from); + private static boolean addToProperties(final GdbElement e, final String key, final String value, final String type, final String card) { + final Object pValue; + final ValueType valueType = ValueType.fromShortName(type); - // generate UUID for edge - if (id == null) { - id = UUID.randomUUID().toString(); - } - } + if (valueType == ValueType.STRING) { + pValue = value; + } else if (valueType == ValueType.INT || valueType == ValueType.INTEGER) { + pValue = Integer.valueOf(value); + } else if (valueType == ValueType.LONG) { + pValue = Long.valueOf(value); + } else if (valueType == ValueType.DOUBLE) { + pValue = Double.valueOf(value); + } else if (valueType == ValueType.FLOAT) { + pValue = Float.valueOf(value); + } else if (valueType == ValueType.BOOLEAN) { + pValue = Boolean.valueOf(value); + } else { + log.error("invalid property key {}, value {}, type {}", key, value, type); + return false; + } - if (id == null || label == null) { - log.error("invalid record id: {} , label: {}", id, label); - throw new IllegalArgumentException("id or label missed"); - } + // apply vertexSetProperty + if (Key.PropertyType.set.name().equals(card) && (e instanceof GdbVertex)) { + e.addProperty(key, pValue, Key.PropertyType.set); + } else { + e.addProperty(key, pValue); + } + return true; + } - e.setId(id); - e.setLabel(label); - - properties.forEach(p -> p.accept(r, e)); - }; - return ret; - } - - static Function forObjColumn(String rule, ValueType type) { - Matcher m = NORMAL_PATTERN.matcher(rule); - if (m.matches()) { - int index = Integer.valueOf(m.group(1)); - return r -> type.applyColumn(r.getColumn(index)); - } else { - return r -> type.fromStrFunc(rule); - } - } - - static Function forStrColumn(String rule) { - List> list = new ArrayList<>(); - Matcher m = STR_PATTERN.matcher(rule); - int last = 0; - while (m.find()) { - String index = m.group(1); - // as simple integer index. - int i = Integer.parseInt(index); - - final int tmp = last; - final int start = m.start(); - list.add((sb, record) -> { - sb.append(rule.subSequence(tmp, start)); - if(record.getColumn(i) != null && record.getColumn(i).getByteSize() > 0) { - sb.append(record.getColumn(i).asString()); - } - }); - - last = m.end(); - } - - final int tmp = last; - list.add((sb, record) -> { - sb.append(rule.subSequence(tmp, rule.length())); - }); - - return r -> { - StringBuilder sb = new StringBuilder(); - list.forEach(c -> c.accept(sb, r)); - String res = sb.toString(); - return res.isEmpty() ? null : res; - }; - } - - static boolean addToProperties(GdbElement e, String key, String value, String type) { - ValueType valueType = ValueType.fromShortName(type); - - if(valueType == ValueType.STRING) { - e.getProperties().put(key, value); - } else if (valueType == ValueType.INT) { - e.getProperties().put(key, Integer.valueOf(value)); - } else if (valueType == ValueType.LONG) { - e.getProperties().put(key, Long.valueOf(value)); - } else if (valueType == ValueType.DOUBLE) { - e.getProperties().put(key, Double.valueOf(value)); - } else if (valueType == ValueType.FLOAT) { - e.getProperties().put(key, Float.valueOf(value)); - } else if (valueType == ValueType.BOOLEAN) { - e.getProperties().put(key, Boolean.valueOf(value)); - } else { - log.error("invalid property key {}, value {}, type {}", key, value, type); - return false; - } - - return true; - } + @Override + public Function getMapper(final MappingRule rule) { + return r -> { + final GdbElement e = (rule.getImportType() == VERTEX) ? new GdbVertex() : new GdbEdge(); + forElement(rule).accept(r, e); + return e; + }; + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/GdbMapper.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/GdbMapper.java index 3282f203..6a717a95 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/GdbMapper.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/GdbMapper.java @@ -13,5 +13,5 @@ import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; * */ public interface GdbMapper { - Function getMapper(MappingRule rule); + Function getMapper(MappingRule rule); } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MapperConfig.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MapperConfig.java new file mode 100644 index 00000000..241cd31a --- /dev/null +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MapperConfig.java @@ -0,0 +1,68 @@ +/* + * (C) 2019-present Alibaba Group Holding Limited. + * + * This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public + * License version 2 as published by the Free Software Foundation. + */ +package com.alibaba.datax.plugin.writer.gdbwriter.mapping; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.gdbwriter.Key; +import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; + +/** + * @author : Liu Jianping + * @date : 2019/10/15 + */ + +public class MapperConfig { + private static MapperConfig instance = new MapperConfig(); + private int maxIdLength; + private int maxLabelLength; + private int maxPropKeyLength; + private int maxPropValueLength; + + private MapperConfig() { + this.maxIdLength = GdbWriterConfig.MAX_STRING_LENGTH; + this.maxLabelLength = GdbWriterConfig.MAX_STRING_LENGTH; + this.maxPropKeyLength = GdbWriterConfig.MAX_STRING_LENGTH; + this.maxPropValueLength = GdbWriterConfig.MAX_STRING_LENGTH; + } + + public static MapperConfig getInstance() { + return instance; + } + + public void updateConfig(final Configuration config) { + final int length = config.getInt(Key.MAX_GDB_STRING_LENGTH, GdbWriterConfig.MAX_STRING_LENGTH); + + Integer sLength = config.getInt(Key.MAX_GDB_ID_LENGTH); + this.maxIdLength = sLength == null ? length : sLength; + + sLength = config.getInt(Key.MAX_GDB_LABEL_LENGTH); + this.maxLabelLength = sLength == null ? length : sLength; + + sLength = config.getInt(Key.MAX_GDB_PROP_KEY_LENGTH); + this.maxPropKeyLength = sLength == null ? length : sLength; + + sLength = config.getInt(Key.MAX_GDB_PROP_VALUE_LENGTH); + this.maxPropValueLength = sLength == null ? length : sLength; + } + + public int getMaxIdLength() { + return this.maxIdLength; + } + + public int getMaxLabelLength() { + return this.maxLabelLength; + } + + public int getMaxPropKeyLength() { + return this.maxPropKeyLength; + } + + public int getMaxPropValueLength() { + return this.maxPropValueLength; + } + +} diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRule.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRule.java index c0c58d88..971fd6da 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRule.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRule.java @@ -7,6 +7,7 @@ import java.util.ArrayList; import java.util.List; import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType; +import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType; import lombok.Data; @@ -16,26 +17,30 @@ import lombok.Data; */ @Data public class MappingRule { - private String id = null; + private String id = null; - private String label = null; - - private ImportType importType = null; - - private String from = null; + private String label = null; - private String to = null; + private ImportType importType = null; - private List properties = new ArrayList<>(); + private String from = null; - private String propertiesJsonStr = null; + private String to = null; - @Data - public static class PropertyMappingRule { - private String key = null; - - private String value = null; - - private ValueType valueType = null; - } + private List properties = new ArrayList<>(); + + private String propertiesJsonStr = null; + + private boolean numPattern = false; + + @Data + public static class PropertyMappingRule { + private String key = null; + + private String value = null; + + private ValueType valueType = null; + + private PropertyType pType = PropertyType.single; + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRuleFactory.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRuleFactory.java index 0738ac17..3e3a2afe 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRuleFactory.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRuleFactory.java @@ -3,18 +3,21 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode; import com.alibaba.datax.plugin.writer.gdbwriter.Key; -import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType; -import com.alibaba.datax.plugin.writer.gdbwriter.Key.IdTransRule; import com.alibaba.datax.plugin.writer.gdbwriter.Key.ColumnType; +import com.alibaba.datax.plugin.writer.gdbwriter.Key.IdTransRule; +import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule.PropertyMappingRule; import com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper; -import lombok.extern.slf4j.Slf4j; -import java.util.List; +import lombok.extern.slf4j.Slf4j; /** * @author jerrywang @@ -22,66 +25,94 @@ import java.util.List; */ @Slf4j public class MappingRuleFactory { - private static final MappingRuleFactory instance = new MappingRuleFactory(); - - public static final MappingRuleFactory getInstance() { - return instance; - } + private static final MappingRuleFactory instance = new MappingRuleFactory(); + private static final Pattern STR_PATTERN = Pattern.compile("\\$\\{(\\d+)}"); + private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}"); - @Deprecated - public MappingRule create(Configuration config, ImportType type) { - MappingRule rule = new MappingRule(); - rule.setId(config.getString(Key.ID)); - rule.setLabel(config.getString(Key.LABEL)); - if (type == ImportType.EDGE) { - rule.setFrom(config.getString(Key.FROM)); - rule.setTo(config.getString(Key.TO)); - } - - rule.setImportType(type); - - List configurations = config.getListConfiguration(Key.PROPERTIES); - if (configurations != null) { - for (Configuration prop : config.getListConfiguration(Key.PROPERTIES)) { - PropertyMappingRule propRule = new PropertyMappingRule(); - propRule.setKey(prop.getString(Key.PROP_KEY)); - propRule.setValue(prop.getString(Key.PROP_VALUE)); - propRule.setValueType(ValueType.fromShortName(prop.getString(Key.PROP_TYPE).toLowerCase())); - rule.getProperties().add(propRule); - } - } - - String propertiesJsonStr = config.getString(Key.PROPERTIES_JSON_STR, null); - if (propertiesJsonStr != null) { - rule.setPropertiesJsonStr(propertiesJsonStr); - } - - return rule; - } - - public MappingRule createV2(Configuration config) { - try { - ImportType type = ImportType.valueOf(config.getString(Key.IMPORT_TYPE)); - return createV2(config, type); - } catch (NullPointerException e) { - throw DataXException.asDataXException(GdbWriterErrorCode.CONFIG_ITEM_MISS, Key.IMPORT_TYPE); - } catch (IllegalArgumentException e) { - throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, Key.IMPORT_TYPE); - } + public static MappingRuleFactory getInstance() { + return instance; } - public MappingRule createV2(Configuration config, ImportType type) { - MappingRule rule = new MappingRule(); + private static boolean isPattern(final String value, final MappingRule rule, final boolean checked) { + if (checked) { + return true; + } - ConfigHelper.assertHasContent(config, Key.LABEL); - rule.setLabel(config.getString(Key.LABEL)); - rule.setImportType(type); + if (value == null || value.isEmpty()) { + return false; + } - IdTransRule srcTransRule = IdTransRule.none; + Matcher m = STR_PATTERN.matcher(value); + if (m.find()) { + rule.setNumPattern(false); + return true; + } + + m = STR_NUM_PATTERN.matcher(value); + if (m.find()) { + rule.setNumPattern(true); + return true; + } + + return false; + } + + @Deprecated + public MappingRule create(final Configuration config, final ImportType type) { + final MappingRule rule = new MappingRule(); + rule.setId(config.getString(Key.ID)); + rule.setLabel(config.getString(Key.LABEL)); + if (type == ImportType.EDGE) { + rule.setFrom(config.getString(Key.FROM)); + rule.setTo(config.getString(Key.TO)); + } + + rule.setImportType(type); + + final List configurations = config.getListConfiguration(Key.PROPERTIES); + if (configurations != null) { + for (final Configuration prop : config.getListConfiguration(Key.PROPERTIES)) { + final PropertyMappingRule propRule = new PropertyMappingRule(); + propRule.setKey(prop.getString(Key.PROP_KEY)); + propRule.setValue(prop.getString(Key.PROP_VALUE)); + propRule.setValueType(ValueType.fromShortName(prop.getString(Key.PROP_TYPE).toLowerCase())); + rule.getProperties().add(propRule); + } + } + + final String propertiesJsonStr = config.getString(Key.PROPERTIES_JSON_STR, null); + if (propertiesJsonStr != null) { + rule.setPropertiesJsonStr(propertiesJsonStr); + } + + return rule; + } + + public MappingRule createV2(final Configuration config) { + try { + final ImportType type = ImportType.valueOf(config.getString(Key.IMPORT_TYPE)); + return createV2(config, type); + } catch (final NullPointerException e) { + throw DataXException.asDataXException(GdbWriterErrorCode.CONFIG_ITEM_MISS, Key.IMPORT_TYPE); + } catch (final IllegalArgumentException e) { + throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, Key.IMPORT_TYPE); + } + } + + public MappingRule createV2(final Configuration config, final ImportType type) { + final MappingRule rule = new MappingRule(); + boolean patternChecked = false; + + ConfigHelper.assertHasContent(config, Key.LABEL); + rule.setLabel(config.getString(Key.LABEL)); + rule.setImportType(type); + patternChecked = isPattern(rule.getLabel(), rule, patternChecked); + + IdTransRule srcTransRule = IdTransRule.none; IdTransRule dstTransRule = IdTransRule.none; if (type == ImportType.EDGE) { - ConfigHelper.assertHasContent(config, Key.SRC_ID_TRANS_RULE); - ConfigHelper.assertHasContent(config, Key.DST_ID_TRANS_RULE); + ConfigHelper.assertHasContent(config, Key.SRC_ID_TRANS_RULE); + ConfigHelper.assertHasContent(config, Key.DST_ID_TRANS_RULE); srcTransRule = IdTransRule.valueOf(config.getString(Key.SRC_ID_TRANS_RULE)); dstTransRule = IdTransRule.valueOf(config.getString(Key.DST_ID_TRANS_RULE)); @@ -94,88 +125,96 @@ public class MappingRuleFactory { ConfigHelper.assertHasContent(config, Key.DST_LABEL); } } - ConfigHelper.assertHasContent(config, Key.ID_TRANS_RULE); - IdTransRule transRule = IdTransRule.valueOf(config.getString(Key.ID_TRANS_RULE)); + ConfigHelper.assertHasContent(config, Key.ID_TRANS_RULE); + final IdTransRule transRule = IdTransRule.valueOf(config.getString(Key.ID_TRANS_RULE)); - List configurationList = config.getListConfiguration(Key.COLUMN); - ConfigHelper.assertConfig(Key.COLUMN, () -> (configurationList != null && !configurationList.isEmpty())); - for (Configuration column : configurationList) { - ConfigHelper.assertHasContent(column, Key.COLUMN_NAME); - ConfigHelper.assertHasContent(column, Key.COLUMN_VALUE); - ConfigHelper.assertHasContent(column, Key.COLUMN_TYPE); - ConfigHelper.assertHasContent(column, Key.COLUMN_NODE_TYPE); + final List configurationList = config.getListConfiguration(Key.COLUMN); + ConfigHelper.assertConfig(Key.COLUMN, () -> (configurationList != null && !configurationList.isEmpty())); + for (final Configuration column : configurationList) { + ConfigHelper.assertHasContent(column, Key.COLUMN_NAME); + ConfigHelper.assertHasContent(column, Key.COLUMN_VALUE); + ConfigHelper.assertHasContent(column, Key.COLUMN_TYPE); + ConfigHelper.assertHasContent(column, Key.COLUMN_NODE_TYPE); - String columnValue = column.getString(Key.COLUMN_VALUE); - ColumnType columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE)); - if (columnValue == null || columnValue.isEmpty()) { - // only allow edge empty id - ConfigHelper.assertConfig("empty column value", - () -> (type == ImportType.EDGE && columnType == ColumnType.primaryKey)); - } + final String columnValue = column.getString(Key.COLUMN_VALUE); + final ColumnType columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE)); + if (columnValue == null || columnValue.isEmpty()) { + // only allow edge empty id + ConfigHelper.assertConfig("empty column value", + () -> (type == ImportType.EDGE && columnType == ColumnType.primaryKey)); + } + patternChecked = isPattern(columnValue, rule, patternChecked); - if (columnType == ColumnType.primaryKey) { - ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); - ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); + if (columnType == ColumnType.primaryKey) { + final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); + ConfigHelper.assertConfig("only string is allowed in primary key", + () -> (propType == ValueType.STRING)); - if (transRule == IdTransRule.labelPrefix) { - rule.setId(config.getString(Key.LABEL) + columnValue); - } else { - rule.setId(columnValue); - } - } else if (columnType == ColumnType.edgeJsonProperty || columnType == ColumnType.vertexJsonProperty) { - // only support one json property in column - ConfigHelper.assertConfig("multi JsonProperty", () -> (rule.getPropertiesJsonStr() == null)); - - rule.setPropertiesJsonStr(columnValue); - } else if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty) { - PropertyMappingRule propertyMappingRule = new PropertyMappingRule(); - - propertyMappingRule.setKey(column.getString(Key.COLUMN_NAME)); - propertyMappingRule.setValue(columnValue); - ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); - ConfigHelper.assertConfig("unsupported property type", () -> propType != null); - - propertyMappingRule.setValueType(propType); - rule.getProperties().add(propertyMappingRule); - } else if (columnType == ColumnType.srcPrimaryKey) { - if (type != ImportType.EDGE) { - continue; - } - - ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); - ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); - - if (srcTransRule == IdTransRule.labelPrefix) { - rule.setFrom(config.getString(Key.SRC_LABEL) + columnValue); + if (transRule == IdTransRule.labelPrefix) { + rule.setId(config.getString(Key.LABEL) + columnValue); } else { - rule.setFrom(columnValue); + rule.setId(columnValue); } - } else if (columnType == ColumnType.dstPrimaryKey) { + } else if (columnType == ColumnType.edgeJsonProperty || columnType == ColumnType.vertexJsonProperty) { + // only support one json property in column + ConfigHelper.assertConfig("multi JsonProperty", () -> (rule.getPropertiesJsonStr() == null)); + + rule.setPropertiesJsonStr(columnValue); + } else if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty + || columnType == ColumnType.vertexSetProperty) { + final PropertyMappingRule propertyMappingRule = new PropertyMappingRule(); + + propertyMappingRule.setKey(column.getString(Key.COLUMN_NAME)); + propertyMappingRule.setValue(columnValue); + final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); + ConfigHelper.assertConfig("unsupported property type", () -> propType != null); + + if (columnType == ColumnType.vertexSetProperty) { + propertyMappingRule.setPType(Key.PropertyType.set); + } + propertyMappingRule.setValueType(propType); + rule.getProperties().add(propertyMappingRule); + } else if (columnType == ColumnType.srcPrimaryKey) { if (type != ImportType.EDGE) { continue; } - ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); - ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); + final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); + ConfigHelper.assertConfig("only string is allowed in primary key", + () -> (propType == ValueType.STRING)); + + if (srcTransRule == IdTransRule.labelPrefix) { + rule.setFrom(config.getString(Key.SRC_LABEL) + columnValue); + } else { + rule.setFrom(columnValue); + } + } else if (columnType == ColumnType.dstPrimaryKey) { + if (type != ImportType.EDGE) { + continue; + } + + final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); + ConfigHelper.assertConfig("only string is allowed in primary key", + () -> (propType == ValueType.STRING)); if (dstTransRule == IdTransRule.labelPrefix) { rule.setTo(config.getString(Key.DST_LABEL) + columnValue); } else { rule.setTo(columnValue); } - } - } + } + } - if (rule.getImportType() == ImportType.EDGE) { - if (rule.getId() == null) { - rule.setId(""); - log.info("edge id is missed, uuid be default"); - } - ConfigHelper.assertConfig("to needed in edge", () -> (rule.getTo() != null)); - ConfigHelper.assertConfig("from needed in edge", () -> (rule.getFrom() != null)); - } - ConfigHelper.assertConfig("id needed", () -> (rule.getId() != null)); + if (rule.getImportType() == ImportType.EDGE) { + if (rule.getId() == null) { + rule.setId(""); + log.info("edge id is missed, uuid be default"); + } + ConfigHelper.assertConfig("to needed in edge", () -> (rule.getTo() != null)); + ConfigHelper.assertConfig("from needed in edge", () -> (rule.getFrom() != null)); + } + ConfigHelper.assertConfig("id needed", () -> (rule.getId() != null)); - return rule; - } + return rule; + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/ValueType.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/ValueType.java index 9ad8bd8d..969fda3b 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/ValueType.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/ValueType.java @@ -8,6 +8,7 @@ import java.util.Map; import java.util.function.Function; import com.alibaba.datax.common.element.Column; + import lombok.extern.slf4j.Slf4j; /** @@ -16,56 +17,61 @@ import lombok.extern.slf4j.Slf4j; */ @Slf4j public enum ValueType { - INT(Integer.class, "int", Column::asLong, Integer::valueOf), - LONG(Long.class, "long", Column::asLong, Long::valueOf), - DOUBLE(Double.class, "double", Column::asDouble, Double::valueOf), - FLOAT(Float.class, "float", Column::asDouble, Float::valueOf), - BOOLEAN(Boolean.class, "boolean", Column::asBoolean, Boolean::valueOf), - STRING(String.class, "string", Column::asString, String::valueOf); + /** + * property value type + */ + INT(Integer.class, "int", Column::asLong, Integer::valueOf), + INTEGER(Integer.class, "integer", Column::asLong, Integer::valueOf), + LONG(Long.class, "long", Column::asLong, Long::valueOf), + DOUBLE(Double.class, "double", Column::asDouble, Double::valueOf), + FLOAT(Float.class, "float", Column::asDouble, Float::valueOf), + BOOLEAN(Boolean.class, "boolean", Column::asBoolean, Boolean::valueOf), + STRING(String.class, "string", Column::asString, String::valueOf); - private Class type = null; - private String shortName = null; - private Function columnFunc = null; - private Function fromStrFunc = null; + private Class type = null; + private String shortName = null; + private Function columnFunc = null; + private Function fromStrFunc = null; - private ValueType(Class type, String name, Function columnFunc, Function fromStrFunc) { - this.type = type; - this.shortName = name; - this.columnFunc = columnFunc; - this.fromStrFunc = fromStrFunc; - - ValueTypeHolder.shortName2type.put(name, this); - } - - public static ValueType fromShortName(String name) { - return ValueTypeHolder.shortName2type.get(name); - } + private ValueType(final Class type, final String name, final Function columnFunc, + final Function fromStrFunc) { + this.type = type; + this.shortName = name; + this.columnFunc = columnFunc; + this.fromStrFunc = fromStrFunc; - public Class type() { - return this.type; - } - - public String shortName() { - return this.shortName; - } - - public Object applyColumn(Column column) { - try { - if (column == null) { - return null; - } - return columnFunc.apply(column); - } catch (Exception e) { - log.error("applyColumn error {}, column {}", e.toString(), column); - throw e; - } - } - - public Object fromStrFunc(String str) { - return fromStrFunc.apply(str); - } + ValueTypeHolder.shortName2type.put(name, this); + } - private static class ValueTypeHolder { - private static Map shortName2type = new HashMap<>(); - } + public static ValueType fromShortName(final String name) { + return ValueTypeHolder.shortName2type.get(name); + } + + public Class type() { + return this.type; + } + + public String shortName() { + return this.shortName; + } + + public Object applyColumn(final Column column) { + try { + if (column == null) { + return null; + } + return this.columnFunc.apply(column); + } catch (final Exception e) { + log.error("applyColumn error {}, column {}", e.toString(), column); + throw e; + } + } + + public Object fromStrFunc(final String str) { + return this.fromStrFunc.apply(str); + } + + private static class ValueTypeHolder { + private static Map shortName2type = new HashMap<>(); + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/AbstractGdbGraph.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/AbstractGdbGraph.java index 0c31c644..038663ac 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/AbstractGdbGraph.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/AbstractGdbGraph.java @@ -3,20 +3,24 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.model; -import com.alibaba.datax.common.util.Configuration; -import com.alibaba.datax.plugin.writer.gdbwriter.Key; -import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; +import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM; +import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.MAX_REQUEST_LENGTH; + +import java.util.Map; +import java.util.UUID; +import java.util.concurrent.TimeUnit; -import lombok.extern.slf4j.Slf4j; import org.apache.tinkerpop.gremlin.driver.Client; import org.apache.tinkerpop.gremlin.driver.Cluster; import org.apache.tinkerpop.gremlin.driver.RequestOptions; import org.apache.tinkerpop.gremlin.driver.ResultSet; import org.apache.tinkerpop.gremlin.driver.ser.Serializers; -import java.util.Map; -import java.util.UUID; -import java.util.concurrent.TimeUnit; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.gdbwriter.Key; +import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; + +import lombok.extern.slf4j.Slf4j; /** * @author jerrywang @@ -24,128 +28,124 @@ import java.util.concurrent.TimeUnit; */ @Slf4j public abstract class AbstractGdbGraph implements GdbGraph { - private final static int DEFAULT_TIMEOUT = 30000; + private final static int DEFAULT_TIMEOUT = 30000; - protected Client client = null; - protected Key.UpdateMode updateMode = Key.UpdateMode.INSERT; - protected int propertiesBatchNum = GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM; - protected boolean session = false; + protected Client client = null; + protected Key.UpdateMode updateMode = Key.UpdateMode.INSERT; + protected int propertiesBatchNum = DEFAULT_BATCH_PROPERTY_NUM; + protected boolean session = false; + protected int maxRequestLength = GdbWriterConfig.MAX_REQUEST_LENGTH; + protected AbstractGdbGraph() {} - protected AbstractGdbGraph() {} + protected AbstractGdbGraph(final Configuration config, final boolean session) { + initClient(config, session); + } - protected AbstractGdbGraph(Configuration config, boolean session) { - initClient(config, session); - } + protected void initClient(final Configuration config, final boolean session) { + this.updateMode = Key.UpdateMode.valueOf(config.getString(Key.UPDATE_MODE, "INSERT")); + log.info("init graphdb client"); + final String host = config.getString(Key.HOST); + final int port = config.getInt(Key.PORT); + final String username = config.getString(Key.USERNAME); + final String password = config.getString(Key.PASSWORD); + int maxDepthPerConnection = + config.getInt(Key.MAX_IN_PROCESS_PER_CONNECTION, GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION); - protected void initClient(Configuration config, boolean session) { - updateMode = Key.UpdateMode.valueOf(config.getString(Key.UPDATE_MODE, "INSERT")); - log.info("init graphdb client"); - String host = config.getString(Key.HOST); - int port = config.getInt(Key.PORT); - String username = config.getString(Key.USERNAME); - String password = config.getString(Key.PASSWORD); - int maxDepthPerConnection = config.getInt(Key.MAX_IN_PROCESS_PER_CONNECTION, - GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION); + int maxConnectionPoolSize = + config.getInt(Key.MAX_CONNECTION_POOL_SIZE, GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE); - int maxConnectionPoolSize = config.getInt(Key.MAX_CONNECTION_POOL_SIZE, - GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE); + int maxSimultaneousUsagePerConnection = config.getInt(Key.MAX_SIMULTANEOUS_USAGE_PER_CONNECTION, + GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION); - int maxSimultaneousUsagePerConnection = config.getInt(Key.MAX_SIMULTANEOUS_USAGE_PER_CONNECTION, - GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION); + this.session = session; + if (this.session) { + maxConnectionPoolSize = GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE; + maxDepthPerConnection = GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION; + maxSimultaneousUsagePerConnection = GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION; + } - this.session = session; - if (this.session) { - maxConnectionPoolSize = GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE; - maxDepthPerConnection = GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION; - maxSimultaneousUsagePerConnection = GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION; - } + try { + final Cluster cluster = Cluster.build(host).port(port).credentials(username, password) + .serializer(Serializers.GRAPHBINARY_V1D0).maxContentLength(1048576) + .maxInProcessPerConnection(maxDepthPerConnection).minInProcessPerConnection(0) + .maxConnectionPoolSize(maxConnectionPoolSize).minConnectionPoolSize(maxConnectionPoolSize) + .maxSimultaneousUsagePerConnection(maxSimultaneousUsagePerConnection).resultIterationBatchSize(64) + .create(); + this.client = session ? cluster.connect(UUID.randomUUID().toString()).init() : cluster.connect().init(); + warmClient(maxConnectionPoolSize * maxDepthPerConnection); + } catch (final RuntimeException e) { + log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e); + throw e; + } - try { - Cluster cluster = Cluster.build(host).port(port).credentials(username, password) - .serializer(Serializers.GRAPHBINARY_V1D0) - .maxContentLength(1048576) - .maxInProcessPerConnection(maxDepthPerConnection) - .minInProcessPerConnection(0) - .maxConnectionPoolSize(maxConnectionPoolSize) - .minConnectionPoolSize(maxConnectionPoolSize) - .maxSimultaneousUsagePerConnection(maxSimultaneousUsagePerConnection) - .resultIterationBatchSize(64) - .create(); - client = session ? cluster.connect(UUID.randomUUID().toString()).init() : cluster.connect().init(); - warmClient(maxConnectionPoolSize*maxDepthPerConnection); - } catch (RuntimeException e) { - log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e); - throw e; - } + this.propertiesBatchNum = config.getInt(Key.MAX_PROPERTIES_BATCH_NUM, DEFAULT_BATCH_PROPERTY_NUM); + this.maxRequestLength = config.getInt(Key.MAX_GDB_REQUEST_LENGTH, MAX_REQUEST_LENGTH); + } - propertiesBatchNum = config.getInt(Key.MAX_PROPERTIES_BATCH_NUM, GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM); - } + /** + * @param dsl + * @param parameters + */ + protected void runInternal(final String dsl, final Map parameters) throws Exception { + final RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT); + if (parameters != null && !parameters.isEmpty()) { + parameters.forEach(options::addParameter); + } + final ResultSet results = this.client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS); + results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); + } - /** - * @param dsl - * @param parameters - */ - protected void runInternal(String dsl, final Map parameters) throws Exception { - RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT); - if (parameters != null && !parameters.isEmpty()) { - parameters.forEach(options::addParameter); - } + void beginTx() { + if (!this.session) { + return; + } - ResultSet results = client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS); - results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); - } + final String dsl = "g.tx().open()"; + this.client.submit(dsl).all().join(); + } - void beginTx() { - if (!session) { - return; - } + void doCommit() { + if (!this.session) { + return; + } - String dsl = "g.tx().open()"; - client.submit(dsl).all().join(); - } + try { + final String dsl = "g.tx().commit()"; + this.client.submit(dsl).all().join(); + } catch (final Exception e) { + throw new RuntimeException(e); + } + } - void doCommit() { - if (!session) { - return; - } + void doRollback() { + if (!this.session) { + return; + } - try { - String dsl = "g.tx().commit()"; - client.submit(dsl).all().join(); - } catch (Exception e) { - throw new RuntimeException(e); - } - } + final String dsl = "g.tx().rollback()"; + this.client.submit(dsl).all().join(); + } - void doRollback() { - if (!session) { - return; - } + private void warmClient(final int num) { + try { + beginTx(); + runInternal("g.V('test')", null); + doCommit(); + log.info("warm graphdb client over"); + } catch (final Exception e) { + log.error("warmClient error"); + doRollback(); + throw new RuntimeException(e); + } + } - String dsl = "g.tx().rollback()"; - client.submit(dsl).all().join(); - } - - private void warmClient(int num) { - try { - beginTx(); - runInternal("g.V('test')", null); - doCommit(); - log.info("warm graphdb client over"); - } catch (Exception e) { - log.error("warmClient error"); - doRollback(); - throw new RuntimeException(e); - } - } - - @Override - public void close() { - if (client != null) { - log.info("close graphdb client"); - client.close(); - } - } + @Override + public void close() { + if (this.client != null) { + log.info("close graphdb client"); + this.client.close(); + } + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbEdge.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbEdge.java index d42c9182..0bd42057 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbEdge.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbEdge.java @@ -3,7 +3,8 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.model; -import lombok.Data; +import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig; + import lombok.EqualsAndHashCode; import lombok.ToString; @@ -11,10 +12,33 @@ import lombok.ToString; * @author jerrywang * */ -@Data @EqualsAndHashCode(callSuper = true) @ToString(callSuper = true) public class GdbEdge extends GdbElement { - private String from = null; - private String to = null; + private String from = null; + private String to = null; + + public String getFrom() { + return this.from; + } + + public void setFrom(final String from) { + final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); + if (from.length() > maxIdLength) { + throw new IllegalArgumentException("from length over limit(" + maxIdLength + ")"); + } + this.from = from; + } + + public String getTo() { + return this.to; + } + + public void setTo(final String to) { + final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); + if (to.length() > maxIdLength) { + throw new IllegalArgumentException("to length over limit(" + maxIdLength + ")"); + } + this.to = to; + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbElement.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbElement.java index af3c7090..3d513a6a 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbElement.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbElement.java @@ -3,18 +3,107 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.model; -import java.util.HashMap; -import java.util.Map; +import java.util.LinkedList; +import java.util.List; -import lombok.Data; +import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType; +import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig; /** * @author jerrywang * */ -@Data public class GdbElement { - String id = null; - String label = null; - Map properties = new HashMap<>(); + private String id = null; + private String label = null; + private List properties = new LinkedList<>(); + + public String getId() { + return this.id; + } + + public void setId(final String id) { + final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); + if (id.length() > maxIdLength) { + throw new IllegalArgumentException("id length over limit(" + maxIdLength + ")"); + } + this.id = id; + } + + public String getLabel() { + return this.label; + } + + public void setLabel(final String label) { + final int maxLabelLength = MapperConfig.getInstance().getMaxLabelLength(); + if (label.length() > maxLabelLength) { + throw new IllegalArgumentException("label length over limit(" + maxLabelLength + ")"); + } + this.label = label; + } + + public List getProperties() { + return this.properties; + } + + public void addProperty(final String propKey, final Object propValue, final PropertyType card) { + if (propKey == null || propValue == null) { + return; + } + + final int maxPropKeyLength = MapperConfig.getInstance().getMaxPropKeyLength(); + if (propKey.length() > maxPropKeyLength) { + throw new IllegalArgumentException("property key length over limit(" + maxPropKeyLength + ")"); + } + if (propValue instanceof String) { + final int maxPropValueLength = MapperConfig.getInstance().getMaxPropValueLength(); + if (((String)propValue).length() > maxPropKeyLength) { + throw new IllegalArgumentException("property value length over limit(" + maxPropValueLength + ")"); + } + } + + this.properties.add(new GdbProperty(propKey, propValue, card)); + } + + public void addProperty(final String propKey, final Object propValue) { + addProperty(propKey, propValue, PropertyType.single); + } + + @Override + public String toString() { + final StringBuffer sb = new StringBuffer(this.id + "[" + this.label + "]{"); + this.properties.forEach(n -> { + sb.append(n.cardinality.name()); + sb.append("["); + sb.append(n.key); + sb.append(" - "); + sb.append(String.valueOf(n.value)); + sb.append("]"); + }); + return sb.toString(); + } + + public static class GdbProperty { + private String key; + private Object value; + private PropertyType cardinality; + + private GdbProperty(final String key, final Object value, final PropertyType card) { + this.key = key; + this.value = value; + this.cardinality = card; + } + + public PropertyType getCardinality() { + return this.cardinality; + } + + public String getKey() { + return this.key; + } + + public Object getValue() { + return this.value; + } + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbGraph.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbGraph.java index 5b98c502..5d9b4508 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbGraph.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbGraph.java @@ -3,18 +3,19 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.model; -import com.alibaba.datax.common.element.Record; -import groovy.lang.Tuple2; - import java.util.List; +import com.alibaba.datax.common.element.Record; + +import groovy.lang.Tuple2; + /** * @author jerrywang * */ public interface GdbGraph extends AutoCloseable { - List> add(List> records); + List> add(List> records); - @Override - void close(); + @Override + void close(); } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/ScriptGdbGraph.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/ScriptGdbGraph.java index 7f898431..9ecee8ab 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/ScriptGdbGraph.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/ScriptGdbGraph.java @@ -3,15 +3,17 @@ */ package com.alibaba.datax.plugin.writer.gdbwriter.model; -import java.util.*; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; - import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.util.GdbDuplicateIdException; -import com.github.benmanes.caffeine.cache.Cache; -import com.github.benmanes.caffeine.cache.Caffeine; + import groovy.lang.Tuple2; import lombok.extern.slf4j.Slf4j; @@ -21,176 +23,198 @@ import lombok.extern.slf4j.Slf4j; */ @Slf4j public class ScriptGdbGraph extends AbstractGdbGraph { - private static final String VAR_PREFIX = "GDB___"; - private static final String VAR_ID = VAR_PREFIX + "id"; - private static final String VAR_LABEL = VAR_PREFIX + "label"; - private static final String VAR_FROM = VAR_PREFIX + "from"; - private static final String VAR_TO = VAR_PREFIX + "to"; - private static final String VAR_PROP_KEY = VAR_PREFIX + "PK"; - private static final String VAR_PROP_VALUE = VAR_PREFIX + "PV"; - private static final String ADD_V_START = "g.addV(" + VAR_LABEL + ").property(id, " + VAR_ID + ")"; - private static final String ADD_E_START = "g.addE(" + VAR_LABEL + ").property(id, " + VAR_ID + ").from(V(" - + VAR_FROM + ")).to(V(" + VAR_TO + "))"; + private static final String VAR_PREFIX = "GDB___"; + private static final String VAR_ID = VAR_PREFIX + "id"; + private static final String VAR_LABEL = VAR_PREFIX + "label"; + private static final String VAR_FROM = VAR_PREFIX + "from"; + private static final String VAR_TO = VAR_PREFIX + "to"; + private static final String VAR_PROP_KEY = VAR_PREFIX + "PK"; + private static final String VAR_PROP_VALUE = VAR_PREFIX + "PV"; + private static final String ADD_V_START = "g.addV(" + VAR_LABEL + ").property(id, " + VAR_ID + ")"; + private static final String ADD_E_START = + "g.addE(" + VAR_LABEL + ").property(id, " + VAR_ID + ").from(V(" + VAR_FROM + ")).to(V(" + VAR_TO + "))"; - private static final String UPDATE_V_START = "g.V("+VAR_ID+")"; - private static final String UPDATE_E_START = "g.E("+VAR_ID+")"; + private static final String UPDATE_V_START = "g.V(" + VAR_ID + ")"; + private static final String UPDATE_E_START = "g.E(" + VAR_ID + ")"; - private Cache propertyCache; - private Random random; + private Random random; - public ScriptGdbGraph() { - propertyCache = Caffeine.newBuilder().maximumSize(1024).build(); - random = new Random(); - } + public ScriptGdbGraph() { + this.random = new Random(); + } - public ScriptGdbGraph(Configuration config, boolean session) { - super(config, session); + public ScriptGdbGraph(final Configuration config, final boolean session) { + super(config, session); - propertyCache = Caffeine.newBuilder().maximumSize(1024).build(); - random = new Random(); + this.random = new Random(); + log.info("Init as ScriptGdbGraph."); + } - log.info("Init as ScriptGdbGraph."); - } + /** + * Apply list of {@link GdbElement} to GDB, return the failed records + * + * @param records + * list of element to apply + * @return + */ + @Override + public List> add(final List> records) { + final List> errors = new ArrayList<>(); + try { + beginTx(); + for (final Tuple2 elementTuple2 : records) { + try { + addInternal(elementTuple2.getSecond()); + } catch (final Exception e) { + errors.add(new Tuple2<>(elementTuple2.getFirst(), e)); + } + } + doCommit(); + } catch (final Exception ex) { + doRollback(); + throw new RuntimeException(ex); + } + return errors; + } - /** - * Apply list of {@link GdbElement} to GDB, return the failed records - * @param records list of element to apply - * @return - */ - @Override - public List> add(List> records) { - List> errors = new ArrayList<>(); - try { - beginTx(); - for (Tuple2 elementTuple2 : records) { - try { - addInternal(elementTuple2.getSecond()); - } catch (Exception e) { - errors.add(new Tuple2<>(elementTuple2.getFirst(), e)); - } - } - doCommit(); - } catch (Exception ex) { - doRollback(); - throw new RuntimeException(ex); - } - return errors; - } + private void addInternal(final GdbElement element) { + try { + addInternal(element, false); + } catch (final GdbDuplicateIdException e) { + if (this.updateMode == Key.UpdateMode.SKIP) { + log.debug("Skip duplicate id {}", element.getId()); + } else if (this.updateMode == Key.UpdateMode.INSERT) { + throw new RuntimeException(e); + } else if (this.updateMode == Key.UpdateMode.MERGE) { + if (element.getProperties().isEmpty()) { + return; + } - private void addInternal(GdbElement element) { - try { - addInternal(element, false); - } catch (GdbDuplicateIdException e) { - if (updateMode == Key.UpdateMode.SKIP) { - log.debug("Skip duplicate id {}", element.getId()); - } else if (updateMode == Key.UpdateMode.INSERT) { - throw new RuntimeException(e); - } else if (updateMode == Key.UpdateMode.MERGE) { - if (element.getProperties().isEmpty()) { - return; - } + try { + addInternal(element, true); + } catch (final GdbDuplicateIdException e1) { + log.error("duplicate id {} while update...", element.getId()); + throw new RuntimeException(e1); + } + } + } + } - try { - addInternal(element, true); - } catch (GdbDuplicateIdException e1) { - log.error("duplicate id {} while update...", element.getId()); - throw new RuntimeException(e1); - } - } - } - } + private void addInternal(final GdbElement element, final boolean update) throws GdbDuplicateIdException { + boolean firstAdd = !update; + final boolean isVertex = (element instanceof GdbVertex); + final List params = element.getProperties(); + final List subParams = new ArrayList<>(this.propertiesBatchNum); - private void addInternal(GdbElement element, boolean update) throws GdbDuplicateIdException { - Map params = element.getProperties(); - Map subParams = new HashMap<>(propertiesBatchNum); - boolean firstAdd = !update; - boolean isVertex = (element instanceof GdbVertex); + final int idLength = element.getId().length(); + int attachLength = element.getLabel().length(); + if (element instanceof GdbEdge) { + attachLength += ((GdbEdge)element).getFrom().length(); + attachLength += ((GdbEdge)element).getTo().length(); + } - for (Map.Entry entry : params.entrySet()) { - subParams.put(entry.getKey(), entry.getValue()); - if (subParams.size() >= propertiesBatchNum) { - setGraphDbElement(element, subParams, isVertex, firstAdd); - firstAdd = false; - subParams.clear(); - } - } - if (!subParams.isEmpty() || firstAdd) { - setGraphDbElement(element, subParams, isVertex, firstAdd); - } - } + int requestLength = idLength; + for (final GdbElement.GdbProperty entry : params) { + final String propKey = entry.getKey(); + final Object propValue = entry.getValue(); - private Tuple2> buildDsl(GdbElement element, - Map properties, - boolean isVertex, boolean firstAdd) { - Map params = new HashMap<>(); + int appendLength = propKey.length(); + if (propValue instanceof String) { + appendLength += ((String)propValue).length(); + } - String dslPropertyPart = propertyCache.get(properties.size(), keys -> { - final StringBuilder sb = new StringBuilder(); - for (int i = 0; i < keys; i++) { - sb.append(".property(").append(VAR_PROP_KEY).append(i) - .append(", ").append(VAR_PROP_VALUE).append(i).append(")"); - } - return sb.toString(); - }); + if (checkSplitDsl(firstAdd, requestLength, attachLength, appendLength, subParams.size())) { + setGraphDbElement(element, subParams, isVertex, firstAdd); + firstAdd = false; + subParams.clear(); + requestLength = idLength; + } - String dsl; - if (isVertex) { - dsl = (firstAdd ? ADD_V_START : UPDATE_V_START) + dslPropertyPart; - } else { - dsl = (firstAdd ? ADD_E_START : UPDATE_E_START) + dslPropertyPart; - if (firstAdd) { - params.put(VAR_FROM, ((GdbEdge)element).getFrom()); - params.put(VAR_TO, ((GdbEdge)element).getTo()); - } - } + requestLength += appendLength; + subParams.add(entry); + } + if (!subParams.isEmpty() || firstAdd) { + checkSplitDsl(firstAdd, requestLength, attachLength, 0, 0); + setGraphDbElement(element, subParams, isVertex, firstAdd); + } + } - int index = 0; - for (Map.Entry entry : properties.entrySet()) { - params.put(VAR_PROP_KEY+index, entry.getKey()); - params.put(VAR_PROP_VALUE+index, entry.getValue()); - index++; - } + private boolean checkSplitDsl(final boolean firstAdd, final int requestLength, final int attachLength, final int appendLength, + final int propNum) { + final int length = firstAdd ? requestLength + attachLength : requestLength; + if (length > this.maxRequestLength) { + throw new IllegalArgumentException("request length over limit(" + this.maxRequestLength + ")"); + } + return length + appendLength > this.maxRequestLength || propNum >= this.propertiesBatchNum; + } - if (firstAdd) { - params.put(VAR_LABEL, element.getLabel()); - } - params.put(VAR_ID, element.getId()); + private Tuple2> buildDsl(final GdbElement element, final List properties, + final boolean isVertex, final boolean firstAdd) { + final Map params = new HashMap<>(); + final StringBuilder sb = new StringBuilder(); + if (isVertex) { + sb.append(firstAdd ? ADD_V_START : UPDATE_V_START); + } else { + sb.append(firstAdd ? ADD_E_START : UPDATE_E_START); + } - return new Tuple2<>(dsl, params); - } + for (int i = 0; i < properties.size(); i++) { + final GdbElement.GdbProperty prop = properties.get(i); - private void setGraphDbElement(GdbElement element, Map properties, - boolean isVertex, boolean firstAdd) throws GdbDuplicateIdException { - int retry = 10; - int idleTime = random.nextInt(10) + 10; - Tuple2> elementDsl = buildDsl(element, properties, isVertex, firstAdd); + sb.append(".property("); + if (prop.getCardinality() == Key.PropertyType.set) { + sb.append("set, "); + } + sb.append(VAR_PROP_KEY).append(i).append(", ").append(VAR_PROP_VALUE).append(i).append(")"); - while (retry > 0) { - try { - runInternal(elementDsl.getFirst(), elementDsl.getSecond()); - log.debug("AddElement {}", element.getId()); - return; - } catch (Exception e) { - String cause = e.getCause() == null ? "" : e.getCause().toString(); - if (cause.contains("rejected from")) { - retry--; - try { - Thread.sleep(idleTime); - } catch (InterruptedException e1) { - // ... - } - idleTime = Math.min(idleTime * 2, 2000); - continue; - } else if (firstAdd && cause.contains("GraphDB id exists")) { - throw new GdbDuplicateIdException(e); - } - log.error("Add Failed id {}, dsl {}, params {}, e {}", element.getId(), - elementDsl.getFirst(), elementDsl.getSecond(), e); - throw new RuntimeException(e); - } - } - log.error("Add Failed id {}, dsl {}, params {}", element.getId(), - elementDsl.getFirst(), elementDsl.getSecond()); - throw new RuntimeException("failed to queue new element to server"); - } + params.put(VAR_PROP_KEY + i, prop.getKey()); + params.put(VAR_PROP_VALUE + i, prop.getValue()); + } + + if (firstAdd) { + params.put(VAR_LABEL, element.getLabel()); + if (!isVertex) { + params.put(VAR_FROM, ((GdbEdge)element).getFrom()); + params.put(VAR_TO, ((GdbEdge)element).getTo()); + } + } + params.put(VAR_ID, element.getId()); + + return new Tuple2<>(sb.toString(), params); + } + + private void setGraphDbElement(final GdbElement element, final List properties, final boolean isVertex, + final boolean firstAdd) throws GdbDuplicateIdException { + int retry = 10; + int idleTime = this.random.nextInt(10) + 10; + final Tuple2> elementDsl = buildDsl(element, properties, isVertex, firstAdd); + + while (retry > 0) { + try { + runInternal(elementDsl.getFirst(), elementDsl.getSecond()); + log.debug("AddElement {}", element.getId()); + return; + } catch (final Exception e) { + final String cause = e.getCause() == null ? "" : e.getCause().toString(); + if (cause.contains("rejected from") || cause.contains("Timeout waiting to lock key")) { + retry--; + try { + Thread.sleep(idleTime); + } catch (final InterruptedException e1) { + // ... + } + idleTime = Math.min(idleTime * 2, 2000); + continue; + } else if (firstAdd && cause.contains("GraphDB id exists")) { + throw new GdbDuplicateIdException(e); + } + log.error("Add Failed id {}, dsl {}, params {}, e {}", element.getId(), elementDsl.getFirst(), + elementDsl.getSecond(), e); + throw new RuntimeException(e); + } + } + log.error("Add Failed id {}, dsl {}, params {}", element.getId(), elementDsl.getFirst(), + elementDsl.getSecond()); + throw new RuntimeException("failed to queue new element to server"); + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java index 77175197..178b5e7c 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java @@ -7,53 +7,57 @@ import java.io.IOException; import java.io.InputStream; import java.util.function.Supplier; +import org.apache.commons.lang3.StringUtils; + import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONObject; -import org.apache.commons.lang3.StringUtils; - /** * @author jerrywang * */ public interface ConfigHelper { - static void assertConfig(String key, Supplier f) { - if (!f.get()) { - throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, key); - } - } + static void assertConfig(final String key, final Supplier f) { + if (!f.get()) { + throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, key); + } + } - static void assertHasContent(Configuration config, String key) { - assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key))); - } + static void assertHasContent(final Configuration config, final String key) { + assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key))); + } - /** - * NOTE: {@code Configuration::get(String, Class)} doesn't work. - * - * @param conf Configuration - * @param key key path to configuration - * @param cls Class of result type - * @return the target configuration object of type T - */ - static T getConfig(Configuration conf, String key, Class cls) { - JSONObject j = (JSONObject) conf.get(key); - return JSON.toJavaObject(j, cls); - } - - /** - * Create a configuration from the specified file on the classpath. - * - * @param name file name - * @return Configuration instance. - */ - static Configuration fromClasspath(String name) { - try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { - return Configuration.from(is); - } catch (IOException e) { - throw new IllegalArgumentException("File not found: " + name); - } - } + /** + * NOTE: {@code Configuration::get(String, Class)} doesn't work. + * + * @param conf + * Configuration + * @param key + * key path to configuration + * @param cls + * Class of result type + * @return the target configuration object of type T + */ + static T getConfig(final Configuration conf, final String key, final Class cls) { + final JSONObject j = (JSONObject)conf.get(key); + return JSON.toJavaObject(j, cls); + } + + /** + * Create a configuration from the specified file on the classpath. + * + * @param name + * file name + * @return Configuration instance. + */ + static Configuration fromClasspath(final String name) { + try (final InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { + return Configuration.from(is); + } catch (final IOException e) { + throw new IllegalArgumentException("File not found: " + name); + } + } } diff --git a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/GdbDuplicateIdException.java b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/GdbDuplicateIdException.java index e531d51b..dba641b0 100644 --- a/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/GdbDuplicateIdException.java +++ b/gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/GdbDuplicateIdException.java @@ -1,9 +1,8 @@ /* - * (C) 2019-present Alibaba Group Holding Limited. + * (C) 2019-present Alibaba Group Holding Limited. * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. + * This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public + * License version 2 as published by the Free Software Foundation. */ package com.alibaba.datax.plugin.writer.gdbwriter.util; @@ -13,11 +12,11 @@ package com.alibaba.datax.plugin.writer.gdbwriter.util; */ public class GdbDuplicateIdException extends Exception { - public GdbDuplicateIdException(Exception e) { - super(e); - } + public GdbDuplicateIdException(Exception e) { + super(e); + } - public GdbDuplicateIdException() { - super(); - } + public GdbDuplicateIdException() { + super(); + } } diff --git a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java index 2aacdddf..5309d1d9 100644 --- a/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java +++ b/hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java @@ -34,6 +34,14 @@ import java.util.Map; public class HbaseSQLHelper { private static final Logger LOG = LoggerFactory.getLogger(HbaseSQLHelper.class); + static { + try { + Class.forName("org.apache.phoenix.jdbc.PhoenixDriver"); + } catch (Throwable t) { + throw new RuntimeException("faild load org.apache.phoenix.jdbc.PhoenixDriver", t); + } + } + public static org.apache.hadoop.conf.Configuration generatePhoenixConf(HbaseSQLReaderConfig readerConfig) { org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); diff --git a/hbase11xwriter/doc/hbase11xwriter.md b/hbase11xwriter/doc/hbase11xwriter.md index ff20abe9..969f2e47 100644 --- a/hbase11xwriter/doc/hbase11xwriter.md +++ b/hbase11xwriter/doc/hbase11xwriter.md @@ -203,19 +203,20 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba * 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下: ``` -"column": [ - { - "index":1, - "name": "cf1:q1", - "type": "string" - }, - { - "index":2, - "name": "cf1:q2", - "type": "string" - } - ] - + + "column": [ + { + "index":1, + "name": "cf1:q1", + "type": "string" + }, + { + "index":2, + "name": "cf1:q2", + "type": "string" + } + ] + ``` * 必选:是
@@ -227,17 +228,17 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba * 描述:要写入的hbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。hbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下: ``` -"rowkeyColumn": [ - { - "index":0, - "type":"string" - }, - { - "index":-1, - "type":"string", - "value":"_" - } - ] + "rowkeyColumn": [ + { + "index":0, + "type":"string" + }, + { + "index":-1, + "type":"string", + "value":"_" + } + ] ``` @@ -250,19 +251,19 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba * 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下: ``` -"versionColumn":{ - "index":1 -} + "versionColumn":{ + "index":1 + } ``` 或者 ``` -"versionColumn":{ - "index":-1, - "value":123456789 -} + "versionColumn":{ + "index":-1, + "value":123456789 + } ``` diff --git a/hbase20xsqlreader/doc/hbase20xsqlreader.md b/hbase20xsqlreader/doc/hbase20xsqlreader.md index 9df020cc..43c42bf7 100644 --- a/hbase20xsqlreader/doc/hbase20xsqlreader.md +++ b/hbase20xsqlreader/doc/hbase20xsqlreader.md @@ -58,7 +58,9 @@ hbase20xsqlreader插件实现了从Phoenix(HBase SQL)读取数据,对应版本 * **queryServerAddress** * 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。 - + 增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性. + 格式参考:http://127.0.0.1:8765;user=root;password=root + * 必选:是
* 默认值:无
diff --git a/hbase20xsqlreader/pom.xml b/hbase20xsqlreader/pom.xml index ec1c3419..818123f3 100644 --- a/hbase20xsqlreader/pom.xml +++ b/hbase20xsqlreader/pom.xml @@ -14,7 +14,7 @@ jar - 5.1.0-HBase-2.0.0.2 + 5.2.5-HBase-2.x diff --git a/hbase20xsqlwriter/doc/hbase20xsqlwriter.md b/hbase20xsqlwriter/doc/hbase20xsqlwriter.md index 2cc8cb41..e1f4e2f1 100644 --- a/hbase20xsqlwriter/doc/hbase20xsqlwriter.md +++ b/hbase20xsqlwriter/doc/hbase20xsqlwriter.md @@ -120,7 +120,9 @@ HBase20xsqlwriter实现了向hbase中的SQL表(phoenix)批量导入数据的功 * **queryServerAddress** - * 描述:Phoenix QueryServer地址,为必填项,格式:http://${hostName}:${ip},如http://172.16.34.58:8765 + * 描述:Phoenix QueryServer地址,为必填项,格式:http://${hostName}:${ip},如http://172.16.34.58:8765。 + 增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性. + 格式参考:http://127.0.0.1:8765;user=root;password=root * 必选:是 * 默认值:无 diff --git a/hbase20xsqlwriter/pom.xml b/hbase20xsqlwriter/pom.xml index 690bc95e..5a2843e1 100644 --- a/hbase20xsqlwriter/pom.xml +++ b/hbase20xsqlwriter/pom.xml @@ -14,7 +14,7 @@ jar - 5.1.0-HBase-2.0.0.2 + 5.2.5-HBase-2.x 1.8 diff --git a/hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterTask.java b/hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterTask.java index 43f710b7..481e07df 100644 --- a/hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterTask.java +++ b/hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterTask.java @@ -6,12 +6,12 @@ import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; -import com.google.common.collect.Lists; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigDecimal; import java.sql.*; +import java.util.ArrayList; import java.util.Arrays; import java.util.List; @@ -154,7 +154,7 @@ public class HBase20xSQLWriterTask { * 从接收器中获取每条记录,写入Phoenix */ private void writeData(RecordReceiver lineReceiver) throws SQLException { - List buffer = Lists.newArrayListWithExpectedSize(batchSize); + List buffer = new ArrayList(batchSize); Record record = null; while ((record = lineReceiver.getFromReader()) != null) { // 校验列数量是否符合预期 diff --git a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java index 0119be2b..853613a2 100644 --- a/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java +++ b/hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java @@ -81,10 +81,10 @@ public class HdfsWriter extends Writer { //writeMode check this.writeMode = this.writerSliceConfig.getNecessaryValue(Key.WRITE_MODE, HdfsWriterErrorCode.REQUIRED_VALUE); writeMode = writeMode.toLowerCase().trim(); - Set supportedWriteModes = Sets.newHashSet("append", "nonconflict"); + Set supportedWriteModes = Sets.newHashSet("append", "nonconflict", "truncate"); if (!supportedWriteModes.contains(writeMode)) { throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, - String.format("仅支持append, nonConflict两种模式, 不支持您配置的 writeMode 模式 : [%s]", + String.format("仅支持append, nonConflict, truncate三种模式, 不支持您配置的 writeMode 模式 : [%s]", writeMode)); } this.writerSliceConfig.set(Key.WRITE_MODE, writeMode); @@ -179,6 +179,9 @@ public class HdfsWriter extends Writer { LOG.error(String.format("冲突文件列表为: [%s]", StringUtils.join(allFiles, ","))); throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("由于您配置了writeMode nonConflict,但您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path)); + }else if ("truncate".equalsIgnoreCase(writeMode) && isExistFile) { + LOG.info(String.format("由于您配置了writeMode truncate, [%s] 下面的内容将被覆盖重写", path)); + hdfsHelper.deleteFiles(existFilePaths); } }else{ throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, diff --git a/images/DataX开源用户交流群.jpg b/images/DataX开源用户交流群.jpg new file mode 100644 index 00000000..28155f38 Binary files /dev/null and b/images/DataX开源用户交流群.jpg differ diff --git a/images/DataX开源用户交流群2.jpg b/images/DataX开源用户交流群2.jpg new file mode 100644 index 00000000..129add96 Binary files /dev/null and b/images/DataX开源用户交流群2.jpg differ diff --git a/images/DataX开源用户交流群3.jpg b/images/DataX开源用户交流群3.jpg new file mode 100644 index 00000000..923061e8 Binary files /dev/null and b/images/DataX开源用户交流群3.jpg differ diff --git a/images/DataX开源用户交流群4.jpg b/images/DataX开源用户交流群4.jpg new file mode 100644 index 00000000..f0ce8db6 Binary files /dev/null and b/images/DataX开源用户交流群4.jpg differ diff --git a/images/DataX开源用户交流群5.jpg b/images/DataX开源用户交流群5.jpg new file mode 100644 index 00000000..1f3f7f95 Binary files /dev/null and b/images/DataX开源用户交流群5.jpg differ diff --git a/introduction.md b/introduction.md index b27607c7..d08ad98d 100644 --- a/introduction.md +++ b/introduction.md @@ -36,6 +36,7 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建 | ------------ | ---------- | :-------: | :-------: |:-------: | | RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| |             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| +|             | OceanBase  |     √     |     √     |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)| | | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| | | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| | | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| diff --git a/kingbaseesreader/doc/kingbaseesreader.md b/kingbaseesreader/doc/kingbaseesreader.md new file mode 100644 index 00000000..ec5495a6 --- /dev/null +++ b/kingbaseesreader/doc/kingbaseesreader.md @@ -0,0 +1,241 @@ + +# KingbaseesReader 插件文档 + + +___ + + +## 1 快速介绍 + +KingbaseesReader插件实现了从KingbaseES读取数据。在底层实现上,KingbaseesReader通过JDBC连接远程KingbaseES数据库,并执行相应的sql语句将数据从KingbaseES库中SELECT出来。 + +## 2 实现原理 + +简而言之,KingbaseesReader通过JDBC连接器连接到远程的KingbaseES数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程KingbaseES数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 + +对于用户配置Table、Column、Where的信息,KingbaseesReader将其拼接为SQL语句发送到KingbaseES数据库;对于用户配置querySql信息,KingbaseesReader直接将其发送到KingbaseES数据库。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 配置一个从KingbaseES数据库同步抽取数据到本地的作业: + +``` +{ + "job": { + "setting": { + "speed": { + //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. + "byte": 1048576 + }, + //出错限制 + "errorLimit": { + //出错的record条数上限,当大于该值即报错。 + "record": 0, + //出错的record百分比上限 1.0表示100%,0.02表示2% + "percentage": 0.02 + } + }, + "content": [ + { + "reader": { + "name": "kingbaseesreader", + "parameter": { + // 数据库连接用户名 + "username": "xx", + // 数据库连接密码 + "password": "xx", + "column": [ + "id","name" + ], + //切分主键 + "splitPk": "id", + "connection": [ + { + "table": [ + "table" + ], + "jdbcUrl": [ + "jdbc:kingbase8://host:port/database" + ] + } + ] + } + }, + "writer": { + //writer类型 + "name": "streamwriter", + //是否打印内容 + "parameter": { + "print":true, + } + } + } + ] + } +} + +``` + +* 配置一个自定义SQL的数据库同步任务到本地内容的作业: + +``` +{ + "job": { + "setting": { + "speed": 1048576 + }, + "content": [ + { + "reader": { + "name": "kingbaseesreader", + "parameter": { + "username": "xx", + "password": "xx", + "where": "", + "connection": [ + { + "querySql": [ + "select db_id,on_line_flag from db_info where db_id < 10;" + ], + "jdbcUrl": [ + "jdbc:kingbase8://host:port/database", "jdbc:kingbase8://host:port/database" + ] + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": false, + "encoding": "UTF-8" + } + } + } + ] + } +} +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,KingbaseesReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,KingbaseesReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 + + jdbcUrl按照KingbaseES官方规范,并可以填写连接附件控制信息。具体请参看[KingbaseES官方文档](https://help.kingbase.com.cn/doc-view-5683.html)。 + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:数据源的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:数据源指定用户名的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,KingbaseesReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
+ + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 + + 支持列裁剪,即列可以挑选部分列进行导出。 + + 支持列换序,即列可以不按照表schema信息进行导出。 + + 支持常量配置,用户需要按照KingbaseES语法格式: + ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] + id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。 + + **column必须用户显示指定同步的列集合,不允许为空!** + + * 必选:是
+ + * 默认值:无
+ +* **splitPk** + + * 描述:KingbaseesReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 + + 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 + + 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,KingbaseesReader将报错! + + splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 + + * 必选:否
+ + * 默认值:空
+ +* **where** + + * 描述:筛选条件,KingbaseesReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
+ + where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。 + + * 必选:否
+ + * 默认值:无
+ +* **querySql** + + * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
+ + `当用户配置querySql时,KingbaseesReader直接忽略table、column、where条件的配置`。 + + * 必选:否
+ + * 默认值:无
+ +* **fetchSize** + + * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
+ + `注意,该值过大(>2048)可能造成DataX进程OOM。`。 + + * 必选:否
+ + * 默认值:1024
+ + +### 3.3 类型转换 + +目前KingbaseesReader支持大部分KingbaseES类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 + +下面列出KingbaseesReader针对KingbaseES类型转换列表: + + +| DataX 内部类型| KingbaseES 数据类型 | +| -------- | ----- | +| Long |bigint, bigserial, integer, smallint, serial | +| Double |double precision, money, numeric, real | +| String |varchar, char, text, bit, inet| +| Date |date, time, timestamp | +| Boolean |bool| +| Bytes |bytea| + +请注意: + +* `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。 \ No newline at end of file diff --git a/kingbaseesreader/pom.xml b/kingbaseesreader/pom.xml new file mode 100644 index 00000000..6e844c10 --- /dev/null +++ b/kingbaseesreader/pom.xml @@ -0,0 +1,88 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + kingbaseesreader + kingbaseesreader + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + com.kingbase8 + kingbase8 + 8.2.0 + system + ${basedir}/src/main/libs/kingbase8-8.2.0.jar + + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + diff --git a/kingbaseesreader/src/main/assembly/package.xml b/kingbaseesreader/src/main/assembly/package.xml new file mode 100644 index 00000000..e369c5f0 --- /dev/null +++ b/kingbaseesreader/src/main/assembly/package.xml @@ -0,0 +1,42 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/kingbaseesreader + + + target/ + + kingbaseesreader-0.0.1-SNAPSHOT.jar + + plugin/reader/kingbaseesreader + + + src/main/libs + + *.* + + plugin/reader/kingbaseesreader/libs + + + + + + false + plugin/reader/kingbaseesreader/libs + runtime + + + diff --git a/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/Constant.java b/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/Constant.java new file mode 100644 index 00000000..bed4c6e6 --- /dev/null +++ b/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/Constant.java @@ -0,0 +1,7 @@ +package com.alibaba.datax.plugin.reader.kingbaseesreader; + +public class Constant { + + public static final int DEFAULT_FETCH_SIZE = 1000; + +} diff --git a/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/KingbaseesReader.java b/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/KingbaseesReader.java new file mode 100644 index 00000000..9246655f --- /dev/null +++ b/kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/KingbaseesReader.java @@ -0,0 +1,86 @@ +package com.alibaba.datax.plugin.reader.kingbaseesreader; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; + +import java.util.List; + +public class KingbaseesReader extends Reader { + + private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES; + + public static class Job extends Reader.Job { + + private Configuration originalConfig; + private CommonRdbmsReader.Job commonRdbmsReaderMaster; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, + Constant.DEFAULT_FETCH_SIZE); + if (fetchSize < 1) { + throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, + String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); + } + this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); + + this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); + this.commonRdbmsReaderMaster.init(this.originalConfig); + } + + @Override + public List split(int adviceNumber) { + return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); + } + + @Override + public void post() { + this.commonRdbmsReaderMaster.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderMaster.destroy(this.originalConfig); + } + + } + + public static class Task extends Reader.Task { + + private Configuration readerSliceConfig; + private CommonRdbmsReader.Task commonRdbmsReaderSlave; + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderSlave.init(this.readerSliceConfig); + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); + + this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, + super.getTaskPluginCollector(), fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderSlave.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); + } + + } + +} diff --git a/kingbaseesreader/src/main/libs/kingbase8-8.2.0.jar b/kingbaseesreader/src/main/libs/kingbase8-8.2.0.jar new file mode 100644 index 00000000..0b5ac964 Binary files /dev/null and b/kingbaseesreader/src/main/libs/kingbase8-8.2.0.jar differ diff --git a/kingbaseesreader/src/main/resources/plugin.json b/kingbaseesreader/src/main/resources/plugin.json new file mode 100644 index 00000000..9bc0684b --- /dev/null +++ b/kingbaseesreader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "kingbaseesreader", + "class": "com.alibaba.datax.plugin.reader.kingbaseesreader.KingbaseesReader", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/kingbaseesreader/src/main/resources/plugin_job_template.json b/kingbaseesreader/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..49c07098 --- /dev/null +++ b/kingbaseesreader/src/main/resources/plugin_job_template.json @@ -0,0 +1,13 @@ +{ + "name": "kingbaseesreader", + "parameter": { + "username": "", + "password": "", + "connection": [ + { + "table": [], + "jdbcUrl": [] + } + ] + } +} \ No newline at end of file diff --git a/kingbaseeswriter/doc/kingbaseeswriter.md b/kingbaseeswriter/doc/kingbaseeswriter.md new file mode 100644 index 00000000..96e1a3ac --- /dev/null +++ b/kingbaseeswriter/doc/kingbaseeswriter.md @@ -0,0 +1,208 @@ +# DataX KingbaseesWriter + + +--- + + +## 1 快速介绍 + +KingbaseesWriter插件实现了写入数据到 KingbaseES主库目的表的功能。在底层实现上,KingbaseesWriter通过JDBC连接远程 KingbaseES 数据库,并执行相应的 insert into ... sql 语句将数据写入 KingbaseES,内部会分批次提交入库。 + +KingbaseesWriter面向ETL开发工程师,他们使用KingbaseesWriter从数仓导入数据到KingbaseES。同时 KingbaseesWriter亦可以作为数据迁移工具为DBA等用户提供服务。 + + +## 2 实现原理 + +KingbaseesWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 + + +* `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行) + +
+ + 注意: + 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 + 2. KingbaseesWriter和MysqlWriter不同,不支持配置writeMode参数。 + + +## 3 功能说明 + +### 3.1 配置样例 + +* 这里使用一份从内存产生到 KingbaseesWriter导入的数据。 + +```json +{ + "job": { + "setting": { + "speed": { + "channel": 1 + } + }, + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column" : [ + { + "value": "DataX", + "type": "string" + }, + { + "value": 19880808, + "type": "long" + }, + { + "value": "1988-08-08 08:08:08", + "type": "date" + }, + { + "value": true, + "type": "bool" + }, + { + "value": "test", + "type": "bytes" + } + ], + "sliceRecordCount": 1000 + } + }, + "writer": { + "name": "kingbaseeswriter", + "parameter": { + "username": "xx", + "password": "xx", + "column": [ + "id", + "name" + ], + "preSql": [ + "delete from test" + ], + "connection": [ + { + "jdbcUrl": "jdbc:kingbase8://127.0.0.1:3002/datax", + "table": [ + "test" + ] + } + ] + } + } + } + ] + } +} + +``` + + +### 3.2 参数说明 + +* **jdbcUrl** + + * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 + + 注意:1、在一个数据库上只能配置一个值。 + 2、jdbcUrl按照KingbaseES官方规范,并可以填写连接附加参数信息。具体请参看KingbaseES官方文档或者咨询对应 DBA。 + + + * 必选:是
+ + * 默认值:无
+ +* **username** + + * 描述:目的数据库的用户名
+ + * 必选:是
+ + * 默认值:无
+ +* **password** + + * 描述:目的数据库的密码
+ + * 必选:是
+ + * 默认值:无
+ +* **table** + + * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 + + 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 + + * 必选:是
+ + * 默认值:无
+ +* **column** + + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] + + 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 + 2、此处 column 不能配置任何常量值 + + * 必选:是
+ + * 默认值:否
+ +* **preSql** + + * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
+ + * 必选:否
+ + * 默认值:无
+ +* **postSql** + + * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
+ + * 必选:否
+ + * 默认值:无
+ +* **batchSize** + + * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与KingbaseES的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
+ + * 必选:否
+ + * 默认值:1024
+ +### 3.3 类型转换 + +目前 KingbaseesWriter支持大部分 KingbaseES类型,但也存在部分没有支持的情况,请注意检查你的类型。 + +下面列出 KingbaseesWriter针对 KingbaseES类型转换列表: + +| DataX 内部类型| KingbaseES 数据类型 | +| -------- | ----- | +| Long |bigint, bigserial, integer, smallint, serial | +| Double |double precision, money, numeric, real | +| String |varchar, char, text, bit| +| Date |date, time, timestamp | +| Boolean |bool| +| Bytes |bytea| + + +## FAQ + +*** + +**Q: KingbaseesWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** + +A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 + +*** + +**Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** + +A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。 +第二种,向临时表导入数据,完成后再 rename 到线上表。 + +*** diff --git a/kingbaseeswriter/pom.xml b/kingbaseeswriter/pom.xml new file mode 100644 index 00000000..284c8c5e --- /dev/null +++ b/kingbaseeswriter/pom.xml @@ -0,0 +1,84 @@ + + 4.0.0 + + com.alibaba.datax + datax-all + 0.0.1-SNAPSHOT + + kingbaseeswriter + kingbaseeswriter + jar + writer data into kingbasees database + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + + org.slf4j + slf4j-api + + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + + com.kingbase8 + kingbase8 + 8.2.0 + system + ${basedir}/src/main/libs/kingbase8-8.2.0.jar + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/kingbaseeswriter/src/main/assembly/package.xml b/kingbaseeswriter/src/main/assembly/package.xml new file mode 100644 index 00000000..aa78a6ec --- /dev/null +++ b/kingbaseeswriter/src/main/assembly/package.xml @@ -0,0 +1,42 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/kingbaseeswriter + + + target/ + + kingbaseeswriter-0.0.1-SNAPSHOT.jar + + plugin/writer/kingbaseeswriter + + + src/main/libs + + *.* + + plugin/writer/kingbaseeswriter/libs + + + + + + false + plugin/writer/kingbaseeswriter/libs + runtime + + + diff --git a/kingbaseeswriter/src/main/java/com/alibaba/datax/plugin/writer/kingbaseeswriter/KingbaseesWriter.java b/kingbaseeswriter/src/main/java/com/alibaba/datax/plugin/writer/kingbaseeswriter/KingbaseesWriter.java new file mode 100644 index 00000000..dec5ff95 --- /dev/null +++ b/kingbaseeswriter/src/main/java/com/alibaba/datax/plugin/writer/kingbaseeswriter/KingbaseesWriter.java @@ -0,0 +1,100 @@ +package com.alibaba.datax.plugin.writer.kingbaseeswriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Key; + +import java.util.List; + +public class KingbaseesWriter extends Writer { + private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES; + + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterMaster; + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + // warn:not like mysql, KingbaseES only support insert mode, don't use + String writeMode = this.originalConfig.getString(Key.WRITE_MODE); + if (null != writeMode) { + throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, + String.format("写入模式(writeMode)配置有误. 因为KingbaseES不支持配置参数项 writeMode: %s, KingbaseES仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); + } + + this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonRdbmsWriterMaster.init(this.originalConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterMaster.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); + } + + @Override + public void post() { + this.commonRdbmsWriterMaster.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterMaster.destroy(this.originalConfig); + } + + } + + public static class Task extends Writer.Task { + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task commonRdbmsWriterSlave; + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ + @Override + public String calcValueHolder(String columnType){ + if("serial".equalsIgnoreCase(columnType)){ + return "?::int"; + }else if("bit".equalsIgnoreCase(columnType)){ + return "?::bit varying"; + } + return "?::" + columnType; + } + }; + this.commonRdbmsWriterSlave.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); + } + + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterSlave.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); + } + + } + +} diff --git a/kingbaseeswriter/src/main/libs/kingbase8-8.2.0.jar b/kingbaseeswriter/src/main/libs/kingbase8-8.2.0.jar new file mode 100644 index 00000000..0b5ac964 Binary files /dev/null and b/kingbaseeswriter/src/main/libs/kingbase8-8.2.0.jar differ diff --git a/kingbaseeswriter/src/main/resources/plugin.json b/kingbaseeswriter/src/main/resources/plugin.json new file mode 100644 index 00000000..83517760 --- /dev/null +++ b/kingbaseeswriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "kingbaseeswriter", + "class": "com.alibaba.datax.plugin.writer.kingbaseeswriter.KingbaseesWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", + "developer": "alibaba" +} \ No newline at end of file diff --git a/kingbaseeswriter/src/main/resources/plugin_job_template.json b/kingbaseeswriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..94b66168 --- /dev/null +++ b/kingbaseeswriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,17 @@ +{ + "name": "kingbaseeswriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "preSql": [], + "connection": [ + { + "jdbcUrl": "", + "table": [] + } + ], + "preSql": [], + "postSql": [] + } +} \ No newline at end of file diff --git a/kuduwriter/README.md b/kuduwriter/README.md new file mode 100644 index 00000000..f53de1b5 --- /dev/null +++ b/kuduwriter/README.md @@ -0,0 +1,6 @@ +# datax-kudu-plugin +datax kudu的writer插件 + + + +仅在kudu11进行过测试 diff --git a/kuduwriter/doc/image-20200901193148188.png b/kuduwriter/doc/image-20200901193148188.png new file mode 100644 index 00000000..7e7b8a1f Binary files /dev/null and b/kuduwriter/doc/image-20200901193148188.png differ diff --git a/kuduwriter/doc/kuduwirter.md b/kuduwriter/doc/kuduwirter.md new file mode 100644 index 00000000..1a952449 --- /dev/null +++ b/kuduwriter/doc/kuduwirter.md @@ -0,0 +1,143 @@ +# datax-kudu-plugins +datax kudu的writer插件 + + + +eg: + +```json +{ + "name": "kuduwriter", + "parameter": { + "kuduConfig": { + "kudu.master_addresses": "***", + "timeout": 60000, + "sessionTimeout": 60000 + + }, + "table": "", + "replicaCount": 3, + "truncate": false, + "writeMode": "upsert", + "partition": { + "range": { + "column1": [ + { + "lower": "2020-08-25", + "upper": "2020-08-26" + }, + { + "lower": "2020-08-26", + "upper": "2020-08-27" + }, + { + "lower": "2020-08-27", + "upper": "2020-08-28" + } + ] + }, + "hash": { + "column": [ + "column1" + ], + "number": 3 + } + }, + "column": [ + { + "index": 0, + "name": "c1", + "type": "string", + "primaryKey": true + }, + { + "index": 1, + "name": "c2", + "type": "string", + "compress": "DEFAULT_COMPRESSION", + "encoding": "AUTO_ENCODING", + "comment": "注解xxxx" + } + ], + "batchSize": 1024, + "bufferSize": 2048, + "skipFail": false, + "encoding": "UTF-8" + } +} +``` + +必须参数: + +```json + "writer": { + "name": "kuduwriter", + "parameter": { + "kuduConfig": { + "kudu.master_addresses": "***" + }, + "table": "***", + "column": [ + { + "name": "c1", + "type": "string", + "primaryKey": true + }, + { + "name": "c2", + "type": "string", + }, + { + "name": "c3", + "type": "string" + }, + { + "name": "c4", + "type": "string" + } + ] + } + } +``` + +主键列请写到最前面 + + + +![image-20200901193148188](./image-20200901193148188.png) + +##### 配置列表 + +| name | default | description | 是否必须 | +| -------------- | ------------------- | ------------------------------------------------------------ | -------- | +| kuduConfig | | kudu配置 (kudu.master_addresses等) | 是 | +| table | | 导入目标表名 | 是 | +| partition | | 分区 | 否 | +| column | | 列 | 是 | +| name | | 列名 | 是 | +| type | string | 列的类型,现支持INT, FLOAT, STRING, BIGINT, DOUBLE, BOOLEAN, LONG。 | 否 | +| index | 升序排列 | 列索引位置(要么全部列都写,要么都不写),如reader中取到的某一字段在第二位置(eg: name, id, age)但kudu目标表结构不同(eg:id,name, age),此时就需要将index赋值为(1,0,2),默认顺序(0,1,2) | 否 | +| primaryKey | false | 是否为主键(请将所有的主键列写在前面),不表明主键将不会检查过滤脏数据 | 否 | +| compress | DEFAULT_COMPRESSION | 压缩格式 | 否 | +| encoding | AUTO_ENCODING | 编码 | 否 | +| replicaCount | 3 | 保留副本个数 | 否 | +| hash | | hash分区 | 否 | +| number | 3 | hash分区个数 | 否 | +| range | | range分区 | 否 | +| lower | | range分区下限 (eg: sql建表:partition value='haha' 对应:“lower”:“haha”,“upper”:“haha\000”) | 否 | +| upper | | range分区上限(eg: sql建表:partition "10" <= VALUES < "20" 对应:“lower”:“10”,“upper”:“20”) | 否 | +| truncate | false | 是否清空表,本质上是删表重建 | 否 | +| writeMode | upsert | upsert,insert,update | 否 | +| batchSize | 512 | 每xx行数据flush一次结果(最好不要超过1024) | 否 | +| bufferSize | 3072 | 缓冲区大小 | 否 | +| skipFail | false | 是否跳过插入不成功的数据 | 否 | +| timeout | 60000 | client超时时间,如创建表,删除表操作的超时时间。单位:ms | 否 | +| sessionTimeout | 60000 | session超时时间 单位:ms | 否 | + + + + + + + + diff --git a/kuduwriter/pom.xml b/kuduwriter/pom.xml new file mode 100644 index 00000000..5d78be4c --- /dev/null +++ b/kuduwriter/pom.xml @@ -0,0 +1,82 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + kuduwriter + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.apache.kudu + kudu-client + 1.11.1 + + + junit + junit + 4.13.1 + test + + + com.alibaba.datax + datax-core + ${datax-project-version} + + + com.alibaba.datax + datax-service-face + + + test + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + \ No newline at end of file diff --git a/kuduwriter/src/main/assembly/package.xml b/kuduwriter/src/main/assembly/package.xml new file mode 100644 index 00000000..c9497b92 --- /dev/null +++ b/kuduwriter/src/main/assembly/package.xml @@ -0,0 +1,35 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/kuduwriter + + + target/ + + kuduwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/kuduwriter + + + + + + false + plugin/writer/kuduwriter/libs + runtime + + + \ No newline at end of file diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/ColumnType.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/ColumnType.java new file mode 100644 index 00000000..ebd6ea79 --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/ColumnType.java @@ -0,0 +1,37 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.exception.DataXException; + +import java.util.Arrays; + +/** + * @author daizihao + * @create 2020-08-31 19:12 + **/ +public enum ColumnType { + INT("int"), + FLOAT("float"), + STRING("string"), + BIGINT("bigint"), + DOUBLE("double"), + BOOLEAN("boolean"), + LONG("long"); + private String mode; + ColumnType(String mode) { + this.mode = mode.toLowerCase(); + } + + public String getMode() { + return mode; + } + + public static ColumnType getByTypeName(String modeName) { + for (ColumnType modeType : values()) { + if (modeType.mode.equalsIgnoreCase(modeName)) { + return modeType; + } + } + throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, + String.format("Kuduwriter does not support the type:%s, currently supported types are:%s", modeName, Arrays.asList(values()))); + } +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Constant.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Constant.java new file mode 100644 index 00000000..2710e350 --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Constant.java @@ -0,0 +1,21 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +/** + * @author daizihao + * @create 2020-08-31 14:42 + **/ +public class Constant { + public static final String DEFAULT_ENCODING = "UTF-8"; +// public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; + + public static final String COMPRESSION = "DEFAULT_COMPRESSION"; + public static final String ENCODING = "AUTO_ENCODING"; + public static final Long ADMIN_TIMEOUTMS = 60000L; + public static final Long SESSION_TIMEOUTMS = 60000L; + + + public static final String INSERT_MODE = "upsert"; + public static final long DEFAULT_WRITE_BATCH_SIZE = 512L; + public static final long DEFAULT_MUTATION_BUFFER_SPACE = 3072L; + +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/InsertModeType.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/InsertModeType.java new file mode 100644 index 00000000..754ca4fc --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/InsertModeType.java @@ -0,0 +1,34 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.exception.DataXException; + +import java.util.Arrays; + +/** + * @author daizihao + * @create 2020-08-31 14:47 + **/ +public enum InsertModeType { + Insert("insert"), + Upsert("upsert"), + Update("update"); + private String mode; + + InsertModeType(String mode) { + this.mode = mode.toLowerCase(); + } + + public String getMode() { + return mode; + } + + public static InsertModeType getByTypeName(String modeName) { + for (InsertModeType modeType : values()) { + if (modeType.mode.equalsIgnoreCase(modeName)) { + return modeType; + } + } + throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, + String.format("Kuduwriter does not support the mode :[%s], currently supported mode types are :%s", modeName, Arrays.asList(values()))); + } +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Key.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Key.java new file mode 100644 index 00000000..7e5755aa --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Key.java @@ -0,0 +1,45 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +/** + * @author daizihao + * @create 2020-08-31 14:17 + **/ +public class Key { + public final static String KUDU_CONFIG = "kuduConfig"; + public final static String KUDU_MASTER = "kudu.master_addresses"; + public final static String KUDU_ADMIN_TIMEOUT = "timeout"; + public final static String KUDU_SESSION_TIMEOUT = "sessionTimeout"; + + public final static String TABLE = "table"; + public final static String PARTITION = "partition"; + public final static String COLUMN = "column"; + + public static final String NAME = "name"; + public static final String TYPE = "type"; + public static final String INDEX = "index"; + public static final String PRIMARYKEY = "primaryKey"; + public static final String COMPRESSION = "compress"; + public static final String COMMENT = "comment"; + public final static String ENCODING = "encoding"; + + + + public static final String NUM_REPLICAS = "replicaCount"; + public static final String HASH = "hash"; + public static final String HASH_NUM = "number"; + + public static final String RANGE = "range"; + public static final String LOWER = "lower"; + public static final String UPPER = "upper"; + + + + public static final String TRUNCATE = "truncate"; + + public static final String INSERT_MODE = "writeMode"; + + public static final String WRITE_BATCH_SIZE = "batchSize"; + + public static final String MUTATION_BUFFER_SPACE = "bufferSize"; + public static final String SKIP_FAIL = "skipFail"; +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java new file mode 100644 index 00000000..cf1b0f8f --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java @@ -0,0 +1,369 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.fastjson.JSON; +import org.apache.commons.lang3.StringUtils; +import org.apache.commons.lang3.Validate; +import org.apache.kudu.ColumnSchema; +import org.apache.kudu.Schema; +import org.apache.kudu.Type; +import org.apache.kudu.client.*; +import org.apache.kudu.shaded.org.checkerframework.checker.units.qual.K; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import sun.rmi.runtime.Log; + +import java.nio.charset.Charset; +import java.util.*; +import java.util.concurrent.SynchronousQueue; +import java.util.concurrent.ThreadFactory; +import java.util.concurrent.ThreadPoolExecutor; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +/** + * @author daizihao + * @create 2020-08-27 18:30 + **/ +public class Kudu11xHelper { + + private static final Logger LOG = LoggerFactory.getLogger(Kudu11xHelper.class); + + public static Map getKuduConfiguration(String kuduConfig) { + if (StringUtils.isBlank(kuduConfig)) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, + "Connection configuration information required."); + } + Map kConfiguration; + try { + kConfiguration = JSON.parseObject(kuduConfig, HashMap.class); + Validate.isTrue(kConfiguration != null, "kuduConfig is null!"); + kConfiguration.put(Key.KUDU_ADMIN_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_ADMIN_TIMEOUT, Constant.ADMIN_TIMEOUTMS)); + kConfiguration.put(Key.KUDU_SESSION_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_SESSION_TIMEOUT, Constant.SESSION_TIMEOUTMS)); + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); + } + + return kConfiguration; + } + + public static KuduClient getKuduClient(String kuduConfig) { + Map conf = Kudu11xHelper.getKuduConfiguration(kuduConfig); + KuduClient kuduClient = null; + try { + String masterAddress = (String) conf.get(Key.KUDU_MASTER); + kuduClient = new KuduClient.KuduClientBuilder(masterAddress) + .defaultAdminOperationTimeoutMs((Long) conf.get(Key.KUDU_ADMIN_TIMEOUT)) + .defaultOperationTimeoutMs((Long) conf.get(Key.KUDU_SESSION_TIMEOUT)) + .build(); + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); + } + return kuduClient; + } + + public static KuduTable getKuduTable(Configuration configuration, KuduClient kuduClient) { + String tableName = configuration.getString(Key.TABLE); + + KuduTable table = null; + try { + if (kuduClient.tableExists(tableName)) { + table = kuduClient.openTable(tableName); + } else { + synchronized (Kudu11xHelper.class) { + if (!kuduClient.tableExists(tableName)) { + Schema schema = Kudu11xHelper.getSchema(configuration); + CreateTableOptions tableOptions = new CreateTableOptions(); + + Kudu11xHelper.setTablePartition(configuration, tableOptions, schema); + //副本数 + Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3); + tableOptions.setNumReplicas(numReplicas); + table = kuduClient.createTable(tableName, schema, tableOptions); + } else { + table = kuduClient.openTable(tableName); + } + } + } + + + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_TABLE_ERROR, e); + } + return table; + } + + public static void createTable(Configuration configuration) { + String tableName = configuration.getString(Key.TABLE); + String kuduConfig = configuration.getString(Key.KUDU_CONFIG); + KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); + try { + Schema schema = Kudu11xHelper.getSchema(configuration); + CreateTableOptions tableOptions = new CreateTableOptions(); + + Kudu11xHelper.setTablePartition(configuration, tableOptions, schema); + //副本数 + Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3); + tableOptions.setNumReplicas(numReplicas); + kuduClient.createTable(tableName, schema, tableOptions); + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.GREATE_KUDU_TABLE_ERROR, e); + } finally { + AtomicInteger i = new AtomicInteger(10); + while (i.get() > 0) { + try { + if (kuduClient.isCreateTableDone(tableName)) { + Kudu11xHelper.closeClient(kuduClient); + LOG.info("Table " + tableName + " is created!"); + break; + } + i.decrementAndGet(); + LOG.error("timeout!"); + } catch (KuduException e) { + LOG.info("Wait for the table to be created..... " + i); + try { + Thread.sleep(100L); + } catch (InterruptedException ex) { + ex.printStackTrace(); + } + i.decrementAndGet(); + } + } + try { + if (kuduClient != null) { + kuduClient.close(); + } + } catch (KuduException e) { + LOG.info("Kudu client has been shut down!"); + } + } + } + + public static ThreadPoolExecutor createRowAddThreadPool(int coreSize) { + return new ThreadPoolExecutor(coreSize, + coreSize, + 60L, + TimeUnit.SECONDS, + new SynchronousQueue(), + new ThreadFactory() { + private final ThreadGroup group = System.getSecurityManager() == null ? Thread.currentThread().getThreadGroup() : System.getSecurityManager().getThreadGroup(); + private final AtomicInteger threadNumber = new AtomicInteger(1); + + @Override + public Thread newThread(Runnable r) { + Thread t = new Thread(group, r, + "pool-kudu_rows_add-thread-" + threadNumber.getAndIncrement(), + 0); + if (t.isDaemon()) + t.setDaemon(false); + if (t.getPriority() != Thread.NORM_PRIORITY) + t.setPriority(Thread.NORM_PRIORITY); + return t; + } + }, new ThreadPoolExecutor.CallerRunsPolicy()); + } + + public static List> getColumnLists(List columns) { + int quota = 8; + int num = (columns.size() - 1) / quota + 1; + int gap = columns.size() / num; + List> columnLists = new ArrayList<>(num); + for (int j = 0; j < num - 1; j++) { + List destList = new ArrayList<>(columns.subList(j * gap, (j + 1) * gap)); + columnLists.add(destList); + } + List destList = new ArrayList<>(columns.subList(gap * (num - 1), columns.size())); + columnLists.add(destList); + return columnLists; + } + + public static boolean isTableExists(Configuration configuration) { + String tableName = configuration.getString(Key.TABLE); + String kuduConfig = configuration.getString(Key.KUDU_CONFIG); + KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); + try { + return kuduClient.tableExists(tableName); + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); + } finally { + Kudu11xHelper.closeClient(kuduClient); + } + } + + public static void closeClient(KuduClient kuduClient) { + try { + if (kuduClient != null) { + kuduClient.close(); + } + } catch (KuduException e) { + LOG.warn("The \"kudu client\" was not stopped gracefully. !"); + + } + + } + + public static Schema getSchema(Configuration configuration) { + List columns = configuration.getListConfiguration(Key.COLUMN); + List columnSchemas = new ArrayList<>(); + Schema schema = null; + if (columns == null || columns.isEmpty()) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, "column is not defined,eg:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"},{\"name\": \"cf1:column1\",\"type\": \"long\"}]"); + } + try { + for (Configuration column : columns) { + + String type = "BIGINT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) || + "LONG".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ? + "INT64" : "INT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ? + "INT32" : column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase(); + String name = column.getNecessaryValue(Key.NAME, Kudu11xWriterErrorcode.REQUIRED_VALUE); + Boolean key = column.getBool(Key.PRIMARYKEY, false); + String encoding = column.getString(Key.ENCODING, Constant.ENCODING).toUpperCase(); + String compression = column.getString(Key.COMPRESSION, Constant.COMPRESSION).toUpperCase(); + String comment = column.getString(Key.COMMENT, ""); + + columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder(name, Type.getTypeForName(type)) + .key(key) + .encoding(ColumnSchema.Encoding.valueOf(encoding)) + .compressionAlgorithm(ColumnSchema.CompressionAlgorithm.valueOf(compression)) + .comment(comment) + .build()); + } + schema = new Schema(columnSchemas); + } catch (Exception e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, e); + } + return schema; + } + + public static Integer getPrimaryKeyIndexUntil(List columns) { + int i = 0; + while (i < columns.size()) { + Configuration col = columns.get(i); + if (!col.getBool(Key.PRIMARYKEY, false)) { + break; + } + i++; + } + return i; + } + + public static void setTablePartition(Configuration configuration, + CreateTableOptions tableOptions, + Schema schema) { + Configuration partition = configuration.getConfiguration(Key.PARTITION); + if (partition == null) { + ColumnSchema columnSchema = schema.getColumns().get(0); + tableOptions.addHashPartitions(Collections.singletonList(columnSchema.getName()), 3); + return; + } + //range分区 + Configuration range = partition.getConfiguration(Key.RANGE); + if (range != null) { + List rangeColums = new ArrayList<>(range.getKeys()); + tableOptions.setRangePartitionColumns(rangeColums); + for (String rangeColum : rangeColums) { + List lowerAndUppers = range.getListConfiguration(rangeColum); + for (Configuration lowerAndUpper : lowerAndUppers) { + PartialRow lower = schema.newPartialRow(); + lower.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.LOWER, Kudu11xWriterErrorcode.REQUIRED_VALUE)); + PartialRow upper = schema.newPartialRow(); + upper.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.UPPER, Kudu11xWriterErrorcode.REQUIRED_VALUE)); + tableOptions.addRangePartition(lower, upper); + } + } + LOG.info("Set range partition complete!"); + } + + // 设置Hash分区 + Configuration hash = partition.getConfiguration(Key.HASH); + if (hash != null) { + List hashColums = hash.getList(Key.COLUMN, String.class); + Integer hashPartitionNum = configuration.getInt(Key.HASH_NUM, 3); + tableOptions.addHashPartitions(hashColums, hashPartitionNum); + LOG.info("Set hash partition complete!"); + } + } + + public static void validateParameter(Configuration configuration) { + LOG.info("Start validating parameters!"); + configuration.getNecessaryValue(Key.KUDU_CONFIG, Kudu11xWriterErrorcode.REQUIRED_VALUE); + configuration.getNecessaryValue(Key.TABLE, Kudu11xWriterErrorcode.REQUIRED_VALUE); + String encoding = configuration.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); + if (!Charset.isSupported(encoding)) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, + String.format("Encoding is not supported:[%s] .", encoding)); + } + configuration.set(Key.ENCODING, encoding); + String insertMode = configuration.getString(Key.INSERT_MODE, Constant.INSERT_MODE); + try { + InsertModeType.getByTypeName(insertMode); + } catch (Exception e) { + insertMode = Constant.INSERT_MODE; + } + configuration.set(Key.INSERT_MODE, insertMode); + + Long writeBufferSize = configuration.getLong(Key.WRITE_BATCH_SIZE, Constant.DEFAULT_WRITE_BATCH_SIZE); + configuration.set(Key.WRITE_BATCH_SIZE, writeBufferSize); + + Long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE, Constant.DEFAULT_MUTATION_BUFFER_SPACE); + configuration.set(Key.MUTATION_BUFFER_SPACE, mutationBufferSpace); + + Boolean isSkipFail = configuration.getBool(Key.SKIP_FAIL, false); + configuration.set(Key.SKIP_FAIL, isSkipFail); + List columns = configuration.getListConfiguration(Key.COLUMN); + List goalColumns = new ArrayList<>(); + //column参数验证 + int indexFlag = 0; + boolean primaryKey = true; + int primaryKeyFlag = 0; + for (int i = 0; i < columns.size(); i++) { + Configuration col = columns.get(i); + String index = col.getString(Key.INDEX); + if (index == null) { + index = String.valueOf(i); + col.set(Key.INDEX, index); + indexFlag++; + } + if(primaryKey != col.getBool(Key.PRIMARYKEY, false)){ + primaryKey = col.getBool(Key.PRIMARYKEY, false); + primaryKeyFlag++; + } + goalColumns.add(col); + } + if (indexFlag != 0 && indexFlag != columns.size()) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, + "\"index\" either has values for all of them, or all of them are null!"); + } + if (primaryKeyFlag > 1){ + throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, + "\"primaryKey\" must be written in the front!"); + } + configuration.set(Key.COLUMN, goalColumns); +// LOG.info("------------------------------------"); +// LOG.info(configuration.toString()); +// LOG.info("------------------------------------"); + LOG.info("validate parameter complete!"); + } + + public static void truncateTable(Configuration configuration) { + String kuduConfig = configuration.getString(Key.KUDU_CONFIG); + String userTable = configuration.getString(Key.TABLE); + LOG.info(String.format("Because you have configured truncate is true,KuduWriter begins to truncate table %s .", userTable)); + KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); + + try { + if (kuduClient.tableExists(userTable)) { + kuduClient.deleteTable(userTable); + LOG.info(String.format("table %s has been deleted.", userTable)); + } + } catch (KuduException e) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.DELETE_KUDU_ERROR, e); + } finally { + Kudu11xHelper.closeClient(kuduClient); + } + + } +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriter.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriter.java new file mode 100644 index 00000000..83620f43 --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriter.java @@ -0,0 +1,85 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; + +/** + * @author daizihao + * @create 2020-08-27 16:58 + **/ +public class Kudu11xWriter extends Writer { + public static class Job extends Writer.Job{ + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + private Configuration config = null; + @Override + public void init() { + this.config = this.getPluginJobConf(); + Kudu11xHelper.validateParameter(this.config); + } + + @Override + public void prepare() { + Boolean truncate = config.getBool(Key.TRUNCATE,false); + if(truncate){ + Kudu11xHelper.truncateTable(this.config); + } + + if (!Kudu11xHelper.isTableExists(config)){ + Kudu11xHelper.createTable(config); + } + } + + @Override + public List split(int i) { + List splitResultConfigs = new ArrayList<>(); + for (int j = 0; j < i; j++) { + splitResultConfigs.add(config.clone()); + } + + return splitResultConfigs; + } + + + + @Override + public void destroy() { + + } + } + + public static class Task extends Writer.Task{ + private Configuration taskConfig; + private KuduWriterTask kuduTaskProxy; + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + @Override + public void init() { + this.taskConfig = super.getPluginJobConf(); + this.kuduTaskProxy = new KuduWriterTask(this.taskConfig); + } + @Override + public void startWrite(RecordReceiver lineReceiver) { + this.kuduTaskProxy.startWriter(lineReceiver,super.getTaskPluginCollector()); + } + + + @Override + public void destroy() { + try { + if (kuduTaskProxy.session != null) { + kuduTaskProxy.session.close(); + } + }catch (Exception e){ + LOG.warn("The \"kudu session\" was not stopped gracefully !"); + } + Kudu11xHelper.closeClient(kuduTaskProxy.kuduClient); + + } + } +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriterErrorcode.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriterErrorcode.java new file mode 100644 index 00000000..d46bcea3 --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriterErrorcode.java @@ -0,0 +1,39 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.spi.ErrorCode; + +/** + * @author daizihao + * @create 2020-08-27 19:25 + **/ +public enum Kudu11xWriterErrorcode implements ErrorCode { + REQUIRED_VALUE("Kuduwriter-00", "You are missing a required parameter value."), + ILLEGAL_VALUE("Kuduwriter-01", "You fill in the parameter values are not legitimate."), + GET_KUDU_CONNECTION_ERROR("Kuduwriter-02", "Error getting Kudu connection."), + GET_KUDU_TABLE_ERROR("Kuduwriter-03", "Error getting Kudu table."), + CLOSE_KUDU_CONNECTION_ERROR("Kuduwriter-04", "Error closing Kudu connection."), + CLOSE_KUDU_SESSION_ERROR("Kuduwriter-06", "Error closing Kudu table connection."), + PUT_KUDU_ERROR("Kuduwriter-07", "IO exception occurred when writing to Kudu."), + DELETE_KUDU_ERROR("Kuduwriter-08", "An exception occurred while delete Kudu table."), + GREATE_KUDU_TABLE_ERROR("Kuduwriter-09", "Error creating Kudu table."), + PARAMETER_NUM_ERROR("Kuduwriter-10","The number of parameters does not match.") + ; + + private final String code; + private final String description; + + + Kudu11xWriterErrorcode(String code, String description) { + this.code = code; + this.description = description; + } + @Override + public String getCode() { + return code; + } + + @Override + public String getDescription() { + return description; + } +} diff --git a/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java new file mode 100644 index 00000000..bff3509f --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java @@ -0,0 +1,216 @@ +package com.q1.datax.plugin.writer.kudu11xwriter; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.common.util.RetryUtil; +import org.apache.commons.lang3.StringUtils; +import org.apache.kudu.client.*; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.*; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; +import java.util.concurrent.atomic.LongAdder; + +/** + * @author daizihao + * @create 2020-08-31 16:55 + **/ +public class KuduWriterTask { + private final static Logger LOG = LoggerFactory.getLogger(KuduWriterTask.class); + + private List columns; + private List> columnLists; + private ThreadPoolExecutor pool; + private String encoding; + private Double batchSize; + private Boolean isUpsert; + private Boolean isSkipFail; + public KuduClient kuduClient; + public KuduSession session; + private KuduTable table; + private Integer primaryKeyIndexUntil; + + private final Object lock = new Object(); + + public KuduWriterTask(Configuration configuration) { + columns = configuration.getListConfiguration(Key.COLUMN); + columnLists = Kudu11xHelper.getColumnLists(columns); + pool = Kudu11xHelper.createRowAddThreadPool(columnLists.size()); + + this.encoding = configuration.getString(Key.ENCODING); + this.batchSize = configuration.getDouble(Key.WRITE_BATCH_SIZE); + this.isUpsert = !configuration.getString(Key.INSERT_MODE).equalsIgnoreCase("insert"); + this.isSkipFail = configuration.getBool(Key.SKIP_FAIL); + long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE); + + this.kuduClient = Kudu11xHelper.getKuduClient(configuration.getString(Key.KUDU_CONFIG)); + this.table = Kudu11xHelper.getKuduTable(configuration, kuduClient); + this.session = kuduClient.newSession(); + session.setFlushMode(SessionConfiguration.FlushMode.MANUAL_FLUSH); + session.setMutationBufferSpace((int) mutationBufferSpace); + this.primaryKeyIndexUntil = Kudu11xHelper.getPrimaryKeyIndexUntil(columns); +// tableName = configuration.getString(Key.TABLE); + } + + public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector) { + LOG.info("kuduwriter began to write!"); + Record record; + LongAdder counter = new LongAdder(); + try { + while ((record = lineReceiver.getFromReader()) != null) { + if (record.getColumnNumber() != columns.size()) { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.PARAMETER_NUM_ERROR, " number of record fields:" + record.getColumnNumber() + " number of configuration fields:" + columns.size()); + } + boolean isDirtyRecord = false; + + + for (int i = 0; i < primaryKeyIndexUntil && !isDirtyRecord; i++) { + Column column = record.getColumn(i); + isDirtyRecord = StringUtils.isBlank(column.asString()); + } + + if (isDirtyRecord) { + taskPluginCollector.collectDirtyRecord(record, "primarykey field is null"); + continue; + } + + CountDownLatch countDownLatch = new CountDownLatch(columnLists.size()); + Upsert upsert = table.newUpsert(); + Insert insert = table.newInsert(); + PartialRow row; + if (isUpsert) { + //覆盖更新 + row = upsert.getRow(); + } else { + //增量更新 + row = insert.getRow(); + } + List> futures = new ArrayList<>(); + for (List columnList : columnLists) { + Record finalRecord = record; + Future future = pool.submit(() -> { + try { + for (Configuration col : columnList) { + String name = col.getString(Key.NAME); + ColumnType type = ColumnType.getByTypeName(col.getString(Key.TYPE, "string")); + Column column = finalRecord.getColumn(col.getInt(Key.INDEX)); + String rawData = column.asString(); + if (rawData == null) { + synchronized (lock) { + row.setNull(name); + } + continue; + } + switch (type) { + case INT: + synchronized (lock) { + row.addInt(name, Integer.parseInt(rawData)); + } + break; + case LONG: + case BIGINT: + synchronized (lock) { + row.addLong(name, Long.parseLong(rawData)); + } + break; + case FLOAT: + synchronized (lock) { + row.addFloat(name, Float.parseFloat(rawData)); + } + break; + case DOUBLE: + synchronized (lock) { + row.addDouble(name, Double.parseDouble(rawData)); + } + break; + case BOOLEAN: + synchronized (lock) { + row.addBoolean(name, Boolean.getBoolean(rawData)); + } + break; + case STRING: + default: + synchronized (lock) { + row.addString(name, rawData); + } + } + } + } finally { + countDownLatch.countDown(); + } + }); + futures.add(future); + } + countDownLatch.await(); + for (Future future : futures) { + future.get(); + } + try { + RetryUtil.executeWithRetry(() -> { + if (isUpsert) { + //覆盖更新 + session.apply(upsert); + } else { + //增量更新 + session.apply(insert); + } + //flush + if (counter.longValue() > (batchSize * 0.8)) { + session.flush(); + counter.reset(); + } + counter.increment(); + return true; + }, 5, 500L, true); + + } catch (Exception e) { + LOG.error("Record Write Failure!", e); + if (isSkipFail) { + LOG.warn("Since you have configured \"skipFail\" to be true, this record will be skipped !"); + taskPluginCollector.collectDirtyRecord(record, e.getMessage()); + } else { + throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); + } + } + } + } catch (Exception e) { + LOG.error("write failure! the task will exit!"); + throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); + } + AtomicInteger i = new AtomicInteger(10); + try { + while (i.get() > 0) { + if (session.hasPendingOperations()) { + session.flush(); + break; + } + Thread.sleep(20L); + i.decrementAndGet(); + } + } catch (Exception e) { + LOG.info("Waiting for data to be written to kudu...... " + i + "s"); + + } finally { + try { + pool.shutdown(); + //强制刷写 + session.flush(); + } catch (KuduException e) { + LOG.error("kuduwriter flush error! The results may be incomplete!"); + throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); + } + } + + } + + +} diff --git a/kuduwriter/src/main/java/com/q1/kudu/conf/KuduConfig.java b/kuduwriter/src/main/java/com/q1/kudu/conf/KuduConfig.java new file mode 100644 index 00000000..f1499a0f --- /dev/null +++ b/kuduwriter/src/main/java/com/q1/kudu/conf/KuduConfig.java @@ -0,0 +1,9 @@ +package com.q1.kudu.conf; + +/** + * @author daizihao + * @create 2020-09-16 11:39 + **/ +public class KuduConfig { + +} diff --git a/kuduwriter/src/main/resources/plugin.json b/kuduwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..f60dc825 --- /dev/null +++ b/kuduwriter/src/main/resources/plugin.json @@ -0,0 +1,7 @@ +{ + "name": "kuduwriter", + "class": "com.q1.datax.plugin.writer.kudu11xwriter.Kudu11xWriter", + "description": "use put: prod. mechanism: use kudu java api put data.", + "developer": "com.q1.daizihao" +} + diff --git a/kuduwriter/src/main/resources/plugin_job_template.json b/kuduwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..3edc6c39 --- /dev/null +++ b/kuduwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,59 @@ +{ + "name": "kuduwriter", + "parameter": { + "kuduConfig": { + "kudu.master_addresses": "***", + "timeout": 60000, + "sessionTimeout": 60000 + + }, + "table": "", + "replicaCount": 3, + "truncate": false, + "writeMode": "upsert", + "partition": { + "range": { + "column1": [ + { + "lower": "2020-08-25", + "upper": "2020-08-26" + }, + { + "lower": "2020-08-26", + "upper": "2020-08-27" + }, + { + "lower": "2020-08-27", + "upper": "2020-08-28" + } + ] + }, + "hash": { + "column": [ + "column1" + ], + "number": 3 + } + }, + "column": [ + { + "index": 0, + "name": "c1", + "type": "string", + "primaryKey": true + }, + { + "index": 1, + "name": "c2", + "type": "string", + "compress": "DEFAULT_COMPRESSION", + "encoding": "AUTO_ENCODING", + "comment": "注解xxxx" + } + ], + "batchSize": 1024, + "bufferSize": 2048, + "skipFail": false, + "encoding": "UTF-8" + } +} \ No newline at end of file diff --git a/kuduwriter/src/test/java/com/dai/test.java b/kuduwriter/src/test/java/com/dai/test.java new file mode 100644 index 00000000..5fd17beb --- /dev/null +++ b/kuduwriter/src/test/java/com/dai/test.java @@ -0,0 +1,40 @@ +package com.dai; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.RetryUtil; +import com.q1.datax.plugin.writer.kudu11xwriter.*; +import static org.apache.kudu.client.AsyncKuduClient.LOG; + +/** + * @author daizihao + * @create 2020-08-28 11:03 + **/ +public class test { + static boolean isSkipFail; + + + public static void main(String[] args) { + try { + while (true) { + try { + RetryUtil.executeWithRetry(()->{ + throw new RuntimeException(); + },5,1000L,true); + + } catch (Exception e) { + LOG.error("Data write failed!", e); + System.out.println(isSkipFail); + if (isSkipFail) { + LOG.warn("Because you have configured skipFail is true,this data will be skipped!"); + }else { + System.out.println("异常抛出"); + throw e; + } + } + } + } catch (Exception e) { + LOG.error("write failed! the task will exit!"); + throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e); + } + } +} diff --git a/mongodbreader/doc/mongodbreader.md b/mongodbreader/doc/mongodbreader.md index 3535d5b7..b61493e6 100644 --- a/mongodbreader/doc/mongodbreader.md +++ b/mongodbreader/doc/mongodbreader.md @@ -8,7 +8,7 @@ MongoDBReader 插件利用 MongoDB 的java客户端MongoClient进行MongoDB的 MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的JOB程序按照指定的规则对MongoDB中的数据进行分片,并行读取,然后将MongoDB支持的类型通过逐一判断转换成Datax支持的类型。 #### 3 功能说明 -* 该示例从ODPS读一份数据到MongoDB。 +* 该示例从MongoDB读一份数据到ODPS。 { "job": { @@ -132,6 +132,7 @@ MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的J * name:Column的名字。【必填】 * type:Column的类型。【选填】 * splitter:因为MongoDB支持数组类型,但是Datax框架本身不支持数组类型,所以mongoDB读出来的数组类型要通过这个分隔符合并成字符串。【选填】 +* query: MongoDB的额外查询条件。【选填】 #### 5 类型转换 @@ -146,4 +147,4 @@ MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的J #### 6 性能报告 -#### 7 测试报告 \ No newline at end of file +#### 7 测试报告 diff --git a/mongodbwriter/doc/mongodbwriter.md b/mongodbwriter/doc/mongodbwriter.md index e30008db..74de8a0a 100644 --- a/mongodbwriter/doc/mongodbwriter.md +++ b/mongodbwriter/doc/mongodbwriter.md @@ -139,7 +139,7 @@ MongoDBWriter通过Datax框架获取Reader生成的数据,然后将Datax支持 * splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组时,才使用这个参数,通过这个参数指定的分隔符,将字符串分隔存储到MongoDB的数组中。【选填】 * upsertInfo:指定了传输数据时更新的信息。【选填】 * isUpsert:当设置为true时,表示针对相同的upsertKey做更新操作。【选填】 -* upsertKey:upsertKey指定了没行记录的业务主键。用来做更新时使用。【选填】 +* upsertKey:upsertKey指定了每行记录的业务主键。用来做更新时使用。【选填】 #### 5 类型转换 @@ -154,4 +154,4 @@ MongoDBWriter通过Datax框架获取Reader生成的数据,然后将Datax支持 #### 6 性能报告 -#### 7 测试报告 \ No newline at end of file +#### 7 测试报告 diff --git a/mysqlreader/pom.xml b/mysqlreader/pom.xml index 08183272..621326ae 100755 --- a/mysqlreader/pom.xml +++ b/mysqlreader/pom.xml @@ -40,7 +40,7 @@ mysql mysql-connector-java - 5.1.34 + ${mysql.driver.version} diff --git a/mysqlwriter/doc/mysqlwriter.md b/mysqlwriter/doc/mysqlwriter.md index f6abd242..5368775c 100644 --- a/mysqlwriter/doc/mysqlwriter.md +++ b/mysqlwriter/doc/mysqlwriter.md @@ -147,7 +147,7 @@ MysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你 * **column** - * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["*"]。 + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 **column配置项必须指定,不能留空!** diff --git a/mysqlwriter/pom.xml b/mysqlwriter/pom.xml index 11618022..1c3891f5 100755 --- a/mysqlwriter/pom.xml +++ b/mysqlwriter/pom.xml @@ -40,7 +40,7 @@ mysql mysql-connector-java - 5.1.34 + ${mysql.driver.version}
diff --git a/oceanbasev10reader/pom.xml b/oceanbasev10reader/pom.xml new file mode 100644 index 00000000..49477241 --- /dev/null +++ b/oceanbasev10reader/pom.xml @@ -0,0 +1,97 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + oceanbasev10reader + com.alibaba.datax + 0.0.1-SNAPSHOT + jar + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + mysql + mysql-connector-java + 5.1.40 + + + log4j + log4j + 1.2.16 + + + junit + junit + 4.11 + test + + + + + + + src/main/java + + **/*.properties + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/oceanbasev10reader/src/main/assembly/package.xml b/oceanbasev10reader/src/main/assembly/package.xml new file mode 100644 index 00000000..c1db32a9 --- /dev/null +++ b/oceanbasev10reader/src/main/assembly/package.xml @@ -0,0 +1,42 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/reader/oceanbasev10reader + + + target/ + + oceanbasev10reader-0.0.1-SNAPSHOT.jar + + plugin/reader/oceanbasev10reader + + + src/main/libs/ + + *.jar + + plugin/reader/oceanbasev10reader/libs + + + + + + false + plugin/reader/oceanbasev10reader/libs + runtime + + + diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/Config.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/Config.java new file mode 100644 index 00000000..ca803c49 --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/Config.java @@ -0,0 +1,16 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader; + +public interface Config { + // queryTimeoutSecond + String QUERY_TIMEOUT_SECOND = "memstoreCheckIntervalSecond"; + + int DEFAULT_QUERY_TIMEOUT_SECOND = 60 * 60 * 48;// 2天 + + // readBatchSize + String READ_BATCH_SIZE = "readBatchSize"; + + int DEFAULT_READ_BATCH_SIZE = 100000;// 10万 + + String RETRY_LIMIT = "retryLimit"; + int DEFAULT_RETRY_LIMIT = 10; +} diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/OceanBaseReader.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/OceanBaseReader.java new file mode 100644 index 00000000..0a4934a1 --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/OceanBaseReader.java @@ -0,0 +1,127 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader; + +import java.sql.Connection; +import java.util.List; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.spi.Reader; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.Constant; +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ReaderJob; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ReaderTask; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; + +public class OceanBaseReader extends Reader { + + public static class Job extends Reader.Job { + private Configuration originalConfig = null; + private ReaderJob readerJob; + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + Integer userConfigedFetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE); + if (userConfigedFetchSize != null) { + LOG.warn("The [fetchSize] is not recognized, please use readBatchSize instead."); + } + + this.originalConfig.set(Constant.FETCH_SIZE, Integer.MIN_VALUE); + + setDatabaseType(originalConfig); + + this.readerJob = new ReaderJob(); + this.readerJob.init(this.originalConfig); + } + + @Override + public void preCheck() { + init(); + this.readerJob.preCheck(this.originalConfig, ObReaderUtils.DATABASE_TYPE); + + } + + @Override + public List split(int adviceNumber) { + return this.readerJob.split(this.originalConfig, adviceNumber); + } + + @Override + public void post() { + this.readerJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.readerJob.destroy(this.originalConfig); + } + + private void setDatabaseType(Configuration config) { + String username = config.getString(Key.USERNAME); + String password = config.getString(Key.PASSWORD); + List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); + Configuration connConf = Configuration.from(conns.get(0).toString()); + List jdbcUrls = connConf.getList(Key.JDBC_URL, String.class); + String jdbcUrl = jdbcUrls.get(0); + if(jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { + String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); + if (ss.length != 3) { + LOG.warn("unrecognized jdbc url: " + jdbcUrl); + return; + } + username = ss[1].trim() + ":" + username; + jdbcUrl = ss[2]; + } + // Use ob-client to get compatible mode. + try { + String obJdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:"); + Connection conn = DBUtil.getConnection(DataBaseType.OceanBase, obJdbcUrl, username, password); + String compatibleMode = ObReaderUtils.getCompatibleMode(conn); + if (ObReaderUtils.isOracleMode(compatibleMode)) { + ObReaderUtils.DATABASE_TYPE = DataBaseType.OceanBase; + } + } catch (Exception e){ + LOG.warn("error in get compatible mode, using mysql as default: " + e.getMessage()); + } + } + } + + public static class Task extends Reader.Task { + private Configuration readerSliceConfig; + private ReaderTask commonRdbmsReaderTask; + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + + @Override + public void init() { + this.readerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsReaderTask = new ReaderTask(super.getTaskGroupId(), super.getTaskId()); + this.commonRdbmsReaderTask.init(this.readerSliceConfig); + + } + + @Override + public void startRead(RecordSender recordSender) { + int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); + this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), + fetchSize); + } + + @Override + public void post() { + this.commonRdbmsReaderTask.post(this.readerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); + } + } + +} diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java new file mode 100644 index 00000000..c56155f6 --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java @@ -0,0 +1,40 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; + +import java.util.List; + +import com.alibaba.datax.common.constant.CommonConstant; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; + +public class ReaderJob extends CommonRdbmsReader.Job { + + public ReaderJob() { + super(ObReaderUtils.DATABASE_TYPE); + } + + @Override + public List split(Configuration originalConfig, int adviceNumber) { + List list = super.split(originalConfig, adviceNumber); + for (Configuration config : list) { + String jdbcUrl = config.getString(Key.JDBC_URL); + String obRegionName = getObRegionName(jdbcUrl); + config.set(CommonConstant.LOAD_BALANCE_RESOURCE_MARK, obRegionName); + } + return list; + } + + private String getObRegionName(String jdbcUrl) { + if (jdbcUrl.startsWith(Constant.OB10_SPLIT_STRING)) { + String[] ss = jdbcUrl.split(Constant.OB10_SPLIT_STRING_PATTERN); + if (ss.length >= 2) { + String tenant = ss[1].trim(); + String[] sss = tenant.split(":"); + return sss[0]; + } + } + return null; + } +} diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderTask.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderTask.java new file mode 100644 index 00000000..073bb3cb --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderTask.java @@ -0,0 +1,301 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; + +import java.sql.*; +import java.util.ArrayList; +import java.util.List; + +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.plugin.RecordSender; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.statistics.PerfRecord; +import com.alibaba.datax.common.statistics.PerfTrace; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.reader.Constant; +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.util.RdbmsException; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.Config; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; +import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.TaskContext; + +public class ReaderTask extends CommonRdbmsReader.Task { + private static final Logger LOG = LoggerFactory.getLogger(ReaderTask.class); + private int taskGroupId = -1; + private int taskId = -1; + + private String username; + private String password; + private String jdbcUrl; + private String mandatoryEncoding; + private int queryTimeoutSeconds;// 查询超时 默认48小时 + private int readBatchSize; + private int retryLimit = 0; + private String compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_MYSQL; + private boolean reuseConn = false; + + public ReaderTask(int taskGroupId, int taskId) { + super(ObReaderUtils.DATABASE_TYPE, taskGroupId, taskId); + this.taskGroupId = taskGroupId; + this.taskId = taskId; + } + + public void init(Configuration readerSliceConfig) { + /* for database connection */ + username = readerSliceConfig.getString(Key.USERNAME); + password = readerSliceConfig.getString(Key.PASSWORD); + jdbcUrl = readerSliceConfig.getString(Key.JDBC_URL); + queryTimeoutSeconds = readerSliceConfig.getInt(Config.QUERY_TIMEOUT_SECOND, + Config.DEFAULT_QUERY_TIMEOUT_SECOND); + // ob10的处理 + if(jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { + String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); + if (ss.length == 3) { + LOG.info("this is ob1_0 jdbc url."); + username = ss[1].trim() + ":" + username; + jdbcUrl = ss[2]; + } + } + + if (ObReaderUtils.DATABASE_TYPE == DataBaseType.OceanBase) { + jdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:") + "&socketTimeout=1800000&connectTimeout=60000"; //socketTimeout 半个小时 + compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE; + } else { + jdbcUrl = jdbcUrl + "&socketTimeout=1800000&connectTimeout=60000"; //socketTimeout 半个小时 + } + LOG.info("this is ob1_0 jdbc url. user=" + username + " :url=" + jdbcUrl); + mandatoryEncoding = readerSliceConfig.getString(Key.MANDATORY_ENCODING, ""); + retryLimit = readerSliceConfig.getInt(Config.RETRY_LIMIT, Config.DEFAULT_RETRY_LIMIT); + LOG.info("retryLimit: "+ retryLimit); + } + + private void buildSavePoint(TaskContext context) { + if (!ObReaderUtils.isUserSavePointValid(context)) { + LOG.info("user save point is not valid, set to null."); + context.setUserSavePoint(null); + } + } + + /** + * + * 如果isTableMode && table有PK + *

+ * 则支持断点续读 (若pk不在原始的columns中,则追加到尾部,但不传给下游) + *

+ * 否则,则使用旧模式 + */ + @Override + public void startRead(Configuration readerSliceConfig, RecordSender recordSender, + TaskPluginCollector taskPluginCollector, int fetchSize) { + String querySql = readerSliceConfig.getString(Key.QUERY_SQL); + String table = readerSliceConfig.getString(Key.TABLE); + PerfTrace.getInstance().addTaskDetails(taskId, table + "," + jdbcUrl); + List columns = readerSliceConfig.getList(Key.COLUMN_LIST, String.class); + String where = readerSliceConfig.getString(Key.WHERE); + boolean weakRead = readerSliceConfig.getBool(Key.WEAK_READ, true); // default true, using weak read + String userSavePoint = readerSliceConfig.getString(Key.SAVE_POINT, null); + reuseConn = readerSliceConfig.getBool(Key.REUSE_CONN, false); + String partitionName = readerSliceConfig.getString(Key.PARTITION_NAME, null); + // 从配置文件中取readBatchSize,若无则用默认值 + readBatchSize = readerSliceConfig.getInt(Config.READ_BATCH_SIZE, Config.DEFAULT_READ_BATCH_SIZE); + // 不能少于1万 + if (readBatchSize < 10000) { + readBatchSize = 10000; + } + TaskContext context = new TaskContext(table, columns, where, fetchSize); + context.setQuerySql(querySql); + context.setWeakRead(weakRead); + context.setCompatibleMode(compatibleMode); + if (partitionName != null) { + context.setPartitionName(partitionName); + } + // Add the user save point into the context + context.setUserSavePoint(userSavePoint); + PerfRecord allPerf = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.RESULT_NEXT_ALL); + allPerf.start(); + boolean isTableMode = readerSliceConfig.getBool(Constant.IS_TABLE_MODE); + try { + startRead0(isTableMode, context, recordSender, taskPluginCollector); + } finally { + ObReaderUtils.close(null, null, context.getConn()); + } + allPerf.end(context.getCost()); + // 目前大盘是依赖这个打印,而之前这个Finish read record是包含了sql查询和result next的全部时间 + LOG.info("finished read record by Sql: [{}\n] {}.", context.getQuerySql(), jdbcUrl); + } + + private void startRead0(boolean isTableMode, TaskContext context, RecordSender recordSender, + TaskPluginCollector taskPluginCollector) { + // 不是table模式 直接使用原来的做法 + if (!isTableMode) { + doRead(recordSender, taskPluginCollector, context); + return; + } + // check primary key index + Connection conn = DBUtil.getConnection(ObReaderUtils.DATABASE_TYPE, jdbcUrl, username, password); + ObReaderUtils.initConn4Reader(conn, queryTimeoutSeconds); + context.setConn(conn); + try { + ObReaderUtils.initIndex(conn, context); + ObReaderUtils.matchPkIndexs(conn, context); + } catch (Throwable e) { + LOG.warn("fetch PkIndexs fail,table=" + context.getTable(), e); + } + // 如果不是table 且 pk不存在 则仍然使用原来的做法 + if (context.getPkIndexs() == null) { + doRead(recordSender, taskPluginCollector, context); + return; + } + + // setup the user defined save point + buildSavePoint(context); + + // 从这里开始就是 断点续读功能 + // while(true) { + // 正常读 (需 order by pk asc) + // 如果遇到失败,分两种情况: + // a)已读出记录,则开始走增量读逻辑 + // b)未读出记录,则走正常读逻辑(仍然需要order by pk asc) + // 正常结束 则 break + // } + context.setReadBatchSize(readBatchSize); + String getFirstQuerySql = ObReaderUtils.buildFirstQuerySql(context); + String appendQuerySql = ObReaderUtils.buildAppendQuerySql(conn, context); + LOG.warn("start table scan key : {}", context.getIndexName() == null ? "primary" : context.getIndexName()); + context.setQuerySql(getFirstQuerySql); + boolean firstQuery = true; + // 原来打算firstQuery时 limit 1 减少 + // 后来经过对比发现其实是多余的,因为: + // 1.假如走gmt_modified辅助索引,则直接索引扫描 不需要topN的order by + // 2.假如不走辅助索引,而是pk table scan,则减少排序规模并没有好处,因为下一次仍然要排序 + // 减少这个多余的优化tip 可以让代码更易读 + int retryCount = 0; + while (true) { + try { + boolean finish = doRead(recordSender, taskPluginCollector, context); + if (finish) { + break; + } + } catch (Throwable e) { + if (retryLimit == ++retryCount) { + throw RdbmsException.asQueryException(ObReaderUtils.DATABASE_TYPE, new Exception(e), + context.getQuerySql(), context.getTable(), username); + } + LOG.error("read fail, retry count " + retryCount + ", sleep 60 second, save point:" + + context.getSavePoint() + ", error: "+ e.getMessage()); + ObReaderUtils.sleep(60000); // sleep 10s + } + // 假如原来的查询有查出数据,则改成增量查询 + if (firstQuery && context.getPkIndexs() != null && context.getSavePoint() != null) { + context.setQuerySql(appendQuerySql); + firstQuery = false; + } + } + DBUtil.closeDBResources(null, context.getConn()); + } + + private boolean isConnectionAlive(Connection conn) { + if (conn == null) { + return false; + } + Statement stmt = null; + ResultSet rs = null; + String sql = "select 1" + (compatibleMode == ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE ? " from dual" : ""); + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + rs.next(); + } catch (Exception ex) { + LOG.info("connection is not alive: " + ex.getMessage()); + return false; + } finally { + DBUtil.closeDBResources(rs, stmt, null); + } + return true; + } + + private boolean doRead(RecordSender recordSender, TaskPluginCollector taskPluginCollector, TaskContext context) { + LOG.info("exe sql: {}", context.getQuerySql()); + Connection conn = context.getConn(); + if (reuseConn && isConnectionAlive(conn)) { + LOG.info("connection is alive, will reuse this connection."); + } else { + LOG.info("Create new connection for reader."); + conn = DBUtil.getConnection(ObReaderUtils.DATABASE_TYPE, jdbcUrl, username, password); + ObReaderUtils.initConn4Reader(conn, queryTimeoutSeconds); + context.setConn(conn); + } + PreparedStatement ps = null; + ResultSet rs = null; + PerfRecord perfRecord = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.SQL_QUERY); + perfRecord.start(); + try { + ps = conn.prepareStatement(context.getQuerySql(), + ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); + if (context.getPkIndexs() != null && context.getSavePoint() != null) { + Record savePoint = context.getSavePoint(); + List point = ObReaderUtils.buildPoint(savePoint, context.getPkIndexs()); + ObReaderUtils.binding(ps, point); + if (LOG.isWarnEnabled()) { + List pointForLog = new ArrayList(); + for (Column c : point) { + pointForLog.add(c.asString()); + } + LOG.warn("{} save point : {}", context.getTable(), StringUtils.join(pointForLog, ',')); + } + } + // 打开流式接口 + ps.setFetchSize(context.getFetchSize()); + rs = ps.executeQuery(); + ResultSetMetaData metaData = rs.getMetaData(); + int columnNumber = metaData.getColumnCount(); + long lastTime = System.nanoTime(); + int count = 0; + for (; rs.next(); count++) { + context.addCost(System.nanoTime() - lastTime); + Record row = buildRecord(recordSender, rs, metaData, columnNumber, mandatoryEncoding, + taskPluginCollector); + // // 如果第一个record重复了,则不需要发送 + // if (count == 0 && + // ObReaderUtils.isPkEquals(context.getSavePoint(), row, + // context.getPkIndexs())) { + // continue; + // } + // 如果是querySql + if (context.getTransferColumnNumber() == -1 + || row.getColumnNumber() == context.getTransferColumnNumber()) { + recordSender.sendToWriter(row); + } else { + Record newRow = recordSender.createRecord(); + for (int i = 0; i < context.getTransferColumnNumber(); i++) { + newRow.addColumn(row.getColumn(i)); + } + recordSender.sendToWriter(newRow); + } + context.setSavePoint(row); + lastTime = System.nanoTime(); + } + LOG.info("end of sql: {}, " + count + "rows are read.", context.getQuerySql()); + return context.getReadBatchSize() <= 0 || count < readBatchSize; + } catch (Exception e) { + ObReaderUtils.close(null, null, context.getConn()); + context.setConn(null); + LOG.error("reader data fail", e); + throw RdbmsException.asQueryException(ObReaderUtils.DATABASE_TYPE, e, context.getQuerySql(), + context.getTable(), username); + } finally { + perfRecord.end(); + if (reuseConn) { + ObReaderUtils.close(rs, ps, null); + } else { + ObReaderUtils.close(rs, ps, conn); + } + } + } +} diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java new file mode 100644 index 00000000..2290fb43 --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java @@ -0,0 +1,697 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.sql.Statement; +import java.sql.Timestamp; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; +import java.util.TreeMap; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.Set; +import java.util.TreeSet; + +import org.apache.commons.lang3.ArrayUtils; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.BoolColumn; +import com.alibaba.datax.common.element.BytesColumn; +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.common.element.DateColumn; +import com.alibaba.datax.common.element.DoubleColumn; +import com.alibaba.datax.common.element.LongColumn; +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.element.StringColumn; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.druid.sql.SQLUtils; +import com.alibaba.druid.sql.ast.SQLExpr; +import com.alibaba.druid.sql.ast.expr.SQLBinaryOpExpr; +import com.alibaba.druid.sql.ast.expr.SQLBinaryOperator; + +public class ObReaderUtils { + + private static final Logger LOG = LoggerFactory.getLogger(ObReaderUtils.class); + + final static public String OB_COMPATIBLE_MODE = "obCompatibilityMode"; + final static public String OB_COMPATIBLE_MODE_ORACLE = "ORACLE"; + final static public String OB_COMPATIBLE_MODE_MYSQL = "MYSQL"; + + public static DataBaseType DATABASE_TYPE = DataBaseType.MySql; + + public static void initConn4Reader(Connection conn, long queryTimeoutSeconds) { + String setQueryTimeout = "set ob_query_timeout=" + (queryTimeoutSeconds * 1000 * 1000L); + String setTrxTimeout = "set ob_trx_timeout=" + ((queryTimeoutSeconds + 5) * 1000 * 1000L); + Statement stmt = null; + try { + conn.setAutoCommit(true); + stmt = conn.createStatement(); + stmt.execute(setQueryTimeout); + stmt.execute(setTrxTimeout); + LOG.warn("setAutoCommit=true;"+setQueryTimeout+";"+setTrxTimeout+";"); + } catch (Throwable e) { + LOG.warn("initConn4Reader fail", e); + } finally { + DBUtil.closeDBResources(stmt, null); + } + } + + public static void sleep(int ms) { + try { + Thread.sleep(ms); + } catch (InterruptedException e) { + } + } + + /** + * + * @param conn + * @param context + */ + public static void matchPkIndexs(Connection conn, TaskContext context) { + String[] pkColumns = getPkColumns(conn, context); + if (ArrayUtils.isEmpty(pkColumns)) { + LOG.warn("table=" + context.getTable() + " has no primary key"); + return; + } + List columns = context.getColumns(); + // 最后参与排序的索引列 + context.setPkColumns(pkColumns); + int[] pkIndexs = new int[pkColumns.length]; + for (int i = 0, n = pkColumns.length; i < n; i++) { + String pkc = pkColumns[i]; + int j = 0; + for (int k = columns.size(); j < k; j++) { + // 如果用户定义的 columns中 带有 ``,也不影响, + // 最多只是在select里多加了几列PK column + if (StringUtils.equalsIgnoreCase(pkc, columns.get(j))) { + pkIndexs[i] = j; + break; + } + } + // 到这里 说明主键列不在columns中,则主动追加到尾部 + if (j == columns.size()) { + columns.add(pkc); + pkIndexs[i] = columns.size() - 1; + } + } + context.setPkIndexs(pkIndexs); + } + + private static String[] getPkColumns(Connection conn, TaskContext context) { + String tableName = context.getTable(); + String sql = "show index from " + tableName + " where Key_name='PRIMARY'"; + if (isOracleMode(context.getCompatibleMode())) { + tableName = tableName.toUpperCase(); + sql = "SELECT cols.column_name Column_name "+ + "FROM all_constraints cons, all_cons_columns cols " + + "WHERE cols.table_name = '" + tableName+ "' AND cons.constraint_type = 'P' " + + "AND cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner"; + } + LOG.info("get primary key by sql: " + sql); + Statement ps = null; + ResultSet rs = null; + List realIndex = new ArrayList(); + realIndex.addAll(context.getSecondaryIndexColumns()); + try { + ps = conn.createStatement(); + rs = ps.executeQuery(sql); + while (rs.next()) { + String columnName = StringUtils.lowerCase(rs.getString("Column_name")); + if (!realIndex.contains(columnName)) { + realIndex.add(columnName); + } + } + String[] pks = new String[realIndex.size()]; + realIndex.toArray(pks); + return pks; + } catch (Throwable e) { + LOG.error("show index from table fail :" + sql, e); + } finally { + close(rs, ps, null); + } + return null; + } + + /** + * 首次查的SQL + * + * @param context + * @return + */ + public static String buildFirstQuerySql(TaskContext context) { + String userSavePoint = context.getUserSavePoint(); + String indexName = context.getIndexName(); + String sql = "select "; + boolean weakRead = context.getWeakRead(); + if (StringUtils.isNotEmpty(indexName)) { + String weakReadHint = weakRead ? "+READ_CONSISTENCY(WEAK)," : "+"; + sql += " /*" + weakReadHint + "index(" + context.getTable() + " " + indexName + ")*/ "; + } else if (weakRead){ + sql += " /*+READ_CONSISTENCY(WEAK)*/ "; + } + sql += StringUtils.join(context.getColumns(), ','); + sql += " from " + context.getTable(); + if (context.getPartitionName() != null) { + sql += String.format(" partition(%s) ", context.getPartitionName()); + } + if (StringUtils.isNotEmpty(context.getWhere())) { + sql += " where " + context.getWhere(); + } + + if (userSavePoint != null && userSavePoint.length() != 0) { + userSavePoint = userSavePoint.replace("=", ">"); + sql += (StringUtils.isNotEmpty(context.getWhere()) ? " and " : " where ") + userSavePoint; + } + + sql += " order by " + StringUtils.join(context.getPkColumns(), ',') + " asc"; + + // Using sub-query to apply rownum < readBatchSize since where has higher priority than order by + if (ObReaderUtils.isOracleMode(context.getCompatibleMode()) && context.getReadBatchSize() != -1) { + sql = String.format("select * from (%s) where rownum <= %d", sql, context.getReadBatchSize()); + } + + return sql; + } + + /** + * 增量查的SQL + * + * @param conn + * + * @param context + * @return sql + */ + public static String buildAppendQuerySql(Connection conn, TaskContext context) { + String indexName = context.getIndexName(); + boolean weakRead = context.getWeakRead(); + String sql = "select "; + if (StringUtils.isNotEmpty(indexName)) { + String weakReadHint = weakRead ? "+READ_CONSISTENCY(WEAK)," : "+"; + sql += " /*"+ weakReadHint + "index(" + context.getTable() + " " + indexName + ")*/ "; + } else if (weakRead){ + sql += " /*+READ_CONSISTENCY(WEAK)*/ "; + } + sql += StringUtils.join(context.getColumns(), ',') + " from " + context.getTable(); + + if (context.getPartitionName() != null) { + sql += String.format(" partition(%s) ", context.getPartitionName()); + } + + sql += " where "; + String append = "(" + StringUtils.join(context.getPkColumns(), ',') + ") > (" + + buildPlaceHolder(context.getPkColumns().length) + ")"; + + if (StringUtils.isNotEmpty(context.getWhere())) { + sql += "(" + context.getWhere() + ") and "; + } + + sql = String.format("%s %s order by %s asc", sql, append, StringUtils.join(context.getPkColumns(), ',')); + + // Using sub-query to apply rownum < readBatchSize since where has higher priority than order by + if (ObReaderUtils.isOracleMode(context.getCompatibleMode()) && context.getReadBatchSize() != -1) { + sql = String.format("select * from (%s) where rownum <= %d", sql, context.getReadBatchSize()); + } + + return sql; + } + + /** + * check if the userSavePoint is valid + * + * @param context + * @return true - valid, false - invalid + */ + public static boolean isUserSavePointValid(TaskContext context) { + String userSavePoint = context.getUserSavePoint(); + if (userSavePoint == null || userSavePoint.length() == 0) { + LOG.info("user save point is empty!"); + return false; + } + + LOG.info("validating user save point: " + userSavePoint); + + final String patternString = "(.+)=(.+)"; + Pattern parttern = Pattern.compile(patternString); + Matcher matcher = parttern.matcher(userSavePoint); + if (!matcher.find()) { + LOG.error("user save point format is not correct: " + userSavePoint); + return false; + } + + List columnsInUserSavePoint = getColumnsFromUserSavePoint(userSavePoint); + List valuesInUserSavePoint = getValuesFromUserSavePoint(userSavePoint); + if (columnsInUserSavePoint.size() == 0 || valuesInUserSavePoint.size() == 0 || + columnsInUserSavePoint.size() != valuesInUserSavePoint.size()) { + LOG.error("number of columns and values in user save point are different:" + userSavePoint); + return false; + } + + String where = context.getWhere(); + if (StringUtils.isNotEmpty(where)) { + for (String column : columnsInUserSavePoint) { + if (where.contains(column)) { + LOG.error("column " + column + " is conflict with where: " + where); + return false; + } + } + } + + // Columns in userSavePoint must be the selected index. + String[] pkColumns = context.getPkColumns(); + if (pkColumns.length != columnsInUserSavePoint.size()) { + LOG.error("user save point is not on the selected index."); + return false; + } + + for (String column : columnsInUserSavePoint) { + boolean found = false; + for (String pkCol : pkColumns) { + if (pkCol.equals(column)) { + found = true; + break; + } + } + if (!found) { + LOG.error("column " + column + " is not on the selected index."); + return false; + } + } + + return true; + } + + private static String removeBracket(String str) { + final char leftBracket = '('; + final char rightBracket = ')'; + if (str != null && str.contains(String.valueOf(leftBracket)) && str.contains(String.valueOf(rightBracket)) && + str.indexOf(leftBracket) < str.indexOf(rightBracket)) { + return str.substring(str.indexOf(leftBracket)+1, str.indexOf(rightBracket)); + } + return str; + } + + private static List getColumnsFromUserSavePoint(String userSavePoint) { + return Arrays.asList(removeBracket(userSavePoint.split("=")[0]).split(",")); + } + + private static List getValuesFromUserSavePoint(String userSavePoint) { + return Arrays.asList(removeBracket(userSavePoint.split("=")[1]).split(",")); + } + + /** + * 先解析成where + *

+ * 再判断是否存在索引 + * + * @param conn + * @param context + * @return + */ + public static void initIndex(Connection conn, TaskContext context) { + if (StringUtils.isEmpty(context.getWhere())) { + return; + } + SQLExpr expr = SQLUtils.toSQLExpr(context.getWhere(), "mysql"); + LOG.info("expr: " + expr); + List allColumnsInTab = getAllColumnFromTab(conn, context.getTable()); + List allColNames = getColNames(allColumnsInTab, expr); + + if (allColNames == null) { + return; + } + + // Remove the duplicated column names + Set colNames = new TreeSet(); + for (String colName : allColNames) { + if (!colNames.contains(colName)) { + colNames.add(colName); + } + } + List indexNames = getIndexName(conn, context.getTable(), colNames, context.getCompatibleMode()); + findBestIndex(conn, indexNames, context.getTable(), context); + } + + private static List getAllColumnFromTab(Connection conn, String tableName) { + String sql = "show columns from " + tableName; + Statement stmt = null; + ResultSet rs = null; + List allColumns = new ArrayList(); + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + while (rs.next()) { + allColumns.add(rs.getString("Field").toUpperCase()); + } + } catch (Exception e) { + LOG.warn("fail to get all columns from table " + tableName, e); + } finally { + close(rs, stmt, null); + } + + LOG.info("all columns in tab: " + String.join(",", allColumns)); + return allColumns; + } + + /** + * 找出where条件中的列名,目前仅支持全部为and条件,并且操作符为大于、大约等于、等于、小于、小于等于和不等于的表达式。 + * + * test coverage: - c6 = 20180710 OR c4 = 320: no index selected - 20180710 + * = c6: correct index selected - 20180710 = c6 and c4 = 320 or c2 < 100: no + * index selected + * + * @param expr + * @return + */ + private static List getColNames(List allColInTab, SQLExpr expr) { + List colNames = new ArrayList(); + if (expr instanceof SQLBinaryOpExpr) { + SQLBinaryOpExpr exp = (SQLBinaryOpExpr) expr; + if (exp.getOperator() == SQLBinaryOperator.BooleanAnd) { + List leftColumns = getColNames(allColInTab, exp.getLeft()); + List rightColumns = getColNames(allColInTab, exp.getRight()); + if (leftColumns == null || rightColumns == null) { + return null; + } + colNames.addAll(leftColumns); + colNames.addAll(rightColumns); + } else if (exp.getOperator() == SQLBinaryOperator.GreaterThan + || exp.getOperator() == SQLBinaryOperator.GreaterThanOrEqual + || exp.getOperator() == SQLBinaryOperator.Equality + || exp.getOperator() == SQLBinaryOperator.LessThan + || exp.getOperator() == SQLBinaryOperator.LessThanOrEqual + || exp.getOperator() == SQLBinaryOperator.NotEqual) { + // only support simple comparison operators + String left = SQLUtils.toMySqlString(exp.getLeft()).toUpperCase(); + String right = SQLUtils.toMySqlString(exp.getRight()).toUpperCase(); + LOG.debug("left: " + left + ", right: " + right); + if (allColInTab.contains(left)) { + colNames.add(left); + } + + if (allColInTab.contains(right)) { + colNames.add(right); + } + } else { + // unsupported operators + return null; + } + } + + return colNames; + } + + private static Map> getAllIndex(Connection conn, String tableName, String compatibleMode) { + Map> allIndex = new HashMap>(); + String sql = "show index from " + tableName; + if (isOracleMode(compatibleMode)) { + tableName = tableName.toUpperCase(); + sql = "SELECT INDEX_NAME Key_name, COLUMN_NAME Column_name " + + "from dba_ind_columns where TABLE_NAME = '" + tableName +"' " + + " union all " + + "SELECT DISTINCT " + + "CASE " + + "WHEN cons.CONSTRAINT_TYPE = 'P' THEN 'PRIMARY' " + + "WHEN cons.CONSTRAINT_TYPE = 'U' THEN cons.CONSTRAINT_NAME " + + "ELSE '' " + + "END AS Key_name, " + + "cols.column_name Column_name " + + "FROM all_constraints cons, all_cons_columns cols " + + "WHERE cols.table_name = '" + tableName + "' AND cons.constraint_type in('P', 'U') " + + "AND cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner"; + } + Statement stmt = null; + ResultSet rs = null; + + try { + LOG.info("running sql to get index: " + sql); + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + while (rs.next()) { + String keyName = rs.getString("Key_name"); + String colName = rs.getString("Column_name").toUpperCase(); + if (allIndex.containsKey(keyName)) { + allIndex.get(keyName).add(colName); + } else { + List allColumns = new ArrayList(); + allColumns.add(colName); + allIndex.put(keyName, allColumns); + } + } + + // add primary key to all index + if (allIndex.containsKey("PRIMARY")) { + List colsInPrimary = allIndex.get("PRIMARY"); + for (String keyName : allIndex.keySet()) { + if (keyName.equals("PRIMARY")) { + continue; + } + allIndex.get(keyName).addAll(colsInPrimary); + } + } + } catch (Exception e) { + LOG.error("fail to get all keys from table" + sql, e); + } finally { + close(rs, stmt, null); + } + + LOG.info("all index: " + allIndex.toString()); + return allIndex; + } + + /** + * + * @param conn + * @param table + * @param colNamesInCondition + * @return + */ + private static List getIndexName(Connection conn, String table, + Set colNamesInCondition, String compatibleMode) { + List indexNames = new ArrayList(); + if (colNamesInCondition == null || colNamesInCondition.size() == 0) { + LOG.info("there is no qulified conditions in the where clause, skip index selection."); + return indexNames; + } + + LOG.info("columNamesInConditions: " + String.join(",", colNamesInCondition)); + + Map> allIndex = getAllIndex(conn, table, compatibleMode); + for (String keyName : allIndex.keySet()) { + boolean indexNotMatch = false; + // If the index does not have all the column in where conditions, it + // can not be chosen + // the selected index must start with the columns in where condition + if (allIndex.get(keyName).size() < colNamesInCondition.size()) { + indexNotMatch = true; + } else { + // the the first number columns of this index + int num = colNamesInCondition.size(); + for (String colName : allIndex.get(keyName)) { + if (!colNamesInCondition.contains(colName)) { + indexNotMatch = true; + break; + } + if (--num == 0) { + break; + } + } + } + + if (indexNotMatch) { + continue; + } else { + indexNames.add(keyName); + } + } + + return indexNames; + } + + /** + * 以 column开头的索引,可能有多个,也可能存在多列的情形 + *

+ * 所以,需要选择列数最少的 + * + * @param indexNames + * @param context + */ + private static void findBestIndex(Connection conn, List indexNames, String table, TaskContext context) { + if (indexNames.size() == 0) { + LOG.warn("table has no index."); + return; + } + + Map> allIndexs = new HashMap>(); + String sql = "show index from " + table + " where key_name in (" + buildPlaceHolder(indexNames.size()) + ")"; + if (isOracleMode(context.getCompatibleMode())) { + Map> allIndexInTab = getAllIndex(conn, table, context.getCompatibleMode()); + for (String indexName : indexNames) { + if (allIndexInTab.containsKey(indexName)) { + Map index = new TreeMap(); + List columnList = allIndexInTab.get(indexName); + for (int i = 1; i <= columnList.size(); i++) { + index.put(i, columnList.get(i-1)); + } + allIndexs.put(indexName, index); + } else { + LOG.error("index does not exist: " + indexName); + } + } + } else { + PreparedStatement ps = null; + ResultSet rs = null; + try { + ps = conn.prepareStatement(sql); + for (int i = 0, n = indexNames.size(); i < n; i++) { + ps.setString(i + 1, indexNames.get(i)); + } + rs = ps.executeQuery(); + while (rs.next()) { + String keyName = rs.getString("Key_name"); + Map index = allIndexs.get(keyName); + if (index == null) { + index = new TreeMap(); + allIndexs.put(keyName, index); + } + int keyInIndex = rs.getInt("Seq_in_index"); + String column = rs.getString("Column_name"); + index.put(keyInIndex, column); + } + } catch (Throwable e) { + LOG.error("show index from table fail :" + sql, e); + } finally { + close(rs, ps, null); + } + } + + LOG.info("possible index:" + allIndexs + ",where:" + context.getWhere()); + + Entry> chooseIndex = null; + int columnCount = Integer.MAX_VALUE; + for (Entry> entry : allIndexs.entrySet()) { + if (entry.getValue().size() < columnCount) { + columnCount = entry.getValue().size(); + chooseIndex = entry; + } + } + + if (chooseIndex != null) { + LOG.info("choose index name:" + chooseIndex.getKey() + ",columns:" + chooseIndex.getValue()); + context.setIndexName(chooseIndex.getKey()); + context.setSecondaryIndexColumns(new ArrayList(chooseIndex.getValue().values())); + } + } + + /** + * 由于ObProxy存在bug,事务超时或事务被杀时,conn的close是没有响应的 + * + * @param rs + * @param stmt + * @param conn + */ + public static void close(final ResultSet rs, final Statement stmt, final Connection conn) { + DBUtil.closeDBResources(rs, stmt, conn); + } + + /** + * 判断是否重复record + * + * @param savePoint + * @param row + * @param pkIndexs + * @return + */ + public static boolean isPkEquals(Record savePoint, Record row, int[] pkIndexs) { + if (savePoint == null || row == null) { + return false; + } + try { + for (int index : pkIndexs) { + Object left = savePoint.getColumn(index).getRawData(); + Object right = row.getColumn(index).getRawData(); + if (!left.equals(right)) { + return false; + } + } + } catch (Throwable e) { + return false; + } + return true; + } + + public static String buildPlaceHolder(int n) { + if (n <= 0) { + return ""; + } + StringBuilder str = new StringBuilder(2 * n); + str.append('?'); + for (int i = 1; i < n; i++) { + str.append(",?"); + } + return str.toString(); + } + + public static void binding(PreparedStatement ps, List list) throws SQLException { + for (int i = 0, n = list.size(); i < n; i++) { + Column c = list.get(i); + if(c instanceof BoolColumn){ + ps.setLong(i + 1, ((BoolColumn)c).asLong()); + }else if(c instanceof BytesColumn){ + ps.setBytes(i + 1, ((BytesColumn)c).asBytes()); + }else if(c instanceof DateColumn){ + ps.setTimestamp(i + 1, new Timestamp(((DateColumn)c).asDate().getTime())); + }else if(c instanceof DoubleColumn){ + ps.setDouble(i + 1, ((DoubleColumn)c).asDouble()); + }else if(c instanceof LongColumn){ + ps.setLong(i + 1, ((LongColumn)c).asLong()); + }else if(c instanceof StringColumn){ + ps.setString(i + 1, ((StringColumn)c).asString()); + }else{ + ps.setObject(i + 1, c.getRawData()); + } + } + } + + public static List buildPoint(Record savePoint, int[] pkIndexs) { + List result = new ArrayList(pkIndexs.length); + for (int i = 0, n = pkIndexs.length; i < n; i++) { + result.add(savePoint.getColumn(pkIndexs[i])); + } + return result; + } + + public static String getCompatibleMode(Connection conn) { + String compatibleMode = OB_COMPATIBLE_MODE_MYSQL; + String getCompatibleModeSql = "SHOW VARIABLES LIKE 'ob_compatibility_mode'"; + Statement stmt = null; + ResultSet rs = null; + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(getCompatibleModeSql); + if (rs.next()) { + compatibleMode = rs.getString("VALUE"); + } + } catch (Exception e) { + LOG.error("fail to get ob compatible mode, using mysql as default: " + e.getMessage()); + } finally { + DBUtil.closeDBResources(rs, stmt, conn); + } + + LOG.info("ob compatible mode is " + compatibleMode); + return compatibleMode; + } + + public static boolean isOracleMode(String mode) { + return (mode != null && OB_COMPATIBLE_MODE_ORACLE.equals(mode)); + } +} diff --git a/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java new file mode 100644 index 00000000..ba754a37 --- /dev/null +++ b/oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java @@ -0,0 +1,176 @@ +package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; + +import java.sql.Connection; +import java.util.Collections; +import java.util.List; + +import com.alibaba.datax.common.element.Record; + +public class TaskContext { + private Connection conn; + private final String table; + private String indexName; + // 辅助索引的字段列表 + private List secondaryIndexColumns = Collections.emptyList(); + private String querySql; + private final String where; + private final int fetchSize; + private long readBatchSize = -1; + private boolean weakRead = true; + private String userSavePoint; + private String compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_MYSQL; + + public String getPartitionName() { + return partitionName; + } + + public void setPartitionName(String partitionName) { + this.partitionName = partitionName; + } + + private String partitionName; + + // 断点续读的保存点 + private volatile Record savePoint; + + // pk在column中的index,用于绑定变量时从savePoint中读取值 + // 如果这个值为null,则表示 不是断点续读的场景 + private int[] pkIndexs; + + private final List columns; + + private String[] pkColumns; + + private long cost; + + private final int transferColumnNumber; + + public TaskContext(String table, List columns, String where, int fetchSize) { + super(); + this.table = table; + this.columns = columns; + // 针对只有querySql的场景 + this.transferColumnNumber = columns == null ? -1 : columns.size(); + this.where = where; + this.fetchSize = fetchSize; + } + + public Connection getConn() { + return conn; + } + + public void setConn(Connection conn) { + this.conn = conn; + } + + public String getIndexName() { + return indexName; + } + + public void setIndexName(String indexName) { + this.indexName = indexName; + } + + public List getSecondaryIndexColumns() { + return secondaryIndexColumns; + } + + public void setSecondaryIndexColumns(List secondaryIndexColumns) { + this.secondaryIndexColumns = secondaryIndexColumns; + } + + public String getQuerySql() { + if (readBatchSize == -1 || ObReaderUtils.isOracleMode(compatibleMode)) { + return querySql; + } else { + return querySql + " limit " + readBatchSize; + } + } + + public void setQuerySql(String querySql) { + this.querySql = querySql; + } + + public String getWhere() { + return where; + } + + public Record getSavePoint() { + return savePoint; + } + + public void setSavePoint(Record savePoint) { + this.savePoint = savePoint; + } + + public int[] getPkIndexs() { + return pkIndexs; + } + + public void setPkIndexs(int[] pkIndexs) { + this.pkIndexs = pkIndexs; + } + + public List getColumns() { + return columns; + } + + public String[] getPkColumns() { + return pkColumns; + } + + public void setPkColumns(String[] pkColumns) { + this.pkColumns = pkColumns; + } + + public String getTable() { + return table; + } + + public int getFetchSize() { + return fetchSize; + } + + public long getCost() { + return cost; + } + + public void addCost(long cost) { + this.cost += cost; + } + + public int getTransferColumnNumber() { + return transferColumnNumber; + } + + public long getReadBatchSize() { + return readBatchSize; + } + + public void setReadBatchSize(long readBatchSize) { + this.readBatchSize = readBatchSize; + } + + public boolean getWeakRead() { + return weakRead; + } + + public void setWeakRead(boolean weakRead) { + this.weakRead = weakRead; + } + + public String getUserSavePoint() { + return userSavePoint; + } + public void setUserSavePoint(String userSavePoint) { + this.userSavePoint = userSavePoint; + } + + public String getCompatibleMode() { + return compatibleMode; + } + + public void setCompatibleMode(String compatibleMode) { + this.compatibleMode = compatibleMode; + } +} diff --git a/oceanbasev10reader/src/main/libs/oceanbase-client-1.1.10.jar b/oceanbasev10reader/src/main/libs/oceanbase-client-1.1.10.jar new file mode 100644 index 00000000..38162912 Binary files /dev/null and b/oceanbasev10reader/src/main/libs/oceanbase-client-1.1.10.jar differ diff --git a/oceanbasev10reader/src/main/resources/plugin.json b/oceanbasev10reader/src/main/resources/plugin.json new file mode 100644 index 00000000..66acbd62 --- /dev/null +++ b/oceanbasev10reader/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "oceanbasev10reader", + "class": "com.alibaba.datax.plugin.reader.oceanbasev10reader.OceanBaseReader", + "description": "read data from oceanbase with SQL interface", + "developer": "oceanbase" +} \ No newline at end of file diff --git a/oceanbasev10writer/pom.xml b/oceanbasev10writer/pom.xml new file mode 100644 index 00000000..cbe19732 --- /dev/null +++ b/oceanbasev10writer/pom.xml @@ -0,0 +1,126 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + + oceanbasev10writer + + com.alibaba.datax + 0.0.1-SNAPSHOT + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + org.springframework + spring-test + 4.0.4.RELEASE + test + + + + + com.alipay.oceanbase + oceanbase-connector-java + 3.2.0 + system + ${basedir}/src/main/libs/oceanbase-connector-java-3.2.0.jar + + + com.alipay.oceanbase + oceanbase-client + + + + + + log4j + log4j + 1.2.16 + + + org.json + json + 20160810 + + + junit + junit + 4.11 + test + + + + + + + src/main/java + + **/*.properties + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + diff --git a/oceanbasev10writer/src/main/assembly/package.xml b/oceanbasev10writer/src/main/assembly/package.xml new file mode 100644 index 00000000..559ab5d6 --- /dev/null +++ b/oceanbasev10writer/src/main/assembly/package.xml @@ -0,0 +1,42 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/oceanbasev10writer + + + target/ + + oceanbasev10writer-0.0.1-SNAPSHOT.jar + + plugin/writer/oceanbasev10writer + + + src/main/libs + + *.jar + + plugin/writer/oceanbasev10writer/libs + + + + + + false + plugin/writer/oceanbasev10writer/libs + runtime + + + diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java new file mode 100644 index 00000000..9fa3cd9a --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java @@ -0,0 +1,62 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer; + +public interface Config { + + String MEMSTORE_THRESHOLD = "memstoreThreshold"; + + double DEFAULT_MEMSTORE_THRESHOLD = 0.9d; + + String MEMSTORE_CHECK_INTERVAL_SECOND = "memstoreCheckIntervalSecond"; + + long DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND = 30; + + int DEFAULT_BATCH_SIZE = 100; + int MAX_BATCH_SIZE = 4096; + + String FAIL_TRY_COUNT = "failTryCount"; + + int DEFAULT_FAIL_TRY_COUNT = 10000; + + String WRITER_THREAD_COUNT = "writerThreadCount"; + + int DEFAULT_WRITER_THREAD_COUNT = 1; + + String CONCURRENT_WRITE = "concurrentWrite"; + + boolean DEFAULT_CONCURRENT_WRITE = true; + + String OB_VERSION = "obVersion"; + String TIMEOUT = "timeout"; + + String PRINT_COST = "printCost"; + boolean DEFAULT_PRINT_COST = false; + + String COST_BOUND = "costBound"; + long DEFAULT_COST_BOUND = 20; + + String MAX_ACTIVE_CONNECTION = "maxActiveConnection"; + int DEFAULT_MAX_ACTIVE_CONNECTION = 2000; + + String WRITER_SUB_TASK_COUNT = "writerSubTaskCount"; + int DEFAULT_WRITER_SUB_TASK_COUNT = 1; + int MAX_WRITER_SUB_TASK_COUNT = 4096; + + String OB_WRITE_MODE = "obWriteMode"; + String OB_COMPATIBLE_MODE = "obCompatibilityMode"; + String OB_COMPATIBLE_MODE_ORACLE = "ORACLE"; + String OB_COMPATIBLE_MODE_MYSQL = "MYSQL"; + + String OCJ_GET_CONNECT_TIMEOUT = "ocjGetConnectTimeout"; + int DEFAULT_OCJ_GET_CONNECT_TIMEOUT = 5000; // 5s + + String OCJ_PROXY_CONNECT_TIMEOUT = "ocjProxyConnectTimeout"; + int DEFAULT_OCJ_PROXY_CONNECT_TIMEOUT = 5000; // 5s + + String OCJ_CREATE_RESOURCE_TIMEOUT = "ocjCreateResourceTimeout"; + int DEFAULT_OCJ_CREATE_RESOURCE_TIMEOUT = 60000; // 60s + + String OB_UPDATE_COLUMNS = "obUpdateColumns"; + + String USE_PART_CALCULATOR = "usePartCalculator"; + boolean DEFAULT_USE_PART_CALCULATOR = false; +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java new file mode 100644 index 00000000..4ffaffed --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java @@ -0,0 +1,245 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer; + +import java.sql.*; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.DbUtils; +import org.apache.commons.lang3.StringUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.datax.plugin.rdbms.writer.Key; +import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; + +/** + * 2016-04-07 + *

+ * 专门针对OceanBase1.0的Writer + * + * @author biliang.wbl + * + */ +public class OceanBaseV10Writer extends Writer { + private static DataBaseType DATABASE_TYPE = DataBaseType.OceanBase; + + /** + * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 + *

+ * 整个 Writer 执行流程是: + * + *

+	 * Job类init-->prepare-->split
+	 * 
+	 *                          Task类init-->prepare-->startWrite-->post-->destroy
+	 *                          Task类init-->prepare-->startWrite-->post-->destroy
+	 * 
+	 *                                                                            Job类post-->destroy
+	 * 
+ */ + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonJob; + private static final Logger LOG = LoggerFactory.getLogger(Job.class); + + /** + * 注意:此方法仅执行一次。 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... + * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 + */ + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + checkCompatibleMode(originalConfig); + this.commonJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); + this.commonJob.init(this.originalConfig); + } + + /** + * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 + */ + // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) + @Override + public void prepare() { + int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); + if (tableNumber == 1) { + this.commonJob.prepare(this.originalConfig); + final String version = fetchServerVersion(originalConfig); + originalConfig.set(Config.OB_VERSION, version); + } + + String username = originalConfig.getString(Key.USERNAME); + String password = originalConfig.getString(Key.PASSWORD); + + // 获取presql配置,并执行 + List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); + if (preSqls == null || preSqls.size() == 0) { + return; + } + + List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); + for (Object connConfObject : conns) { + Configuration connConf = Configuration.from(connConfObject.toString()); + // 这里的 jdbcUrl 已经 append 了合适后缀参数 + String jdbcUrl = connConf.getString(Key.JDBC_URL); + + List tableList = connConf.getList(Key.TABLE, String.class); + for (String table : tableList) { + List renderedPreSqls = WriterUtil.renderPreOrPostSqls(preSqls, table); + if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { + Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcUrl, username, password); + LOG.info("Begin to execute preSqls:[{}]. context info:{}.", + StringUtils.join(renderedPreSqls, ";"), jdbcUrl); + WriterUtil.executeSqls(conn, renderedPreSqls, jdbcUrl, DATABASE_TYPE); + ObWriterUtils.asyncClose(null, null, conn); + } + } + } + if (LOG.isDebugEnabled()) { + LOG.debug("After job prepare(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); + } + } + + /** + * 注意:此方法仅执行一次。 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 这里的 + * mandatoryNumber 是强制必须切分的份数。 + */ + @Override + public List split(int mandatoryNumber) { + int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); + if (tableNumber == 1) { + return this.commonJob.split(this.originalConfig, mandatoryNumber); + } + Configuration simplifiedConf = this.originalConfig; + + List splitResultConfigs = new ArrayList(); + for (int j = 0; j < mandatoryNumber; j++) { + splitResultConfigs.add(simplifiedConf.clone()); + } + return splitResultConfigs; + } + + /** + * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 + */ + @Override + public void post() { + int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); + if (tableNumber == 1) { + commonJob.post(this.originalConfig); + return; + } + String username = originalConfig.getString(Key.USERNAME); + String password = originalConfig.getString(Key.PASSWORD); + List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); + List postSqls = originalConfig.getList(Key.POST_SQL, String.class); + if (postSqls == null || postSqls.size() == 0) { + return; + } + + for (Object connConfObject : conns) { + Configuration connConf = Configuration.from(connConfObject.toString()); + String jdbcUrl = connConf.getString(Key.JDBC_URL); + List tableList = connConf.getList(Key.TABLE, String.class); + + for (String table : tableList) { + List renderedPostSqls = WriterUtil.renderPreOrPostSqls(postSqls, table); + if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { + // 说明有 postSql 配置,则此处删除掉 + Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcUrl, username, password); + LOG.info("Begin to execute postSqls:[{}]. context info:{}.", + StringUtils.join(renderedPostSqls, ";"), jdbcUrl); + WriterUtil.executeSqls(conn, renderedPostSqls, jdbcUrl, DATABASE_TYPE); + ObWriterUtils.asyncClose(null, null, conn); + } + } + } + originalConfig.remove(Key.POST_SQL); + } + + /** + * 注意:此方法仅执行一次。 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 + */ + @Override + public void destroy() { + this.commonJob.destroy(this.originalConfig); + } + + private String fetchServerVersion(Configuration config) { + final String fetchVersionSql = "show variables like 'version'"; + return DbUtils.fetchSingleValueWithRetry(config, fetchVersionSql); + } + + private void checkCompatibleMode(Configuration configure) { + final String fetchCompatibleModeSql = "SHOW VARIABLES LIKE 'ob_compatibility_mode'"; + String compatibleMode = DbUtils.fetchSingleValueWithRetry(configure, fetchCompatibleModeSql); + ObWriterUtils.setCompatibleMode(compatibleMode); + configure.set(Config.OB_COMPATIBLE_MODE, compatibleMode); + } + } + + public static class Task extends Writer.Task { + private static final Logger LOG = LoggerFactory.getLogger(Task.class); + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task writerTask; + + /** + * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 + * startWrite()做准备。 + */ + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + int tableNumber = writerSliceConfig.getInt(Constant.TABLE_NUMBER_MARK); + if (tableNumber == 1) { + // always use concurrentTableWriter + this.writerTask = new ConcurrentTableWriterTask(DATABASE_TYPE); + } else { + throw new RuntimeException("writing to multi-tables is not supported."); + } + LOG.info("tableNumber:" + tableNumber + ",writerTask Class:" + writerTask.getClass().getName()); + this.writerTask.init(this.writerSliceConfig); + } + + /** + * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task + * 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 + */ + @Override + public void prepare() { + this.writerTask.prepare(this.writerSliceConfig); + } + + /** + * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。 + */ + public void startWrite(RecordReceiver recordReceiver) { + this.writerTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); + } + + /** + * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 + */ + @Override + public void post() { + this.writerTask.post(this.writerSliceConfig); + } + + /** + * 注意:此方法每个 Task 都会执行一次。 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 + */ + @Override + public void destroy() { + this.writerTask.destroy(this.writerSliceConfig); + } + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ConnHolder.java new file mode 100644 index 00000000..785f47bf --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ConnHolder.java @@ -0,0 +1,37 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; + +import java.sql.Connection; + +public abstract class ConnHolder { + + protected final Configuration config; + protected Connection conn; + + public ConnHolder(Configuration config) { + this.config = config; + } + + public abstract Connection initConnection(); + + public Configuration getConfig() { + return config; + } + + public Connection getConn() { + return conn; + } + + public Connection reconnect() { + DBUtil.closeDBResources(null, conn); + return initConnection(); + } + + public abstract String getJdbcUrl(); + + public abstract String getUserName(); + + public abstract void destroy(); +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java new file mode 100644 index 00000000..53172495 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java @@ -0,0 +1,101 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import java.sql.Connection; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * + * @author oceanbase + * + */ +public class DataBaseWriterBuffer { + private static final Logger LOG = LoggerFactory.getLogger(DataBaseWriterBuffer.class); + + private final ConnHolder connHolder; + private final String dbName; + private Map> tableBuffer = new HashMap>(); + private long lastCheckMemstoreTime; + + public DataBaseWriterBuffer(Configuration config,String jdbcUrl, String userName, String password,String dbName){ + this.connHolder = new ObClientConnHolder(config, jdbcUrl, userName, password); + this.dbName=dbName; + } + + public ConnHolder getConnHolder(){ + return connHolder; + } + + public void initTableBuffer(List tableList) { + for (String table : tableList) { + tableBuffer.put(table, new LinkedList()); + } + } + + public List getTableList(){ + return new ArrayList(tableBuffer.keySet()); + } + + public void addRecord(Record record, String tableName) { + LinkedList recordList = tableBuffer.get(tableName); + if (recordList == null) { + throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, + String.format("The [table] calculated based on the rules does not exist. The calculated [tableName]=%s, [db]=%s. Please check the rules you configured.", + tableName, connHolder.getJdbcUrl())); + } + recordList.add(record); + } + + public Map> getTableBuffer() { + return tableBuffer; + } + + public String getDbName() { + return dbName; + } + + public long getLastCheckMemstoreTime() { + return lastCheckMemstoreTime; + } + + public void setLastCheckMemstoreTime(long lastCheckMemstoreTime) { + this.lastCheckMemstoreTime = lastCheckMemstoreTime; + } + + /** + * 检查当前DB的memstore使用状态 + *

+ * 若超过阈值,则休眠 + * + * @param memstoreCheckIntervalSecond + * @param memstoreThreshold + */ + public synchronized void checkMemstore(long memstoreCheckIntervalSecond, double memstoreThreshold) { + long now = System.currentTimeMillis(); + if (now - getLastCheckMemstoreTime() < 1000 * memstoreCheckIntervalSecond) { + return; + } + + LOG.debug(String.format("checking memstore usage: lastCheckTime=%d, now=%d, check interval=%d, threshold=%f", + getLastCheckMemstoreTime(), now, memstoreCheckIntervalSecond, memstoreThreshold)); + + Connection conn = getConnHolder().getConn(); + while (ObWriterUtils.isMemstoreFull(conn, memstoreThreshold)) { + LOG.warn("OB memstore is full,sleep 60 seconds, jdbc=" + getConnHolder().getJdbcUrl() + + ",threshold=" + memstoreThreshold); + ObWriterUtils.sleep(60000); + } + setLastCheckMemstoreTime(now); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OBDataSourceV10.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OBDataSourceV10.java new file mode 100644 index 00000000..2c1f516f --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OBDataSourceV10.java @@ -0,0 +1,190 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import java.sql.Connection; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import com.alipay.oceanbase.obproxy.datasource.ObGroupDataSource; +import com.alipay.oceanbase.obproxy.exception.ConnectionPropertiesNotSupportedException; +import com.alipay.oceanbase.obproxy.util.StringParser.IllegalFormatException; +import com.google.common.collect.Maps; + +public class OBDataSourceV10 { + private static final Logger LOG = LoggerFactory.getLogger(OBDataSourceV10.class); + + private static final Map dataSources = Maps.newHashMap(); + + private static int ocjGetConnectionTimeout = 0; + private static int ocjGlobalProxyroGetConnectionTimeout = 0; + private static int ocjMaxWaitOfCreateClusterResourceMs = 0; + + private static Configuration taskConfig; + + public static String genKey(String fullUserName, String dbName) { + //username@tenantName#clusterName/dbName + return fullUserName + "/" + dbName; + } + + public static synchronized void init(Configuration configuration, + final String fullUsername, + final String password, + final String dbName) { + taskConfig = configuration; + final String rsUrl = ""; + final String dataSourceKey = genKey(fullUsername, dbName); + final int maxActiveConnection = configuration.getInt(Config.MAX_ACTIVE_CONNECTION, Config.DEFAULT_MAX_ACTIVE_CONNECTION); + if (dataSources.containsKey(dataSourceKey)) { + dataSources.get(dataSourceKey).increseRefercnce(); + } else { + long timeout = configuration.getInt(Config.TIMEOUT, 30); + if (timeout < 30) { + timeout = 30; + } + if (ocjGetConnectionTimeout == 0) { + ocjGetConnectionTimeout = configuration.getInt(Config.OCJ_GET_CONNECT_TIMEOUT, + Config.DEFAULT_OCJ_GET_CONNECT_TIMEOUT); + ocjGlobalProxyroGetConnectionTimeout = configuration.getInt(Config.OCJ_PROXY_CONNECT_TIMEOUT, + Config.DEFAULT_OCJ_PROXY_CONNECT_TIMEOUT); + ocjMaxWaitOfCreateClusterResourceMs = configuration.getInt(Config.OCJ_CREATE_RESOURCE_TIMEOUT, + Config.DEFAULT_OCJ_CREATE_RESOURCE_TIMEOUT); + + LOG.info(String.format("initializing OCJ with ocjGetConnectionTimeout=%d, " + + "ocjGlobalProxyroGetConnectionTimeout=%d, ocjMaxWaitOfCreateClusterResourceMs=%d", + ocjGetConnectionTimeout, ocjGlobalProxyroGetConnectionTimeout, ocjMaxWaitOfCreateClusterResourceMs)); + } + DataSourceHolder holder = null; + try { + holder = new DataSourceHolder(rsUrl, fullUsername, password, dbName, maxActiveConnection, timeout); + dataSources.put(dataSourceKey, holder); + } catch (ConnectionPropertiesNotSupportedException e) { + e.printStackTrace(); + throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); + } catch (IllegalArgumentException e) { + e.printStackTrace(); + throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); + } catch (IllegalFormatException e) { + e.printStackTrace(); + throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); + } catch (SQLException e) { + e.printStackTrace(); + throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); + } + } + } + + public static synchronized void destory(final String dataSourceKey){ + DataSourceHolder holder = dataSources.get(dataSourceKey); + holder.decreaseReference(); + if (holder.canClose()) { + dataSources.remove(dataSourceKey); + holder.close(); + LOG.info(String.format("close datasource success [%s]", dataSourceKey)); + } + } + + public static Connection getConnection(final String url) { + Connection conn = null; + try { + conn = dataSources.get(url).getconnection(); + } catch (SQLException e) { + e.printStackTrace(); + } + return conn; + } + + private static Map buildJdbcProperty() { + Map property = new HashMap(); + property.put("useServerPrepStmts", "false"); + property.put("characterEncoding", "UTF-8"); + property.put("useLocalSessionState", "false"); + property.put("rewriteBatchedStatements", "true"); + property.put("socketTimeout", "25000"); + + return property; + } + + private static class DataSourceHolder { + private volatile int reference; + private final ObGroupDataSource groupDataSource; + public static final Map jdbcProperty = buildJdbcProperty();; + + public DataSourceHolder(final String rsUrl, + final String fullUsername, + final String password, + final String dbName, + final int maxActive, + final long timeout) throws ConnectionPropertiesNotSupportedException, IllegalFormatException, IllegalArgumentException, SQLException { + this.reference = 1; + this.groupDataSource = new ObGroupDataSource(); + this.groupDataSource.setUrl(rsUrl); + this.groupDataSource.setFullUsername(fullUsername); + this.groupDataSource.setPassword(password); + this.groupDataSource.setDatabase(dbName); + this.groupDataSource.setConnectionProperties(jdbcProperty); + this.groupDataSource.setGetConnectionTimeout(ocjGetConnectionTimeout); + this.groupDataSource.setGlobalProxyroGetConnectionTimeout(ocjGlobalProxyroGetConnectionTimeout); + this.groupDataSource.setMaxWaitOfCreateClusterResourceMs(ocjMaxWaitOfCreateClusterResourceMs); + this.groupDataSource.setMaxActive(maxActive); + this.groupDataSource.setGlobalSlowQueryThresholdUs(3000000); // 3s, sql with response time more than 3s will be logged + this.groupDataSource.setGlobalCleanLogFileEnabled(true); // enable log cleanup + this.groupDataSource.setGlobalLogFileSizeThreshold(17179869184L); // 16G, log file total size + this.groupDataSource.setGlobalCleanLogFileInterval(10000); // 10s, check interval + this.groupDataSource.setInitialSize(1); + + List initSqls = new ArrayList(); + if (taskConfig != null) { + List sessionConfig = taskConfig.getList(Key.SESSION, new ArrayList(), String.class); + if (sessionConfig != null || sessionConfig.size() > 0) { + initSqls.addAll(sessionConfig); + } + } + // set up for writing timestamp columns + if (ObWriterUtils.isOracleMode()) { + initSqls.add("ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';"); + initSqls.add("ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF';"); + initSqls.add("ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD';"); + } + + this.groupDataSource.setConnectionInitSqls(initSqls); + + this.groupDataSource.init(); + // this.groupDataSource; + LOG.info("Create GroupDataSource rsUrl=[{}], fullUserName=[{}], dbName=[{}], getConnectionTimeout= {}ms, maxActive={}", + rsUrl, fullUsername, dbName, 5000, maxActive); + } + + public Connection getconnection() throws SQLException { + return groupDataSource.getConnection(); + } + + public synchronized void increseRefercnce() { + this.reference++; + } + + public synchronized void decreaseReference() { + this.reference--; + } + + public synchronized boolean canClose() { + return reference == 0; + } + + public synchronized void close() { + if (this.canClose()) { + groupDataSource.destroy(); + } + } + } + +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java new file mode 100644 index 00000000..10de5615 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java @@ -0,0 +1,55 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import java.sql.Connection; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; + +/** + * wrap oceanbase java client + * @author oceanbase + */ + +public class OCJConnHolder extends ConnHolder { + private ServerConnectInfo connectInfo; + private String dataSourceKey; + + public OCJConnHolder (Configuration config, ServerConnectInfo connInfo) { + super(config); + this.connectInfo = connInfo; + this.dataSourceKey = OBDataSourceV10.genKey(connectInfo.getFullUserName(), connectInfo.databaseName); + OBDataSourceV10.init(config, connectInfo.getFullUserName(), connectInfo.password, connectInfo.databaseName); + } + + @Override + public Connection initConnection() { + conn = OBDataSourceV10.getConnection(dataSourceKey); + return conn; + } + + @Override + public Connection reconnect() { + DBUtil.closeDBResources(null, conn); + return initConnection(); + } + + @Override + public Connection getConn() { + return conn; + } + + @Override + public String getJdbcUrl() { + return connectInfo.jdbcUrl; + } + + @Override + public String getUserName() { + return connectInfo.userName; + } + + public void destroy() { + OBDataSourceV10.destory(this.dataSourceKey); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java new file mode 100644 index 00000000..8ff53039 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java @@ -0,0 +1,63 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import java.sql.Connection; +import java.util.ArrayList; +import java.util.List; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.reader.Key; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; + +/** + * 数据库连接代理对象,负责创建连接,重新连接 + * + * @author oceanbase + * + */ +public class ObClientConnHolder extends ConnHolder { + private final String jdbcUrl; + private final String userName; + private final String password; + + public ObClientConnHolder(Configuration config, String jdbcUrl, String userName, String password) { + super(config); + this.jdbcUrl = jdbcUrl; + this.userName = userName; + this.password = password; + } + + // Connect to ob with obclient and obproxy + @Override + public Connection initConnection() { + String BASIC_MESSAGE = String.format("jdbcUrl:[%s]", this.jdbcUrl); + DataBaseType dbType = DataBaseType.OceanBase; + if (ObWriterUtils.isOracleMode()) { + // set up for writing timestamp columns + List sessionConfig = config.getList(Key.SESSION, new ArrayList(), String.class); + sessionConfig.add("ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'"); + sessionConfig.add("ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF'"); + sessionConfig.add("ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD'"); + config.set(Key.SESSION, sessionConfig); + } + conn = DBUtil.getConnection(dbType, jdbcUrl, userName, password); + DBUtil.dealWithSessionConfig(conn, config, dbType, BASIC_MESSAGE); + return conn; + } + + @Override + public String getJdbcUrl() { + return jdbcUrl; + } + + @Override + public String getUserName() { + return userName; + } + + @Override + public void destroy() { + DBUtil.closeDBResources(null, conn); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObDataSourceErrorCode.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObDataSourceErrorCode.java new file mode 100644 index 00000000..6509c766 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObDataSourceErrorCode.java @@ -0,0 +1,31 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum ObDataSourceErrorCode implements ErrorCode { + DESC("ObDataSourceError code","connect error"); + + private final String code; + private final String describe; + + private ObDataSourceErrorCode(String code, String describe) { + this.code = code; + this.describe = describe; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.describe; + } + + @Override + public String toString() { + return String.format("Code:[%s], Describe:[%s]. ", this.code, + this.describe); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java new file mode 100644 index 00000000..b0611642 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java @@ -0,0 +1,79 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; + +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class ServerConnectInfo { + + public String clusterName; + public String tenantName; + public String userName; + public String password; + public String databaseName; + public String ipPort; + public String jdbcUrl; + + public ServerConnectInfo(final String jdbcUrl, final String username, final String password) { + if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { + String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); + if (ss.length != 3) { + throw new RuntimeException("jdbc url format is not correct: " + jdbcUrl); + } + this.userName = username; + this.clusterName = ss[1].trim().split(":")[0]; + this.tenantName = ss[1].trim().split(":")[1]; + this.jdbcUrl = ss[2].replace("jdbc:mysql:", "jdbc:oceanbase:"); + } else { + this.jdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:"); + if (username.contains("@") && username.contains("#")) { + this.userName = username.substring(0, username.indexOf("@")); + this.tenantName = username.substring(username.indexOf("@") + 1, username.indexOf("#")); + this.clusterName = username.substring(username.indexOf("#") + 1); + } else if (username.contains(":")) { + String[] config = username.split(":"); + if (config.length != 3) { + throw new RuntimeException ("username format is not correct: " + username); + } + this.clusterName = config[0]; + this.tenantName = config[1]; + this.userName = config[2]; + } else { + this.clusterName = null; + this.tenantName = null; + this.userName = username; + } + } + + this.password = password; + parseJdbcUrl(jdbcUrl); + } + + private void parseJdbcUrl(final String jdbcUrl) { + Pattern pattern = Pattern.compile("//([\\w\\.\\-]+:\\d+)/([\\w-]+)\\?"); + Matcher matcher = pattern.matcher(jdbcUrl); + if (matcher.find()) { + String ipPort = matcher.group(1); + String dbName = matcher.group(2); + this.ipPort = ipPort; + this.databaseName = dbName; + } else { + throw new RuntimeException("Invalid argument:" + jdbcUrl); + } + } + + public String toString() { + StringBuffer strBuffer = new StringBuffer(); + return strBuffer.append("clusterName:").append(clusterName).append(", tenantName:").append(tenantName) + .append(", userName:").append(userName).append(", databaseName:").append(databaseName) + .append(", ipPort:").append(ipPort).append(", jdbcUrl:").append(jdbcUrl).toString(); + } + + public String getFullUserName() { + StringBuilder builder = new StringBuilder(userName); + if (tenantName != null && clusterName != null) { + builder.append("@").append(tenantName).append("#").append(clusterName); + } + + return builder.toString(); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ColumnMetaCache.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ColumnMetaCache.java new file mode 100644 index 00000000..13339e0b --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ColumnMetaCache.java @@ -0,0 +1,41 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; + +import java.sql.Connection; +import java.sql.SQLException; +import java.util.List; + +import org.apache.commons.lang3.StringUtils; +import org.apache.commons.lang3.tuple.Triple; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.plugin.rdbms.util.DBUtil; + +public class ColumnMetaCache { + private static final Logger LOG = LoggerFactory.getLogger(ColumnMetaCache.class); + + private static String tableName; + private static Triple, List, List> columnMeta = null; + + public ColumnMetaCache() { + + } + + public static void init(Connection connection, final String tableName, final List columns) throws SQLException { + if (columnMeta == null) { + synchronized(ColumnMetaCache.class) { + ColumnMetaCache.tableName = tableName; + if (columnMeta == null) { + columnMeta = DBUtil.getColumnMetaData(connection, + tableName, StringUtils.join(columns, ",")); + LOG.info("fetch columnMeta of table {} success", tableName); + } + } + } + } + + public static Triple, List, List> getColumnMeta() { + return columnMeta; + } + +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java new file mode 100644 index 00000000..cbc9a936 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java @@ -0,0 +1,523 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; + +import java.sql.Connection; +//import java.sql.PreparedStatement; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicLong; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +import com.alibaba.datax.common.element.Column; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; +import org.apache.commons.lang3.tuple.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.plugin.TaskPluginCollector; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; +import com.alipay.oceanbase.obproxy.data.TableEntryKey; +import com.alipay.oceanbase.obproxy.util.ObPartitionIdCalculator; + +public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { + private static final Logger LOG = LoggerFactory.getLogger(ConcurrentTableWriterTask.class); + + // memstore_total 与 memstore_limit 比例的阈值,一旦超过这个值,则暂停写入 + private double memstoreThreshold = Config.DEFAULT_MEMSTORE_THRESHOLD; + // memstore检查的间隔 + private long memstoreCheckIntervalSecond = Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND; + // 最后一次检查 + private long lastCheckMemstoreTime; + + private static AtomicLong totalTask = new AtomicLong(0); + private long taskId = -1; + + private AtomicBoolean isMemStoreFull = new AtomicBoolean(false); + private ConnHolder checkConnHolder; + + public ConcurrentTableWriterTask(DataBaseType dataBaseType) { + super(dataBaseType); + taskId = totalTask.getAndIncrement(); + } + + private ObPartitionIdCalculator partCalculator = null; + + private HashMap> groupInsertValues; +// private List unknownPartRecords; + private List partitionKeyIndexes; + + private ConcurrentTableWriter concurrentWriter = null; + + private ConnHolder connHolder; + + private boolean allTaskInQueue = false; + + private Lock lock = new ReentrantLock(); + private Condition condition = lock.newCondition(); + + private long startTime; + private boolean isOb2 = false; + private String obWriteMode = "update"; + private boolean isOracleCompatibleMode = false; + private String obUpdateColumns = null; + private List> deleteColPos; + private String dbName; + + @Override + public void init(Configuration config) { + super.init(config); + // OceanBase 所有操作都是 insert into on duplicate key update 模式 + // writeMode应该使用enum来定义 + this.writeMode = "update"; + obWriteMode = config.getString(Config.OB_WRITE_MODE, "update"); + ServerConnectInfo connectInfo = new ServerConnectInfo(jdbcUrl, username, password); + dbName = connectInfo.databaseName; + //init check memstore + this.memstoreThreshold = config.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); + this.memstoreCheckIntervalSecond = config.getLong(Config.MEMSTORE_CHECK_INTERVAL_SECOND, + Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND); + this.isOracleCompatibleMode = ObWriterUtils.isOracleMode(); + + LOG.info("configure url is unavailable, use obclient for connections."); + this.checkConnHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, + connectInfo.getFullUserName(), connectInfo.password); + this.connHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, + connectInfo.getFullUserName(), connectInfo.password); + checkConnHolder.initConnection(); + if (isOracleCompatibleMode) { + connectInfo.databaseName = connectInfo.databaseName.toUpperCase(); + table = table.toUpperCase(); + LOG.info(String.format("this is oracle compatible mode, change database to %s, table to %s", + connectInfo.databaseName, table)); + } + + if (config.getBool(Config.USE_PART_CALCULATOR, Config.DEFAULT_USE_PART_CALCULATOR)) { + initPartCalculator(connectInfo); + } else { + LOG.info("Disable partition calculation feature."); + } + + obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); + groupInsertValues = new HashMap>(); + partitionKeyIndexes = new ArrayList(); + rewriteSql(); + + if (null == concurrentWriter) { + concurrentWriter = new ConcurrentTableWriter(config, connectInfo, writeRecordSql); + allTaskInQueue = false; + } + + String version = config.getString(Config.OB_VERSION); + int pIdx = version.lastIndexOf('.'); + if ((Float.valueOf(version.substring(0, pIdx)) >= 2.1f)) { + isOb2 = true; + } + } + + private void initPartCalculator(ServerConnectInfo connectInfo) { + int retry = 0; + LOG.info(String.format("create tableEntryKey with clusterName %s, tenantName %s, databaseName %s, tableName %s", + connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table)); + TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, + connectInfo.databaseName, table); + do { + try { + if (retry > 0) { + int sleep = retry > 8 ? 500 : (1 << retry); + TimeUnit.SECONDS.sleep(sleep); + LOG.info("retry create new part calculator, the {} times", retry); + } + LOG.info("create partCalculator with address: " + connectInfo.ipPort); + partCalculator = new ObPartitionIdCalculator(connectInfo.ipPort, tableEntryKey); + } catch (Exception ex) { + ++retry; + LOG.warn("create new part calculator failed, retry {}: {}", retry, ex.getMessage()); + } + } while (partCalculator == null && retry < 3); // try 3 times + } + + public boolean isFinished() { + return allTaskInQueue && concurrentWriter.checkFinish(); + } + + public boolean allTaskInQueue() { + return allTaskInQueue; + } + + public void setPutAllTaskInQueue() { + this.allTaskInQueue = true; + LOG.info("ConcurrentTableWriter has put all task in queue, queueSize = {}, total = {}, finished = {}", + concurrentWriter.getTaskQueueSize(), + concurrentWriter.getTotalTaskCount(), + concurrentWriter.getFinishTaskCount()); + } + + private void rewriteSql() { + Connection conn = connHolder.initConnection(); + if (isOracleCompatibleMode && obWriteMode.equalsIgnoreCase("update")) { + // change obWriteMode to insert so the insert statement will be generated. + obWriteMode = "insert"; + deleteColPos = ObWriterUtils.buildDeleteSql(conn, dbName, table, columns); + } + this.writeRecordSql = ObWriterUtils.buildWriteSql(table, columns, conn, obWriteMode, obUpdateColumns); + LOG.info("writeRecordSql :{}", this.writeRecordSql); + } + + public void prepare(Configuration writerSliceConfig) { + super.prepare(writerSliceConfig); + calPartitionKeyIndex(partitionKeyIndexes); + concurrentWriter.start(); + } + + private void calPartitionKeyIndex(List partKeyIndexes) { + partKeyIndexes.clear(); + if (null == partCalculator) { + LOG.error("partCalculator is null"); + return; + } + for (int i = 0; i < columns.size(); ++i) { + if (partCalculator.isPartitionKeyColumn(columns.get(i))) { + LOG.info(columns.get(i) + " is partition key."); + partKeyIndexes.add(i); + } + } + } + + private Long calPartitionId(List partKeyIndexes, Record record) { + if (partCalculator == null) { + return null; + } + for (Integer i : partKeyIndexes) { + partCalculator.addColumn(columns.get(i), record.getColumn(i).asString()); + } + return partCalculator.calculate(); + } + + @Override + public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection) { + this.taskPluginCollector = taskPluginCollector; + + // 用于写入数据的时候的类型根据目的表字段类型转换 + int retryTimes = 0; + boolean needRetry = false; + do { + try { + if (retryTimes > 0) { + TimeUnit.SECONDS.sleep((1 << retryTimes)); + DBUtil.closeDBResources(null, connection); + connection = DBUtil.getConnection(dataBaseType, jdbcUrl, username, password); + LOG.warn("getColumnMetaData of table {} failed, retry the {} times ...", this.table, retryTimes); + } + ColumnMetaCache.init(connection, this.table, this.columns); + this.resultSetMetaData = ColumnMetaCache.getColumnMeta(); + needRetry = false; + } catch (SQLException e) { + needRetry = true; + ++retryTimes; + e.printStackTrace(); + LOG.warn("fetch column meta of [{}] failed..., retry {} times", this.table, retryTimes); + } catch (InterruptedException e) { + LOG.warn("startWriteWithConnection interrupt, ignored"); + } finally { + } + } while (needRetry && retryTimes < 100); + + try { + Record record; + startTime = System.currentTimeMillis(); + while ((record = recordReceiver.getFromReader()) != null) { + if (record.getColumnNumber() != this.columnNumber) { + // 源头读取字段列数与目的表字段写入列数不相等,直接报错 + LOG.error("column not equal {} != {}, record = {}", + this.columnNumber, record.getColumnNumber(), record.toString()); + throw DataXException + .asDataXException( + DBUtilErrorCode.CONF_ERROR, + String.format("Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: %d. ErrorCode: %d", + record.getColumnNumber(), + this.columnNumber)); + } + addRecordToCache(record); + } + addLeftRecords(); + waitTaskFinish(); + } catch (Exception e) { + throw DataXException.asDataXException( + DBUtilErrorCode.WRITE_DATA_ERROR, e); + } finally { + DBUtil.closeDBResources(null, null, connection); + } + } + + public PreparedStatement fillStatement(PreparedStatement preparedStatement, Record record) + throws SQLException { + return fillPreparedStatement(preparedStatement, record); + } + + public PreparedStatement fillStatementIndex(PreparedStatement preparedStatement, + int prepIdx, int columnIndex, Column column) throws SQLException { + int columnSqltype = this.resultSetMetaData.getMiddle().get(columnIndex); + String typeName = this.resultSetMetaData.getRight().get(columnIndex); + return fillPreparedStatementColumnType(preparedStatement, prepIdx, columnSqltype, typeName, column); + } + + public void collectDirtyRecord(Record record, SQLException e) { + taskPluginCollector.collectDirtyRecord(record, e); + } + + public void insertOneRecord(Connection connection, List buffer) { + doOneInsert(connection, buffer); + } + + private void addLeftRecords() { + for (List groupValues : groupInsertValues.values()) { + if (groupValues.size() > 0 ) { + int retry = 0; + while (true) { + try { + concurrentWriter.addBatchRecords(groupValues); + break; + } catch (InterruptedException e) { + retry++; + LOG.info("Concurrent table writer is interrupted, retry {}", retry); + } + } + } + } + } + + private void addRecordToCache(final Record record) { + Long partId =null; + try { + partId = calPartitionId(partitionKeyIndexes, record); + } catch (Exception e1) { + LOG.warn("fail to get partition id: " + e1.getMessage() + ", record: " + record); + } + + if (partId == null && isOb2) { + LOG.debug("fail to calculate parition id, just put into the default buffer."); + partId = Long.MAX_VALUE; + } + + if (partId != null) { + List groupValues = groupInsertValues.get(partId); + if (groupValues == null) { + groupValues = new ArrayList(batchSize); + groupInsertValues.put(partId, groupValues); + } + groupValues.add(record); + if (groupValues.size() >= batchSize) { + int i = 0; + while (true) { + if (i > 0) { + LOG.info("retry add batch record the {} times", i); + } + try { + concurrentWriter.addBatchRecords(groupValues); + printEveryTime(); + break; + } catch (InterruptedException e) { + LOG.info("Concurrent table writer is interrupted"); + } + } + groupValues = new ArrayList(batchSize); + groupInsertValues.put(partId, groupValues); + } + } else { + LOG.warn("add unknown part record {}", record); + List unknownPartRecords = new ArrayList(); + unknownPartRecords.add(record); + int i = 0; + while (true) { + if (i > 0) { + LOG.info("retry add batch record the {} times", i); + } + try { + concurrentWriter.addBatchRecords(unknownPartRecords); + break; + } catch (InterruptedException e) { + LOG.info("Concurrent table writer is interrupted"); + } + } + } + } + + private void checkMemStore() { + Connection checkConn = checkConnHolder.reconnect(); + long now = System.currentTimeMillis(); + if (now - lastCheckMemstoreTime < 1000 * memstoreCheckIntervalSecond) { + return; + } + boolean isFull = ObWriterUtils.isMemstoreFull(checkConn, memstoreThreshold); + this.isMemStoreFull.set(isFull); + if (isFull) { + LOG.warn("OB memstore is full,sleep 30 seconds, threshold=" + memstoreThreshold); + } + lastCheckMemstoreTime = now; + } + + public boolean isMemStoreFull() { + return isMemStoreFull.get(); + } + + public void printEveryTime() { + long cost = System.currentTimeMillis() - startTime; + if (cost > 10000) { //10s + print(); + startTime = System.currentTimeMillis(); + } + } + + public void print() { + LOG.debug("Statistic total task {}, finished {}, queue Size {}", + concurrentWriter.getTotalTaskCount(), + concurrentWriter.getFinishTaskCount(), + concurrentWriter.getTaskQueueSize()); + concurrentWriter.printStatistics(); + } + + public void waitTaskFinish() { + setPutAllTaskInQueue(); + lock.lock(); + try { + while (!concurrentWriter.checkFinish()) { + condition.await(15, TimeUnit.SECONDS); + print(); + checkMemStore(); + } + } catch (InterruptedException e) { + LOG.warn("Concurrent table writer wait task finish interrupt"); + } finally { + lock.unlock(); + } + LOG.debug("wait all InsertTask finished ..."); + } + + public void singalTaskFinish() { + lock.lock(); + condition.signal(); + lock.unlock(); + } + + @Override + public void destroy(Configuration writerSliceConfig) { + if(concurrentWriter!=null) { + concurrentWriter.destory(); + } + // 把本级持有的conn关闭掉 + DBUtil.closeDBResources(null, connHolder.getConn()); + DBUtil.closeDBResources(null, checkConnHolder.getConn()); + checkConnHolder.destroy(); + super.destroy(writerSliceConfig); + } + + public class ConcurrentTableWriter { + private BlockingQueue> queue; + private List insertTasks; + private Configuration config; + private ServerConnectInfo connectInfo; + private String rewriteRecordSql; + private AtomicLong totalTaskCount; + private AtomicLong finishTaskCount; + private final int threadCount; + + public ConcurrentTableWriter(Configuration config, ServerConnectInfo connInfo, String rewriteRecordSql) { + threadCount = config.getInt(Config.WRITER_THREAD_COUNT, Config.DEFAULT_WRITER_THREAD_COUNT); + queue = new LinkedBlockingQueue>(threadCount << 1); + insertTasks = new ArrayList(threadCount); + this.config = config; + this.connectInfo = connInfo; + this.rewriteRecordSql = rewriteRecordSql; + this.totalTaskCount = new AtomicLong(0); + this.finishTaskCount = new AtomicLong(0); + } + + public long getTotalTaskCount() { + return totalTaskCount.get(); + } + + public long getFinishTaskCount() { + return finishTaskCount.get(); + } + + public int getTaskQueueSize() { + return queue.size(); + } + + public void increFinishCount() { + finishTaskCount.incrementAndGet(); + } + + //should check after put all the task in the queue + public boolean checkFinish() { + long finishCount = finishTaskCount.get(); + long totalCount = totalTaskCount.get(); + return finishCount == totalCount; + } + + public synchronized void start() { + for (int i = 0; i < threadCount; ++i) { + LOG.info("start {} insert task.", (i+1)); + InsertTask insertTask = new InsertTask(taskId, queue, config, connectInfo, rewriteRecordSql, deleteColPos); + insertTask.setWriterTask(ConcurrentTableWriterTask.this); + insertTask.setWriter(this); + insertTasks.add(insertTask); + } + WriterThreadPool.executeBatch(insertTasks); + } + + public void printStatistics() { + long insertTotalCost = 0; + long insertTotalCount = 0; + for (InsertTask task: insertTasks) { + insertTotalCost += task.getTotalCost(); + insertTotalCount += task.getInsertCount(); + } + long avgCost = 0; + if (insertTotalCount != 0) { + avgCost = insertTotalCost / insertTotalCount; + } + ConcurrentTableWriterTask.LOG.debug("Insert {} times, totalCost {} ms, average {} ms", + insertTotalCount, insertTotalCost, avgCost); + } + + public void addBatchRecords(final List records) throws InterruptedException { + boolean isSucc = false; + while (!isSucc) { + isSucc = queue.offer(records, 5, TimeUnit.SECONDS); + checkMemStore(); + } + totalTaskCount.incrementAndGet(); + } + + public synchronized void destory() { + if (insertTasks != null) { + for(InsertTask task : insertTasks) { + task.setStop(); + } + for(InsertTask task: insertTasks) { + task.destroy(); + } + } + } + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java new file mode 100644 index 00000000..968908ca --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java @@ -0,0 +1,286 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.List; +import java.util.Queue; +import java.util.concurrent.TimeUnit; + +import com.alibaba.datax.common.exception.DataXException; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; +import org.apache.commons.lang3.tuple.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask.ConcurrentTableWriter; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; + +public class InsertTask implements Runnable { + + private static final Logger LOG = LoggerFactory.getLogger(InsertTask.class); + + private ConcurrentTableWriterTask writerTask; + private ConcurrentTableWriter writer; + + private String writeRecordSql; + private long totalCost = 0; + private long insertCount = 0; + + private Queue> queue; + private boolean isStop; + private ConnHolder connHolder; + + private final long taskId; + private ServerConnectInfo connInfo; + + // 失败重试次数 + private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; + private boolean printCost = Config.DEFAULT_PRINT_COST; + private long costBound = Config.DEFAULT_COST_BOUND; + private List> deleteMeta; + + public InsertTask( + final long taskId, + Queue> recordsQueue, + Configuration config, + ServerConnectInfo connectInfo, + String writeRecordSql, + List> deleteMeta) { + this.taskId = taskId; + this.queue = recordsQueue; + this.connInfo = connectInfo; + failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); + printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); + costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); + this.connHolder = new ObClientConnHolder(config, connInfo.jdbcUrl, + connInfo.getFullUserName(), connInfo.password); + this.writeRecordSql = writeRecordSql; + this.isStop = false; + this.deleteMeta = deleteMeta; + connHolder.initConnection(); + } + + void setWriterTask(ConcurrentTableWriterTask writerTask) { + this.writerTask = writerTask; + } + + void setWriter(ConcurrentTableWriter writer) { + this.writer = writer; + } + + private boolean isStop() { return isStop; } + public void setStop() { isStop = true; } + public long getTotalCost() { return totalCost; } + public long getInsertCount() { return insertCount; } + + @Override + public void run() { + Thread.currentThread().setName(String.format("%d-insertTask-%d", taskId, Thread.currentThread().getId())); + LOG.debug("Task {} start to execute...", taskId); + while (!isStop()) { + try { + List records = queue.poll(); + if (null != records) { + doMultiInsert(records, this.printCost, this.costBound); + + } else if (writerTask.isFinished()) { + writerTask.singalTaskFinish(); + LOG.debug("not more task, thread exist ..."); + break; + } else { + TimeUnit.MILLISECONDS.sleep(5); + } + } catch (InterruptedException e) { + LOG.debug("TableWriter is interrupt"); + } catch (Exception e) { + LOG.warn("ERROR UNEXPECTED {}", e); + } + } + LOG.debug("Thread exist..."); + } + + public void destroy() { + connHolder.destroy(); + }; + + public void calStatistic(final long cost) { + writer.increFinishCount(); + ++insertCount; + totalCost += cost; + if (this.printCost && cost > this.costBound) { + LOG.info("slow multi insert cost {}ms", cost); + } + } + + private void doDelete(Connection conn, final List buffer) throws SQLException { + if(deleteMeta == null || deleteMeta.size() == 0) { + return; + } + for (int i = 0; i < deleteMeta.size(); i++) { + String deleteSql = deleteMeta.get(i).getKey(); + int[] valueIdx = deleteMeta.get(i).getValue(); + PreparedStatement ps = null; + try { + ps = conn.prepareStatement(deleteSql); + StringBuilder builder = new StringBuilder(); + for (Record record : buffer) { + int bindIndex = 0; + for (int idx : valueIdx) { + writerTask.fillStatementIndex(ps, bindIndex++, idx, record.getColumn(idx)); + builder.append(record.getColumn(idx).asString()).append(","); + } + ps.addBatch(); + } + LOG.debug("delete values: " + builder.toString()); + ps.executeBatch(); + } catch (SQLException ex) { + LOG.error("SQL Exception when delete records with {}", deleteSql, ex); + throw ex; + } finally { + DBUtil.closeDBResources(ps, null); + } + } + } + + public void doMultiInsert(final List buffer, final boolean printCost, final long restrict) { + checkMemstore(); + Connection conn = connHolder.getConn(); + boolean success = false; + long cost = 0; + long startTime = 0; + try { + for (int i = 0; i < failTryCount; ++i) { + if (i > 0) { + try { + int sleep = i >= 9 ? 500 : 1 << i;//不明白为什么要sleep 500s + TimeUnit.SECONDS.sleep(sleep); + } catch (InterruptedException e) { + LOG.info("thread interrupted ..., ignore"); + } + conn = connHolder.getConn(); + LOG.info("retry {}, start do batch insert, size={}", i, buffer.size()); + checkMemstore(); + } + startTime = System.currentTimeMillis(); + PreparedStatement ps = null; + try { + conn.setAutoCommit(false); + + // do delete if necessary + doDelete(conn, buffer); + + ps = conn.prepareStatement(writeRecordSql); + for (Record record : buffer) { + ps = writerTask.fillStatement(ps, record); + ps.addBatch(); + } + ps.executeBatch(); + conn.commit(); + success = true; + cost = System.currentTimeMillis() - startTime; + calStatistic(cost); + break; + } catch (SQLException e) { + LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); + if (i == 0 || i > 10 ) { + for (Record record : buffer) { + LOG.warn("ERROR : record {}", record); + } + } + // 按照错误码分类,分情况处理 + // 如果是OB系统级异常,则需要重建连接 + boolean fatalFail = ObWriterUtils.isFatalError(e); + if (fatalFail) { + ObWriterUtils.sleep(300000); + connHolder.reconnect(); + // 如果是可恢复的异常,则重试 + } else if (ObWriterUtils.isRecoverableError(e)) { + conn.rollback(); + ObWriterUtils.sleep(60000); + } else {// 其它异常直接退出,采用逐条写入方式 + conn.rollback(); + ObWriterUtils.sleep(1000); + break; + } + } catch (Exception e) { + e.printStackTrace(); + LOG.warn("Insert error unexpected {}", e); + } finally { + DBUtil.closeDBResources(ps, null); + } + } + } catch (SQLException e) { + LOG.warn("ERROR:retry failSql State ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); + } + + if (!success) { + try { + LOG.info("do one insert"); + conn = connHolder.reconnect(); + doOneInsert(conn, buffer); + cost = System.currentTimeMillis() - startTime; + calStatistic(cost); + } finally { + } + } + } + + // process one row, delete before insert + private void doOneInsert(Connection connection, List buffer) { + List deletePstmtList = new ArrayList(); + PreparedStatement preparedStatement = null; + try { + connection.setAutoCommit(false); + if (deleteMeta != null && deleteMeta.size() > 0) { + for (int i = 0; i < deleteMeta.size(); i++) { + String deleteSql = deleteMeta.get(i).getKey(); + deletePstmtList.add(connection.prepareStatement(deleteSql)); + } + } + + preparedStatement = connection.prepareStatement(this.writeRecordSql); + for (Record record : buffer) { + try { + for (int i = 0; i < deletePstmtList.size(); i++) { + PreparedStatement deleteStmt = deletePstmtList.get(i); + int[] valueIdx = deleteMeta.get(i).getValue(); + int bindIndex = 0; + for (int idx : valueIdx) { + writerTask.fillStatementIndex(deleteStmt, bindIndex++, idx, record.getColumn(idx)); + } + deleteStmt.execute(); + } + preparedStatement = writerTask.fillStatement(preparedStatement, record); + preparedStatement.execute(); + connection.commit(); + } catch (SQLException e) { + writerTask.collectDirtyRecord(record, e); + } finally { + // 此处不应该关闭statement,后续的数据还需要用到 + } + } + } catch (Exception e) { + throw DataXException.asDataXException( + DBUtilErrorCode.WRITE_DATA_ERROR, e); + } finally { + DBUtil.closeDBResources(preparedStatement, null); + for (PreparedStatement pstmt : deletePstmtList) { + DBUtil.closeDBResources(pstmt, null); + } + } + } + + private void checkMemstore() { + while (writerTask.isMemStoreFull()) { + ObWriterUtils.sleep(30000); + } + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java new file mode 100644 index 00000000..637a3be4 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java @@ -0,0 +1,152 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.util.List; + +import com.alibaba.datax.common.element.Record; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Key; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; + +public class SingleTableWriterTask extends CommonRdbmsWriter.Task { + + // memstore_total 与 memstore_limit 比例的阈值,一旦超过这个值,则暂停写入 + private double memstoreThreshold = Config.DEFAULT_MEMSTORE_THRESHOLD; + + // memstore检查的间隔 + private long memstoreCheckIntervalSecond = Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND; + + // 最后一次检查 + private long lastCheckMemstoreTime; + + // 失败重试次数 + private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; + + private ConnHolder connHolder; + private String obWriteMode = "update"; + private boolean isOracleCompatibleMode = false; + private String obUpdateColumns = null; + + public SingleTableWriterTask(DataBaseType dataBaseType) { + super(dataBaseType); + } + + @Override + public void init(Configuration config) { + super.init(config); + this.memstoreThreshold = config.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); + this.memstoreCheckIntervalSecond = config.getLong(Config.MEMSTORE_CHECK_INTERVAL_SECOND, + Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND); + failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); + // OceanBase 所有操作都是 insert into on duplicate key update 模式 + // writeMode应该使用enum来定义 + this.writeMode = "update"; + this.connHolder = new ObClientConnHolder(config, jdbcUrl, username, password); + //ob1.0里面, + this.batchSize = Math.min(128, config.getInt(Key.BATCH_SIZE, 128)); + LOG.info("In Write OceanBase 1.0, Real Batch Size : " + this.batchSize); + + isOracleCompatibleMode = ObWriterUtils.isOracleMode(); + LOG.info("isOracleCompatibleMode=" + isOracleCompatibleMode); + + obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); + + obWriteMode = config.getString(Config.OB_WRITE_MODE, "update"); + if (isOracleCompatibleMode) { + obWriteMode = "insert"; + } + rewriteSql(); + } + + private void rewriteSql() { + Connection conn = connHolder.initConnection(); + this.writeRecordSql = ObWriterUtils.buildWriteSql(table, columns, conn, obWriteMode, obUpdateColumns); + } + + protected void doBatchInsert(Connection conn, List buffer) throws SQLException { + doBatchInsert(buffer); + } + + private void doBatchInsert(List buffer) { + Connection conn = connHolder.getConn(); + // 检查内存 + checkMemstore(conn); + boolean success = false; + try { + for (int i = 0; i < failTryCount; i++) { + PreparedStatement ps = null; + try { + conn.setAutoCommit(false); + ps = conn.prepareStatement(this.writeRecordSql); + for (Record record : buffer) { + ps = fillPreparedStatement(ps, record); + ps.addBatch(); + } + ps.executeBatch(); + conn.commit(); + // 标记执行正常,且退出for循环 + success = true; + break; + } catch (SQLException e) { + // 如果是OB系统级异常,则需要重建连接 + boolean fatalFail = ObWriterUtils.isFatalError(e); + if (fatalFail) { + LOG.warn("Fatal exception in OB. Roll back this write and hibernate for five minutes. SQLState: {}. ErrorCode: {}", + e.getSQLState(), e.getErrorCode(), e); + ObWriterUtils.sleep(300000); + DBUtil.closeDBResources(null, conn); + conn = connHolder.reconnect(); + // 如果是可恢复的异常,则重试 + } else if (ObWriterUtils.isRecoverableError(e)) { + LOG.warn("Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: {}. ErrorCode: {}", + e.getSQLState(), e.getErrorCode(), e); + conn.rollback(); + ObWriterUtils.sleep(60000); + // 其它异常直接退出,采用逐条写入方式 + } else { + LOG.warn("Exception in OB. Roll back this write and hibernate for one second. Write and submit the records one by one. SQLState: {}. ErrorCode: {}", + e.getSQLState(), e.getErrorCode(), e); + conn.rollback(); + ObWriterUtils.sleep(1000); + break; + } + } finally { + DBUtil.closeDBResources(ps, null); + } + } + } catch (SQLException e) { + LOG.warn("Exception in OB. Roll back this write. Write and submit the records one by one. SQLState: {}. ErrorCode: {}", + e.getSQLState(), e.getErrorCode(), e); + } + if (!success) { + doOneInsert(conn, buffer); + } + } + + private void checkMemstore(Connection conn) { + long now = System.currentTimeMillis(); + if (now - lastCheckMemstoreTime < 1000 * memstoreCheckIntervalSecond) { + return; + } + while (ObWriterUtils.isMemstoreFull(conn, memstoreThreshold)) { + LOG.warn("OB memstore is full,sleep 60 seconds, threshold=" + memstoreThreshold); + ObWriterUtils.sleep(60000); + } + lastCheckMemstoreTime = now; + } + + @Override + public void destroy(Configuration writerSliceConfig) { + // 把本级持有的conn关闭掉 + DBUtil.closeDBResources(null, connHolder.getConn()); + super.destroy(writerSliceConfig); + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/WriterThreadPool.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/WriterThreadPool.java new file mode 100644 index 00000000..8add5382 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/WriterThreadPool.java @@ -0,0 +1,37 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; + +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class WriterThreadPool { + private static final Logger LOG = LoggerFactory.getLogger(InsertTask.class); + + private static ExecutorService executorService = Executors.newCachedThreadPool(); + + public WriterThreadPool() { + } + + public static ExecutorService getInstance() { + return executorService; + } + + public static synchronized void shutdown() { + LOG.info("start shutdown executor service..."); + executorService.shutdown(); + LOG.info("shutdown executor service success..."); + } + + public static synchronized void execute(InsertTask task) { + executorService.execute(task); + } + + public static synchronized void executeBatch(List tasks) { + for (InsertTask task : tasks) { + executorService.execute(task); + } + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java new file mode 100644 index 00000000..5138c9cb --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java @@ -0,0 +1,71 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; + +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; +import com.alibaba.datax.plugin.rdbms.writer.Constant; +import com.alibaba.datax.plugin.rdbms.writer.Key; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.util.List; +import java.util.concurrent.TimeUnit; + +public class DbUtils { + + protected static final Logger LOG = LoggerFactory.getLogger(DbUtils.class); + + public static String fetchSingleValueWithRetry(Configuration config, String query) { + final String username = config.getString(Key.USERNAME); + final String password = config.getString(Key.PASSWORD); + String jdbcUrl = config.getString(Key.JDBC_URL); + + if(jdbcUrl == null) { + List conns = config.getList(Constant.CONN_MARK, Object.class); + Configuration connConf = Configuration.from(conns.get(0).toString()); + jdbcUrl = connConf.getString(Key.JDBC_URL); + } + + Connection conn = null; + PreparedStatement stmt = null; + ResultSet result = null; + boolean need_retry = false; + String value = null; + int retry = 0; + do { + try { + if (retry > 0) { + int sleep = retry > 9 ? 500 : 1 << retry; + try { + TimeUnit.SECONDS.sleep(sleep); + } catch (InterruptedException e) { + } + LOG.warn("retry fetch value for {} the {} times", query, retry); + } + conn = DBUtil.getConnection(DataBaseType.OceanBase, jdbcUrl, username, password); + stmt = conn.prepareStatement(query); + result = stmt.executeQuery(); + if (result.next()) { + value = result.getString("Value"); + } else { + throw new RuntimeException("no values returned for " + query); + } + LOG.info("value for query [{}] is [{}]", query, value); + break; + } catch (SQLException e) { + need_retry = true; + ++retry; + LOG.warn("fetch value with {} error {}", query, e); + } finally { + DBUtil.closeDBResources(result, stmt, null); + } + } while (need_retry); + + return value; + } +} diff --git a/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java new file mode 100644 index 00000000..368c3d17 --- /dev/null +++ b/oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java @@ -0,0 +1,390 @@ +package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; + +import com.alibaba.datax.plugin.rdbms.util.DBUtil; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter.Task; +import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; +import org.apache.commons.lang3.StringUtils; +import org.apache.commons.lang3.tuple.ImmutablePair; +import org.apache.commons.lang3.tuple.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.*; +import java.util.*; +import java.util.stream.Collectors; + +public class ObWriterUtils { + protected static final Logger LOG = LoggerFactory.getLogger(Task.class); + + private static String CHECK_MEMSTORE = "select 1 from %s.gv$memstore t where t.total>t.mem_limit * ?"; + + private static String compatibleMode = null; + + public static boolean isMemstoreFull(Connection conn, double memstoreThreshold) { + PreparedStatement ps = null; + ResultSet rs = null; + boolean result = false; + try { + String sysDbName = "oceanbase"; + if (isOracleMode()) { + sysDbName = "sys"; + } + ps = conn.prepareStatement(String.format(CHECK_MEMSTORE, sysDbName)); + ps.setDouble(1, memstoreThreshold); + rs = ps.executeQuery(); + // 只要有满足条件的,则表示当前租户 有个机器的memstore即将满 + result = rs.next(); + } catch (Throwable e) { + LOG.error("check memstore fail" + e.getMessage()); + result = false; + } finally { + //do not need to close the statment in ob1.0 + } + + LOG.info("isMemstoreFull=" + result); + return result; + } + + public static boolean isOracleMode(){ + return (compatibleMode.equals(Config.OB_COMPATIBLE_MODE_ORACLE)); + } + + public static String getCompatibleMode() { + return compatibleMode; + } + + public static void setCompatibleMode(String mode) { + compatibleMode = mode; + } + + private static String buildDeleteSql (String tableName, List columns) { + StringBuilder builder = new StringBuilder("DELETE FROM "); + builder.append(tableName).append(" WHERE "); + for (int i = 0; i < columns.size(); i++) { + builder.append(columns.get(i)).append(" = ?"); + if (i != columns.size() - 1) { + builder.append(" and "); + } + } + return builder.toString(); + } + + private static int[] getColumnIndex(List columnsInIndex, List allColumns) { + allColumns = allColumns.stream().map(String::toUpperCase).collect(Collectors.toList()); + int[] colIdx = new int[columnsInIndex.size()]; + for (int i = 0; i < columnsInIndex.size(); i++) { + int index = allColumns.indexOf(columnsInIndex.get(i)); + if (index < 0) { + throw new RuntimeException( + String.format("column {} is in unique or primary key but not in the column list.", + columnsInIndex.get(i))); + } + colIdx[i] = index; + } + return colIdx; + } + + public static List> buildDeleteSql(Connection conn, String dbName, String tableName, + List columns) { + List> deleteMeta = new ArrayList(); + Map> uniqueKeys = getAllUniqueIndex(conn, dbName, tableName); + for (Map.Entry> entry : uniqueKeys.entrySet()) { + List colNames = entry.getValue(); + String deleteSql = buildDeleteSql(tableName, colNames); + int[] colIdx = getColumnIndex(colNames, columns); + LOG.info("delete sql [{}], column index: {}", deleteSql, Arrays.toString(colIdx)); + deleteMeta.add(new ImmutablePair(deleteSql, colIdx)); + } + return deleteMeta; + } + + // this function is just for oracle mode + private static Map> getAllUniqueIndex(Connection conn, String dbName, String tableName) { + Map> uniqueKeys = new HashMap(); + if (tableName.contains("\\.")) { + dbName = tableName.split("\\.")[0]; + tableName = tableName.split("\\.")[1]; + } + dbName = dbName.toUpperCase(); + String sql = String.format("select cons.CONSTRAINT_NAME AS KEY_NAME, cols.COLUMN_NAME COLUMN_NAME " + + "from all_constraints cons, all_cons_columns cols " + + "WHERE cols.table_name = '%s' AND cons.constraint_type in('P', 'U') " + + " AND cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner " + + " AND cols.owner = '%s' " + + "Order by KEY_NAME, cols.POSITION", tableName, dbName); + + LOG.info("get all unique keys by sql {}", sql); + + Statement stmt = null; + ResultSet rs = null; + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + while (rs.next()) { + String keyName = rs.getString("Key_name"); + String columnName = StringUtils.upperCase(rs.getString("Column_name")); + List s = uniqueKeys.get(keyName); + if (s == null) { + s = new ArrayList(); + uniqueKeys.put(keyName, s); + } + s.add(columnName); + } + } catch (Throwable e) { + LOG.error("show index from table fail :" + sql, e); + } finally { + asyncClose(rs, stmt, null); + } + return uniqueKeys; + } + + /** + * + * @param tableName + * @param columnHolders + * @param conn + * @param writeMode + * @return + */ + public static String buildWriteSql(String tableName, List columnHolders, + Connection conn, String writeMode, String obUpdateColumns) { + List valueHolders = new ArrayList(columnHolders.size()); + for (int i = 0; i < columnHolders.size(); i++) { + valueHolders.add("?"); + } + String writeDataSqlTemplate = new StringBuilder().append("INSERT INTO " + tableName + " (") + .append(StringUtils.join(columnHolders, ",")).append(") VALUES(") + .append(StringUtils.join(valueHolders, ",")).append(")").toString(); + + LOG.info("write mode: " + writeMode); + + // update mode + if (!writeMode.equals("insert")) { + if (obUpdateColumns == null) { + Set skipColumns = getSkipColumns(conn, tableName); + + StringBuilder columnList = new StringBuilder(); + for (String column : skipColumns) { + columnList.append(column).append(","); + } + LOG.info("Skip columns: " + columnList.toString()); + writeDataSqlTemplate = writeDataSqlTemplate + onDuplicateKeyUpdateString(columnHolders, skipColumns); + } else { + LOG.info("Update columns: " + obUpdateColumns); + writeDataSqlTemplate = writeDataSqlTemplate + onDuplicateKeyUpdateString(obUpdateColumns); + + } + } + + return writeDataSqlTemplate; + } + + private static Set getSkipColumns(Connection conn, String tableName) { + String sql = "show index from " + tableName; + Statement stmt = null; + ResultSet rs = null; + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery(sql); + Map> uniqueKeys = new HashMap>(); + while (rs.next()) { + String nonUnique = rs.getString("Non_unique"); + if (!"0".equals(nonUnique)) { + continue; + } + String keyName = rs.getString("Key_name"); + String columnName = StringUtils.upperCase(rs.getString("Column_name")); + Set s = uniqueKeys.get(keyName); + if (s == null) { + s = new HashSet(); + uniqueKeys.put(keyName, s); + } + s.add(columnName); + } + // If the table has only one primary/unique key, just skip the column in the update list, + // it is safe since this primary/unique key does not change when the data in this inserting + // row conflicts with existing values. + if (uniqueKeys.size() == 1) { + return uniqueKeys.values().iterator().next(); + } else if (uniqueKeys.size() > 1) { + // If this table has more than one primary/unique keys, then just skip the common columns in + // all primary/unique keys. These columns can be found in every the primary/unique keys so they + // must be intact when there are at least one primary/unique key conflicts between the new + // data and existing data. So keeping them unchanged is safe. + // + // We can not skip all the columns in primary/unique keys because there might be some fields + // which do not conflict with existing value. If we skip them in the update list of the INSERT + // statement, these fields will not get updated, then we will have some fields with new values + // while some with old values in the same row, which breaks data consistency. + Iterator keyNameIterator = uniqueKeys.keySet().iterator(); + Set skipColumns = uniqueKeys.get(keyNameIterator.next()); + while(keyNameIterator.hasNext()) { + skipColumns.retainAll(uniqueKeys.get(keyNameIterator.next())); + } + return skipColumns; + } + } catch (Throwable e) { + LOG.error("show index from table fail :" + sql, e); + } finally { + asyncClose(rs, stmt, null); + } + return Collections.emptySet(); + } + + /* + * build ON DUPLICATE KEY UPDATE sub clause from updateColumns user specified + */ + private static String onDuplicateKeyUpdateString(String updateColumns) { + if (updateColumns == null || updateColumns.length() < 1) { + return ""; + } + StringBuilder builder = new StringBuilder(); + builder.append(" ON DUPLICATE KEY UPDATE "); + List list = new ArrayList(); + for (String column : updateColumns.split(",")) { + list.add(column + "=VALUES(" + column + ")"); + } + builder.append(StringUtils.join(list, ',')); + return builder.toString(); + } + + private static String onDuplicateKeyUpdateString(List columnHolders, Set skipColumns) { + if (columnHolders == null || columnHolders.size() < 1) { + return ""; + } + StringBuilder builder = new StringBuilder(); + builder.append(" ON DUPLICATE KEY UPDATE "); + List list = new ArrayList(); + for (String column : columnHolders) { + // skip update columns + if (skipColumns.contains(column.toUpperCase())) { + continue; + } + list.add(column + "=VALUES(" + column + ")"); + } + if (!list.isEmpty()) { + builder.append(StringUtils.join(list, ',')); + } else { + // 如果除了UK 没有别的字段,则更新第一个字段 + String column = columnHolders.get(0); + builder.append(column + "=VALUES(" + column + ")"); + } + return builder.toString(); + } + + /** + * 休眠n毫秒 + * + * @param ms + * 毫秒 + */ + public static void sleep(long ms) { + try { + Thread.sleep(ms); + } catch (InterruptedException e) { + } + } + + /** + * 致命错误 + * + * @param e + * @return + */ + public static boolean isFatalError(SQLException e) { + String sqlState = e.getSQLState(); + if (StringUtils.startsWith(sqlState, "08")) { + return true; + } + final int errorCode = Math.abs(e.getErrorCode()); + switch (errorCode) { + // Communications Errors + case 1040: // ER_CON_COUNT_ERROR + case 1042: // ER_BAD_HOST_ERROR + case 1043: // ER_HANDSHAKE_ERROR + case 1047: // ER_UNKNOWN_COM_ERROR + case 1081: // ER_IPSOCK_ERROR + case 1129: // ER_HOST_IS_BLOCKED + case 1130: // ER_HOST_NOT_PRIVILEGED + // Authentication Errors + case 1045: // ER_ACCESS_DENIED_ERROR + // Resource errors + case 1004: // ER_CANT_CREATE_FILE + case 1005: // ER_CANT_CREATE_TABLE + case 1015: // ER_CANT_LOCK + case 1021: // ER_DISK_FULL + case 1041: // ER_OUT_OF_RESOURCES + case 1094: // Unknown thread id: %lu + // Out-of-memory errors + case 1037: // ER_OUTOFMEMORY + case 1038: // ER_OUT_OF_SORTMEMORY + return true; + } + + if (StringUtils.isNotBlank(e.getMessage())) { + final String errorText = e.getMessage().toUpperCase(); + + if (errorCode == 0 + && (errorText.indexOf("COMMUNICATIONS LINK FAILURE") > -1 + || errorText.indexOf("COULD NOT CREATE CONNECTION") > -1) + || errorText.indexOf("NO DATASOURCE") > -1 || errorText.indexOf("NO ALIVE DATASOURCE") > -1 + || errorText.indexOf("NO OPERATIONS ALLOWED AFTER CONNECTION CLOSED") > -1) { + return true; + } + } + return false; + } + + /** + * 可恢复的错误 + * + * @param e + * @return + */ + public static boolean isRecoverableError(SQLException e) { + int error = Math.abs(e.getErrorCode()); + // 明确可恢复 + if (white.contains(error)) { + return true; + } + // 明确不可恢复 + if (black.contains(error)) { + return false; + } + // 超过4000的,都是OB特有的ErrorCode + return error > 4020; + } + + private static Set white = new HashSet(); + static { + int[] errList = { 1213, 1047, 1041, 1094, 4000, 4012 }; + for (int err : errList) { + white.add(err); + } + } + // 不考虑4000以下的 + private static Set black = new HashSet(); + static { + int[] errList = { 4022, 4025, 4026, 4028, 4029, 4031, 4033, 4034, 4037, 4041, 4044 }; + for (int err : errList) { + black.add(err); + } + } + + /** + * 由于ObProxy存在bug,事务超时或事务被杀时,conn的close是没有响应的 + * + * @param rs + * @param stmt + * @param conn + */ + public static void asyncClose(final ResultSet rs, final Statement stmt, final Connection conn) { + Thread t = new Thread() { + public void run() { + DBUtil.closeDBResources(rs, stmt, conn); + } + }; + t.setDaemon(true); + t.start(); + } +} diff --git a/oceanbasev10writer/src/main/libs/oceanbase-client-1.1.10.jar b/oceanbasev10writer/src/main/libs/oceanbase-client-1.1.10.jar new file mode 100644 index 00000000..38162912 Binary files /dev/null and b/oceanbasev10writer/src/main/libs/oceanbase-client-1.1.10.jar differ diff --git a/oceanbasev10writer/src/main/libs/oceanbase-connector-java-3.2.0.jar b/oceanbasev10writer/src/main/libs/oceanbase-connector-java-3.2.0.jar new file mode 100644 index 00000000..239f3dc4 Binary files /dev/null and b/oceanbasev10writer/src/main/libs/oceanbase-connector-java-3.2.0.jar differ diff --git a/oceanbasev10writer/src/main/resources/plugin.json b/oceanbasev10writer/src/main/resources/plugin.json new file mode 100644 index 00000000..23154c31 --- /dev/null +++ b/oceanbasev10writer/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "oceanbasev10writer", + "class": "com.alibaba.datax.plugin.writer.oceanbasev10writer.OceanBaseV10Writer", + "description": "write data into oceanbase with sql interface", + "developer": "oceanbase" +} \ No newline at end of file diff --git a/odpsreader/doc/odpsreader.md b/odpsreader/doc/odpsreader.md index 0ae52894..a79377de 100644 --- a/odpsreader/doc/odpsreader.md +++ b/odpsreader/doc/odpsreader.md @@ -58,7 +58,8 @@ ODPSReader 支持读取分区表、非分区表,不支持读取虚拟视图。 ], "packageAuthorizedProject": "yourCurrentProjectName", "splitMode": "record", - "odpsServer": "http://xxx/api" + "odpsServer": "http://xxx/api", + "tunnelServer": "http://dt.odps.aliyun.com" } }, "writer": { diff --git a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/DESCipher.java b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/DESCipher.java index dad82d50..82e97191 100644 --- a/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/DESCipher.java +++ b/odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/DESCipher.java @@ -40,7 +40,7 @@ public class DESCipher { *    */ - public static final String KEY = "u4Gqu4Z8"; + public static final String KEY = "DESDES"; private final static String DES = "DES"; diff --git a/odpswriter/doc/odpswriter.md b/odpswriter/doc/odpswriter.md index 053a77c2..d81672b0 100644 --- a/odpswriter/doc/odpswriter.md +++ b/odpswriter/doc/odpswriter.md @@ -33,47 +33,51 @@ ODPSWriter插件用于实现往ODPS插入或者更新数据,主要提供给etl ```json { - "job": { - "setting": { - "speed": {"byte": 1048576} + "job": { + "setting": { + "speed": { + "byte": 1048576 + } + }, + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "value": "DataX", + "type": "string" + }, + { + "value": "test", + "type": "bytes" + } + ], + "sliceRecordCount": 100000 + } }, - "content": [ - { - "reader": { - "name": "streamreader", - "parameter": { - "column" : [ - { - "value": "DataX", - "type": "string" - }, - { - "value": "test", - "type": "bytes" - } - ], - "sliceRecordCount": 100000 - } - }, - "writer": { - "name": "odpswriter", - "parameter": { - "project": "chinan_test", - "table": "odps_write_test00_partitioned", - "partition":"school=SiChuan-School,class=1", - "column": ["id","name"], - "accessId": "xxx", - "accessKey": "xxxx", - "truncate": true, - "odpsServer": "http://sxxx/api", - "tunnelServer": "http://xxx", - "accountType": "aliyun" - } - } - } - } - ] - } + "writer": { + "name": "odpswriter", + "parameter": { + "project": "chinan_test", + "table": "odps_write_test00_partitioned", + "partition": "school=SiChuan-School,class=1", + "column": [ + "id", + "name" + ], + "accessId": "xxx", + "accessKey": "xxxx", + "truncate": true, + "odpsServer": "http://sxxx/api", + "tunnelServer": "http://xxx", + "accountType": "aliyun" + } + } + } + ] + } } ``` diff --git a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/DESCipher.java b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/DESCipher.java index bf7f5a88..4afead52 100755 --- a/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/DESCipher.java +++ b/odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/DESCipher.java @@ -40,7 +40,7 @@ public class DESCipher { *    */ - public static final String KEY = "u4Gqu4Z8"; + public static final String KEY = "DESDES"; private final static String DES = "DES"; diff --git a/opentsdbreader/pom.xml b/opentsdbreader/pom.xml index aa3461d8..f2263726 100644 --- a/opentsdbreader/pom.xml +++ b/opentsdbreader/pom.xml @@ -21,7 +21,7 @@ 3.3.2 - 4.4 + 4.5 2.4 @@ -31,7 +31,7 @@ 2.3.2 - 4.12 + 4.13.1 2.9.9 diff --git a/oraclereader/doc/oraclereader.md b/oraclereader/doc/oraclereader.md index 250527ae..bf35ff72 100644 --- a/oraclereader/doc/oraclereader.md +++ b/oraclereader/doc/oraclereader.md @@ -101,7 +101,7 @@ OracleReader插件实现了从Oracle读取数据。在底层实现上,OracleRe "connection": [ { "querySql": [ - "select db_id,on_line_flag from db_info where db_id < 10;" + "select db_id,on_line_flag from db_info where db_id < 10" ], "jdbcUrl": [ "jdbc:oracle:thin:@[HOST_NAME]:PORT:[DATABASE_NAME]" diff --git a/oscarwriter/pom.xml b/oscarwriter/pom.xml new file mode 100644 index 00000000..51643c76 --- /dev/null +++ b/oscarwriter/pom.xml @@ -0,0 +1,84 @@ + + + + datax-all + com.alibaba.datax + 0.0.1-SNAPSHOT + + 4.0.0 + oscarwriter + oscarwriter + jar + writer data into oscar database + + + + com.alibaba.datax + datax-common + ${datax-project-version} + + + slf4j-log4j12 + org.slf4j + + + + + org.slf4j + slf4j-api + + + ch.qos.logback + logback-classic + + + + com.alibaba.datax + plugin-rdbms-util + ${datax-project-version} + + + com.oscar + oscar + 7.0.8 + system + ${basedir}/src/main/lib/oscarJDBC.jar + + + + + + + + maven-compiler-plugin + + ${jdk-version} + ${jdk-version} + ${project-sourceEncoding} + + + + + maven-assembly-plugin + + + src/main/assembly/package.xml + + datax + + + + dwzip + package + + single + + + + + + + + \ No newline at end of file diff --git a/oscarwriter/src/main/assembly/package.xml b/oscarwriter/src/main/assembly/package.xml new file mode 100644 index 00000000..2401372e --- /dev/null +++ b/oscarwriter/src/main/assembly/package.xml @@ -0,0 +1,42 @@ + + + + dir + + false + + + src/main/resources + + plugin.json + plugin_job_template.json + + plugin/writer/oscarwriter + + + src/main/lib + + oscarJDBC.jar + + plugin/writer/oscarwriter/libs + + + target/ + + oscarwriter-0.0.1-SNAPSHOT.jar + + plugin/writer/oscarwriter + + + + + + false + plugin/writer/oscarwriter/libs + runtime + + + diff --git a/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriter.java b/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriter.java new file mode 100644 index 00000000..0602bb44 --- /dev/null +++ b/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriter.java @@ -0,0 +1,90 @@ +package com.alibaba.datax.plugin.writer.oscarwriter; + +import com.alibaba.datax.common.plugin.RecordReceiver; +import com.alibaba.datax.common.spi.Writer; +import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DataBaseType; +import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; + +import java.util.List; + +public class OscarWriter extends Writer { + private static final DataBaseType DATABASE_TYPE = DataBaseType.Oscar; + + public static class Job extends Writer.Job { + private Configuration originalConfig = null; + private CommonRdbmsWriter.Job commonRdbmsWriterJob; + + @Override + public void preCheck() { + this.init(); + this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); + } + + @Override + public void init() { + this.originalConfig = super.getPluginJobConf(); + + this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job( + DATABASE_TYPE); + this.commonRdbmsWriterJob.init(this.originalConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterJob.prepare(this.originalConfig); + } + + @Override + public List split(int mandatoryNumber) { + return this.commonRdbmsWriterJob.split(this.originalConfig, + mandatoryNumber); + } + + @Override + public void post() { + this.commonRdbmsWriterJob.post(this.originalConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterJob.destroy(this.originalConfig); + } + + } + + public static class Task extends Writer.Task { + private Configuration writerSliceConfig; + private CommonRdbmsWriter.Task commonRdbmsWriterTask; + + @Override + public void init() { + this.writerSliceConfig = super.getPluginJobConf(); + this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); + this.commonRdbmsWriterTask.init(this.writerSliceConfig); + } + + @Override + public void prepare() { + this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); + } + + @Override + public void startWrite(RecordReceiver recordReceiver) { + this.commonRdbmsWriterTask.startWrite(recordReceiver, + this.writerSliceConfig, super.getTaskPluginCollector()); + } + + @Override + public void post() { + this.commonRdbmsWriterTask.post(this.writerSliceConfig); + } + + @Override + public void destroy() { + this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); + } + + } + +} diff --git a/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriterErrorCode.java b/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriterErrorCode.java new file mode 100644 index 00000000..7ae21576 --- /dev/null +++ b/oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriterErrorCode.java @@ -0,0 +1,31 @@ +package com.alibaba.datax.plugin.writer.oscarwriter; + +import com.alibaba.datax.common.spi.ErrorCode; + +public enum OscarWriterErrorCode implements ErrorCode { + ; + + private final String code; + private final String describe; + + private OscarWriterErrorCode(String code, String describe) { + this.code = code; + this.describe = describe; + } + + @Override + public String getCode() { + return this.code; + } + + @Override + public String getDescription() { + return this.describe; + } + + @Override + public String toString() { + return String.format("Code:[%s], Describe:[%s]. ", this.code, + this.describe); + } +} diff --git a/oscarwriter/src/main/resources/plugin.json b/oscarwriter/src/main/resources/plugin.json new file mode 100644 index 00000000..f1a99fec --- /dev/null +++ b/oscarwriter/src/main/resources/plugin.json @@ -0,0 +1,6 @@ +{ + "name": "oscarwriter", + "class": "com.alibaba.datax.plugin.writer.oscarwriter.OscarWriter", + "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", + "developer": "linjiayu" +} \ No newline at end of file diff --git a/oscarwriter/src/main/resources/plugin_job_template.json b/oscarwriter/src/main/resources/plugin_job_template.json new file mode 100644 index 00000000..3c5e0707 --- /dev/null +++ b/oscarwriter/src/main/resources/plugin_job_template.json @@ -0,0 +1,15 @@ +{ + "name": "oscarwriter", + "parameter": { + "username": "", + "password": "", + "column": [], + "preSql": [], + "connection": [ + { + "jdbcUrl": "", + "table": [] + } + ] + } +} \ No newline at end of file diff --git a/osswriter/doc/osswriter.md b/osswriter/doc/osswriter.md index 3e6e863d..1a3d3e47 100644 --- a/osswriter/doc/osswriter.md +++ b/osswriter/doc/osswriter.md @@ -105,7 +105,7 @@ OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无 * 描述:OSSWriter写入的文件名,OSS使用文件名模拟目录的实现。
使用"object": "datax",写入object以datax开头,后缀添加随机字符串。 - 使用"object": "/cdo/datax",写入的object以/cdo/datax开头,后缀随机添加字符串,/作为OSS模拟目录的分隔符。 + 使用"object": "cdo/datax",写入的object以cdo/datax开头,后缀随机添加字符串,/作为OSS模拟目录的分隔符。 * 必选:是
diff --git a/otsstreamreader/pom.xml b/otsstreamreader/pom.xml index 8e42f8c9..2a12872f 100644 --- a/otsstreamreader/pom.xml +++ b/otsstreamreader/pom.xml @@ -10,7 +10,7 @@ com.alibaba.datax otsstreamreader - 0.0.1-SNAPSHOT + 0.0.1 diff --git a/package.xml b/package.xml index 42185f54..882dd23b 100755 --- a/package.xml +++ b/package.xml @@ -33,7 +33,7 @@ datax - oceanbasereader/target/datax/ + oceanbasev10reader/target/datax/ **/*.* @@ -74,6 +74,13 @@ datax + + kingbaseesreader/target/datax/ + + **/*.* + + datax + rdbmsreader/target/datax/ @@ -266,6 +273,13 @@ datax + + kingbaseeswriter/target/datax/ + + **/*.* + + datax + rdbmswriter/target/datax/ @@ -357,5 +371,26 @@ datax + + clickhousewriter/target/datax/ + + **/*.* + + datax + + + oscarwriter/target/datax/ + + **/*.* + + datax + + + oceanbasev10writer/target/datax/ + + **/*.* + + datax + diff --git a/plugin-rdbms-util/pom.xml b/plugin-rdbms-util/pom.xml index 1001a37c..c49f64af 100755 --- a/plugin-rdbms-util/pom.xml +++ b/plugin-rdbms-util/pom.xml @@ -30,7 +30,7 @@ mysql mysql-connector-java - 5.1.34 + ${mysql.driver.version} test @@ -63,5 +63,5 @@ guava r05 - + diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Constant.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Constant.java index 729d71ac..f998357e 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Constant.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Constant.java @@ -25,4 +25,6 @@ public final class Constant { public static String TABLE_NAME_PLACEHOLDER = "@table"; + public static Integer SPLIT_FACTOR = 5; + } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Key.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Key.java index 63f8dde0..9f2939c4 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Key.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Key.java @@ -46,5 +46,13 @@ public final class Key { public final static String DRYRUN = "dryRun"; + public static String SPLIT_FACTOR = "splitFactor"; -} \ No newline at end of file + public final static String WEAK_READ = "weakRead"; + + public final static String SAVE_POINT = "savePoint"; + + public final static String REUSE_CONN = "reuseConn"; + + public final static String PARTITION_NAME = "partitionName"; +} diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ReaderSplitUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ReaderSplitUtil.java index 64109e90..ed48ff8c 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ReaderSplitUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ReaderSplitUtil.java @@ -68,7 +68,12 @@ public final class ReaderSplitUtil { //eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 2 + 1;// 不应该加1导致长尾 //考虑其他比率数字?(splitPk is null, 忽略此长尾) - eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 5; + //eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 5; + + //为避免导入hive小文件 默认基数为5,可以通过 splitFactor 配置基数 + // 最终task数为(channel/tableNum)向上取整*splitFactor + Integer splitFactor = originalSliceConfig.getInt(Key.SPLIT_FACTOR, Constant.SPLIT_FACTOR); + eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * splitFactor; } // 尝试对每个表,切分为eachTableShouldSplittedNumber 份 for (String table : tables) { diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java index 63d1621b..2392d1ca 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java @@ -358,7 +358,7 @@ public final class DBUtil { String url, String user, String pass, String socketTimeout) { //ob10的处理 - if (url.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING) && dataBaseType == DataBaseType.MySql) { + if (url.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = url.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException @@ -367,7 +367,7 @@ public final class DBUtil { } LOG.info("this is ob1_0 jdbc url."); user = ss[1].trim() +":"+user; - url = ss[2]; + url = ss[2].replace("jdbc:mysql:", "jdbc:oceanbase:"); LOG.info("this is ob1_0 jdbc url. user="+user+" :url="+url); } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java index 55d9e47b..205919fe 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java @@ -18,7 +18,11 @@ public enum DataBaseType { PostgreSQL("postgresql", "org.postgresql.Driver"), RDBMS("rdbms", "com.alibaba.datax.plugin.rdbms.util.DataBaseType"), DB2("db2", "com.ibm.db2.jcc.DB2Driver"), - ADS("ads","com.mysql.jdbc.Driver"); + ADS("ads","com.mysql.jdbc.Driver"), + ClickHouse("clickhouse", "ru.yandex.clickhouse.ClickHouseDriver"), + KingbaseES("kingbasees", "com.kingbase8.Driver"), + Oscar("oscar", "com.oscar.Driver"), + OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"); private String typeName; @@ -39,6 +43,7 @@ public enum DataBaseType { switch (this) { case MySql: case DRDS: + case OceanBase: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; @@ -53,9 +58,15 @@ public enum DataBaseType { case DB2: break; case PostgreSQL: - break; + break; + case ClickHouse: + break; case RDBMS: break; + case KingbaseES: + break; + case Oscar: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -90,9 +101,23 @@ public enum DataBaseType { case DB2: break; case PostgreSQL: - break; + break; + case ClickHouse: + break; case RDBMS: break; + case KingbaseES: + break; + case Oscar: + break; + case OceanBase: + suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true"; + if (jdbc.contains("?")) { + result = jdbc + "&" + suffix; + } else { + result = jdbc + "?" + suffix; + } + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -117,7 +142,9 @@ public enum DataBaseType { break; case DB2: case PostgreSQL: - break; + case KingbaseES: + case Oscar: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } @@ -140,6 +167,8 @@ public enum DataBaseType { break; case DB2: case PostgreSQL: + case KingbaseES: + case Oscar: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); @@ -163,6 +192,10 @@ public enum DataBaseType { break; case PostgreSQL: break; + case KingbaseES: + break; + case Oscar: + break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); } diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java index 440aac2a..27b88f44 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java @@ -402,13 +402,20 @@ public class CommonRdbmsWriter { throws SQLException { for (int i = 0; i < this.columnNumber; i++) { int columnSqltype = this.resultSetMetaData.getMiddle().get(i); - preparedStatement = fillPreparedStatementColumnType(preparedStatement, i, columnSqltype, record.getColumn(i)); + String typeName = this.resultSetMetaData.getRight().get(i); + preparedStatement = fillPreparedStatementColumnType(preparedStatement, i, columnSqltype, typeName, record.getColumn(i)); } return preparedStatement; } - protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, Column column) throws SQLException { + protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, + int columnSqltype, Column column) throws SQLException { + return fillPreparedStatementColumnType(preparedStatement, columnIndex, columnSqltype, null, column); + } + + protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, + int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; switch (columnSqltype) { case Types.CHAR: @@ -451,8 +458,11 @@ public class CommonRdbmsWriter { // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: - if (this.resultSetMetaData.getRight().get(columnIndex) - .equalsIgnoreCase("year")) { + if (typeName == null) { + typeName = this.resultSetMetaData.getRight().get(columnIndex); + } + + if (typeName.equalsIgnoreCase("year")) { if (column.asBigInteger() == null) { preparedStatement.setString(columnIndex + 1, null); } else { diff --git a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java index c42dd3ea..34d1b3af 100755 --- a/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java +++ b/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java @@ -62,7 +62,7 @@ public final class OriginalConfPretreatmentUtil { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置的写入数据库表的 jdbcUrl."); } - jdbcUrl = DATABASE_TYPE.appendJDBCSuffixForReader(jdbcUrl); + jdbcUrl = DATABASE_TYPE.appendJDBCSuffixForWriter(jdbcUrl); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.JDBC_URL), jdbcUrl); diff --git a/pom.xml b/pom.xml old mode 100755 new mode 100644 index dc006b5a..3bd75a31 --- a/pom.xml +++ b/pom.xml @@ -22,7 +22,7 @@ 3.3.2 1.10 1.2 - 1.1.46.sec01 + 1.1.46.sec10 16.0.1 3.7.2.1-SNAPSHOT @@ -30,7 +30,7 @@ 1.7.10 1.0.13 2.4 - 4.11 + 4.13.1 5.1.22-1 1.0.0 @@ -38,6 +38,7 @@ UTF-8 UTF-8 UTF-8 + 5.1.47 @@ -50,6 +51,7 @@ drdsreader sqlserverreader postgresqlreader + kingbaseesreader oraclereader odpsreader otsreader @@ -66,6 +68,8 @@ tsdbreader opentsdbreader cassandrareader + gdbreader + oceanbasev10reader mysqlwriter @@ -79,6 +83,7 @@ oraclewriter sqlserverwriter postgresqlwriter + kingbaseeswriter osswriter mongodbwriter adswriter @@ -93,11 +98,15 @@ adbpgwriter gdbwriter cassandrawriter + clickhousewriter + oscarwriter + oceanbasev10writer plugin-rdbms-util plugin-unstructured-storage-util hbase20xsqlreader hbase20xsqlwriter + kuduwriter diff --git a/postgresqlreader/doc/postgresqlreader.md b/postgresqlreader/doc/postgresqlreader.md index fed2c7e9..02c354ab 100644 --- a/postgresqlreader/doc/postgresqlreader.md +++ b/postgresqlreader/doc/postgresqlreader.md @@ -48,7 +48,7 @@ PostgresqlReader插件实现了从PostgreSQL读取数据。在底层实现上, // 数据库连接密码 "password": "xx", "column": [ - "id","name" + "id","name" ], //切分主键 "splitPk": "id", diff --git a/postgresqlwriter/doc/postgresqlwriter.md b/postgresqlwriter/doc/postgresqlwriter.md index 662da2e4..e9e25908 100644 --- a/postgresqlwriter/doc/postgresqlwriter.md +++ b/postgresqlwriter/doc/postgresqlwriter.md @@ -141,7 +141,7 @@ PostgresqlWriter通过 DataX 框架获取 Reader 生成的协议数据,根据 * **column** - * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["*"] + * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 diff --git a/rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/RdbmsReader.java b/rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/RdbmsReader.java index 3153e114..f070287a 100755 --- a/rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/RdbmsReader.java +++ b/rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/RdbmsReader.java @@ -5,6 +5,7 @@ import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; @@ -12,7 +13,10 @@ import java.util.List; public class RdbmsReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.RDBMS; - + static { + //加载插件下面配置的驱动类 + DBUtil.loadDriverClass("reader", "rdbms"); + } public static class Job extends Reader.Job { private Configuration originalConfig; diff --git a/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/RdbmsWriter.java b/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/RdbmsWriter.java index 49ef3877..71fe7956 100755 --- a/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/RdbmsWriter.java +++ b/rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/RdbmsWriter.java @@ -4,6 +4,7 @@ import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; +import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; @@ -13,7 +14,10 @@ import java.util.List; public class RdbmsWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.RDBMS; - + static { + //加载插件下面配置的驱动类 + DBUtil.loadDriverClass("writer", "rdbms"); + } public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; diff --git a/transformer/doc/transformer.md b/transformer/doc/transformer.md index 84fab96a..247ab39b 100644 --- a/transformer/doc/transformer.md +++ b/transformer/doc/transformer.md @@ -2,7 +2,7 @@ ## Transformer定义 -在数据同步、传输过程中,存在用户对于数据传输进行特殊定制化的需求场景,包括裁剪列、转换列等工作,可以借助ETL的T过程实现(Transformer)。DataX包含了完成的E(Extract)、T(Transformer)、L(Load)支持。 +在数据同步、传输过程中,存在用户对于数据传输进行特殊定制化的需求场景,包括裁剪列、转换列等工作,可以借助ETL的T过程实现(Transformer)。DataX包含了完整的E(Extract)、T(Transformer)、L(Load)支持。 ## 运行模型 diff --git a/tsdbreader/pom.xml b/tsdbreader/pom.xml index b9a45985..0f990234 100644 --- a/tsdbreader/pom.xml +++ b/tsdbreader/pom.xml @@ -21,14 +21,14 @@ 3.3.2 - 4.4 + 4.5 2.4 1.2.28 - 4.12 + 4.13.1 2.9.9 diff --git a/tsdbwriter/pom.xml b/tsdbwriter/pom.xml index 497264c0..1fb7c1e0 100644 --- a/tsdbwriter/pom.xml +++ b/tsdbwriter/pom.xml @@ -21,14 +21,14 @@ 3.3.2 - 4.4 + 4.5 2.4 1.2.28 - 4.12 + 4.13.1 diff --git a/txtfilewriter/doc/txtfilewriter.md b/txtfilewriter/doc/txtfilewriter.md index e8daab73..dd61142c 100644 --- a/txtfilewriter/doc/txtfilewriter.md +++ b/txtfilewriter/doc/txtfilewriter.md @@ -187,7 +187,6 @@ TxtFileWriter实现了从DataX协议转为本地TXT文件功能,本地文件 | DataX 内部类型| 本地文件 数据类型 | | -------- | ----- | -| | Long |Long | | Double |Double| | String |String| diff --git a/userGuid.md b/userGuid.md index 7abd3e2a..153c8111 100644 --- a/userGuid.md +++ b/userGuid.md @@ -64,7 +64,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 * 配置示例:从stream读取数据并打印到控制台 - * 第一步、创建创业的配置文件(json格式) + * 第一步、创建作业的配置文件(json格式) 可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}