mirror of
https://github.com/alibaba/DataX.git
synced 2025-05-02 04:40:54 +08:00
Merge branch 'alibaba:master' into master
This commit is contained in:
commit
4690fbdd53
52
README.md
52
README.md
@ -1,11 +1,14 @@
|
|||||||

|

|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# DataX
|
# DataX
|
||||||
|
|
||||||
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。
|
DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。
|
||||||
|
|
||||||
|
# DataX 商业版本
|
||||||
|
阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。
|
||||||
|
|
||||||
|
商业版本参见: https://www.aliyun.com/product/bigdata/ide
|
||||||
|
|
||||||
|
|
||||||
# Features
|
# Features
|
||||||
@ -36,6 +39,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
|
|||||||
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
||||||
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
||||||
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
||||||
|
| | OceanBase | √ | √ |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)|
|
||||||
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
||||||
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
||||||
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
||||||
@ -49,7 +53,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
|
|||||||
| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)|
|
| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)|
|
||||||
| | Phoenix4.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md)|
|
| | Phoenix4.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md)|
|
||||||
| | Phoenix5.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md)|
|
| | Phoenix5.x | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md)|
|
||||||
| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)|
|
| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md)|
|
||||||
| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
||||||
| | Cassandra | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md)|
|
| | Cassandra | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md)|
|
||||||
| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)|
|
| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)|
|
||||||
@ -59,9 +63,33 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
|
|||||||
| 时间序列数据库 | OpenTSDB | √ | |[读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md)|
|
| 时间序列数据库 | OpenTSDB | √ | |[读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md)|
|
||||||
| | TSDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md)|
|
| | TSDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md)|
|
||||||
|
|
||||||
|
# 阿里云DataWorks数据集成
|
||||||
|
|
||||||
|
目前DataX的已有能力已经全部融和进阿里云的数据集成,并且比DataX更加高效、安全,同时数据集成具备DataX不具备的其它高级特性和功能。可以理解为数据集成是DataX的全面升级的商业化用版本,为企业可以提供稳定、可靠、安全的数据传输服务。与DataX相比,数据集成主要有以下几大突出特点:
|
||||||
|
|
||||||
|
支持实时同步:
|
||||||
|
|
||||||
|
- 功能简介:https://help.aliyun.com/document_detail/181912.html
|
||||||
|
- 支持的数据源:https://help.aliyun.com/document_detail/146778.html
|
||||||
|
- 支持数据处理:https://help.aliyun.com/document_detail/146777.html
|
||||||
|
|
||||||
|
离线同步数据源种类大幅度扩充:
|
||||||
|
|
||||||
|
- 新增比如:DB2、Kafka、Hologres、MetaQ、SAPHANA、达梦等等,持续扩充中
|
||||||
|
- 离线同步支持的数据源:https://help.aliyun.com/document_detail/137670.html
|
||||||
|
- 具备同步解决方案:
|
||||||
|
- 解决方案系统:https://help.aliyun.com/document_detail/171765.html
|
||||||
|
- 一键全增量:https://help.aliyun.com/document_detail/175676.html
|
||||||
|
- 整库迁移:https://help.aliyun.com/document_detail/137809.html
|
||||||
|
- 批量上云:https://help.aliyun.com/document_detail/146671.html
|
||||||
|
- 更新更多能力请访问:https://help.aliyun.com/document_detail/137663.html
|
||||||
|
|
||||||
|
|
||||||
# 我要开发新的插件
|
# 我要开发新的插件
|
||||||
|
|
||||||
请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md)
|
请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md)
|
||||||
|
|
||||||
|
|
||||||
# 项目成员
|
# 项目成员
|
||||||
|
|
||||||
核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时
|
核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时
|
||||||
@ -108,7 +136,23 @@ This software is free to use under the Apache License [Apache license](https://g
|
|||||||
8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑;
|
8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑;
|
||||||
9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。
|
9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。
|
||||||
````
|
````
|
||||||
钉钉用户群:23169395
|
钉钉用户群:
|
||||||
|
|
||||||
|
- DataX开源用户交流群
|
||||||
|
- <img src="https://github.com/alibaba/DataX/blob/master/images/DataX%E5%BC%80%E6%BA%90%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A4.jpg" width="20%" height="20%">
|
||||||
|
|
||||||
|
- DataX开源用户交流群2
|
||||||
|
- <img src="https://github.com/alibaba/DataX/blob/master/images/DataX%E5%BC%80%E6%BA%90%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A42.jpg" width="20%" height="20%">
|
||||||
|
|
||||||
|
- DataX开源用户交流群3
|
||||||
|
- <img src="https://github.com/alibaba/DataX/blob/master/images/DataX%E5%BC%80%E6%BA%90%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A43.jpg" width="20%" height="20%">
|
||||||
|
|
||||||
|
- DataX开源用户交流群4
|
||||||
|
- <img src="https://github.com/alibaba/DataX/blob/master/images/DataX%E5%BC%80%E6%BA%90%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A44.jpg" width="20%" height="20%">
|
||||||
|
|
||||||
|
- DataX开源用户交流群5
|
||||||
|
- <img src="https://github.com/alibaba/DataX/blob/master/images/DataX%E5%BC%80%E6%BA%90%E7%94%A8%E6%88%B7%E4%BA%A4%E6%B5%81%E7%BE%A45.jpg" width="20%" height="20%">
|
||||||
|
|
||||||
|
- DataX开源用户交流群6
|
||||||
|
- <img src="https://user-images.githubusercontent.com/1905000/124073771-139cbd00-da75-11eb-9a3f-598cba145a76.png" width="20%" height="20%">
|
||||||
|
|
||||||
|
88
clickhousewriter/pom.xml
Normal file
88
clickhousewriter/pom.xml
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
<parent>
|
||||||
|
<artifactId>datax-all</artifactId>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
<artifactId>clickhousewriter</artifactId>
|
||||||
|
<name>clickhousewriter</name>
|
||||||
|
<packaging>jar</packaging>
|
||||||
|
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>ru.yandex.clickhouse</groupId>
|
||||||
|
<artifactId>clickhouse-jdbc</artifactId>
|
||||||
|
<version>0.2.4</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-core</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-common</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
<artifactId>slf4j-api</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>ch.qos.logback</groupId>
|
||||||
|
<artifactId>logback-classic</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>plugin-rdbms-util</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
</dependency>
|
||||||
|
</dependencies>
|
||||||
|
<build>
|
||||||
|
<resources>
|
||||||
|
<resource>
|
||||||
|
<directory>src/main/java</directory>
|
||||||
|
<includes>
|
||||||
|
<include>**/*.properties</include>
|
||||||
|
</includes>
|
||||||
|
</resource>
|
||||||
|
</resources>
|
||||||
|
<plugins>
|
||||||
|
<!-- compiler plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>${jdk-version}</source>
|
||||||
|
<target>${jdk-version}</target>
|
||||||
|
<encoding>${project-sourceEncoding}</encoding>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- assembly plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-assembly-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<descriptors>
|
||||||
|
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||||
|
</descriptors>
|
||||||
|
<finalName>datax</finalName>
|
||||||
|
</configuration>
|
||||||
|
<executions>
|
||||||
|
<execution>
|
||||||
|
<id>dwzip</id>
|
||||||
|
<phase>package</phase>
|
||||||
|
<goals>
|
||||||
|
<goal>single</goal>
|
||||||
|
</goals>
|
||||||
|
</execution>
|
||||||
|
</executions>
|
||||||
|
</plugin>
|
||||||
|
</plugins>
|
||||||
|
</build>
|
||||||
|
</project>
|
35
clickhousewriter/src/main/assembly/package.xml
Executable file
35
clickhousewriter/src/main/assembly/package.xml
Executable file
@ -0,0 +1,35 @@
|
|||||||
|
<assembly
|
||||||
|
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||||
|
<id></id>
|
||||||
|
<formats>
|
||||||
|
<format>dir</format>
|
||||||
|
</formats>
|
||||||
|
<includeBaseDirectory>false</includeBaseDirectory>
|
||||||
|
<fileSets>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/resources</directory>
|
||||||
|
<includes>
|
||||||
|
<include>plugin.json</include>
|
||||||
|
<include>plugin_job_template.json</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/clickhousewriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>target/</directory>
|
||||||
|
<includes>
|
||||||
|
<include>clickhousewriter-0.0.1-SNAPSHOT.jar</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/clickhousewriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
</fileSets>
|
||||||
|
|
||||||
|
<dependencySets>
|
||||||
|
<dependencySet>
|
||||||
|
<useProjectArtifact>false</useProjectArtifact>
|
||||||
|
<outputDirectory>plugin/writer/clickhousewriter/libs</outputDirectory>
|
||||||
|
<scope>runtime</scope>
|
||||||
|
</dependencySet>
|
||||||
|
</dependencySets>
|
||||||
|
</assembly>
|
@ -0,0 +1,329 @@
|
|||||||
|
package com.alibaba.datax.plugin.writer.clickhousewriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Column;
|
||||||
|
import com.alibaba.datax.common.element.StringColumn;
|
||||||
|
import com.alibaba.datax.common.exception.CommonErrorCode;
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
|
import com.alibaba.datax.common.spi.Writer;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
|
||||||
|
import com.alibaba.fastjson.JSON;
|
||||||
|
import com.alibaba.fastjson.JSONArray;
|
||||||
|
|
||||||
|
import java.sql.Array;
|
||||||
|
import java.sql.Connection;
|
||||||
|
import java.sql.PreparedStatement;
|
||||||
|
import java.sql.SQLException;
|
||||||
|
import java.sql.Timestamp;
|
||||||
|
import java.sql.Types;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.regex.Pattern;
|
||||||
|
|
||||||
|
public class ClickhouseWriter extends Writer {
|
||||||
|
private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse;
|
||||||
|
|
||||||
|
public static class Job extends Writer.Job {
|
||||||
|
private Configuration originalConfig = null;
|
||||||
|
private CommonRdbmsWriter.Job commonRdbmsWriterMaster;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.originalConfig = super.getPluginJobConf();
|
||||||
|
this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE);
|
||||||
|
this.commonRdbmsWriterMaster.init(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
this.commonRdbmsWriterMaster.prepare(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<Configuration> split(int mandatoryNumber) {
|
||||||
|
return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsWriterMaster.post(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsWriterMaster.destroy(this.originalConfig);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class Task extends Writer.Task {
|
||||||
|
private Configuration writerSliceConfig;
|
||||||
|
|
||||||
|
private CommonRdbmsWriter.Task commonRdbmsWriterSlave;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.writerSliceConfig = super.getPluginJobConf();
|
||||||
|
|
||||||
|
this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE) {
|
||||||
|
@Override
|
||||||
|
protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, Column column) throws SQLException {
|
||||||
|
try {
|
||||||
|
if (column.getRawData() == null) {
|
||||||
|
preparedStatement.setNull(columnIndex + 1, columnSqltype);
|
||||||
|
return preparedStatement;
|
||||||
|
}
|
||||||
|
|
||||||
|
java.util.Date utilDate;
|
||||||
|
switch (columnSqltype) {
|
||||||
|
case Types.CHAR:
|
||||||
|
case Types.NCHAR:
|
||||||
|
case Types.CLOB:
|
||||||
|
case Types.NCLOB:
|
||||||
|
case Types.VARCHAR:
|
||||||
|
case Types.LONGVARCHAR:
|
||||||
|
case Types.NVARCHAR:
|
||||||
|
case Types.LONGNVARCHAR:
|
||||||
|
preparedStatement.setString(columnIndex + 1, column
|
||||||
|
.asString());
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.TINYINT:
|
||||||
|
case Types.SMALLINT:
|
||||||
|
case Types.INTEGER:
|
||||||
|
case Types.BIGINT:
|
||||||
|
case Types.DECIMAL:
|
||||||
|
case Types.FLOAT:
|
||||||
|
case Types.REAL:
|
||||||
|
case Types.DOUBLE:
|
||||||
|
String strValue = column.asString();
|
||||||
|
if (emptyAsNull && "".equals(strValue)) {
|
||||||
|
preparedStatement.setNull(columnIndex + 1, columnSqltype);
|
||||||
|
} else {
|
||||||
|
switch (columnSqltype) {
|
||||||
|
case Types.TINYINT:
|
||||||
|
case Types.SMALLINT:
|
||||||
|
case Types.INTEGER:
|
||||||
|
preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue());
|
||||||
|
break;
|
||||||
|
case Types.BIGINT:
|
||||||
|
preparedStatement.setLong(columnIndex + 1, column.asLong());
|
||||||
|
break;
|
||||||
|
case Types.DECIMAL:
|
||||||
|
preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal());
|
||||||
|
break;
|
||||||
|
case Types.REAL:
|
||||||
|
case Types.FLOAT:
|
||||||
|
preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue());
|
||||||
|
break;
|
||||||
|
case Types.DOUBLE:
|
||||||
|
preparedStatement.setDouble(columnIndex + 1, column.asDouble());
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.DATE:
|
||||||
|
if (this.resultSetMetaData.getRight().get(columnIndex)
|
||||||
|
.equalsIgnoreCase("year")) {
|
||||||
|
if (column.asBigInteger() == null) {
|
||||||
|
preparedStatement.setString(columnIndex + 1, null);
|
||||||
|
} else {
|
||||||
|
preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue());
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
java.sql.Date sqlDate = null;
|
||||||
|
try {
|
||||||
|
utilDate = column.asDate();
|
||||||
|
} catch (DataXException e) {
|
||||||
|
throw new SQLException(String.format(
|
||||||
|
"Date 类型转换错误:[%s]", column));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (null != utilDate) {
|
||||||
|
sqlDate = new java.sql.Date(utilDate.getTime());
|
||||||
|
}
|
||||||
|
preparedStatement.setDate(columnIndex + 1, sqlDate);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.TIME:
|
||||||
|
java.sql.Time sqlTime = null;
|
||||||
|
try {
|
||||||
|
utilDate = column.asDate();
|
||||||
|
} catch (DataXException e) {
|
||||||
|
throw new SQLException(String.format(
|
||||||
|
"Date 类型转换错误:[%s]", column));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (null != utilDate) {
|
||||||
|
sqlTime = new java.sql.Time(utilDate.getTime());
|
||||||
|
}
|
||||||
|
preparedStatement.setTime(columnIndex + 1, sqlTime);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.TIMESTAMP:
|
||||||
|
Timestamp sqlTimestamp = null;
|
||||||
|
if (column instanceof StringColumn && column.asString() != null) {
|
||||||
|
String timeStampStr = column.asString();
|
||||||
|
// JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式
|
||||||
|
String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+";
|
||||||
|
boolean isMatch = Pattern.matches(pattern, timeStampStr);
|
||||||
|
if (isMatch) {
|
||||||
|
sqlTimestamp = Timestamp.valueOf(timeStampStr);
|
||||||
|
preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
utilDate = column.asDate();
|
||||||
|
} catch (DataXException e) {
|
||||||
|
throw new SQLException(String.format(
|
||||||
|
"Date 类型转换错误:[%s]", column));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (null != utilDate) {
|
||||||
|
sqlTimestamp = new Timestamp(
|
||||||
|
utilDate.getTime());
|
||||||
|
}
|
||||||
|
preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.BINARY:
|
||||||
|
case Types.VARBINARY:
|
||||||
|
case Types.BLOB:
|
||||||
|
case Types.LONGVARBINARY:
|
||||||
|
preparedStatement.setBytes(columnIndex + 1, column
|
||||||
|
.asBytes());
|
||||||
|
break;
|
||||||
|
|
||||||
|
case Types.BOOLEAN:
|
||||||
|
preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue());
|
||||||
|
break;
|
||||||
|
|
||||||
|
// warn: bit(1) -> Types.BIT 可使用setBoolean
|
||||||
|
// warn: bit(>1) -> Types.VARBINARY 可使用setBytes
|
||||||
|
case Types.BIT:
|
||||||
|
if (this.dataBaseType == DataBaseType.MySql) {
|
||||||
|
Boolean asBoolean = column.asBoolean();
|
||||||
|
if (asBoolean != null) {
|
||||||
|
preparedStatement.setBoolean(columnIndex + 1, asBoolean);
|
||||||
|
} else {
|
||||||
|
preparedStatement.setNull(columnIndex + 1, Types.BIT);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
preparedStatement.setString(columnIndex + 1, column.asString());
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
|
default:
|
||||||
|
boolean isHandled = fillPreparedStatementColumnType4CustomType(preparedStatement,
|
||||||
|
columnIndex, columnSqltype, column);
|
||||||
|
if (isHandled) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
throw DataXException
|
||||||
|
.asDataXException(
|
||||||
|
DBUtilErrorCode.UNSUPPORTED_TYPE,
|
||||||
|
String.format(
|
||||||
|
"您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.",
|
||||||
|
this.resultSetMetaData.getLeft()
|
||||||
|
.get(columnIndex),
|
||||||
|
this.resultSetMetaData.getMiddle()
|
||||||
|
.get(columnIndex),
|
||||||
|
this.resultSetMetaData.getRight()
|
||||||
|
.get(columnIndex)));
|
||||||
|
}
|
||||||
|
return preparedStatement;
|
||||||
|
} catch (DataXException e) {
|
||||||
|
// fix类型转换或者溢出失败时,将具体哪一列打印出来
|
||||||
|
if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT ||
|
||||||
|
e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) {
|
||||||
|
throw DataXException
|
||||||
|
.asDataXException(
|
||||||
|
e.getErrorCode(),
|
||||||
|
String.format(
|
||||||
|
"类型转化错误. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.",
|
||||||
|
this.resultSetMetaData.getLeft()
|
||||||
|
.get(columnIndex),
|
||||||
|
this.resultSetMetaData.getMiddle()
|
||||||
|
.get(columnIndex),
|
||||||
|
this.resultSetMetaData.getRight()
|
||||||
|
.get(columnIndex)));
|
||||||
|
} else {
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private Object toJavaArray(Object val) {
|
||||||
|
if (null == val) {
|
||||||
|
return null;
|
||||||
|
} else if (val instanceof JSONArray) {
|
||||||
|
Object[] valArray = ((JSONArray) val).toArray();
|
||||||
|
for (int i = 0; i < valArray.length; i++) {
|
||||||
|
valArray[i] = this.toJavaArray(valArray[i]);
|
||||||
|
}
|
||||||
|
return valArray;
|
||||||
|
} else {
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
boolean fillPreparedStatementColumnType4CustomType(PreparedStatement ps,
|
||||||
|
int columnIndex, int columnSqltype,
|
||||||
|
Column column) throws SQLException {
|
||||||
|
switch (columnSqltype) {
|
||||||
|
case Types.OTHER:
|
||||||
|
if (this.resultSetMetaData.getRight().get(columnIndex).startsWith("Tuple")) {
|
||||||
|
throw DataXException
|
||||||
|
.asDataXException(ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR, ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR.getDescription());
|
||||||
|
} else {
|
||||||
|
ps.setString(columnIndex + 1, column.asString());
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
|
||||||
|
case Types.ARRAY:
|
||||||
|
Connection conn = ps.getConnection();
|
||||||
|
List<Object> values = JSON.parseArray(column.asString(), Object.class);
|
||||||
|
for (int i = 0; i < values.size(); i++) {
|
||||||
|
values.set(i, this.toJavaArray(values.get(i)));
|
||||||
|
}
|
||||||
|
Array array = conn.createArrayOf("String", values.toArray());
|
||||||
|
ps.setArray(columnIndex + 1, array);
|
||||||
|
return true;
|
||||||
|
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
this.commonRdbmsWriterSlave.init(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void startWrite(RecordReceiver recordReceiver) {
|
||||||
|
this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsWriterSlave.post(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,31 @@
|
|||||||
|
package com.alibaba.datax.plugin.writer.clickhousewriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.spi.ErrorCode;
|
||||||
|
|
||||||
|
public enum ClickhouseWriterErrorCode implements ErrorCode {
|
||||||
|
TUPLE_NOT_SUPPORTED_ERROR("ClickhouseWriter-00", "不支持TUPLE类型导入."),
|
||||||
|
;
|
||||||
|
|
||||||
|
private final String code;
|
||||||
|
private final String description;
|
||||||
|
|
||||||
|
private ClickhouseWriterErrorCode(String code, String description) {
|
||||||
|
this.code = code;
|
||||||
|
this.description = description;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getCode() {
|
||||||
|
return this.code;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getDescription() {
|
||||||
|
return this.description;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
return String.format("Code:[%s], Description:[%s].", this.code, this.description);
|
||||||
|
}
|
||||||
|
}
|
6
clickhousewriter/src/main/resources/plugin.json
Executable file
6
clickhousewriter/src/main/resources/plugin.json
Executable file
@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"name": "clickhousewriter",
|
||||||
|
"class": "com.alibaba.datax.plugin.writer.clickhousewriter.ClickhouseWriter",
|
||||||
|
"description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql.",
|
||||||
|
"developer": "jiye.tjy"
|
||||||
|
}
|
21
clickhousewriter/src/main/resources/plugin_job_template.json
Normal file
21
clickhousewriter/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
{
|
||||||
|
"name": "clickhousewriter",
|
||||||
|
"parameter": {
|
||||||
|
"username": "username",
|
||||||
|
"password": "password",
|
||||||
|
"column": ["col1", "col2", "col3"],
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"jdbcUrl": "jdbc:clickhouse://<host>:<port>[/<database>]",
|
||||||
|
"table": ["table1", "table2"]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"preSql": [],
|
||||||
|
"postSql": [],
|
||||||
|
|
||||||
|
"batchSize": 65536,
|
||||||
|
"batchByteSize": 134217728,
|
||||||
|
"dryRun": false,
|
||||||
|
"writeMode": "insert"
|
||||||
|
}
|
||||||
|
}
|
@ -41,12 +41,12 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.apache.httpcomponents</groupId>
|
<groupId>org.apache.httpcomponents</groupId>
|
||||||
<artifactId>httpclient</artifactId>
|
<artifactId>httpclient</artifactId>
|
||||||
<version>4.4</version>
|
<version>4.5</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.apache.httpcomponents</groupId>
|
<groupId>org.apache.httpcomponents</groupId>
|
||||||
<artifactId>fluent-hc</artifactId>
|
<artifactId>fluent-hc</artifactId>
|
||||||
<version>4.4</version>
|
<version>4.5</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.slf4j</groupId>
|
<groupId>org.slf4j</groupId>
|
||||||
|
@ -174,6 +174,9 @@ def parsePluginName(jdbcUrl, pluginType):
|
|||||||
db2Regex = re.compile('jdbc:(db2)://.*')
|
db2Regex = re.compile('jdbc:(db2)://.*')
|
||||||
if (db2Regex.match(jdbcUrl)):
|
if (db2Regex.match(jdbcUrl)):
|
||||||
name = 'db2'
|
name = 'db2'
|
||||||
|
kingbaseesRegex = re.compile('jdbc:(kingbase8)://.*')
|
||||||
|
if (kingbaseesRegex.match(jdbcUrl)):
|
||||||
|
name = 'kingbasees'
|
||||||
return "%s%s" % (name, pluginType)
|
return "%s%s" % (name, pluginType)
|
||||||
|
|
||||||
def renderDataXJson(paramsDict, readerOrWriter = 'reader', channel = 1):
|
def renderDataXJson(paramsDict, readerOrWriter = 'reader', channel = 1):
|
||||||
|
@ -427,7 +427,7 @@ public class JobContainer extends AbstractContainer {
|
|||||||
Long channelLimitedByteSpeed = this.configuration
|
Long channelLimitedByteSpeed = this.configuration
|
||||||
.getLong(CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE);
|
.getLong(CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE);
|
||||||
if (channelLimitedByteSpeed == null || channelLimitedByteSpeed <= 0) {
|
if (channelLimitedByteSpeed == null || channelLimitedByteSpeed <= 0) {
|
||||||
DataXException.asDataXException(
|
throw DataXException.asDataXException(
|
||||||
FrameworkErrorCode.CONFIG_ERROR,
|
FrameworkErrorCode.CONFIG_ERROR,
|
||||||
"在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数");
|
"在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数");
|
||||||
}
|
}
|
||||||
@ -448,7 +448,7 @@ public class JobContainer extends AbstractContainer {
|
|||||||
Long channelLimitedRecordSpeed = this.configuration.getLong(
|
Long channelLimitedRecordSpeed = this.configuration.getLong(
|
||||||
CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD);
|
CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD);
|
||||||
if (channelLimitedRecordSpeed == null || channelLimitedRecordSpeed <= 0) {
|
if (channelLimitedRecordSpeed == null || channelLimitedRecordSpeed <= 0) {
|
||||||
DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR,
|
throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR,
|
||||||
"在有总tps限速条件下,单个channel的tps值不能为空,也不能为非正数");
|
"在有总tps限速条件下,单个channel的tps值不能为空,也不能为非正数");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -111,7 +111,7 @@ public class SomeReader extends Reader {
|
|||||||
```
|
```
|
||||||
|
|
||||||
`Job`接口功能如下:
|
`Job`接口功能如下:
|
||||||
- `init`: Job对象初始化工作,测试可以通过`super.getPluginJobConf()`获取与本插件相关的配置。读插件获得配置中`reader`部分,写插件获得`writer`部分。
|
- `init`: Job对象初始化工作,此时可以通过`super.getPluginJobConf()`获取与本插件相关的配置。读插件获得配置中`reader`部分,写插件获得`writer`部分。
|
||||||
- `prepare`: 全局准备工作,比如odpswriter清空目标表。
|
- `prepare`: 全局准备工作,比如odpswriter清空目标表。
|
||||||
- `split`: 拆分`Task`。参数`adviceNumber`框架建议的拆分数,一般是运行时所配置的并发度。值返回的是`Task`的配置列表。
|
- `split`: 拆分`Task`。参数`adviceNumber`框架建议的拆分数,一般是运行时所配置的并发度。值返回的是`Task`的配置列表。
|
||||||
- `post`: 全局的后置工作,比如mysqlwriter同步完影子表后的rename操作。
|
- `post`: 全局的后置工作,比如mysqlwriter同步完影子表后的rename操作。
|
||||||
@ -155,7 +155,7 @@ public class SomeReader extends Reader {
|
|||||||
```
|
```
|
||||||
|
|
||||||
- `name`: 插件名称,大小写敏感。框架根据用户在配置文件中指定的名称来搜寻插件。 **十分重要** 。
|
- `name`: 插件名称,大小写敏感。框架根据用户在配置文件中指定的名称来搜寻插件。 **十分重要** 。
|
||||||
- `class`: 入口类的全限定名称,框架通过反射穿件入口类的实例。**十分重要** 。
|
- `class`: 入口类的全限定名称,框架通过反射插件入口类的实例。**十分重要** 。
|
||||||
- `description`: 描述信息。
|
- `description`: 描述信息。
|
||||||
- `developer`: 开发人员。
|
- `developer`: 开发人员。
|
||||||
|
|
||||||
@ -435,7 +435,7 @@ DataX的内部类型在实现上会选用不同的java类型:
|
|||||||
|
|
||||||
#### 如何处理脏数据
|
#### 如何处理脏数据
|
||||||
|
|
||||||
在`Reader.Task`和`Writer.Task`中,功过`AbstractTaskPlugin.getPluginCollector()`可以拿到一个`TaskPluginCollector`,它提供了一系列`collectDirtyRecord`的方法。当脏数据出现时,只需要调用合适的`collectDirtyRecord`方法,把被认为是脏数据的`Record`传入即可。
|
在`Reader.Task`和`Writer.Task`中,通过`AbstractTaskPlugin.getTaskPluginCollector()`可以拿到一个`TaskPluginCollector`,它提供了一系列`collectDirtyRecord`的方法。当脏数据出现时,只需要调用合适的`collectDirtyRecord`方法,把被认为是脏数据的`Record`传入即可。
|
||||||
|
|
||||||
用户可以在任务的配置中指定脏数据限制条数或者百分比限制,当脏数据超出限制时,框架会结束同步任务,退出。插件需要保证脏数据都被收集到,其他工作交给框架就好。
|
用户可以在任务的配置中指定脏数据限制条数或者百分比限制,当脏数据超出限制时,框架会结束同步任务,退出。插件需要保证脏数据都被收集到,其他工作交给框架就好。
|
||||||
|
|
||||||
@ -468,4 +468,4 @@ DataX的内部类型在实现上会选用不同的java类型:
|
|||||||
- 测试参数集(多组),系统参数(比如并发数),插件参数(比如batchSize)
|
- 测试参数集(多组),系统参数(比如并发数),插件参数(比如batchSize)
|
||||||
- 不同参数下同步速度(Rec/s, MB/s),机器负载(load, cpu)等,对数据源压力(load, cpu, mem等)。
|
- 不同参数下同步速度(Rec/s, MB/s),机器负载(load, cpu)等,对数据源压力(load, cpu, mem等)。
|
||||||
6. **约束限制**:是否存在其他的使用限制条件。
|
6. **约束限制**:是否存在其他的使用限制条件。
|
||||||
7. **FQA**:用户经常会遇到的问题。
|
7. **FAQ**:用户经常会遇到的问题。
|
||||||
|
@ -50,7 +50,7 @@ DRDS的插件目前DataX只适配了Mysql引擎的场景,DRDS对于DataX而言
|
|||||||
// 数据库连接密码
|
// 数据库连接密码
|
||||||
"password": "root",
|
"password": "root",
|
||||||
"column": [
|
"column": [
|
||||||
"id","name"
|
"id","name"
|
||||||
],
|
],
|
||||||
"connection": [
|
"connection": [
|
||||||
{
|
{
|
||||||
|
@ -42,7 +42,7 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>mysql</groupId>
|
<groupId>mysql</groupId>
|
||||||
<artifactId>mysql-connector-java</artifactId>
|
<artifactId>mysql-connector-java</artifactId>
|
||||||
<version>5.1.34</version>
|
<version>${mysql.driver.version}</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
|
|
||||||
|
|
||||||
|
@ -44,7 +44,7 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>mysql</groupId>
|
<groupId>mysql</groupId>
|
||||||
<artifactId>mysql-connector-java</artifactId>
|
<artifactId>mysql-connector-java</artifactId>
|
||||||
<version>5.1.34</version>
|
<version>${mysql.driver.version}</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
</dependencies>
|
</dependencies>
|
||||||
|
|
||||||
|
@ -50,7 +50,7 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>junit</groupId>
|
<groupId>junit</groupId>
|
||||||
<artifactId>junit</artifactId>
|
<artifactId>junit</artifactId>
|
||||||
<version>4.11</version>
|
<version>4.13.1</version>
|
||||||
<scope>test</scope>
|
<scope>test</scope>
|
||||||
</dependency>
|
</dependency>
|
||||||
</dependencies>
|
</dependencies>
|
||||||
|
@ -63,6 +63,7 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结
|
|||||||
"nullFormat": "null",
|
"nullFormat": "null",
|
||||||
"dateFormat": "yyyy-MM-dd",
|
"dateFormat": "yyyy-MM-dd",
|
||||||
"fileFormat": "csv",
|
"fileFormat": "csv",
|
||||||
|
"suffix": ".csv",
|
||||||
"header": []
|
"header": []
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -200,6 +201,14 @@ FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结
|
|||||||
* 必选:否 <br />
|
* 必选:否 <br />
|
||||||
|
|
||||||
* 默认值:text <br />
|
* 默认值:text <br />
|
||||||
|
|
||||||
|
* **suffix**
|
||||||
|
|
||||||
|
* 描述:最后输出文件的后缀,当前支持 ".text"以及".csv"
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:"" <br />
|
||||||
|
|
||||||
* **header**
|
* **header**
|
||||||
|
|
||||||
|
260
gdbreader/doc/gdbreader.md
Normal file
260
gdbreader/doc/gdbreader.md
Normal file
@ -0,0 +1,260 @@
|
|||||||
|
|
||||||
|
# DataX GDBReader
|
||||||
|
|
||||||
|
## 1. 快速介绍
|
||||||
|
|
||||||
|
GDBReader插件实现读取GDB实例数据的功能,通过`Gremlin Client`连接远程GDB实例,按配置提供的`label`生成查询DSL,遍历点或边数据,包括属性数据,并将数据写入到Record中给到Writer使用。
|
||||||
|
|
||||||
|
## 2. 实现原理
|
||||||
|
|
||||||
|
GDBReader使用`Gremlin Client`连接GDB实例,按`label`分不同Task取点或边数据。
|
||||||
|
单个Task中按`label`遍历点或边的id,再切分范围分多次请求查询点或边和属性数据,最后将点或边数据根据配置转换成指定格式记录发送给下游写插件。
|
||||||
|
|
||||||
|
GDBReader按`label`切分多个Task并发,同一个`label`的数据批量异步获取来加快读取速度。如果配置读取的`label`列表为空,任务启动前会从GDB查询所有`label`再切分Task。
|
||||||
|
|
||||||
|
## 3. 功能说明
|
||||||
|
|
||||||
|
GDB中点和边不同,读取需要区分点和边点配置。
|
||||||
|
|
||||||
|
### 3.1 点配置样例
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": {
|
||||||
|
"channel": 1
|
||||||
|
}
|
||||||
|
"errorLimit": {
|
||||||
|
"record": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "gdbreader",
|
||||||
|
"parameter": {
|
||||||
|
"host": "10.218.145.24",
|
||||||
|
"port": 8182,
|
||||||
|
"username": "***",
|
||||||
|
"password": "***",
|
||||||
|
"fetchBatchSize": 100,
|
||||||
|
"rangeSplitSize": 1000,
|
||||||
|
"labelType": "VERTEX",
|
||||||
|
"labels": ["label1", "label2"],
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"name": "id",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "label",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "age",
|
||||||
|
"type": "int",
|
||||||
|
"columnType": "vertexProperty"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"writer": {
|
||||||
|
"name": "streamwriter",
|
||||||
|
"parameter": {
|
||||||
|
"print": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 边配置样例
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": {
|
||||||
|
"channel": 1
|
||||||
|
},
|
||||||
|
"errorLimit": {
|
||||||
|
"record": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "gdbreader",
|
||||||
|
"parameter": {
|
||||||
|
"host": "10.218.145.24",
|
||||||
|
"port": 8182,
|
||||||
|
"username": "***",
|
||||||
|
"password": "***",
|
||||||
|
"fetchBatchSize": 100,
|
||||||
|
"rangeSplitSize": 1000,
|
||||||
|
"labelType": "EDGE",
|
||||||
|
"labels": ["label1", "label2"],
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"name": "id",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "label",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "srcId",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "srcLabel",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dstId",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dstLabel",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "name",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "edgeProperty"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "weight",
|
||||||
|
"type": "double",
|
||||||
|
"columnType": "edgeProperty"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"writer": {
|
||||||
|
"name": "streamwriter",
|
||||||
|
"parameter": {
|
||||||
|
"print": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.3 参数说明
|
||||||
|
|
||||||
|
* **host**
|
||||||
|
* 描述:GDB实例连接地址,对应'实例管理'->'基本信息'页面的网络地址
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **port**
|
||||||
|
* 描述:GDB实例连接地址对应的端口
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:8182
|
||||||
|
|
||||||
|
* **username**
|
||||||
|
* 描述:GDB实例账号名
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **password**
|
||||||
|
* 描述:GDB实例账号名对应的密码
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **fetchBatchSize**
|
||||||
|
* 描述:一次GDB请求读取点或边的数量,响应包含点或边以及属性
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:100
|
||||||
|
|
||||||
|
* **rangeSplitSize**
|
||||||
|
* 描述:id遍历,一次遍历请求扫描的id个数
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:10 \* fetchBatchSize
|
||||||
|
|
||||||
|
* **labels**
|
||||||
|
* 描述:标签数组,即需要导出的点或边标签,支持读取多个标签,用数组表示。如果留空([]),表示GDB中所有点或边标签
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **labelType**
|
||||||
|
* 描述:数据标签类型,支持点、边两种枚举值
|
||||||
|
* VERTEX:表示点
|
||||||
|
* EDGE:表示边
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **column**
|
||||||
|
* 描述:点或边字段映射关系配置
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **column -> name**
|
||||||
|
* 描述:点或边映射关系的字段名,指定属性时表示读取的属性名,读取其他字段时会被忽略
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **column -> type**
|
||||||
|
* 描述:点或边映射关系的字段类型
|
||||||
|
* id, label在GDB中都是string类型,配置非string类型时可能会转换失败
|
||||||
|
* 普通属性支持基础类型,包括int, long, float, double, boolean, string
|
||||||
|
* GDBReader尽量将读取到的数据转换成配置要求的类型,但转换失败会导致该条记录错误
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
|
||||||
|
* **column -> columnType**
|
||||||
|
* 描述:GDB点或边数据到列数据的映射关系,支持以下枚举值:
|
||||||
|
* primaryKey: 表示该字段是点或边的id
|
||||||
|
* primaryLabel: 表示该字段是点或边的label
|
||||||
|
* srcPrimaryKey: 表示该字段是边关联的起点id,只在读取边时使用
|
||||||
|
* srcPrimaryLabel: 表示该字段是边关联的起点label,只在读取边时使用
|
||||||
|
* dstPrimaryKey: 表示该字段是边关联的终点id,只在读取边时使用
|
||||||
|
* dstPrimaryLabel: 表示该字段是边关联的终点label,只在读取边时使用
|
||||||
|
* vertexProperty: 表示该字段是点的属性,只在读取点时使用,应用到SET属性时只读取其中的一个属性值
|
||||||
|
* vertexJsonProperty: 表示该字段是点的属性集合,只在读取点时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他vertexProperty配置一起使用
|
||||||
|
* edgeProperty: 表示该字段是边的属性,只在读取边时使用
|
||||||
|
* edgeJsonProperty: 表示该字段是边的属性集合,只在读取边时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他edgeProperty配置一起使用
|
||||||
|
* 必选:是
|
||||||
|
* 默认值:无
|
||||||
|
* vertexJsonProperty格式示例,新增`c`字段区分SET属性,但是SET属性只包含单个属性值时会标记成普通属性
|
||||||
|
```
|
||||||
|
{"properties":[
|
||||||
|
{"k":"name","t","string","v":"Jack","c":"set"},
|
||||||
|
{"k":"name","t","string","v":"Luck","c":"set"},
|
||||||
|
{"k":"age","t","int","v":"20","c":"single"}
|
||||||
|
]}
|
||||||
|
```
|
||||||
|
* edgeJsonProperty格式示例,边不支持多值属性
|
||||||
|
```
|
||||||
|
{"properties":[
|
||||||
|
{"k":"created_at","t","long","v":"153498653"},
|
||||||
|
{"k":"weight","t","double","v":"3.14"}
|
||||||
|
]}
|
||||||
|
|
||||||
|
## 4 性能报告
|
||||||
|
(TODO)
|
||||||
|
|
||||||
|
## 5 使用约束
|
||||||
|
无
|
||||||
|
|
||||||
|
## 6 FAQ
|
||||||
|
无
|
||||||
|
|
125
gdbreader/pom.xml
Normal file
125
gdbreader/pom.xml
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
<parent>
|
||||||
|
<artifactId>datax-all</artifactId>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
|
||||||
|
<artifactId>gdbreader</artifactId>
|
||||||
|
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-common</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<artifactId>slf4j-log4j12</artifactId>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-core</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<scope>test</scope>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<artifactId>slf4j-log4j12</artifactId>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
<artifactId>slf4j-api</artifactId>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>ch.qos.logback</groupId>
|
||||||
|
<artifactId>logback-classic</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.apache.tinkerpop</groupId>
|
||||||
|
<artifactId>gremlin-driver</artifactId>
|
||||||
|
<version>3.4.1</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.projectlombok</groupId>
|
||||||
|
<artifactId>lombok</artifactId>
|
||||||
|
<version>1.18.8</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.junit.jupiter</groupId>
|
||||||
|
<artifactId>junit-jupiter-api</artifactId>
|
||||||
|
<version>5.4.0</version>
|
||||||
|
<scope>test</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.junit.jupiter</groupId>
|
||||||
|
<artifactId>junit-jupiter-engine</artifactId>
|
||||||
|
<version>5.4.0</version>
|
||||||
|
<scope>test</scope>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
</dependencies>
|
||||||
|
|
||||||
|
<build>
|
||||||
|
<plugins>
|
||||||
|
<!-- compiler plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>1.6</source>
|
||||||
|
<target>1.6</target>
|
||||||
|
<encoding>${project-sourceEncoding}</encoding>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- test case plugin -->
|
||||||
|
<plugin>
|
||||||
|
<groupId>org.apache.maven.plugins</groupId>
|
||||||
|
<artifactId>maven-surefire-plugin</artifactId>
|
||||||
|
<version>2.22.0</version>
|
||||||
|
<configuration>
|
||||||
|
<includes>
|
||||||
|
<include>**/*Test*.class</include>
|
||||||
|
</includes>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- assembly plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-assembly-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<descriptors>
|
||||||
|
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||||
|
</descriptors>
|
||||||
|
<finalName>datax</finalName>
|
||||||
|
</configuration>
|
||||||
|
<executions>
|
||||||
|
<execution>
|
||||||
|
<id>dwzip</id>
|
||||||
|
<phase>package</phase>
|
||||||
|
<goals>
|
||||||
|
<goal>single</goal>
|
||||||
|
</goals>
|
||||||
|
</execution>
|
||||||
|
</executions>
|
||||||
|
</plugin>
|
||||||
|
<plugin>
|
||||||
|
<groupId>org.apache.maven.plugins</groupId>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>8</source>
|
||||||
|
<target>8</target>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
</plugins>
|
||||||
|
</build>
|
||||||
|
</project>
|
35
gdbreader/src/main/assembly/package.xml
Normal file
35
gdbreader/src/main/assembly/package.xml
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
<assembly
|
||||||
|
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||||
|
<id></id>
|
||||||
|
<formats>
|
||||||
|
<format>dir</format>
|
||||||
|
</formats>
|
||||||
|
<includeBaseDirectory>false</includeBaseDirectory>
|
||||||
|
<fileSets>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/resources</directory>
|
||||||
|
<includes>
|
||||||
|
<include>plugin.json</include>
|
||||||
|
<include>plugin_job_template.json</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/reader/gdbreader</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>target/</directory>
|
||||||
|
<includes>
|
||||||
|
<include>gdbreader-0.0.1-SNAPSHOT.jar</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/reader/gdbreader</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
</fileSets>
|
||||||
|
|
||||||
|
<dependencySets>
|
||||||
|
<dependencySet>
|
||||||
|
<useProjectArtifact>false</useProjectArtifact>
|
||||||
|
<outputDirectory>plugin/reader/gdbreader/libs</outputDirectory>
|
||||||
|
<scope>runtime</scope>
|
||||||
|
</dependencySet>
|
||||||
|
</dependencySets>
|
||||||
|
</assembly>
|
@ -0,0 +1,231 @@
|
|||||||
|
package com.alibaba.datax.plugin.reader.gdbreader;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Record;
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordSender;
|
||||||
|
import com.alibaba.datax.common.spi.Reader;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.mapping.DefaultGdbMapper;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRule;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRuleFactory;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.model.GdbGraph;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.model.ScriptGdbGraph;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.util.LinkedList;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
public class GdbReader extends Reader {
|
||||||
|
private final static int DEFAULT_FETCH_BATCH_SIZE = 200;
|
||||||
|
private static GdbGraph graph;
|
||||||
|
private static Key.ExportType exportType;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。
|
||||||
|
* <p/>
|
||||||
|
* 整个 Reader 执行流程是:
|
||||||
|
* <pre>
|
||||||
|
* Job类init-->prepare-->split
|
||||||
|
*
|
||||||
|
* Task类init-->prepare-->startRead-->post-->destroy
|
||||||
|
* Task类init-->prepare-->startRead-->post-->destroy
|
||||||
|
*
|
||||||
|
* Job类post-->destroy
|
||||||
|
* </pre>
|
||||||
|
*/
|
||||||
|
public static class Job extends Reader.Job {
|
||||||
|
private static final Logger LOG = LoggerFactory.getLogger(Job.class);
|
||||||
|
|
||||||
|
private Configuration jobConfig = null;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.jobConfig = super.getPluginJobConf();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?...
|
||||||
|
* 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。
|
||||||
|
*/
|
||||||
|
|
||||||
|
ConfigHelper.assertGdbClient(jobConfig);
|
||||||
|
ConfigHelper.assertLabels(jobConfig);
|
||||||
|
try {
|
||||||
|
exportType = Key.ExportType.valueOf(jobConfig.getString(Key.EXPORT_TYPE));
|
||||||
|
} catch (NullPointerException | IllegalArgumentException e) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.EXPORT_TYPE);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
||||||
|
*/
|
||||||
|
|
||||||
|
try {
|
||||||
|
graph = new ScriptGdbGraph(jobConfig, exportType);
|
||||||
|
} catch (RuntimeException e) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_CLIENT_CONNECT, e.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<Configuration> split(int adviceNumber) {
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。
|
||||||
|
* 这里的 adviceNumber 是框架根据用户的同步速度的要求建议的切分份数,仅供参考,不是强制必须切分的份数。
|
||||||
|
*/
|
||||||
|
List<String> labels = ConfigHelper.assertLabels(jobConfig);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 配置label列表为空时,尝试查询GDB中所有label,添加到读取列表
|
||||||
|
*/
|
||||||
|
if (labels.isEmpty()) {
|
||||||
|
try {
|
||||||
|
labels.addAll(graph.getLabels().keySet());
|
||||||
|
} catch (RuntimeException ex) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, ex.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (labels.isEmpty()) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, "none labels to read");
|
||||||
|
}
|
||||||
|
|
||||||
|
return ConfigHelper.splitConfig(jobConfig, labels);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。
|
||||||
|
*/
|
||||||
|
try {
|
||||||
|
graph.close();
|
||||||
|
} catch (Exception ex) {
|
||||||
|
LOG.error("Failed to close client : {}", ex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class Task extends Reader.Task {
|
||||||
|
private static final Logger LOG = LoggerFactory.getLogger(Task.class);
|
||||||
|
private static MappingRule rule;
|
||||||
|
private Configuration taskConfig;
|
||||||
|
private String fetchLabel = null;
|
||||||
|
|
||||||
|
private int rangeSplitSize;
|
||||||
|
private int fetchBatchSize;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.taskConfig = super.getPluginJobConf();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
|
* 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startRead()做准备。
|
||||||
|
*/
|
||||||
|
fetchLabel = taskConfig.getString(Key.LABEL);
|
||||||
|
fetchBatchSize = taskConfig.getInt(Key.FETCH_BATCH_SIZE, DEFAULT_FETCH_BATCH_SIZE);
|
||||||
|
rangeSplitSize = taskConfig.getInt(Key.RANGE_SPLIT_SIZE, fetchBatchSize * 10);
|
||||||
|
rule = MappingRuleFactory.getInstance().create(taskConfig, exportType);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法仅执行一次。
|
||||||
|
* 最佳实践:如果 Job 中有需要进行数据同步之后的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void startRead(RecordSender recordSender) {
|
||||||
|
/**
|
||||||
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
|
* 最佳实践:此处适当封装确保简洁清晰完成数据读取工作。
|
||||||
|
*/
|
||||||
|
|
||||||
|
String start = "";
|
||||||
|
while (true) {
|
||||||
|
List<String> ids;
|
||||||
|
try {
|
||||||
|
ids = graph.fetchIds(fetchLabel, start, rangeSplitSize);
|
||||||
|
if (ids.isEmpty()) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
start = ids.get(ids.size() - 1);
|
||||||
|
} catch (Exception ex) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_IDS, ex.getMessage());
|
||||||
|
}
|
||||||
|
|
||||||
|
// send range fetch async
|
||||||
|
int count = ids.size();
|
||||||
|
List<ResultSet> resultSets = new LinkedList<>();
|
||||||
|
for (int pos = 0; pos < count; pos += fetchBatchSize) {
|
||||||
|
int rangeSize = Math.min(fetchBatchSize, count - pos);
|
||||||
|
String endId = ids.get(pos + rangeSize - 1);
|
||||||
|
String beginId = ids.get(pos);
|
||||||
|
|
||||||
|
List<String> propNames = rule.isHasProperty() ? rule.getPropertyNames() : null;
|
||||||
|
try {
|
||||||
|
resultSets.add(graph.fetchElementsAsync(fetchLabel, beginId, endId, propNames));
|
||||||
|
} catch (Exception ex) {
|
||||||
|
// just print error logs and continues
|
||||||
|
LOG.error("failed to request label: {}, start: {}, end: {}, e: {}", fetchLabel, beginId, endId, ex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// get range fetch dsl results
|
||||||
|
resultSets.forEach(results -> {
|
||||||
|
try {
|
||||||
|
List<GdbElement> elements = graph.getElement(results);
|
||||||
|
elements.forEach(element -> {
|
||||||
|
Record record = recordSender.createRecord();
|
||||||
|
DefaultGdbMapper.getMapper(rule).accept(element, record);
|
||||||
|
recordSender.sendToWriter(record);
|
||||||
|
});
|
||||||
|
recordSender.flush();
|
||||||
|
} catch (Exception ex) {
|
||||||
|
LOG.error("failed to send records e {}", ex);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
|
* 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
/**
|
||||||
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
|
* 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,39 @@
|
|||||||
|
package com.alibaba.datax.plugin.reader.gdbreader;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.spi.ErrorCode;
|
||||||
|
|
||||||
|
public enum GdbReaderErrorCode implements ErrorCode {
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
BAD_CONFIG_VALUE("GdbReader-00", "The value you configured is invalid."),
|
||||||
|
FAIL_CLIENT_CONNECT("GdbReader-02", "GDB connection is abnormal."),
|
||||||
|
UNSUPPORTED_TYPE("GdbReader-03", "Unsupported data type conversion."),
|
||||||
|
FAIL_FETCH_LABELS("GdbReader-04", "Error pulling all labels, it is recommended to configure the specified label pull."),
|
||||||
|
FAIL_FETCH_IDS("GdbReader-05", "Pull range id error."),
|
||||||
|
;
|
||||||
|
|
||||||
|
private final String code;
|
||||||
|
private final String description;
|
||||||
|
|
||||||
|
private GdbReaderErrorCode(String code, String description) {
|
||||||
|
this.code = code;
|
||||||
|
this.description = description;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getCode() {
|
||||||
|
return this.code;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getDescription() {
|
||||||
|
return this.description;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
return String.format("Code:[%s], Description:[%s]. ", this.code,
|
||||||
|
this.description);
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,86 @@
|
|||||||
|
package com.alibaba.datax.plugin.reader.gdbreader;
|
||||||
|
|
||||||
|
public final class Key {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 此处声明插件用到的需要插件使用者提供的配置项
|
||||||
|
*/
|
||||||
|
public final static String HOST = "host";
|
||||||
|
public final static String PORT = "port";
|
||||||
|
public final static String USERNAME = "username";
|
||||||
|
public static final String PASSWORD = "password";
|
||||||
|
|
||||||
|
public static final String LABEL = "labels";
|
||||||
|
public static final String EXPORT_TYPE = "labelType";
|
||||||
|
|
||||||
|
public static final String RANGE_SPLIT_SIZE = "RangeSplitSize";
|
||||||
|
public static final String FETCH_BATCH_SIZE = "fetchBatchSize";
|
||||||
|
|
||||||
|
public static final String COLUMN = "column";
|
||||||
|
public static final String COLUMN_NAME = "name";
|
||||||
|
public static final String COLUMN_TYPE = "type";
|
||||||
|
public static final String COLUMN_NODE_TYPE = "columnType";
|
||||||
|
|
||||||
|
public enum ExportType {
|
||||||
|
/**
|
||||||
|
* Import vertices
|
||||||
|
*/
|
||||||
|
VERTEX,
|
||||||
|
/**
|
||||||
|
* Import edges
|
||||||
|
*/
|
||||||
|
EDGE
|
||||||
|
}
|
||||||
|
|
||||||
|
public enum ColumnType {
|
||||||
|
/**
|
||||||
|
* vertex or edge id
|
||||||
|
*/
|
||||||
|
primaryKey,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* vertex or edge label
|
||||||
|
*/
|
||||||
|
primaryLabel,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* vertex property
|
||||||
|
*/
|
||||||
|
vertexProperty,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* collects all vertex property to Json list
|
||||||
|
*/
|
||||||
|
vertexJsonProperty,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* start vertex id of edge
|
||||||
|
*/
|
||||||
|
srcPrimaryKey,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* start vertex label of edge
|
||||||
|
*/
|
||||||
|
srcPrimaryLabel,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* end vertex id of edge
|
||||||
|
*/
|
||||||
|
dstPrimaryKey,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* end vertex label of edge
|
||||||
|
*/
|
||||||
|
dstPrimaryLabel,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* edge property
|
||||||
|
*/
|
||||||
|
edgeProperty,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* collects all edge property to Json list
|
||||||
|
*/
|
||||||
|
edgeJsonProperty,
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,150 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.mapping;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Record;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement;
|
||||||
|
import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceProperty;
|
||||||
|
import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertexProperty;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.function.BiConsumer;
|
||||||
|
import java.util.function.Function;
|
||||||
|
import java.util.stream.Collectors;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class DefaultGdbMapper {
|
||||||
|
|
||||||
|
public static BiConsumer<GdbElement, Record> getMapper(MappingRule rule) {
|
||||||
|
return (gdbElement, record) -> rule.getColumns().forEach(columnMappingRule -> {
|
||||||
|
Object value = null;
|
||||||
|
ValueType type = columnMappingRule.getValueType();
|
||||||
|
String name = columnMappingRule.getName();
|
||||||
|
Map<String, Object> props = gdbElement.getProperties();
|
||||||
|
|
||||||
|
switch (columnMappingRule.getColumnType()) {
|
||||||
|
case dstPrimaryKey:
|
||||||
|
value = gdbElement.getTo();
|
||||||
|
break;
|
||||||
|
case srcPrimaryKey:
|
||||||
|
value = gdbElement.getFrom();
|
||||||
|
break;
|
||||||
|
case primaryKey:
|
||||||
|
value = gdbElement.getId();
|
||||||
|
break;
|
||||||
|
case primaryLabel:
|
||||||
|
value = gdbElement.getLabel();
|
||||||
|
break;
|
||||||
|
case dstPrimaryLabel:
|
||||||
|
value = gdbElement.getToLabel();
|
||||||
|
break;
|
||||||
|
case srcPrimaryLabel:
|
||||||
|
value = gdbElement.getFromLabel();
|
||||||
|
break;
|
||||||
|
case vertexProperty:
|
||||||
|
value = forVertexOnePropertyValue().apply(props.get(name));
|
||||||
|
break;
|
||||||
|
case edgeProperty:
|
||||||
|
value = forEdgePropertyValue().apply(props.get(name));
|
||||||
|
break;
|
||||||
|
case edgeJsonProperty:
|
||||||
|
value = forEdgeJsonProperties().apply(props);
|
||||||
|
break;
|
||||||
|
case vertexJsonProperty:
|
||||||
|
value = forVertexJsonProperties().apply(props);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
record.addColumn(type.applyObject(value));
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parser ReferenceProperty value for edge
|
||||||
|
*
|
||||||
|
* @return property value
|
||||||
|
*/
|
||||||
|
private static Function<Object, Object> forEdgePropertyValue() {
|
||||||
|
return prop -> {
|
||||||
|
if (prop instanceof ReferenceProperty) {
|
||||||
|
return ((ReferenceProperty) prop).value();
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parser ReferenceVertexProperty value for vertex
|
||||||
|
*
|
||||||
|
* @return the first property value in list
|
||||||
|
*/
|
||||||
|
private static Function<Object, Object> forVertexOnePropertyValue() {
|
||||||
|
return props -> {
|
||||||
|
if (props instanceof List<?>) {
|
||||||
|
// get the first one property if more than one
|
||||||
|
Object o = ((List) props).get(0);
|
||||||
|
if (o instanceof ReferenceVertexProperty) {
|
||||||
|
return ((ReferenceVertexProperty) o).value();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parser all edge properties to json string
|
||||||
|
*
|
||||||
|
* @return json string
|
||||||
|
*/
|
||||||
|
private static Function<Map<String, Object>, String> forEdgeJsonProperties() {
|
||||||
|
return props -> "{\"properties\":[" +
|
||||||
|
props.entrySet().stream().filter(p -> p.getValue() instanceof ReferenceProperty)
|
||||||
|
.map(p -> "{\"k\":\"" + ((ReferenceProperty) p.getValue()).key() + "\"," +
|
||||||
|
"\"t\":\"" + ((ReferenceProperty) p.getValue()).value().getClass().getSimpleName().toLowerCase() + "\"," +
|
||||||
|
"\"v\":\"" + String.valueOf(((ReferenceProperty) p.getValue()).value()) + "\"}")
|
||||||
|
.collect(Collectors.joining(",")) +
|
||||||
|
"]}";
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parser all vertex properties to json string, include set-property
|
||||||
|
*
|
||||||
|
* @return json string
|
||||||
|
*/
|
||||||
|
private static Function<Map<String, Object>, String> forVertexJsonProperties() {
|
||||||
|
return props -> "{\"properties\":[" +
|
||||||
|
props.entrySet().stream().filter(p -> p.getValue() instanceof List<?>)
|
||||||
|
.map(p -> forVertexPropertyStr().apply((List<?>) p.getValue()))
|
||||||
|
.collect(Collectors.joining(",")) +
|
||||||
|
"]}";
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parser one vertex property to json string item, set 'cardinality'
|
||||||
|
*
|
||||||
|
* @return json string item
|
||||||
|
*/
|
||||||
|
private static Function<List<?>, String> forVertexPropertyStr() {
|
||||||
|
return vp -> {
|
||||||
|
final String setFlag = vp.size() > 1 ? "set" : "single";
|
||||||
|
return vp.stream().filter(p -> p instanceof ReferenceVertexProperty)
|
||||||
|
.map(p -> "{\"k\":\"" + ((ReferenceVertexProperty) p).key() + "\"," +
|
||||||
|
"\"t\":\"" + ((ReferenceVertexProperty) p).value().getClass().getSimpleName().toLowerCase() + "\"," +
|
||||||
|
"\"v\":\"" + String.valueOf(((ReferenceVertexProperty) p).value()) + "\"," +
|
||||||
|
"\"c\":\"" + setFlag + "\"}")
|
||||||
|
.collect(Collectors.joining(","));
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,79 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.mapping;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType;
|
||||||
|
import lombok.Data;
|
||||||
|
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
@Data
|
||||||
|
public class MappingRule {
|
||||||
|
private boolean hasRelation = false;
|
||||||
|
private boolean hasProperty = false;
|
||||||
|
private ExportType type = ExportType.VERTEX;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* property names for property key-value
|
||||||
|
*/
|
||||||
|
private List<String> propertyNames = new ArrayList<>();
|
||||||
|
|
||||||
|
private List<ColumnMappingRule> columns = new ArrayList<>();
|
||||||
|
|
||||||
|
void addColumn(ColumnType columnType, ValueType type, String name) {
|
||||||
|
ColumnMappingRule rule = new ColumnMappingRule();
|
||||||
|
rule.setColumnType(columnType);
|
||||||
|
rule.setName(name);
|
||||||
|
rule.setValueType(type);
|
||||||
|
|
||||||
|
if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty) {
|
||||||
|
propertyNames.add(name);
|
||||||
|
hasProperty = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
boolean hasTo = columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel;
|
||||||
|
boolean hasFrom = columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel;
|
||||||
|
if (hasTo || hasFrom) {
|
||||||
|
hasRelation = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
columns.add(rule);
|
||||||
|
}
|
||||||
|
|
||||||
|
void addJsonColumn(ColumnType columnType) {
|
||||||
|
ColumnMappingRule rule = new ColumnMappingRule();
|
||||||
|
rule.setColumnType(columnType);
|
||||||
|
rule.setName("json");
|
||||||
|
rule.setValueType(ValueType.STRING);
|
||||||
|
|
||||||
|
if (!propertyNames.isEmpty()) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "JsonProperties should be only property");
|
||||||
|
}
|
||||||
|
|
||||||
|
columns.add(rule);
|
||||||
|
hasProperty = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Data
|
||||||
|
protected static class ColumnMappingRule {
|
||||||
|
private String name = null;
|
||||||
|
|
||||||
|
private ValueType valueType = null;
|
||||||
|
|
||||||
|
private ColumnType columnType = null;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,76 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.mapping;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/20
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class MappingRuleFactory {
|
||||||
|
private static final MappingRuleFactory instance = new MappingRuleFactory();
|
||||||
|
|
||||||
|
public static MappingRuleFactory getInstance() {
|
||||||
|
return instance;
|
||||||
|
}
|
||||||
|
|
||||||
|
public MappingRule create(Configuration config, ExportType exportType) {
|
||||||
|
MappingRule rule = new MappingRule();
|
||||||
|
|
||||||
|
rule.setType(exportType);
|
||||||
|
List<Configuration> configurationList = config.getListConfiguration(Key.COLUMN);
|
||||||
|
for (Configuration column : configurationList) {
|
||||||
|
ColumnType columnType;
|
||||||
|
try {
|
||||||
|
columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE));
|
||||||
|
} catch (NullPointerException | IllegalArgumentException e) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.COLUMN_NODE_TYPE);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (exportType == ExportType.VERTEX) {
|
||||||
|
// only id/label/property column allow when vertex
|
||||||
|
ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () ->
|
||||||
|
columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel
|
||||||
|
|| columnType == ColumnType.vertexProperty || columnType == ColumnType.vertexJsonProperty);
|
||||||
|
} else if (exportType == ExportType.EDGE) {
|
||||||
|
// edge
|
||||||
|
ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () ->
|
||||||
|
columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel
|
||||||
|
|| columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel
|
||||||
|
|| columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel
|
||||||
|
|| columnType == ColumnType.edgeProperty || columnType == ColumnType.edgeJsonProperty);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (columnType == ColumnType.edgeProperty || columnType == ColumnType.vertexProperty) {
|
||||||
|
String name = column.getString(Key.COLUMN_NAME);
|
||||||
|
ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
||||||
|
|
||||||
|
ConfigHelper.assertConfig(Key.COLUMN_NAME, () -> name != null);
|
||||||
|
if (propType == null) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.UNSUPPORTED_TYPE, Key.COLUMN_TYPE);
|
||||||
|
}
|
||||||
|
rule.addColumn(columnType, propType, name);
|
||||||
|
} else if (columnType == ColumnType.vertexJsonProperty || columnType == ColumnType.edgeJsonProperty) {
|
||||||
|
rule.addJsonColumn(columnType);
|
||||||
|
} else {
|
||||||
|
rule.addColumn(columnType, ValueType.STRING, null);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return rule;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,128 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.mapping;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.BoolColumn;
|
||||||
|
import com.alibaba.datax.common.element.Column;
|
||||||
|
import com.alibaba.datax.common.element.DoubleColumn;
|
||||||
|
import com.alibaba.datax.common.element.LongColumn;
|
||||||
|
import com.alibaba.datax.common.element.StringColumn;
|
||||||
|
|
||||||
|
import java.util.HashMap;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.function.Function;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public enum ValueType {
|
||||||
|
/**
|
||||||
|
* transfer gdb element object value to DataX Column data
|
||||||
|
* <p>
|
||||||
|
* int, long -> LongColumn
|
||||||
|
* float, double -> DoubleColumn
|
||||||
|
* bool -> BooleanColumn
|
||||||
|
* string -> StringColumn
|
||||||
|
*/
|
||||||
|
INT(Integer.class, "int", ValueTypeHolder::longColumnMapper),
|
||||||
|
INTEGER(Integer.class, "integer", ValueTypeHolder::longColumnMapper),
|
||||||
|
LONG(Long.class, "long", ValueTypeHolder::longColumnMapper),
|
||||||
|
DOUBLE(Double.class, "double", ValueTypeHolder::doubleColumnMapper),
|
||||||
|
FLOAT(Float.class, "float", ValueTypeHolder::doubleColumnMapper),
|
||||||
|
BOOLEAN(Boolean.class, "boolean", ValueTypeHolder::boolColumnMapper),
|
||||||
|
STRING(String.class, "string", ValueTypeHolder::stringColumnMapper),
|
||||||
|
;
|
||||||
|
|
||||||
|
private Class<?> type = null;
|
||||||
|
private String shortName = null;
|
||||||
|
private Function<Object, Column> columnFunc = null;
|
||||||
|
|
||||||
|
ValueType(Class<?> type, String name, Function<Object, Column> columnFunc) {
|
||||||
|
this.type = type;
|
||||||
|
this.shortName = name;
|
||||||
|
this.columnFunc = columnFunc;
|
||||||
|
|
||||||
|
ValueTypeHolder.shortName2type.put(shortName, this);
|
||||||
|
}
|
||||||
|
|
||||||
|
public static ValueType fromShortName(String name) {
|
||||||
|
return ValueTypeHolder.shortName2type.get(name);
|
||||||
|
}
|
||||||
|
|
||||||
|
public Column applyObject(Object value) {
|
||||||
|
if (value == null) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
return columnFunc.apply(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static class ValueTypeHolder {
|
||||||
|
private static Map<String, ValueType> shortName2type = new HashMap<>();
|
||||||
|
|
||||||
|
private static LongColumn longColumnMapper(Object o) {
|
||||||
|
long v;
|
||||||
|
if (o instanceof Integer) {
|
||||||
|
v = (int) o;
|
||||||
|
} else if (o instanceof Long) {
|
||||||
|
v = (long) o;
|
||||||
|
} else if (o instanceof String) {
|
||||||
|
v = Long.valueOf((String) o);
|
||||||
|
} else {
|
||||||
|
throw new RuntimeException("Failed to cast " + o.getClass() + " to Long");
|
||||||
|
}
|
||||||
|
|
||||||
|
return new LongColumn(v);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static DoubleColumn doubleColumnMapper(Object o) {
|
||||||
|
double v;
|
||||||
|
if (o instanceof Integer) {
|
||||||
|
v = (double) (int) o;
|
||||||
|
} else if (o instanceof Long) {
|
||||||
|
v = (double) (long) o;
|
||||||
|
} else if (o instanceof Float) {
|
||||||
|
v = (double) (float) o;
|
||||||
|
} else if (o instanceof Double) {
|
||||||
|
v = (double) o;
|
||||||
|
} else if (o instanceof String) {
|
||||||
|
v = Double.valueOf((String) o);
|
||||||
|
} else {
|
||||||
|
throw new RuntimeException("Failed to cast " + o.getClass() + " to Double");
|
||||||
|
}
|
||||||
|
|
||||||
|
return new DoubleColumn(v);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static BoolColumn boolColumnMapper(Object o) {
|
||||||
|
boolean v;
|
||||||
|
if (o instanceof Integer) {
|
||||||
|
v = ((int) o != 0);
|
||||||
|
} else if (o instanceof Long) {
|
||||||
|
v = ((long) o != 0);
|
||||||
|
} else if (o instanceof Boolean) {
|
||||||
|
v = (boolean) o;
|
||||||
|
} else if (o instanceof String) {
|
||||||
|
v = Boolean.valueOf((String) o);
|
||||||
|
} else {
|
||||||
|
throw new RuntimeException("Failed to cast " + o.getClass() + " to Boolean");
|
||||||
|
}
|
||||||
|
|
||||||
|
return new BoolColumn(v);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static StringColumn stringColumnMapper(Object o) {
|
||||||
|
if (o instanceof String) {
|
||||||
|
return new StringColumn((String) o);
|
||||||
|
} else {
|
||||||
|
return new StringColumn(String.valueOf(o));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,89 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.model;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.Client;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.Cluster;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.RequestOptions;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.Result;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.ser.Serializers;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public abstract class AbstractGdbGraph implements GdbGraph {
|
||||||
|
final static int DEFAULT_TIMEOUT = 30000;
|
||||||
|
private static final Logger log = LoggerFactory.getLogger(AbstractGdbGraph.class);
|
||||||
|
private Client client;
|
||||||
|
|
||||||
|
AbstractGdbGraph() {
|
||||||
|
}
|
||||||
|
|
||||||
|
AbstractGdbGraph(Configuration config) {
|
||||||
|
log.info("init graphdb client");
|
||||||
|
String host = config.getString(Key.HOST);
|
||||||
|
int port = config.getInt(Key.PORT);
|
||||||
|
String username = config.getString(Key.USERNAME);
|
||||||
|
String password = config.getString(Key.PASSWORD);
|
||||||
|
|
||||||
|
try {
|
||||||
|
Cluster cluster = Cluster.build(host).port(port).credentials(username, password)
|
||||||
|
.serializer(Serializers.GRAPHBINARY_V1D0)
|
||||||
|
.maxContentLength(1024 * 1024)
|
||||||
|
.resultIterationBatchSize(64)
|
||||||
|
.create();
|
||||||
|
client = cluster.connect().init();
|
||||||
|
|
||||||
|
warmClient();
|
||||||
|
} catch (RuntimeException e) {
|
||||||
|
log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e);
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
protected List<Result> runInternal(String dsl, Map<String, Object> params) throws Exception {
|
||||||
|
return runInternalAsync(dsl, params).all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS);
|
||||||
|
}
|
||||||
|
|
||||||
|
protected ResultSet runInternalAsync(String dsl, Map<String, Object> params) throws Exception {
|
||||||
|
RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT);
|
||||||
|
if (params != null && !params.isEmpty()) {
|
||||||
|
params.forEach(options::addParameter);
|
||||||
|
}
|
||||||
|
return client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void warmClient() {
|
||||||
|
try {
|
||||||
|
runInternal("g.V('test')", null);
|
||||||
|
log.info("warm graphdb client over");
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.error("warmClient error");
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void close() throws Exception {
|
||||||
|
if (client != null) {
|
||||||
|
log.info("close graphdb client");
|
||||||
|
client.close();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,39 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.model;
|
||||||
|
|
||||||
|
import lombok.Data;
|
||||||
|
|
||||||
|
import java.util.HashMap;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
@Data
|
||||||
|
public class GdbElement {
|
||||||
|
String id = null;
|
||||||
|
String label = null;
|
||||||
|
String to = null;
|
||||||
|
String from = null;
|
||||||
|
String toLabel = null;
|
||||||
|
String fromLabel = null;
|
||||||
|
|
||||||
|
Map<String, Object> properties = new HashMap<>();
|
||||||
|
|
||||||
|
public GdbElement() {
|
||||||
|
}
|
||||||
|
|
||||||
|
public GdbElement(String id, String label) {
|
||||||
|
this.id = id;
|
||||||
|
this.label = label;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,65 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.model;
|
||||||
|
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public interface GdbGraph extends AutoCloseable {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get All labels of GraphDB
|
||||||
|
*
|
||||||
|
* @return labels map included numbers
|
||||||
|
*/
|
||||||
|
Map<String, Long> getLabels();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the Ids list of special 'label', size up to 'limit'
|
||||||
|
*
|
||||||
|
* @param label is Label of Vertex or Edge
|
||||||
|
* @param start of Ids range to get
|
||||||
|
* @param limit size of Ids list
|
||||||
|
* @return Ids list
|
||||||
|
*/
|
||||||
|
List<String> fetchIds(String label, String start, long limit);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fetch element in async mode, just send query dsl to server
|
||||||
|
*
|
||||||
|
* @param label node label to filter
|
||||||
|
* @param start range begin(included)
|
||||||
|
* @param end range end(included)
|
||||||
|
* @param propNames propKey list to fetch
|
||||||
|
* @return future to get result later
|
||||||
|
*/
|
||||||
|
ResultSet fetchElementsAsync(String label, String start, String end, List<String> propNames);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get get element from Response @{ResultSet}
|
||||||
|
*
|
||||||
|
* @param results Response of Server
|
||||||
|
* @return element sets
|
||||||
|
*/
|
||||||
|
List<GdbElement> getElement(ResultSet results);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* close graph client
|
||||||
|
*
|
||||||
|
* @throws Exception if fails
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
void close() throws Exception;
|
||||||
|
}
|
@ -0,0 +1,192 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.model;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.Result;
|
||||||
|
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
||||||
|
import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceEdge;
|
||||||
|
import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.HashMap;
|
||||||
|
import java.util.LinkedList;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class ScriptGdbGraph extends AbstractGdbGraph {
|
||||||
|
private static final Logger log = LoggerFactory.getLogger(ScriptGdbGraph.class);
|
||||||
|
|
||||||
|
private final static String LABEL = "GDB___LABEL";
|
||||||
|
private final static String START_ID = "GDB___ID";
|
||||||
|
private final static String END_ID = "GDB___ID_END";
|
||||||
|
private final static String LIMIT = "GDB___LIMIT";
|
||||||
|
|
||||||
|
private final static String FETCH_VERTEX_IDS_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()";
|
||||||
|
private final static String FETCH_EDGE_IDS_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()";
|
||||||
|
|
||||||
|
private final static String FETCH_VERTEX_LABELS_DSL = "g.V().groupCount().by(label)";
|
||||||
|
private final static String FETCH_EDGE_LABELS_DSL = "g.E().groupCount().by(label)";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* fetch node range [START_ID, END_ID]
|
||||||
|
*/
|
||||||
|
private final static String FETCH_RANGE_VERTEX_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))";
|
||||||
|
private final static String FETCH_RANGE_EDGE_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))";
|
||||||
|
private final static String PART_WITH_PROP_DSL = ".as('a').project('node', 'props').by(select('a')).by(select('a').propertyMap(";
|
||||||
|
|
||||||
|
private final ExportType exportType;
|
||||||
|
|
||||||
|
public ScriptGdbGraph(ExportType exportType) {
|
||||||
|
super();
|
||||||
|
this.exportType = exportType;
|
||||||
|
}
|
||||||
|
|
||||||
|
public ScriptGdbGraph(Configuration config, ExportType exportType) {
|
||||||
|
super(config);
|
||||||
|
this.exportType = exportType;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<String> fetchIds(final String label, final String start, long limit) {
|
||||||
|
Map<String, Object> params = new HashMap<String, Object>(3) {{
|
||||||
|
put(LABEL, label);
|
||||||
|
put(START_ID, start);
|
||||||
|
put(LIMIT, limit);
|
||||||
|
}};
|
||||||
|
String fetchDsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_IDS_DSL : FETCH_EDGE_IDS_DSL;
|
||||||
|
|
||||||
|
List<String> ids = new ArrayList<>();
|
||||||
|
try {
|
||||||
|
List<Result> results = runInternal(fetchDsl, params);
|
||||||
|
|
||||||
|
// transfer result to id string
|
||||||
|
results.forEach(id -> ids.add(id.getString()));
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.error("fetch range node failed, label {}, start {}", label, start);
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
return ids;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ResultSet fetchElementsAsync(final String label, final String start, final String end, final List<String> propNames) {
|
||||||
|
Map<String, Object> params = new HashMap<>(3);
|
||||||
|
params.put(LABEL, label);
|
||||||
|
params.put(START_ID, start);
|
||||||
|
params.put(END_ID, end);
|
||||||
|
|
||||||
|
String prefixDsl = exportType == ExportType.VERTEX ? FETCH_RANGE_VERTEX_DSL : FETCH_RANGE_EDGE_DSL;
|
||||||
|
StringBuilder fetchDsl = new StringBuilder(prefixDsl);
|
||||||
|
if (propNames != null) {
|
||||||
|
fetchDsl.append(PART_WITH_PROP_DSL);
|
||||||
|
for (int i = 0; i < propNames.size(); i++) {
|
||||||
|
String propName = "GDB___PK" + String.valueOf(i);
|
||||||
|
params.put(propName, propNames.get(i));
|
||||||
|
|
||||||
|
fetchDsl.append(propName);
|
||||||
|
if (i != propNames.size() - 1) {
|
||||||
|
fetchDsl.append(", ");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fetchDsl.append("))");
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
return runInternalAsync(fetchDsl.toString(), params);
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.error("Failed to fetch range node startId {}, end {} , e {}", start, end, e);
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
@SuppressWarnings("unchecked")
|
||||||
|
public List<GdbElement> getElement(ResultSet results) {
|
||||||
|
List<GdbElement> elements = new LinkedList<>();
|
||||||
|
try {
|
||||||
|
List<Result> resultList = results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS);
|
||||||
|
|
||||||
|
resultList.forEach(n -> {
|
||||||
|
Object o = n.getObject();
|
||||||
|
GdbElement element = new GdbElement();
|
||||||
|
if (o instanceof Map) {
|
||||||
|
// project response
|
||||||
|
Object node = ((Map) o).get("node");
|
||||||
|
Object props = ((Map) o).get("props");
|
||||||
|
|
||||||
|
mapNodeToElement(node, element);
|
||||||
|
mapPropToElement((Map<String, Object>) props, element);
|
||||||
|
} else {
|
||||||
|
// range node response
|
||||||
|
mapNodeToElement(n.getObject(), element);
|
||||||
|
}
|
||||||
|
if (element.getId() != null) {
|
||||||
|
elements.add(element);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.error("Failed to get node: {}", e);
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
return elements;
|
||||||
|
}
|
||||||
|
|
||||||
|
private void mapNodeToElement(Object node, GdbElement element) {
|
||||||
|
if (node instanceof ReferenceVertex) {
|
||||||
|
ReferenceVertex v = (ReferenceVertex) node;
|
||||||
|
|
||||||
|
element.setId((String) v.id());
|
||||||
|
element.setLabel(v.label());
|
||||||
|
} else if (node instanceof ReferenceEdge) {
|
||||||
|
ReferenceEdge e = (ReferenceEdge) node;
|
||||||
|
|
||||||
|
element.setId((String) e.id());
|
||||||
|
element.setLabel(e.label());
|
||||||
|
element.setTo((String) e.inVertex().id());
|
||||||
|
element.setToLabel(e.inVertex().label());
|
||||||
|
element.setFrom((String) e.outVertex().id());
|
||||||
|
element.setFromLabel(e.outVertex().label());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private void mapPropToElement(Map<String, Object> props, GdbElement element) {
|
||||||
|
element.setProperties(props);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Map<String, Long> getLabels() {
|
||||||
|
String dsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_LABELS_DSL : FETCH_EDGE_LABELS_DSL;
|
||||||
|
|
||||||
|
try {
|
||||||
|
List<Result> results = runInternal(dsl, null);
|
||||||
|
Map<String, Long> labelMap = new HashMap<>(2);
|
||||||
|
|
||||||
|
Map<?, ?> labels = results.get(0).get(Map.class);
|
||||||
|
labels.forEach((k, v) -> {
|
||||||
|
String label = (String) k;
|
||||||
|
Long count = (Long) v;
|
||||||
|
labelMap.put(label, count);
|
||||||
|
});
|
||||||
|
|
||||||
|
return labelMap;
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.error("Failed to fetch label list, please give special labels and run again, e {}", e);
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,77 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.reader.gdbreader.util;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.reader.gdbreader.Key;
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.io.InputStream;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.function.Supplier;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/9/6
|
||||||
|
*/
|
||||||
|
|
||||||
|
public interface ConfigHelper {
|
||||||
|
static void assertConfig(String key, Supplier<Boolean> f) {
|
||||||
|
if (!f.get()) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, key);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static void assertHasContent(Configuration config, String key) {
|
||||||
|
assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key)));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void assertGdbClient(Configuration config) {
|
||||||
|
assertHasContent(config, Key.HOST);
|
||||||
|
assertConfig(Key.PORT, () -> config.getInt(Key.PORT) > 0);
|
||||||
|
|
||||||
|
assertHasContent(config, Key.USERNAME);
|
||||||
|
assertHasContent(config, Key.PASSWORD);
|
||||||
|
}
|
||||||
|
|
||||||
|
static List<String> assertLabels(Configuration config) {
|
||||||
|
Object labels = config.get(Key.LABEL);
|
||||||
|
if (!(labels instanceof List)) {
|
||||||
|
throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "labels should be List");
|
||||||
|
}
|
||||||
|
|
||||||
|
List<?> list = (List<?>) labels;
|
||||||
|
List<String> configLabels = new ArrayList<>(0);
|
||||||
|
list.forEach(n -> configLabels.add(String.valueOf(n)));
|
||||||
|
|
||||||
|
return configLabels;
|
||||||
|
}
|
||||||
|
|
||||||
|
static List<Configuration> splitConfig(Configuration config, List<String> labels) {
|
||||||
|
List<Configuration> configs = new ArrayList<>();
|
||||||
|
for (String label : labels) {
|
||||||
|
Configuration conf = config.clone();
|
||||||
|
conf.set(Key.LABEL, label);
|
||||||
|
|
||||||
|
configs.add(conf);
|
||||||
|
}
|
||||||
|
return configs;
|
||||||
|
}
|
||||||
|
|
||||||
|
static Configuration fromClasspath(String name) {
|
||||||
|
try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) {
|
||||||
|
return Configuration.from(is);
|
||||||
|
} catch (IOException e) {
|
||||||
|
throw new IllegalArgumentException("File not found: " + name);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
6
gdbreader/src/main/resources/plugin.json
Normal file
6
gdbreader/src/main/resources/plugin.json
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"name": "gdbreader",
|
||||||
|
"class": "com.alibaba.datax.plugin.reader.gdbreader.GdbReader",
|
||||||
|
"description": "useScene: prod. mechanism: connect GDB with gremlin-client, execute 'g.V().propertyMap() or g.E().propertyMap()' to get record",
|
||||||
|
"developer": "alibaba"
|
||||||
|
}
|
77
gdbreader/src/main/resources/plugin_job_template.json
Normal file
77
gdbreader/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,77 @@
|
|||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": {
|
||||||
|
"channel": 1
|
||||||
|
},
|
||||||
|
"errorLimit": {
|
||||||
|
"record": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "gdbreader",
|
||||||
|
"parameter": {
|
||||||
|
"host": "10.218.145.24",
|
||||||
|
"port": 8182,
|
||||||
|
"username": "***",
|
||||||
|
"password": "***",
|
||||||
|
"labelType": "EDGE",
|
||||||
|
"labels": ["label1", "label2"],
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"name": "id",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "label",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "primaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "srcId",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "srcLabel",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dstId",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryKey"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "dstLabel",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "srcPrimaryLabel"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "name",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "edgeProperty"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "weight",
|
||||||
|
"type": "double",
|
||||||
|
"columnType": "edgeProperty"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"writer": {
|
||||||
|
"name": "streamwriter",
|
||||||
|
"parameter": {
|
||||||
|
"print": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
@ -41,6 +41,14 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
{
|
{
|
||||||
"random": "60,64",
|
"random": "60,64",
|
||||||
"type": "string"
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"random": "100,1000",
|
||||||
|
"type": "long"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"random": "32,48",
|
||||||
|
"type": "string"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"sliceRecordCount": 1000
|
"sliceRecordCount": 1000
|
||||||
@ -55,20 +63,32 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
"password": "***",
|
"password": "***",
|
||||||
"writeMode": "INSERT",
|
"writeMode": "INSERT",
|
||||||
"labelType": "VERTEX",
|
"labelType": "VERTEX",
|
||||||
"label": "${1}",
|
"label": "#{1}",
|
||||||
"idTransRule": "none",
|
"idTransRule": "none",
|
||||||
"session": true,
|
"session": true,
|
||||||
"maxRecordsInBatch": 64,
|
"maxRecordsInBatch": 64,
|
||||||
"column": [
|
"column": [
|
||||||
{
|
{
|
||||||
"name": "id",
|
"name": "id",
|
||||||
"value": "${0}",
|
"value": "#{0}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "primaryKey"
|
"columnType": "primaryKey"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "vertex_propKey",
|
"name": "vertex_propKey",
|
||||||
"value": "${2}",
|
"value": "#{2}",
|
||||||
|
"type": "string",
|
||||||
|
"columnType": "vertexSetProperty"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "vertex_propKey",
|
||||||
|
"value": "#{3}",
|
||||||
|
"type": "long",
|
||||||
|
"columnType": "vertexSetProperty"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "vertex_propKey2",
|
||||||
|
"value": "#{4}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "vertexProperty"
|
"columnType": "vertexProperty"
|
||||||
}
|
}
|
||||||
@ -134,7 +154,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
"password": "***",
|
"password": "***",
|
||||||
"writeMode": "INSERT",
|
"writeMode": "INSERT",
|
||||||
"labelType": "EDGE",
|
"labelType": "EDGE",
|
||||||
"label": "${3}",
|
"label": "#{3}",
|
||||||
"idTransRule": "none",
|
"idTransRule": "none",
|
||||||
"srcIdTransRule": "labelPrefix",
|
"srcIdTransRule": "labelPrefix",
|
||||||
"dstIdTransRule": "labelPrefix",
|
"dstIdTransRule": "labelPrefix",
|
||||||
@ -144,25 +164,25 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
"column": [
|
"column": [
|
||||||
{
|
{
|
||||||
"name": "id",
|
"name": "id",
|
||||||
"value": "${0}",
|
"value": "#{0}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "primaryKey"
|
"columnType": "primaryKey"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "id",
|
"name": "id",
|
||||||
"value": "${1}",
|
"value": "#{1}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "srcPrimaryKey"
|
"columnType": "srcPrimaryKey"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "id",
|
"name": "id",
|
||||||
"value": "${2}",
|
"value": "#{2}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "dstPrimaryKey"
|
"columnType": "dstPrimaryKey"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "edge_propKey",
|
"name": "edge_propKey",
|
||||||
"value": "${4}",
|
"value": "#{4}",
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"columnType": "edgeProperty"
|
"columnType": "edgeProperty"
|
||||||
}
|
}
|
||||||
@ -199,7 +219,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
* **label**
|
* **label**
|
||||||
* 描述:类型名,即点/边名称; label支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始;
|
* 描述:类型名,即点/边名称; label支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始;
|
||||||
* 必选:是
|
* 必选:是
|
||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
@ -211,12 +231,12 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
* **srcLabel**
|
* **srcLabel**
|
||||||
* 描述:当label为边时,表示起点的点名称;srcLabel支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始;
|
* 描述:当label为边时,表示起点的点名称;srcLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始;
|
||||||
* 必选:labelType为边,srcIdTransRule为none时可不填写,否则必填;
|
* 必选:labelType为边,srcIdTransRule为none时可不填写,否则必填;
|
||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
* **dstLabel**
|
* **dstLabel**
|
||||||
* 描述:当label为边时,表示终点的点名称;dstLabel支持从源列中读取,如${0},表示取第一列字段作为label名。源列索引从0开始;
|
* 描述:当label为边时,表示终点的点名称;dstLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始;
|
||||||
* 必选:labelType为边,dstIdTransRule为none时可不填写,否则必填;
|
* 必选:labelType为边,dstIdTransRule为none时可不填写,否则必填;
|
||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
@ -271,9 +291,9 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
|
|
||||||
* **column -> value**
|
* **column -> value**
|
||||||
* 描述:点/边映射关系的字段值;
|
* 描述:点/边映射关系的字段值;
|
||||||
* ${N}表示直接映射源端值,N为源端column索引,从0开始;${0}表示映射源端column第1个字段;
|
* #{N}表示直接映射源端值,N为源端column索引,从0开始;#{0}表示映射源端column第1个字段;
|
||||||
* test-${0} 表示源端值做拼接转换,${0}值前/后可添加固定字符串;
|
* test-#{0} 表示源端值做拼接转换,#{0}值前/后可添加固定字符串;
|
||||||
* ${0}-${1}表示做多字段拼接,也可在任意位置添加固定字符串,如test-${0}-test1-${1}-test2
|
* #{0}-#{1}表示做多字段拼接,也可在任意位置添加固定字符串,如test-#{0}-test1-#{1}-test2
|
||||||
* 必选:是
|
* 必选:是
|
||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
@ -290,6 +310,7 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
* primaryKey:表示该字段是主键id
|
* primaryKey:表示该字段是主键id
|
||||||
* 点枚举值:
|
* 点枚举值:
|
||||||
* vertexProperty:labelType为点时,表示该字段是点的普通属性
|
* vertexProperty:labelType为点时,表示该字段是点的普通属性
|
||||||
|
* vertexSetProperty:labelType为点时,表示该字段是点的SET属性,value是SET属性中的一个属性值
|
||||||
* vertexJsonProperty:labelType为点时,表示是点json属性,value结构请见备注**json properties示例**,点配置最多只允许出现一个json属性;
|
* vertexJsonProperty:labelType为点时,表示是点json属性,value结构请见备注**json properties示例**,点配置最多只允许出现一个json属性;
|
||||||
* 边枚举值:
|
* 边枚举值:
|
||||||
* srcPrimaryKey:labelType为边时,表示该字段是起点主键id
|
* srcPrimaryKey:labelType为边时,表示该字段是起点主键id
|
||||||
@ -305,6 +326,14 @@ GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GD
|
|||||||
> {"k":"age","t":"int","v":"20"},
|
> {"k":"age","t":"int","v":"20"},
|
||||||
> {"k":"sex","t":"string","v":"male"}
|
> {"k":"sex","t":"string","v":"male"}
|
||||||
> ]}
|
> ]}
|
||||||
|
>
|
||||||
|
> # json格式同样支持给点添加SET属性,格式如下
|
||||||
|
> {"properties":[
|
||||||
|
> {"k":"name","t":"string","v":"tom","c":"set"},
|
||||||
|
> {"k":"name","t":"string","v":"jack","c":"set"},
|
||||||
|
> {"k":"age","t":"int","v":"20"},
|
||||||
|
> {"k":"sex","t":"string","v":"male"}
|
||||||
|
> ]}
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
## 4 性能报告
|
## 4 性能报告
|
||||||
@ -367,4 +396,5 @@ DataX压测机器
|
|||||||
- GDBWriter插件与用户查询DSL使用相同的GDB实例端口,导入时可能会影响查询性能
|
- GDBWriter插件与用户查询DSL使用相同的GDB实例端口,导入时可能会影响查询性能
|
||||||
|
|
||||||
## FAQ
|
## FAQ
|
||||||
无
|
1. 使用SET属性需要升级GDB实例到`1.0.20`版本及以上。
|
||||||
|
2. 边只支持普通单值属性,不能给边写SET属性数据。
|
||||||
|
@ -1,10 +1,5 @@
|
|||||||
package com.alibaba.datax.plugin.writer.gdbwriter;
|
package com.alibaba.datax.plugin.writer.gdbwriter;
|
||||||
|
|
||||||
import java.util.ArrayList;
|
|
||||||
import java.util.List;
|
|
||||||
import java.util.concurrent.*;
|
|
||||||
import java.util.function.Function;
|
|
||||||
|
|
||||||
import com.alibaba.datax.common.element.Record;
|
import com.alibaba.datax.common.element.Record;
|
||||||
import com.alibaba.datax.common.exception.DataXException;
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
import com.alibaba.datax.common.plugin.RecordReceiver;
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
@ -18,24 +13,33 @@ import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule;
|
|||||||
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRuleFactory;
|
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRuleFactory;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph;
|
||||||
|
|
||||||
import groovy.lang.Tuple2;
|
import groovy.lang.Tuple2;
|
||||||
import io.netty.util.concurrent.DefaultThreadFactory;
|
import io.netty.util.concurrent.DefaultThreadFactory;
|
||||||
import lombok.extern.slf4j.Slf4j;
|
import lombok.extern.slf4j.Slf4j;
|
||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
public class GdbWriter extends Writer {
|
import java.util.ArrayList;
|
||||||
private static final Logger log = LoggerFactory.getLogger(GdbWriter.class);
|
import java.util.List;
|
||||||
|
import java.util.concurrent.ExecutorService;
|
||||||
|
import java.util.concurrent.Future;
|
||||||
|
import java.util.concurrent.LinkedBlockingDeque;
|
||||||
|
import java.util.concurrent.ThreadPoolExecutor;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
import java.util.function.Function;
|
||||||
|
|
||||||
private static Function<Record, GdbElement> mapper = null;
|
public class GdbWriter extends Writer {
|
||||||
private static GdbGraph globalGraph = null;
|
private static final Logger log = LoggerFactory.getLogger(GdbWriter.class);
|
||||||
private static boolean session = false;
|
|
||||||
|
private static Function<Record, GdbElement> mapper = null;
|
||||||
|
private static GdbGraph globalGraph = null;
|
||||||
|
private static boolean session = false;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。
|
* Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。
|
||||||
* <p/>
|
* <p/>
|
||||||
* 整个 Writer 执行流程是:
|
* 整个 Writer 执行流程是:
|
||||||
|
*
|
||||||
* <pre>
|
* <pre>
|
||||||
* Job类init-->prepare-->split
|
* Job类init-->prepare-->split
|
||||||
*
|
*
|
||||||
@ -46,17 +50,16 @@ public class GdbWriter extends Writer {
|
|||||||
* </pre>
|
* </pre>
|
||||||
*/
|
*/
|
||||||
public static class Job extends Writer.Job {
|
public static class Job extends Writer.Job {
|
||||||
private static final Logger LOG = LoggerFactory
|
private static final Logger LOG = LoggerFactory.getLogger(Job.class);
|
||||||
.getLogger(Job.class);
|
|
||||||
|
|
||||||
private Configuration jobConfig = null;
|
private Configuration jobConfig = null;
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public void init() {
|
public void init() {
|
||||||
LOG.info("GDB datax plugin writer job init begin ...");
|
LOG.info("GDB datax plugin writer job init begin ...");
|
||||||
this.jobConfig = getPluginJobConf();
|
this.jobConfig = getPluginJobConf();
|
||||||
GdbWriterConfig.of(this.jobConfig);
|
GdbWriterConfig.of(this.jobConfig);
|
||||||
LOG.info("GDB datax plugin writer job init end.");
|
LOG.info("GDB datax plugin writer job init end.");
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* 注意:此方法仅执行一次。
|
* 注意:此方法仅执行一次。
|
||||||
@ -71,37 +74,37 @@ public class GdbWriter extends Writer {
|
|||||||
* 注意:此方法仅执行一次。
|
* 注意:此方法仅执行一次。
|
||||||
* 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
* 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
||||||
*/
|
*/
|
||||||
super.prepare();
|
super.prepare();
|
||||||
|
|
||||||
MappingRule rule = MappingRuleFactory.getInstance().createV2(jobConfig);
|
final MappingRule rule = MappingRuleFactory.getInstance().createV2(this.jobConfig);
|
||||||
|
|
||||||
mapper = new DefaultGdbMapper().getMapper(rule);
|
mapper = new DefaultGdbMapper(this.jobConfig).getMapper(rule);
|
||||||
session = jobConfig.getBool(Key.SESSION_STATE, false);
|
session = this.jobConfig.getBool(Key.SESSION_STATE, false);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* client connect check before task
|
* client connect check before task
|
||||||
*/
|
*/
|
||||||
try {
|
try {
|
||||||
globalGraph = GdbGraphManager.instance().getGraph(jobConfig, false);
|
globalGraph = GdbGraphManager.instance().getGraph(this.jobConfig, false);
|
||||||
} catch (RuntimeException e) {
|
} catch (final RuntimeException e) {
|
||||||
throw DataXException.asDataXException(GdbWriterErrorCode.FAIL_CLIENT_CONNECT, e.getMessage());
|
throw DataXException.asDataXException(GdbWriterErrorCode.FAIL_CLIENT_CONNECT, e.getMessage());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public List<Configuration> split(int mandatoryNumber) {
|
public List<Configuration> split(final int mandatoryNumber) {
|
||||||
/**
|
/**
|
||||||
* 注意:此方法仅执行一次。
|
* 注意:此方法仅执行一次。
|
||||||
* 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。
|
* 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。
|
||||||
* 这里的 mandatoryNumber 是强制必须切分的份数。
|
* 这里的 mandatoryNumber 是强制必须切分的份数。
|
||||||
*/
|
*/
|
||||||
LOG.info("split begin...");
|
LOG.info("split begin...");
|
||||||
List<Configuration> configurationList = new ArrayList<Configuration>();
|
final List<Configuration> configurationList = new ArrayList<Configuration>();
|
||||||
for (int i = 0; i < mandatoryNumber; i++) {
|
for (int i = 0; i < mandatoryNumber; i++) {
|
||||||
configurationList.add(this.jobConfig.clone());
|
configurationList.add(this.jobConfig.clone());
|
||||||
}
|
}
|
||||||
LOG.info("split end...");
|
LOG.info("split end...");
|
||||||
return configurationList;
|
return configurationList;
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@ -127,7 +130,7 @@ public class GdbWriter extends Writer {
|
|||||||
public static class Task extends Writer.Task {
|
public static class Task extends Writer.Task {
|
||||||
|
|
||||||
private Configuration taskConfig;
|
private Configuration taskConfig;
|
||||||
|
|
||||||
private int failed = 0;
|
private int failed = 0;
|
||||||
private int batchRecords;
|
private int batchRecords;
|
||||||
private ExecutorService submitService = null;
|
private ExecutorService submitService = null;
|
||||||
@ -139,24 +142,24 @@ public class GdbWriter extends Writer {
|
|||||||
* 注意:此方法每个 Task 都会执行一次。
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
* 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startWrite()做准备。
|
* 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startWrite()做准备。
|
||||||
*/
|
*/
|
||||||
this.taskConfig = super.getPluginJobConf();
|
this.taskConfig = super.getPluginJobConf();
|
||||||
batchRecords = taskConfig.getInt(Key.MAX_RECORDS_IN_BATCH, GdbWriterConfig.DEFAULT_RECORD_NUM_IN_BATCH);
|
this.batchRecords = this.taskConfig.getInt(Key.MAX_RECORDS_IN_BATCH, GdbWriterConfig.DEFAULT_RECORD_NUM_IN_BATCH);
|
||||||
submitService = new ThreadPoolExecutor(1, 1, 0L,
|
this.submitService = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingDeque<>(),
|
||||||
TimeUnit.MILLISECONDS, new LinkedBlockingDeque<>(), new DefaultThreadFactory("submit-dsl"));
|
new DefaultThreadFactory("submit-dsl"));
|
||||||
|
|
||||||
if (!session) {
|
if (!session) {
|
||||||
graph = globalGraph;
|
this.graph = globalGraph;
|
||||||
} else {
|
} else {
|
||||||
/**
|
/**
|
||||||
* 分批创建session client,由于服务端groovy编译性能的限制
|
* 分批创建session client,由于服务端groovy编译性能的限制
|
||||||
*/
|
*/
|
||||||
try {
|
try {
|
||||||
Thread.sleep((getTaskId()/10)*10000);
|
Thread.sleep((getTaskId() / 10) * 10000);
|
||||||
} catch (Exception e) {
|
} catch (final Exception e) {
|
||||||
// ...
|
// ...
|
||||||
}
|
}
|
||||||
graph = GdbGraphManager.instance().getGraph(taskConfig, session);
|
this.graph = GdbGraphManager.instance().getGraph(this.taskConfig, session);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@ -165,64 +168,69 @@ public class GdbWriter extends Writer {
|
|||||||
* 注意:此方法每个 Task 都会执行一次。
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
* 最佳实践:如果 Task 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
* 最佳实践:如果 Task 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。
|
||||||
*/
|
*/
|
||||||
super.prepare();
|
super.prepare();
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public void startWrite(RecordReceiver recordReceiver) {
|
public void startWrite(final RecordReceiver recordReceiver) {
|
||||||
/**
|
/**
|
||||||
* 注意:此方法每个 Task 都会执行一次。
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
* 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。
|
* 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。
|
||||||
*/
|
*/
|
||||||
Record r;
|
Record r;
|
||||||
Future<Boolean> future = null;
|
Future<Boolean> future = null;
|
||||||
List<Tuple2<Record, GdbElement>> records = new ArrayList<>(batchRecords);
|
List<Tuple2<Record, GdbElement>> records = new ArrayList<>(this.batchRecords);
|
||||||
|
|
||||||
while ((r = recordReceiver.getFromReader()) != null) {
|
while ((r = recordReceiver.getFromReader()) != null) {
|
||||||
records.add(new Tuple2<>(r, mapper.apply(r)));
|
try {
|
||||||
|
records.add(new Tuple2<>(r, mapper.apply(r)));
|
||||||
|
} catch (final Exception ex) {
|
||||||
|
getTaskPluginCollector().collectDirtyRecord(r, ex);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if (records.size() >= batchRecords) {
|
if (records.size() >= this.batchRecords) {
|
||||||
wait4Submit(future);
|
wait4Submit(future);
|
||||||
|
|
||||||
final List<Tuple2<Record, GdbElement>> batch = records;
|
final List<Tuple2<Record, GdbElement>> batch = records;
|
||||||
future = submitService.submit(() -> batchCommitRecords(batch));
|
future = this.submitService.submit(() -> batchCommitRecords(batch));
|
||||||
records = new ArrayList<>(batchRecords);
|
records = new ArrayList<>(this.batchRecords);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
wait4Submit(future);
|
wait4Submit(future);
|
||||||
if (!records.isEmpty()) {
|
if (!records.isEmpty()) {
|
||||||
final List<Tuple2<Record, GdbElement>> batch = records;
|
final List<Tuple2<Record, GdbElement>> batch = records;
|
||||||
future = submitService.submit(() -> batchCommitRecords(batch));
|
future = this.submitService.submit(() -> batchCommitRecords(batch));
|
||||||
wait4Submit(future);
|
wait4Submit(future);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private void wait4Submit(Future<Boolean> future) {
|
private void wait4Submit(final Future<Boolean> future) {
|
||||||
if (future == null) {
|
if (future == null) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
future.get();
|
future.get();
|
||||||
} catch (Exception e) {
|
} catch (final Exception e) {
|
||||||
e.printStackTrace();
|
e.printStackTrace();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private boolean batchCommitRecords(final List<Tuple2<Record, GdbElement>> records) {
|
private boolean batchCommitRecords(final List<Tuple2<Record, GdbElement>> records) {
|
||||||
TaskPluginCollector collector = getTaskPluginCollector();
|
final TaskPluginCollector collector = getTaskPluginCollector();
|
||||||
try {
|
try {
|
||||||
List<Tuple2<Record, Exception>> errors = graph.add(records);
|
final List<Tuple2<Record, Exception>> errors = this.graph.add(records);
|
||||||
errors.forEach(t -> collector.collectDirtyRecord(t.getFirst(), t.getSecond()));
|
errors.forEach(t -> collector.collectDirtyRecord(t.getFirst(), t.getSecond()));
|
||||||
failed += errors.size();
|
this.failed += errors.size();
|
||||||
} catch (Exception e) {
|
} catch (final Exception e) {
|
||||||
records.forEach(t -> collector.collectDirtyRecord(t.getFirst(), e));
|
records.forEach(t -> collector.collectDirtyRecord(t.getFirst(), e));
|
||||||
failed += records.size();
|
this.failed += records.size();
|
||||||
}
|
}
|
||||||
|
|
||||||
records.clear();
|
records.clear();
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@ -231,7 +239,7 @@ public class GdbWriter extends Writer {
|
|||||||
* 注意:此方法每个 Task 都会执行一次。
|
* 注意:此方法每个 Task 都会执行一次。
|
||||||
* 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。
|
* 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。
|
||||||
*/
|
*/
|
||||||
log.info("Task done, dirty record count - {}", failed);
|
log.info("Task done, dirty record count - {}", this.failed);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@ -241,9 +249,9 @@ public class GdbWriter extends Writer {
|
|||||||
* 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。
|
* 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。
|
||||||
*/
|
*/
|
||||||
if (session) {
|
if (session) {
|
||||||
graph.close();
|
this.graph.close();
|
||||||
}
|
}
|
||||||
submitService.shutdown();
|
this.submitService.shutdown();
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@ -27,7 +27,6 @@ public enum GdbWriterErrorCode implements ErrorCode {
|
|||||||
|
|
||||||
@Override
|
@Override
|
||||||
public String toString() {
|
public String toString() {
|
||||||
return String.format("Code:[%s], Description:[%s]. ", this.code,
|
return String.format("Code:[%s], Description:[%s]. ", this.code, this.description);
|
||||||
this.description);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
@ -6,136 +6,164 @@ public final class Key {
|
|||||||
* 此处声明插件用到的需要插件使用者提供的配置项
|
* 此处声明插件用到的需要插件使用者提供的配置项
|
||||||
*/
|
*/
|
||||||
|
|
||||||
public final static String HOST = "host";
|
public final static String HOST = "host";
|
||||||
public final static String PORT = "port";
|
public final static String PORT = "port";
|
||||||
public final static String USERNAME = "username";
|
public final static String USERNAME = "username";
|
||||||
public static final String PASSWORD = "password";
|
public static final String PASSWORD = "password";
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* import type and mode
|
* import type and mode
|
||||||
*/
|
*/
|
||||||
public static final String IMPORT_TYPE = "labelType";
|
public static final String IMPORT_TYPE = "labelType";
|
||||||
public static final String UPDATE_MODE = "writeMode";
|
public static final String UPDATE_MODE = "writeMode";
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* label prefix issue
|
* label prefix issue
|
||||||
*/
|
*/
|
||||||
public static final String ID_TRANS_RULE = "idTransRule";
|
public static final String ID_TRANS_RULE = "idTransRule";
|
||||||
public static final String SRC_ID_TRANS_RULE = "srcIdTransRule";
|
public static final String SRC_ID_TRANS_RULE = "srcIdTransRule";
|
||||||
public static final String DST_ID_TRANS_RULE = "dstIdTransRule";
|
public static final String DST_ID_TRANS_RULE = "dstIdTransRule";
|
||||||
|
|
||||||
public static final String LABEL = "label";
|
public static final String LABEL = "label";
|
||||||
public static final String SRC_LABEL = "srcLabel";
|
public static final String SRC_LABEL = "srcLabel";
|
||||||
public static final String DST_LABEL = "dstLabel";
|
public static final String DST_LABEL = "dstLabel";
|
||||||
|
|
||||||
public static final String MAPPING = "mapping";
|
public static final String MAPPING = "mapping";
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* column define in Gdb
|
* column define in Gdb
|
||||||
*/
|
*/
|
||||||
public static final String COLUMN = "column";
|
public static final String COLUMN = "column";
|
||||||
public static final String COLUMN_NAME = "name";
|
public static final String COLUMN_NAME = "name";
|
||||||
public static final String COLUMN_VALUE = "value";
|
public static final String COLUMN_VALUE = "value";
|
||||||
public static final String COLUMN_TYPE = "type";
|
public static final String COLUMN_TYPE = "type";
|
||||||
public static final String COLUMN_NODE_TYPE = "columnType";
|
public static final String COLUMN_NODE_TYPE = "columnType";
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Gdb Vertex/Edge elements
|
* Gdb Vertex/Edge elements
|
||||||
*/
|
*/
|
||||||
public static final String ID = "id";
|
public static final String ID = "id";
|
||||||
public static final String FROM = "from";
|
public static final String FROM = "from";
|
||||||
public static final String TO = "to";
|
public static final String TO = "to";
|
||||||
public static final String PROPERTIES = "properties";
|
public static final String PROPERTIES = "properties";
|
||||||
public static final String PROP_KEY = "name";
|
public static final String PROP_KEY = "name";
|
||||||
public static final String PROP_VALUE = "value";
|
public static final String PROP_VALUE = "value";
|
||||||
public static final String PROP_TYPE = "type";
|
public static final String PROP_TYPE = "type";
|
||||||
|
|
||||||
public static final String PROPERTIES_JSON_STR = "propertiesJsonStr";
|
public static final String PROPERTIES_JSON_STR = "propertiesJsonStr";
|
||||||
public static final String MAX_PROPERTIES_BATCH_NUM = "maxPropertiesBatchNumber";
|
public static final String MAX_PROPERTIES_BATCH_NUM = "maxPropertiesBatchNumber";
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* session less client configure for connect pool
|
* session less client configure for connect pool
|
||||||
*/
|
*/
|
||||||
public static final String MAX_IN_PROCESS_PER_CONNECTION = "maxInProcessPerConnection";
|
public static final String MAX_IN_PROCESS_PER_CONNECTION = "maxInProcessPerConnection";
|
||||||
public static final String MAX_CONNECTION_POOL_SIZE = "maxConnectionPoolSize";
|
public static final String MAX_CONNECTION_POOL_SIZE = "maxConnectionPoolSize";
|
||||||
public static final String MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = "maxSimultaneousUsagePerConnection";
|
public static final String MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = "maxSimultaneousUsagePerConnection";
|
||||||
|
|
||||||
public static final String MAX_RECORDS_IN_BATCH = "maxRecordsInBatch";
|
public static final String MAX_RECORDS_IN_BATCH = "maxRecordsInBatch";
|
||||||
public static final String SESSION_STATE = "session";
|
public static final String SESSION_STATE = "session";
|
||||||
|
|
||||||
public static enum ImportType {
|
/**
|
||||||
/**
|
* request length limit, include gdb element string length GDB字段长度限制配置,可分别配置各字段的限制,超过限制的记录会当脏数据处理
|
||||||
* Import vertices
|
*/
|
||||||
*/
|
public static final String MAX_GDB_STRING_LENGTH = "maxStringLengthLimit";
|
||||||
VERTEX,
|
public static final String MAX_GDB_ID_LENGTH = "maxIdStringLengthLimit";
|
||||||
/**
|
public static final String MAX_GDB_LABEL_LENGTH = "maxLabelStringLengthLimit";
|
||||||
* Import edges
|
public static final String MAX_GDB_PROP_KEY_LENGTH = "maxPropKeyStringLengthLimit";
|
||||||
*/
|
public static final String MAX_GDB_PROP_VALUE_LENGTH = "maxPropValueStringLengthLimit";
|
||||||
EDGE;
|
|
||||||
}
|
|
||||||
|
|
||||||
public static enum UpdateMode {
|
|
||||||
/**
|
|
||||||
* Insert new records, fail if exists
|
|
||||||
*/
|
|
||||||
INSERT,
|
|
||||||
/**
|
|
||||||
* Skip this record if exists
|
|
||||||
*/
|
|
||||||
SKIP,
|
|
||||||
/**
|
|
||||||
* Update property of this record if exists
|
|
||||||
*/
|
|
||||||
MERGE;
|
|
||||||
}
|
|
||||||
|
|
||||||
public static enum ColumnType {
|
public static final String MAX_GDB_REQUEST_LENGTH = "maxRequestLengthLimit";
|
||||||
/**
|
|
||||||
* vertex or edge id
|
|
||||||
*/
|
|
||||||
primaryKey,
|
|
||||||
|
|
||||||
/**
|
public static enum ImportType {
|
||||||
* vertex property
|
/**
|
||||||
*/
|
* Import vertices
|
||||||
vertexProperty,
|
*/
|
||||||
|
VERTEX,
|
||||||
|
/**
|
||||||
|
* Import edges
|
||||||
|
*/
|
||||||
|
EDGE;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
public static enum UpdateMode {
|
||||||
* start vertex id of edge
|
/**
|
||||||
*/
|
* Insert new records, fail if exists
|
||||||
srcPrimaryKey,
|
*/
|
||||||
|
INSERT,
|
||||||
|
/**
|
||||||
|
* Skip this record if exists
|
||||||
|
*/
|
||||||
|
SKIP,
|
||||||
|
/**
|
||||||
|
* Update property of this record if exists
|
||||||
|
*/
|
||||||
|
MERGE;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
public static enum ColumnType {
|
||||||
* end vertex id of edge
|
/**
|
||||||
*/
|
* vertex or edge id
|
||||||
dstPrimaryKey,
|
*/
|
||||||
|
primaryKey,
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* edge property
|
* vertex property
|
||||||
*/
|
*/
|
||||||
edgeProperty,
|
vertexProperty,
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* vertex json style property
|
* vertex setProperty
|
||||||
*/
|
*/
|
||||||
vertexJsonProperty,
|
vertexSetProperty,
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* edge json style property
|
* start vertex id of edge
|
||||||
*/
|
*/
|
||||||
edgeJsonProperty
|
srcPrimaryKey,
|
||||||
}
|
|
||||||
|
|
||||||
public static enum IdTransRule {
|
/**
|
||||||
/**
|
* end vertex id of edge
|
||||||
* vertex or edge id with 'label' prefix
|
*/
|
||||||
*/
|
dstPrimaryKey,
|
||||||
labelPrefix,
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* vertex or edge id raw
|
* edge property
|
||||||
*/
|
*/
|
||||||
none
|
edgeProperty,
|
||||||
}
|
|
||||||
|
/**
|
||||||
|
* vertex json style property
|
||||||
|
*/
|
||||||
|
vertexJsonProperty,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* edge json style property
|
||||||
|
*/
|
||||||
|
edgeJsonProperty
|
||||||
|
}
|
||||||
|
|
||||||
|
public static enum IdTransRule {
|
||||||
|
/**
|
||||||
|
* vertex or edge id with 'label' prefix
|
||||||
|
*/
|
||||||
|
labelPrefix,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* vertex or edge id raw
|
||||||
|
*/
|
||||||
|
none
|
||||||
|
}
|
||||||
|
|
||||||
|
public static enum PropertyType {
|
||||||
|
/**
|
||||||
|
* single Vertex Property
|
||||||
|
*/
|
||||||
|
single,
|
||||||
|
|
||||||
|
/**
|
||||||
|
* set Vertex Property
|
||||||
|
*/
|
||||||
|
set
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@ -3,37 +3,37 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.client;
|
package com.alibaba.datax.plugin.writer.gdbwriter.client;
|
||||||
|
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.ScriptGdbGraph;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.ScriptGdbGraph;
|
||||||
|
|
||||||
import java.util.ArrayList;
|
|
||||||
import java.util.List;
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
public class GdbGraphManager implements AutoCloseable {
|
public class GdbGraphManager implements AutoCloseable {
|
||||||
private static final GdbGraphManager instance = new GdbGraphManager();
|
private static final GdbGraphManager INSTANCE = new GdbGraphManager();
|
||||||
|
|
||||||
private List<GdbGraph> graphs = new ArrayList<>();
|
|
||||||
|
|
||||||
public static GdbGraphManager instance() {
|
|
||||||
return instance;
|
|
||||||
}
|
|
||||||
|
|
||||||
public GdbGraph getGraph(Configuration config, boolean session) {
|
private List<GdbGraph> graphs = new ArrayList<>();
|
||||||
GdbGraph graph = new ScriptGdbGraph(config, session);
|
|
||||||
graphs.add(graph);
|
|
||||||
return graph;
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
public static GdbGraphManager instance() {
|
||||||
public void close() {
|
return INSTANCE;
|
||||||
for(GdbGraph graph : graphs) {
|
}
|
||||||
graph.close();
|
|
||||||
}
|
public GdbGraph getGraph(final Configuration config, final boolean session) {
|
||||||
graphs.clear();
|
final GdbGraph graph = new ScriptGdbGraph(config, session);
|
||||||
}
|
this.graphs.add(graph);
|
||||||
|
return graph;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void close() {
|
||||||
|
for (final GdbGraph graph : this.graphs) {
|
||||||
|
graph.close();
|
||||||
|
}
|
||||||
|
this.graphs.clear();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,39 +3,43 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.client;
|
package com.alibaba.datax.plugin.writer.gdbwriter.client;
|
||||||
|
|
||||||
|
import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertConfig;
|
||||||
|
import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertHasContent;
|
||||||
|
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
|
|
||||||
import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.*;
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
public class GdbWriterConfig {
|
public class GdbWriterConfig {
|
||||||
public static final int DEFAULT_MAX_IN_PROCESS_PER_CONNECTION = 4;
|
public static final int DEFAULT_MAX_IN_PROCESS_PER_CONNECTION = 4;
|
||||||
public static final int DEFAULT_MAX_CONNECTION_POOL_SIZE = 8;
|
public static final int DEFAULT_MAX_CONNECTION_POOL_SIZE = 8;
|
||||||
public static final int DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = 8;
|
public static final int DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = 8;
|
||||||
public static final int DEFAULT_BATCH_PROPERTY_NUM = 30;
|
public static final int DEFAULT_BATCH_PROPERTY_NUM = 30;
|
||||||
public static final int DEFAULT_RECORD_NUM_IN_BATCH = 16;
|
public static final int DEFAULT_RECORD_NUM_IN_BATCH = 16;
|
||||||
|
|
||||||
private Configuration config;
|
public static final int MAX_STRING_LENGTH = 10240;
|
||||||
|
public static final int MAX_REQUEST_LENGTH = 65535 - 1000;
|
||||||
|
|
||||||
private GdbWriterConfig(Configuration config) {
|
private Configuration config;
|
||||||
this.config = config;
|
|
||||||
|
|
||||||
validate();
|
private GdbWriterConfig(final Configuration config) {
|
||||||
}
|
this.config = config;
|
||||||
|
|
||||||
private void validate() {
|
validate();
|
||||||
assertHasContent(config, Key.HOST);
|
}
|
||||||
assertConfig(Key.PORT, () -> config.getInt(Key.PORT) > 0);
|
|
||||||
|
|
||||||
assertHasContent(config, Key.USERNAME);
|
public static GdbWriterConfig of(final Configuration config) {
|
||||||
assertHasContent(config, Key.PASSWORD);
|
return new GdbWriterConfig(config);
|
||||||
}
|
}
|
||||||
|
|
||||||
public static GdbWriterConfig of(Configuration config) {
|
private void validate() {
|
||||||
return new GdbWriterConfig(config);
|
assertHasContent(this.config, Key.HOST);
|
||||||
}
|
assertConfig(Key.PORT, () -> this.config.getInt(Key.PORT) > 0);
|
||||||
|
|
||||||
|
assertHasContent(this.config, Key.USERNAME);
|
||||||
|
assertHasContent(this.config, Key.PASSWORD);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,6 +3,8 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.mapping;
|
package com.alibaba.datax.plugin.writer.gdbwriter.mapping;
|
||||||
|
|
||||||
|
import static com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType.VERTEX;
|
||||||
|
|
||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.UUID;
|
import java.util.UUID;
|
||||||
@ -12,179 +14,191 @@ import java.util.regex.Matcher;
|
|||||||
import java.util.regex.Pattern;
|
import java.util.regex.Pattern;
|
||||||
|
|
||||||
import com.alibaba.datax.common.element.Record;
|
import com.alibaba.datax.common.element.Record;
|
||||||
import com.alibaba.fastjson.JSONArray;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.alibaba.fastjson.JSONObject;
|
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex;
|
import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex;
|
||||||
|
import com.alibaba.fastjson.JSONArray;
|
||||||
|
import com.alibaba.fastjson.JSONObject;
|
||||||
|
|
||||||
import lombok.extern.slf4j.Slf4j;
|
import lombok.extern.slf4j.Slf4j;
|
||||||
|
|
||||||
import static com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType.VERTEX;
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
@Slf4j
|
@Slf4j
|
||||||
public class DefaultGdbMapper implements GdbMapper {
|
public class DefaultGdbMapper implements GdbMapper {
|
||||||
private static final Pattern STR_PATTERN = Pattern.compile("\\$\\{(\\d+)}");
|
private static final Pattern STR_DOLLAR_PATTERN = Pattern.compile("\\$\\{(\\d+)}");
|
||||||
private static final Pattern NORMAL_PATTERN = Pattern.compile("^\\$\\{(\\d+)}$");
|
private static final Pattern NORMAL_DOLLAR_PATTERN = Pattern.compile("^\\$\\{(\\d+)}$");
|
||||||
|
|
||||||
@Override
|
private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}");
|
||||||
public Function<Record, GdbElement> getMapper(MappingRule rule) {
|
private static final Pattern NORMAL_NUM_PATTERN = Pattern.compile("^#\\{(\\d+)}$");
|
||||||
return r -> {
|
|
||||||
GdbElement e = (rule.getImportType() == VERTEX) ? new GdbVertex() : new GdbEdge();
|
public DefaultGdbMapper() {}
|
||||||
forElement(rule).accept(r, e);
|
|
||||||
return e;
|
public DefaultGdbMapper(final Configuration config) {
|
||||||
|
MapperConfig.getInstance().updateConfig(config);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static BiConsumer<Record, GdbElement> forElement(final MappingRule rule) {
|
||||||
|
final boolean numPattern = rule.isNumPattern();
|
||||||
|
final List<BiConsumer<Record, GdbElement>> properties = new ArrayList<>();
|
||||||
|
for (final MappingRule.PropertyMappingRule propRule : rule.getProperties()) {
|
||||||
|
final Function<Record, String> keyFunc = forStrColumn(numPattern, propRule.getKey());
|
||||||
|
|
||||||
|
if (propRule.getValueType() == ValueType.STRING) {
|
||||||
|
final Function<Record, String> valueFunc = forStrColumn(numPattern, propRule.getValue());
|
||||||
|
properties.add((r, e) -> {
|
||||||
|
e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType());
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
final Function<Record, Object> valueFunc =
|
||||||
|
forObjColumn(numPattern, propRule.getValue(), propRule.getValueType());
|
||||||
|
properties.add((r, e) -> {
|
||||||
|
e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType());
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (rule.getPropertiesJsonStr() != null) {
|
||||||
|
final Function<Record, String> jsonFunc = forStrColumn(numPattern, rule.getPropertiesJsonStr());
|
||||||
|
properties.add((r, e) -> {
|
||||||
|
final String propertiesStr = jsonFunc.apply(r);
|
||||||
|
final JSONObject root = (JSONObject)JSONObject.parse(propertiesStr);
|
||||||
|
final JSONArray propertiesList = root.getJSONArray("properties");
|
||||||
|
|
||||||
|
for (final Object object : propertiesList) {
|
||||||
|
final JSONObject jsonObject = (JSONObject)object;
|
||||||
|
final String key = jsonObject.getString("k");
|
||||||
|
final String name = jsonObject.getString("v");
|
||||||
|
final String type = jsonObject.getString("t");
|
||||||
|
final String card = jsonObject.getString("c");
|
||||||
|
|
||||||
|
if (key == null || name == null) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
addToProperties(e, key, name, type, card);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
final BiConsumer<Record, GdbElement> ret = (r, e) -> {
|
||||||
|
final String label = forStrColumn(numPattern, rule.getLabel()).apply(r);
|
||||||
|
String id = forStrColumn(numPattern, rule.getId()).apply(r);
|
||||||
|
|
||||||
|
if (rule.getImportType() == Key.ImportType.EDGE) {
|
||||||
|
final String to = forStrColumn(numPattern, rule.getTo()).apply(r);
|
||||||
|
final String from = forStrColumn(numPattern, rule.getFrom()).apply(r);
|
||||||
|
if (to == null || from == null) {
|
||||||
|
log.error("invalid record to: {} , from: {}", to, from);
|
||||||
|
throw new IllegalArgumentException("to or from missed in edge");
|
||||||
|
}
|
||||||
|
((GdbEdge)e).setTo(to);
|
||||||
|
((GdbEdge)e).setFrom(from);
|
||||||
|
|
||||||
|
// generate UUID for edge
|
||||||
|
if (id == null) {
|
||||||
|
id = UUID.randomUUID().toString();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (id == null || label == null) {
|
||||||
|
log.error("invalid record id: {} , label: {}", id, label);
|
||||||
|
throw new IllegalArgumentException("id or label missed");
|
||||||
|
}
|
||||||
|
|
||||||
|
e.setId(id);
|
||||||
|
e.setLabel(label);
|
||||||
|
|
||||||
|
properties.forEach(p -> p.accept(r, e));
|
||||||
};
|
};
|
||||||
}
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
private static BiConsumer<Record, GdbElement> forElement(MappingRule rule) {
|
private static Function<Record, Object> forObjColumn(final boolean numPattern, final String rule, final ValueType type) {
|
||||||
List<BiConsumer<Record, GdbElement>> properties = new ArrayList<>();
|
final Pattern pattern = numPattern ? NORMAL_NUM_PATTERN : NORMAL_DOLLAR_PATTERN;
|
||||||
for (MappingRule.PropertyMappingRule propRule : rule.getProperties()) {
|
final Matcher m = pattern.matcher(rule);
|
||||||
Function<Record, String> keyFunc = forStrColumn(propRule.getKey());
|
if (m.matches()) {
|
||||||
|
final int index = Integer.valueOf(m.group(1));
|
||||||
|
return r -> type.applyColumn(r.getColumn(index));
|
||||||
|
} else {
|
||||||
|
return r -> type.fromStrFunc(rule);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (propRule.getValueType() == ValueType.STRING) {
|
private static Function<Record, String> forStrColumn(final boolean numPattern, final String rule) {
|
||||||
final Function<Record, String> valueFunc = forStrColumn(propRule.getValue());
|
final List<BiConsumer<StringBuilder, Record>> list = new ArrayList<>();
|
||||||
properties.add((r, e) -> {
|
final Pattern pattern = numPattern ? STR_NUM_PATTERN : STR_DOLLAR_PATTERN;
|
||||||
String k = keyFunc.apply(r);
|
final Matcher m = pattern.matcher(rule);
|
||||||
String v = valueFunc.apply(r);
|
int last = 0;
|
||||||
if (k != null && v != null) {
|
while (m.find()) {
|
||||||
e.getProperties().put(k, v);
|
final String index = m.group(1);
|
||||||
}
|
// as simple integer index.
|
||||||
});
|
final int i = Integer.parseInt(index);
|
||||||
} else {
|
|
||||||
final Function<Record, Object> valueFunc = forObjColumn(propRule.getValue(), propRule.getValueType());
|
|
||||||
properties.add((r, e) -> {
|
|
||||||
String k = keyFunc.apply(r);
|
|
||||||
Object v = valueFunc.apply(r);
|
|
||||||
if (k != null && v != null) {
|
|
||||||
e.getProperties().put(k, v);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (rule.getPropertiesJsonStr() != null) {
|
final int tmp = last;
|
||||||
Function<Record, String> jsonFunc = forStrColumn(rule.getPropertiesJsonStr());
|
final int start = m.start();
|
||||||
properties.add((r, e) -> {
|
list.add((sb, record) -> {
|
||||||
String propertiesStr = jsonFunc.apply(r);
|
sb.append(rule.subSequence(tmp, start));
|
||||||
JSONObject root = (JSONObject)JSONObject.parse(propertiesStr);
|
if (record.getColumn(i) != null && record.getColumn(i).getByteSize() > 0) {
|
||||||
JSONArray propertiesList = root.getJSONArray("properties");
|
sb.append(record.getColumn(i).asString());
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
for (Object object : propertiesList) {
|
last = m.end();
|
||||||
JSONObject jsonObject = (JSONObject)object;
|
}
|
||||||
String key = jsonObject.getString("k");
|
|
||||||
String name = jsonObject.getString("v");
|
|
||||||
String type = jsonObject.getString("t");
|
|
||||||
|
|
||||||
if (key == null || name == null) {
|
final int tmp = last;
|
||||||
continue;
|
list.add((sb, record) -> {
|
||||||
}
|
sb.append(rule.subSequence(tmp, rule.length()));
|
||||||
addToProperties(e, key, name, type);
|
});
|
||||||
}
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
BiConsumer<Record, GdbElement> ret = (r, e) -> {
|
return r -> {
|
||||||
String label = forStrColumn(rule.getLabel()).apply(r);
|
final StringBuilder sb = new StringBuilder();
|
||||||
String id = forStrColumn(rule.getId()).apply(r);
|
list.forEach(c -> c.accept(sb, r));
|
||||||
|
final String res = sb.toString();
|
||||||
|
return res.isEmpty() ? null : res;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
if (rule.getImportType() == Key.ImportType.EDGE) {
|
private static boolean addToProperties(final GdbElement e, final String key, final String value, final String type, final String card) {
|
||||||
String to = forStrColumn(rule.getTo()).apply(r);
|
final Object pValue;
|
||||||
String from = forStrColumn(rule.getFrom()).apply(r);
|
final ValueType valueType = ValueType.fromShortName(type);
|
||||||
if (to == null || from == null) {
|
|
||||||
log.error("invalid record to: {} , from: {}", to, from);
|
|
||||||
throw new IllegalArgumentException("to or from missed in edge");
|
|
||||||
}
|
|
||||||
((GdbEdge)e).setTo(to);
|
|
||||||
((GdbEdge)e).setFrom(from);
|
|
||||||
|
|
||||||
// generate UUID for edge
|
if (valueType == ValueType.STRING) {
|
||||||
if (id == null) {
|
pValue = value;
|
||||||
id = UUID.randomUUID().toString();
|
} else if (valueType == ValueType.INT || valueType == ValueType.INTEGER) {
|
||||||
}
|
pValue = Integer.valueOf(value);
|
||||||
}
|
} else if (valueType == ValueType.LONG) {
|
||||||
|
pValue = Long.valueOf(value);
|
||||||
|
} else if (valueType == ValueType.DOUBLE) {
|
||||||
|
pValue = Double.valueOf(value);
|
||||||
|
} else if (valueType == ValueType.FLOAT) {
|
||||||
|
pValue = Float.valueOf(value);
|
||||||
|
} else if (valueType == ValueType.BOOLEAN) {
|
||||||
|
pValue = Boolean.valueOf(value);
|
||||||
|
} else {
|
||||||
|
log.error("invalid property key {}, value {}, type {}", key, value, type);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
if (id == null || label == null) {
|
// apply vertexSetProperty
|
||||||
log.error("invalid record id: {} , label: {}", id, label);
|
if (Key.PropertyType.set.name().equals(card) && (e instanceof GdbVertex)) {
|
||||||
throw new IllegalArgumentException("id or label missed");
|
e.addProperty(key, pValue, Key.PropertyType.set);
|
||||||
}
|
} else {
|
||||||
|
e.addProperty(key, pValue);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
e.setId(id);
|
@Override
|
||||||
e.setLabel(label);
|
public Function<Record, GdbElement> getMapper(final MappingRule rule) {
|
||||||
|
return r -> {
|
||||||
properties.forEach(p -> p.accept(r, e));
|
final GdbElement e = (rule.getImportType() == VERTEX) ? new GdbVertex() : new GdbEdge();
|
||||||
};
|
forElement(rule).accept(r, e);
|
||||||
return ret;
|
return e;
|
||||||
}
|
};
|
||||||
|
}
|
||||||
static Function<Record, Object> forObjColumn(String rule, ValueType type) {
|
|
||||||
Matcher m = NORMAL_PATTERN.matcher(rule);
|
|
||||||
if (m.matches()) {
|
|
||||||
int index = Integer.valueOf(m.group(1));
|
|
||||||
return r -> type.applyColumn(r.getColumn(index));
|
|
||||||
} else {
|
|
||||||
return r -> type.fromStrFunc(rule);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
static Function<Record, String> forStrColumn(String rule) {
|
|
||||||
List<BiConsumer<StringBuilder, Record>> list = new ArrayList<>();
|
|
||||||
Matcher m = STR_PATTERN.matcher(rule);
|
|
||||||
int last = 0;
|
|
||||||
while (m.find()) {
|
|
||||||
String index = m.group(1);
|
|
||||||
// as simple integer index.
|
|
||||||
int i = Integer.parseInt(index);
|
|
||||||
|
|
||||||
final int tmp = last;
|
|
||||||
final int start = m.start();
|
|
||||||
list.add((sb, record) -> {
|
|
||||||
sb.append(rule.subSequence(tmp, start));
|
|
||||||
if(record.getColumn(i) != null && record.getColumn(i).getByteSize() > 0) {
|
|
||||||
sb.append(record.getColumn(i).asString());
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
last = m.end();
|
|
||||||
}
|
|
||||||
|
|
||||||
final int tmp = last;
|
|
||||||
list.add((sb, record) -> {
|
|
||||||
sb.append(rule.subSequence(tmp, rule.length()));
|
|
||||||
});
|
|
||||||
|
|
||||||
return r -> {
|
|
||||||
StringBuilder sb = new StringBuilder();
|
|
||||||
list.forEach(c -> c.accept(sb, r));
|
|
||||||
String res = sb.toString();
|
|
||||||
return res.isEmpty() ? null : res;
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
static boolean addToProperties(GdbElement e, String key, String value, String type) {
|
|
||||||
ValueType valueType = ValueType.fromShortName(type);
|
|
||||||
|
|
||||||
if(valueType == ValueType.STRING) {
|
|
||||||
e.getProperties().put(key, value);
|
|
||||||
} else if (valueType == ValueType.INT) {
|
|
||||||
e.getProperties().put(key, Integer.valueOf(value));
|
|
||||||
} else if (valueType == ValueType.LONG) {
|
|
||||||
e.getProperties().put(key, Long.valueOf(value));
|
|
||||||
} else if (valueType == ValueType.DOUBLE) {
|
|
||||||
e.getProperties().put(key, Double.valueOf(value));
|
|
||||||
} else if (valueType == ValueType.FLOAT) {
|
|
||||||
e.getProperties().put(key, Float.valueOf(value));
|
|
||||||
} else if (valueType == ValueType.BOOLEAN) {
|
|
||||||
e.getProperties().put(key, Boolean.valueOf(value));
|
|
||||||
} else {
|
|
||||||
log.error("invalid property key {}, value {}, type {}", key, value, type);
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
@ -13,5 +13,5 @@ import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement;
|
|||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
public interface GdbMapper {
|
public interface GdbMapper {
|
||||||
Function<Record, GdbElement> getMapper(MappingRule rule);
|
Function<Record, GdbElement> getMapper(MappingRule rule);
|
||||||
}
|
}
|
||||||
|
@ -0,0 +1,68 @@
|
|||||||
|
/*
|
||||||
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public
|
||||||
|
* License version 2 as published by the Free Software Foundation.
|
||||||
|
*/
|
||||||
|
package com.alibaba.datax.plugin.writer.gdbwriter.mapping;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author : Liu Jianping
|
||||||
|
* @date : 2019/10/15
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class MapperConfig {
|
||||||
|
private static MapperConfig instance = new MapperConfig();
|
||||||
|
private int maxIdLength;
|
||||||
|
private int maxLabelLength;
|
||||||
|
private int maxPropKeyLength;
|
||||||
|
private int maxPropValueLength;
|
||||||
|
|
||||||
|
private MapperConfig() {
|
||||||
|
this.maxIdLength = GdbWriterConfig.MAX_STRING_LENGTH;
|
||||||
|
this.maxLabelLength = GdbWriterConfig.MAX_STRING_LENGTH;
|
||||||
|
this.maxPropKeyLength = GdbWriterConfig.MAX_STRING_LENGTH;
|
||||||
|
this.maxPropValueLength = GdbWriterConfig.MAX_STRING_LENGTH;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static MapperConfig getInstance() {
|
||||||
|
return instance;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void updateConfig(final Configuration config) {
|
||||||
|
final int length = config.getInt(Key.MAX_GDB_STRING_LENGTH, GdbWriterConfig.MAX_STRING_LENGTH);
|
||||||
|
|
||||||
|
Integer sLength = config.getInt(Key.MAX_GDB_ID_LENGTH);
|
||||||
|
this.maxIdLength = sLength == null ? length : sLength;
|
||||||
|
|
||||||
|
sLength = config.getInt(Key.MAX_GDB_LABEL_LENGTH);
|
||||||
|
this.maxLabelLength = sLength == null ? length : sLength;
|
||||||
|
|
||||||
|
sLength = config.getInt(Key.MAX_GDB_PROP_KEY_LENGTH);
|
||||||
|
this.maxPropKeyLength = sLength == null ? length : sLength;
|
||||||
|
|
||||||
|
sLength = config.getInt(Key.MAX_GDB_PROP_VALUE_LENGTH);
|
||||||
|
this.maxPropValueLength = sLength == null ? length : sLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getMaxIdLength() {
|
||||||
|
return this.maxIdLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getMaxLabelLength() {
|
||||||
|
return this.maxLabelLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getMaxPropKeyLength() {
|
||||||
|
return this.maxPropKeyLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getMaxPropValueLength() {
|
||||||
|
return this.maxPropValueLength;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -7,6 +7,7 @@ import java.util.ArrayList;
|
|||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType;
|
||||||
|
|
||||||
import lombok.Data;
|
import lombok.Data;
|
||||||
|
|
||||||
@ -16,26 +17,30 @@ import lombok.Data;
|
|||||||
*/
|
*/
|
||||||
@Data
|
@Data
|
||||||
public class MappingRule {
|
public class MappingRule {
|
||||||
private String id = null;
|
private String id = null;
|
||||||
|
|
||||||
private String label = null;
|
private String label = null;
|
||||||
|
|
||||||
private ImportType importType = null;
|
|
||||||
|
|
||||||
private String from = null;
|
|
||||||
|
|
||||||
private String to = null;
|
private ImportType importType = null;
|
||||||
|
|
||||||
private List<PropertyMappingRule> properties = new ArrayList<>();
|
private String from = null;
|
||||||
|
|
||||||
private String propertiesJsonStr = null;
|
private String to = null;
|
||||||
|
|
||||||
@Data
|
private List<PropertyMappingRule> properties = new ArrayList<>();
|
||||||
public static class PropertyMappingRule {
|
|
||||||
private String key = null;
|
private String propertiesJsonStr = null;
|
||||||
|
|
||||||
private String value = null;
|
private boolean numPattern = false;
|
||||||
|
|
||||||
private ValueType valueType = null;
|
@Data
|
||||||
}
|
public static class PropertyMappingRule {
|
||||||
|
private String key = null;
|
||||||
|
|
||||||
|
private String value = null;
|
||||||
|
|
||||||
|
private ValueType valueType = null;
|
||||||
|
|
||||||
|
private PropertyType pType = PropertyType.single;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,18 +3,21 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.mapping;
|
package com.alibaba.datax.plugin.writer.gdbwriter.mapping;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.regex.Matcher;
|
||||||
|
import java.util.regex.Pattern;
|
||||||
|
|
||||||
import com.alibaba.datax.common.exception.DataXException;
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode;
|
import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType;
|
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key.IdTransRule;
|
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ColumnType;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ColumnType;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.IdTransRule;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule.PropertyMappingRule;
|
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule.PropertyMappingRule;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper;
|
import com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper;
|
||||||
import lombok.extern.slf4j.Slf4j;
|
|
||||||
|
|
||||||
import java.util.List;
|
import lombok.extern.slf4j.Slf4j;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
@ -22,66 +25,94 @@ import java.util.List;
|
|||||||
*/
|
*/
|
||||||
@Slf4j
|
@Slf4j
|
||||||
public class MappingRuleFactory {
|
public class MappingRuleFactory {
|
||||||
private static final MappingRuleFactory instance = new MappingRuleFactory();
|
private static final MappingRuleFactory instance = new MappingRuleFactory();
|
||||||
|
private static final Pattern STR_PATTERN = Pattern.compile("\\$\\{(\\d+)}");
|
||||||
public static final MappingRuleFactory getInstance() {
|
private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}");
|
||||||
return instance;
|
|
||||||
}
|
|
||||||
|
|
||||||
@Deprecated
|
public static MappingRuleFactory getInstance() {
|
||||||
public MappingRule create(Configuration config, ImportType type) {
|
return instance;
|
||||||
MappingRule rule = new MappingRule();
|
|
||||||
rule.setId(config.getString(Key.ID));
|
|
||||||
rule.setLabel(config.getString(Key.LABEL));
|
|
||||||
if (type == ImportType.EDGE) {
|
|
||||||
rule.setFrom(config.getString(Key.FROM));
|
|
||||||
rule.setTo(config.getString(Key.TO));
|
|
||||||
}
|
|
||||||
|
|
||||||
rule.setImportType(type);
|
|
||||||
|
|
||||||
List<Configuration> configurations = config.getListConfiguration(Key.PROPERTIES);
|
|
||||||
if (configurations != null) {
|
|
||||||
for (Configuration prop : config.getListConfiguration(Key.PROPERTIES)) {
|
|
||||||
PropertyMappingRule propRule = new PropertyMappingRule();
|
|
||||||
propRule.setKey(prop.getString(Key.PROP_KEY));
|
|
||||||
propRule.setValue(prop.getString(Key.PROP_VALUE));
|
|
||||||
propRule.setValueType(ValueType.fromShortName(prop.getString(Key.PROP_TYPE).toLowerCase()));
|
|
||||||
rule.getProperties().add(propRule);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
String propertiesJsonStr = config.getString(Key.PROPERTIES_JSON_STR, null);
|
|
||||||
if (propertiesJsonStr != null) {
|
|
||||||
rule.setPropertiesJsonStr(propertiesJsonStr);
|
|
||||||
}
|
|
||||||
|
|
||||||
return rule;
|
|
||||||
}
|
|
||||||
|
|
||||||
public MappingRule createV2(Configuration config) {
|
|
||||||
try {
|
|
||||||
ImportType type = ImportType.valueOf(config.getString(Key.IMPORT_TYPE));
|
|
||||||
return createV2(config, type);
|
|
||||||
} catch (NullPointerException e) {
|
|
||||||
throw DataXException.asDataXException(GdbWriterErrorCode.CONFIG_ITEM_MISS, Key.IMPORT_TYPE);
|
|
||||||
} catch (IllegalArgumentException e) {
|
|
||||||
throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, Key.IMPORT_TYPE);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public MappingRule createV2(Configuration config, ImportType type) {
|
private static boolean isPattern(final String value, final MappingRule rule, final boolean checked) {
|
||||||
MappingRule rule = new MappingRule();
|
if (checked) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
ConfigHelper.assertHasContent(config, Key.LABEL);
|
if (value == null || value.isEmpty()) {
|
||||||
rule.setLabel(config.getString(Key.LABEL));
|
return false;
|
||||||
rule.setImportType(type);
|
}
|
||||||
|
|
||||||
IdTransRule srcTransRule = IdTransRule.none;
|
Matcher m = STR_PATTERN.matcher(value);
|
||||||
|
if (m.find()) {
|
||||||
|
rule.setNumPattern(false);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
m = STR_NUM_PATTERN.matcher(value);
|
||||||
|
if (m.find()) {
|
||||||
|
rule.setNumPattern(true);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Deprecated
|
||||||
|
public MappingRule create(final Configuration config, final ImportType type) {
|
||||||
|
final MappingRule rule = new MappingRule();
|
||||||
|
rule.setId(config.getString(Key.ID));
|
||||||
|
rule.setLabel(config.getString(Key.LABEL));
|
||||||
|
if (type == ImportType.EDGE) {
|
||||||
|
rule.setFrom(config.getString(Key.FROM));
|
||||||
|
rule.setTo(config.getString(Key.TO));
|
||||||
|
}
|
||||||
|
|
||||||
|
rule.setImportType(type);
|
||||||
|
|
||||||
|
final List<Configuration> configurations = config.getListConfiguration(Key.PROPERTIES);
|
||||||
|
if (configurations != null) {
|
||||||
|
for (final Configuration prop : config.getListConfiguration(Key.PROPERTIES)) {
|
||||||
|
final PropertyMappingRule propRule = new PropertyMappingRule();
|
||||||
|
propRule.setKey(prop.getString(Key.PROP_KEY));
|
||||||
|
propRule.setValue(prop.getString(Key.PROP_VALUE));
|
||||||
|
propRule.setValueType(ValueType.fromShortName(prop.getString(Key.PROP_TYPE).toLowerCase()));
|
||||||
|
rule.getProperties().add(propRule);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
final String propertiesJsonStr = config.getString(Key.PROPERTIES_JSON_STR, null);
|
||||||
|
if (propertiesJsonStr != null) {
|
||||||
|
rule.setPropertiesJsonStr(propertiesJsonStr);
|
||||||
|
}
|
||||||
|
|
||||||
|
return rule;
|
||||||
|
}
|
||||||
|
|
||||||
|
public MappingRule createV2(final Configuration config) {
|
||||||
|
try {
|
||||||
|
final ImportType type = ImportType.valueOf(config.getString(Key.IMPORT_TYPE));
|
||||||
|
return createV2(config, type);
|
||||||
|
} catch (final NullPointerException e) {
|
||||||
|
throw DataXException.asDataXException(GdbWriterErrorCode.CONFIG_ITEM_MISS, Key.IMPORT_TYPE);
|
||||||
|
} catch (final IllegalArgumentException e) {
|
||||||
|
throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, Key.IMPORT_TYPE);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public MappingRule createV2(final Configuration config, final ImportType type) {
|
||||||
|
final MappingRule rule = new MappingRule();
|
||||||
|
boolean patternChecked = false;
|
||||||
|
|
||||||
|
ConfigHelper.assertHasContent(config, Key.LABEL);
|
||||||
|
rule.setLabel(config.getString(Key.LABEL));
|
||||||
|
rule.setImportType(type);
|
||||||
|
patternChecked = isPattern(rule.getLabel(), rule, patternChecked);
|
||||||
|
|
||||||
|
IdTransRule srcTransRule = IdTransRule.none;
|
||||||
IdTransRule dstTransRule = IdTransRule.none;
|
IdTransRule dstTransRule = IdTransRule.none;
|
||||||
if (type == ImportType.EDGE) {
|
if (type == ImportType.EDGE) {
|
||||||
ConfigHelper.assertHasContent(config, Key.SRC_ID_TRANS_RULE);
|
ConfigHelper.assertHasContent(config, Key.SRC_ID_TRANS_RULE);
|
||||||
ConfigHelper.assertHasContent(config, Key.DST_ID_TRANS_RULE);
|
ConfigHelper.assertHasContent(config, Key.DST_ID_TRANS_RULE);
|
||||||
|
|
||||||
srcTransRule = IdTransRule.valueOf(config.getString(Key.SRC_ID_TRANS_RULE));
|
srcTransRule = IdTransRule.valueOf(config.getString(Key.SRC_ID_TRANS_RULE));
|
||||||
dstTransRule = IdTransRule.valueOf(config.getString(Key.DST_ID_TRANS_RULE));
|
dstTransRule = IdTransRule.valueOf(config.getString(Key.DST_ID_TRANS_RULE));
|
||||||
@ -94,88 +125,96 @@ public class MappingRuleFactory {
|
|||||||
ConfigHelper.assertHasContent(config, Key.DST_LABEL);
|
ConfigHelper.assertHasContent(config, Key.DST_LABEL);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
ConfigHelper.assertHasContent(config, Key.ID_TRANS_RULE);
|
ConfigHelper.assertHasContent(config, Key.ID_TRANS_RULE);
|
||||||
IdTransRule transRule = IdTransRule.valueOf(config.getString(Key.ID_TRANS_RULE));
|
final IdTransRule transRule = IdTransRule.valueOf(config.getString(Key.ID_TRANS_RULE));
|
||||||
|
|
||||||
List<Configuration> configurationList = config.getListConfiguration(Key.COLUMN);
|
final List<Configuration> configurationList = config.getListConfiguration(Key.COLUMN);
|
||||||
ConfigHelper.assertConfig(Key.COLUMN, () -> (configurationList != null && !configurationList.isEmpty()));
|
ConfigHelper.assertConfig(Key.COLUMN, () -> (configurationList != null && !configurationList.isEmpty()));
|
||||||
for (Configuration column : configurationList) {
|
for (final Configuration column : configurationList) {
|
||||||
ConfigHelper.assertHasContent(column, Key.COLUMN_NAME);
|
ConfigHelper.assertHasContent(column, Key.COLUMN_NAME);
|
||||||
ConfigHelper.assertHasContent(column, Key.COLUMN_VALUE);
|
ConfigHelper.assertHasContent(column, Key.COLUMN_VALUE);
|
||||||
ConfigHelper.assertHasContent(column, Key.COLUMN_TYPE);
|
ConfigHelper.assertHasContent(column, Key.COLUMN_TYPE);
|
||||||
ConfigHelper.assertHasContent(column, Key.COLUMN_NODE_TYPE);
|
ConfigHelper.assertHasContent(column, Key.COLUMN_NODE_TYPE);
|
||||||
|
|
||||||
String columnValue = column.getString(Key.COLUMN_VALUE);
|
final String columnValue = column.getString(Key.COLUMN_VALUE);
|
||||||
ColumnType columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE));
|
final ColumnType columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE));
|
||||||
if (columnValue == null || columnValue.isEmpty()) {
|
if (columnValue == null || columnValue.isEmpty()) {
|
||||||
// only allow edge empty id
|
// only allow edge empty id
|
||||||
ConfigHelper.assertConfig("empty column value",
|
ConfigHelper.assertConfig("empty column value",
|
||||||
() -> (type == ImportType.EDGE && columnType == ColumnType.primaryKey));
|
() -> (type == ImportType.EDGE && columnType == ColumnType.primaryKey));
|
||||||
}
|
}
|
||||||
|
patternChecked = isPattern(columnValue, rule, patternChecked);
|
||||||
|
|
||||||
if (columnType == ColumnType.primaryKey) {
|
if (columnType == ColumnType.primaryKey) {
|
||||||
ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
||||||
ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING));
|
ConfigHelper.assertConfig("only string is allowed in primary key",
|
||||||
|
() -> (propType == ValueType.STRING));
|
||||||
|
|
||||||
if (transRule == IdTransRule.labelPrefix) {
|
if (transRule == IdTransRule.labelPrefix) {
|
||||||
rule.setId(config.getString(Key.LABEL) + columnValue);
|
rule.setId(config.getString(Key.LABEL) + columnValue);
|
||||||
} else {
|
|
||||||
rule.setId(columnValue);
|
|
||||||
}
|
|
||||||
} else if (columnType == ColumnType.edgeJsonProperty || columnType == ColumnType.vertexJsonProperty) {
|
|
||||||
// only support one json property in column
|
|
||||||
ConfigHelper.assertConfig("multi JsonProperty", () -> (rule.getPropertiesJsonStr() == null));
|
|
||||||
|
|
||||||
rule.setPropertiesJsonStr(columnValue);
|
|
||||||
} else if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty) {
|
|
||||||
PropertyMappingRule propertyMappingRule = new PropertyMappingRule();
|
|
||||||
|
|
||||||
propertyMappingRule.setKey(column.getString(Key.COLUMN_NAME));
|
|
||||||
propertyMappingRule.setValue(columnValue);
|
|
||||||
ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
|
||||||
ConfigHelper.assertConfig("unsupported property type", () -> propType != null);
|
|
||||||
|
|
||||||
propertyMappingRule.setValueType(propType);
|
|
||||||
rule.getProperties().add(propertyMappingRule);
|
|
||||||
} else if (columnType == ColumnType.srcPrimaryKey) {
|
|
||||||
if (type != ImportType.EDGE) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
|
||||||
ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING));
|
|
||||||
|
|
||||||
if (srcTransRule == IdTransRule.labelPrefix) {
|
|
||||||
rule.setFrom(config.getString(Key.SRC_LABEL) + columnValue);
|
|
||||||
} else {
|
} else {
|
||||||
rule.setFrom(columnValue);
|
rule.setId(columnValue);
|
||||||
}
|
}
|
||||||
} else if (columnType == ColumnType.dstPrimaryKey) {
|
} else if (columnType == ColumnType.edgeJsonProperty || columnType == ColumnType.vertexJsonProperty) {
|
||||||
|
// only support one json property in column
|
||||||
|
ConfigHelper.assertConfig("multi JsonProperty", () -> (rule.getPropertiesJsonStr() == null));
|
||||||
|
|
||||||
|
rule.setPropertiesJsonStr(columnValue);
|
||||||
|
} else if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty
|
||||||
|
|| columnType == ColumnType.vertexSetProperty) {
|
||||||
|
final PropertyMappingRule propertyMappingRule = new PropertyMappingRule();
|
||||||
|
|
||||||
|
propertyMappingRule.setKey(column.getString(Key.COLUMN_NAME));
|
||||||
|
propertyMappingRule.setValue(columnValue);
|
||||||
|
final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
||||||
|
ConfigHelper.assertConfig("unsupported property type", () -> propType != null);
|
||||||
|
|
||||||
|
if (columnType == ColumnType.vertexSetProperty) {
|
||||||
|
propertyMappingRule.setPType(Key.PropertyType.set);
|
||||||
|
}
|
||||||
|
propertyMappingRule.setValueType(propType);
|
||||||
|
rule.getProperties().add(propertyMappingRule);
|
||||||
|
} else if (columnType == ColumnType.srcPrimaryKey) {
|
||||||
if (type != ImportType.EDGE) {
|
if (type != ImportType.EDGE) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
||||||
ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING));
|
ConfigHelper.assertConfig("only string is allowed in primary key",
|
||||||
|
() -> (propType == ValueType.STRING));
|
||||||
|
|
||||||
|
if (srcTransRule == IdTransRule.labelPrefix) {
|
||||||
|
rule.setFrom(config.getString(Key.SRC_LABEL) + columnValue);
|
||||||
|
} else {
|
||||||
|
rule.setFrom(columnValue);
|
||||||
|
}
|
||||||
|
} else if (columnType == ColumnType.dstPrimaryKey) {
|
||||||
|
if (type != ImportType.EDGE) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE));
|
||||||
|
ConfigHelper.assertConfig("only string is allowed in primary key",
|
||||||
|
() -> (propType == ValueType.STRING));
|
||||||
|
|
||||||
if (dstTransRule == IdTransRule.labelPrefix) {
|
if (dstTransRule == IdTransRule.labelPrefix) {
|
||||||
rule.setTo(config.getString(Key.DST_LABEL) + columnValue);
|
rule.setTo(config.getString(Key.DST_LABEL) + columnValue);
|
||||||
} else {
|
} else {
|
||||||
rule.setTo(columnValue);
|
rule.setTo(columnValue);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (rule.getImportType() == ImportType.EDGE) {
|
if (rule.getImportType() == ImportType.EDGE) {
|
||||||
if (rule.getId() == null) {
|
if (rule.getId() == null) {
|
||||||
rule.setId("");
|
rule.setId("");
|
||||||
log.info("edge id is missed, uuid be default");
|
log.info("edge id is missed, uuid be default");
|
||||||
}
|
}
|
||||||
ConfigHelper.assertConfig("to needed in edge", () -> (rule.getTo() != null));
|
ConfigHelper.assertConfig("to needed in edge", () -> (rule.getTo() != null));
|
||||||
ConfigHelper.assertConfig("from needed in edge", () -> (rule.getFrom() != null));
|
ConfigHelper.assertConfig("from needed in edge", () -> (rule.getFrom() != null));
|
||||||
}
|
}
|
||||||
ConfigHelper.assertConfig("id needed", () -> (rule.getId() != null));
|
ConfigHelper.assertConfig("id needed", () -> (rule.getId() != null));
|
||||||
|
|
||||||
return rule;
|
return rule;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -8,6 +8,7 @@ import java.util.Map;
|
|||||||
import java.util.function.Function;
|
import java.util.function.Function;
|
||||||
|
|
||||||
import com.alibaba.datax.common.element.Column;
|
import com.alibaba.datax.common.element.Column;
|
||||||
|
|
||||||
import lombok.extern.slf4j.Slf4j;
|
import lombok.extern.slf4j.Slf4j;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@ -16,56 +17,61 @@ import lombok.extern.slf4j.Slf4j;
|
|||||||
*/
|
*/
|
||||||
@Slf4j
|
@Slf4j
|
||||||
public enum ValueType {
|
public enum ValueType {
|
||||||
INT(Integer.class, "int", Column::asLong, Integer::valueOf),
|
/**
|
||||||
LONG(Long.class, "long", Column::asLong, Long::valueOf),
|
* property value type
|
||||||
DOUBLE(Double.class, "double", Column::asDouble, Double::valueOf),
|
*/
|
||||||
FLOAT(Float.class, "float", Column::asDouble, Float::valueOf),
|
INT(Integer.class, "int", Column::asLong, Integer::valueOf),
|
||||||
BOOLEAN(Boolean.class, "boolean", Column::asBoolean, Boolean::valueOf),
|
INTEGER(Integer.class, "integer", Column::asLong, Integer::valueOf),
|
||||||
STRING(String.class, "string", Column::asString, String::valueOf);
|
LONG(Long.class, "long", Column::asLong, Long::valueOf),
|
||||||
|
DOUBLE(Double.class, "double", Column::asDouble, Double::valueOf),
|
||||||
|
FLOAT(Float.class, "float", Column::asDouble, Float::valueOf),
|
||||||
|
BOOLEAN(Boolean.class, "boolean", Column::asBoolean, Boolean::valueOf),
|
||||||
|
STRING(String.class, "string", Column::asString, String::valueOf);
|
||||||
|
|
||||||
private Class<?> type = null;
|
private Class<?> type = null;
|
||||||
private String shortName = null;
|
private String shortName = null;
|
||||||
private Function<Column, Object> columnFunc = null;
|
private Function<Column, Object> columnFunc = null;
|
||||||
private Function<String, Object> fromStrFunc = null;
|
private Function<String, Object> fromStrFunc = null;
|
||||||
|
|
||||||
private ValueType(Class<?> type, String name, Function<Column, Object> columnFunc, Function<String, Object> fromStrFunc) {
|
private ValueType(final Class<?> type, final String name, final Function<Column, Object> columnFunc,
|
||||||
this.type = type;
|
final Function<String, Object> fromStrFunc) {
|
||||||
this.shortName = name;
|
this.type = type;
|
||||||
this.columnFunc = columnFunc;
|
this.shortName = name;
|
||||||
this.fromStrFunc = fromStrFunc;
|
this.columnFunc = columnFunc;
|
||||||
|
this.fromStrFunc = fromStrFunc;
|
||||||
ValueTypeHolder.shortName2type.put(name, this);
|
|
||||||
}
|
|
||||||
|
|
||||||
public static ValueType fromShortName(String name) {
|
|
||||||
return ValueTypeHolder.shortName2type.get(name);
|
|
||||||
}
|
|
||||||
|
|
||||||
public Class<?> type() {
|
ValueTypeHolder.shortName2type.put(name, this);
|
||||||
return this.type;
|
}
|
||||||
}
|
|
||||||
|
|
||||||
public String shortName() {
|
|
||||||
return this.shortName;
|
|
||||||
}
|
|
||||||
|
|
||||||
public Object applyColumn(Column column) {
|
|
||||||
try {
|
|
||||||
if (column == null) {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
return columnFunc.apply(column);
|
|
||||||
} catch (Exception e) {
|
|
||||||
log.error("applyColumn error {}, column {}", e.toString(), column);
|
|
||||||
throw e;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
public Object fromStrFunc(String str) {
|
|
||||||
return fromStrFunc.apply(str);
|
|
||||||
}
|
|
||||||
|
|
||||||
private static class ValueTypeHolder {
|
public static ValueType fromShortName(final String name) {
|
||||||
private static Map<String, ValueType> shortName2type = new HashMap<>();
|
return ValueTypeHolder.shortName2type.get(name);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public Class<?> type() {
|
||||||
|
return this.type;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String shortName() {
|
||||||
|
return this.shortName;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Object applyColumn(final Column column) {
|
||||||
|
try {
|
||||||
|
if (column == null) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
return this.columnFunc.apply(column);
|
||||||
|
} catch (final Exception e) {
|
||||||
|
log.error("applyColumn error {}, column {}", e.toString(), column);
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public Object fromStrFunc(final String str) {
|
||||||
|
return this.fromStrFunc.apply(str);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static class ValueTypeHolder {
|
||||||
|
private static Map<String, ValueType> shortName2type = new HashMap<>();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,20 +3,24 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
||||||
|
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.MAX_REQUEST_LENGTH;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig;
|
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.UUID;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
|
||||||
import lombok.extern.slf4j.Slf4j;
|
|
||||||
import org.apache.tinkerpop.gremlin.driver.Client;
|
import org.apache.tinkerpop.gremlin.driver.Client;
|
||||||
import org.apache.tinkerpop.gremlin.driver.Cluster;
|
import org.apache.tinkerpop.gremlin.driver.Cluster;
|
||||||
import org.apache.tinkerpop.gremlin.driver.RequestOptions;
|
import org.apache.tinkerpop.gremlin.driver.RequestOptions;
|
||||||
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
import org.apache.tinkerpop.gremlin.driver.ResultSet;
|
||||||
import org.apache.tinkerpop.gremlin.driver.ser.Serializers;
|
import org.apache.tinkerpop.gremlin.driver.ser.Serializers;
|
||||||
|
|
||||||
import java.util.Map;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import java.util.UUID;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
import java.util.concurrent.TimeUnit;
|
import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig;
|
||||||
|
|
||||||
|
import lombok.extern.slf4j.Slf4j;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
@ -24,128 +28,124 @@ import java.util.concurrent.TimeUnit;
|
|||||||
*/
|
*/
|
||||||
@Slf4j
|
@Slf4j
|
||||||
public abstract class AbstractGdbGraph implements GdbGraph {
|
public abstract class AbstractGdbGraph implements GdbGraph {
|
||||||
private final static int DEFAULT_TIMEOUT = 30000;
|
private final static int DEFAULT_TIMEOUT = 30000;
|
||||||
|
|
||||||
protected Client client = null;
|
protected Client client = null;
|
||||||
protected Key.UpdateMode updateMode = Key.UpdateMode.INSERT;
|
protected Key.UpdateMode updateMode = Key.UpdateMode.INSERT;
|
||||||
protected int propertiesBatchNum = GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM;
|
protected int propertiesBatchNum = DEFAULT_BATCH_PROPERTY_NUM;
|
||||||
protected boolean session = false;
|
protected boolean session = false;
|
||||||
|
protected int maxRequestLength = GdbWriterConfig.MAX_REQUEST_LENGTH;
|
||||||
|
|
||||||
|
protected AbstractGdbGraph() {}
|
||||||
|
|
||||||
protected AbstractGdbGraph() {}
|
protected AbstractGdbGraph(final Configuration config, final boolean session) {
|
||||||
|
initClient(config, session);
|
||||||
|
}
|
||||||
|
|
||||||
protected AbstractGdbGraph(Configuration config, boolean session) {
|
protected void initClient(final Configuration config, final boolean session) {
|
||||||
initClient(config, session);
|
this.updateMode = Key.UpdateMode.valueOf(config.getString(Key.UPDATE_MODE, "INSERT"));
|
||||||
}
|
log.info("init graphdb client");
|
||||||
|
final String host = config.getString(Key.HOST);
|
||||||
|
final int port = config.getInt(Key.PORT);
|
||||||
|
final String username = config.getString(Key.USERNAME);
|
||||||
|
final String password = config.getString(Key.PASSWORD);
|
||||||
|
int maxDepthPerConnection =
|
||||||
|
config.getInt(Key.MAX_IN_PROCESS_PER_CONNECTION, GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION);
|
||||||
|
|
||||||
protected void initClient(Configuration config, boolean session) {
|
int maxConnectionPoolSize =
|
||||||
updateMode = Key.UpdateMode.valueOf(config.getString(Key.UPDATE_MODE, "INSERT"));
|
config.getInt(Key.MAX_CONNECTION_POOL_SIZE, GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE);
|
||||||
log.info("init graphdb client");
|
|
||||||
String host = config.getString(Key.HOST);
|
|
||||||
int port = config.getInt(Key.PORT);
|
|
||||||
String username = config.getString(Key.USERNAME);
|
|
||||||
String password = config.getString(Key.PASSWORD);
|
|
||||||
int maxDepthPerConnection = config.getInt(Key.MAX_IN_PROCESS_PER_CONNECTION,
|
|
||||||
GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION);
|
|
||||||
|
|
||||||
int maxConnectionPoolSize = config.getInt(Key.MAX_CONNECTION_POOL_SIZE,
|
int maxSimultaneousUsagePerConnection = config.getInt(Key.MAX_SIMULTANEOUS_USAGE_PER_CONNECTION,
|
||||||
GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE);
|
GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION);
|
||||||
|
|
||||||
int maxSimultaneousUsagePerConnection = config.getInt(Key.MAX_SIMULTANEOUS_USAGE_PER_CONNECTION,
|
this.session = session;
|
||||||
GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION);
|
if (this.session) {
|
||||||
|
maxConnectionPoolSize = GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE;
|
||||||
|
maxDepthPerConnection = GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION;
|
||||||
|
maxSimultaneousUsagePerConnection = GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION;
|
||||||
|
}
|
||||||
|
|
||||||
this.session = session;
|
try {
|
||||||
if (this.session) {
|
final Cluster cluster = Cluster.build(host).port(port).credentials(username, password)
|
||||||
maxConnectionPoolSize = GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE;
|
.serializer(Serializers.GRAPHBINARY_V1D0).maxContentLength(1048576)
|
||||||
maxDepthPerConnection = GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION;
|
.maxInProcessPerConnection(maxDepthPerConnection).minInProcessPerConnection(0)
|
||||||
maxSimultaneousUsagePerConnection = GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION;
|
.maxConnectionPoolSize(maxConnectionPoolSize).minConnectionPoolSize(maxConnectionPoolSize)
|
||||||
}
|
.maxSimultaneousUsagePerConnection(maxSimultaneousUsagePerConnection).resultIterationBatchSize(64)
|
||||||
|
.create();
|
||||||
|
this.client = session ? cluster.connect(UUID.randomUUID().toString()).init() : cluster.connect().init();
|
||||||
|
warmClient(maxConnectionPoolSize * maxDepthPerConnection);
|
||||||
|
} catch (final RuntimeException e) {
|
||||||
|
log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e);
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
|
||||||
try {
|
this.propertiesBatchNum = config.getInt(Key.MAX_PROPERTIES_BATCH_NUM, DEFAULT_BATCH_PROPERTY_NUM);
|
||||||
Cluster cluster = Cluster.build(host).port(port).credentials(username, password)
|
this.maxRequestLength = config.getInt(Key.MAX_GDB_REQUEST_LENGTH, MAX_REQUEST_LENGTH);
|
||||||
.serializer(Serializers.GRAPHBINARY_V1D0)
|
}
|
||||||
.maxContentLength(1048576)
|
|
||||||
.maxInProcessPerConnection(maxDepthPerConnection)
|
|
||||||
.minInProcessPerConnection(0)
|
|
||||||
.maxConnectionPoolSize(maxConnectionPoolSize)
|
|
||||||
.minConnectionPoolSize(maxConnectionPoolSize)
|
|
||||||
.maxSimultaneousUsagePerConnection(maxSimultaneousUsagePerConnection)
|
|
||||||
.resultIterationBatchSize(64)
|
|
||||||
.create();
|
|
||||||
client = session ? cluster.connect(UUID.randomUUID().toString()).init() : cluster.connect().init();
|
|
||||||
warmClient(maxConnectionPoolSize*maxDepthPerConnection);
|
|
||||||
} catch (RuntimeException e) {
|
|
||||||
log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e);
|
|
||||||
throw e;
|
|
||||||
}
|
|
||||||
|
|
||||||
propertiesBatchNum = config.getInt(Key.MAX_PROPERTIES_BATCH_NUM, GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM);
|
/**
|
||||||
}
|
* @param dsl
|
||||||
|
* @param parameters
|
||||||
|
*/
|
||||||
|
protected void runInternal(final String dsl, final Map<String, Object> parameters) throws Exception {
|
||||||
|
final RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT);
|
||||||
|
if (parameters != null && !parameters.isEmpty()) {
|
||||||
|
parameters.forEach(options::addParameter);
|
||||||
|
}
|
||||||
|
|
||||||
|
final ResultSet results = this.client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS);
|
||||||
|
results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
void beginTx() {
|
||||||
* @param dsl
|
if (!this.session) {
|
||||||
* @param parameters
|
return;
|
||||||
*/
|
}
|
||||||
protected void runInternal(String dsl, final Map<String, Object> parameters) throws Exception {
|
|
||||||
RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT);
|
|
||||||
if (parameters != null && !parameters.isEmpty()) {
|
|
||||||
parameters.forEach(options::addParameter);
|
|
||||||
}
|
|
||||||
|
|
||||||
ResultSet results = client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS);
|
final String dsl = "g.tx().open()";
|
||||||
results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS);
|
this.client.submit(dsl).all().join();
|
||||||
}
|
}
|
||||||
|
|
||||||
void beginTx() {
|
void doCommit() {
|
||||||
if (!session) {
|
if (!this.session) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
String dsl = "g.tx().open()";
|
try {
|
||||||
client.submit(dsl).all().join();
|
final String dsl = "g.tx().commit()";
|
||||||
}
|
this.client.submit(dsl).all().join();
|
||||||
|
} catch (final Exception e) {
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
void doCommit() {
|
void doRollback() {
|
||||||
if (!session) {
|
if (!this.session) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
final String dsl = "g.tx().rollback()";
|
||||||
String dsl = "g.tx().commit()";
|
this.client.submit(dsl).all().join();
|
||||||
client.submit(dsl).all().join();
|
}
|
||||||
} catch (Exception e) {
|
|
||||||
throw new RuntimeException(e);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
void doRollback() {
|
private void warmClient(final int num) {
|
||||||
if (!session) {
|
try {
|
||||||
return;
|
beginTx();
|
||||||
}
|
runInternal("g.V('test')", null);
|
||||||
|
doCommit();
|
||||||
|
log.info("warm graphdb client over");
|
||||||
|
} catch (final Exception e) {
|
||||||
|
log.error("warmClient error");
|
||||||
|
doRollback();
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
String dsl = "g.tx().rollback()";
|
@Override
|
||||||
client.submit(dsl).all().join();
|
public void close() {
|
||||||
}
|
if (this.client != null) {
|
||||||
|
log.info("close graphdb client");
|
||||||
private void warmClient(int num) {
|
this.client.close();
|
||||||
try {
|
}
|
||||||
beginTx();
|
}
|
||||||
runInternal("g.V('test')", null);
|
|
||||||
doCommit();
|
|
||||||
log.info("warm graphdb client over");
|
|
||||||
} catch (Exception e) {
|
|
||||||
log.error("warmClient error");
|
|
||||||
doRollback();
|
|
||||||
throw new RuntimeException(e);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
|
||||||
public void close() {
|
|
||||||
if (client != null) {
|
|
||||||
log.info("close graphdb client");
|
|
||||||
client.close();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
@ -3,7 +3,8 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
||||||
|
|
||||||
import lombok.Data;
|
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig;
|
||||||
|
|
||||||
import lombok.EqualsAndHashCode;
|
import lombok.EqualsAndHashCode;
|
||||||
import lombok.ToString;
|
import lombok.ToString;
|
||||||
|
|
||||||
@ -11,10 +12,33 @@ import lombok.ToString;
|
|||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
@Data
|
|
||||||
@EqualsAndHashCode(callSuper = true)
|
@EqualsAndHashCode(callSuper = true)
|
||||||
@ToString(callSuper = true)
|
@ToString(callSuper = true)
|
||||||
public class GdbEdge extends GdbElement {
|
public class GdbEdge extends GdbElement {
|
||||||
private String from = null;
|
private String from = null;
|
||||||
private String to = null;
|
private String to = null;
|
||||||
|
|
||||||
|
public String getFrom() {
|
||||||
|
return this.from;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setFrom(final String from) {
|
||||||
|
final int maxIdLength = MapperConfig.getInstance().getMaxIdLength();
|
||||||
|
if (from.length() > maxIdLength) {
|
||||||
|
throw new IllegalArgumentException("from length over limit(" + maxIdLength + ")");
|
||||||
|
}
|
||||||
|
this.from = from;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getTo() {
|
||||||
|
return this.to;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setTo(final String to) {
|
||||||
|
final int maxIdLength = MapperConfig.getInstance().getMaxIdLength();
|
||||||
|
if (to.length() > maxIdLength) {
|
||||||
|
throw new IllegalArgumentException("to length over limit(" + maxIdLength + ")");
|
||||||
|
}
|
||||||
|
this.to = to;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,18 +3,107 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
||||||
|
|
||||||
import java.util.HashMap;
|
import java.util.LinkedList;
|
||||||
import java.util.Map;
|
import java.util.List;
|
||||||
|
|
||||||
import lombok.Data;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType;
|
||||||
|
import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
@Data
|
|
||||||
public class GdbElement {
|
public class GdbElement {
|
||||||
String id = null;
|
private String id = null;
|
||||||
String label = null;
|
private String label = null;
|
||||||
Map<String, Object> properties = new HashMap<>();
|
private List<GdbProperty> properties = new LinkedList<>();
|
||||||
|
|
||||||
|
public String getId() {
|
||||||
|
return this.id;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setId(final String id) {
|
||||||
|
final int maxIdLength = MapperConfig.getInstance().getMaxIdLength();
|
||||||
|
if (id.length() > maxIdLength) {
|
||||||
|
throw new IllegalArgumentException("id length over limit(" + maxIdLength + ")");
|
||||||
|
}
|
||||||
|
this.id = id;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getLabel() {
|
||||||
|
return this.label;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setLabel(final String label) {
|
||||||
|
final int maxLabelLength = MapperConfig.getInstance().getMaxLabelLength();
|
||||||
|
if (label.length() > maxLabelLength) {
|
||||||
|
throw new IllegalArgumentException("label length over limit(" + maxLabelLength + ")");
|
||||||
|
}
|
||||||
|
this.label = label;
|
||||||
|
}
|
||||||
|
|
||||||
|
public List<GdbProperty> getProperties() {
|
||||||
|
return this.properties;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void addProperty(final String propKey, final Object propValue, final PropertyType card) {
|
||||||
|
if (propKey == null || propValue == null) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
final int maxPropKeyLength = MapperConfig.getInstance().getMaxPropKeyLength();
|
||||||
|
if (propKey.length() > maxPropKeyLength) {
|
||||||
|
throw new IllegalArgumentException("property key length over limit(" + maxPropKeyLength + ")");
|
||||||
|
}
|
||||||
|
if (propValue instanceof String) {
|
||||||
|
final int maxPropValueLength = MapperConfig.getInstance().getMaxPropValueLength();
|
||||||
|
if (((String)propValue).length() > maxPropKeyLength) {
|
||||||
|
throw new IllegalArgumentException("property value length over limit(" + maxPropValueLength + ")");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
this.properties.add(new GdbProperty(propKey, propValue, card));
|
||||||
|
}
|
||||||
|
|
||||||
|
public void addProperty(final String propKey, final Object propValue) {
|
||||||
|
addProperty(propKey, propValue, PropertyType.single);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String toString() {
|
||||||
|
final StringBuffer sb = new StringBuffer(this.id + "[" + this.label + "]{");
|
||||||
|
this.properties.forEach(n -> {
|
||||||
|
sb.append(n.cardinality.name());
|
||||||
|
sb.append("[");
|
||||||
|
sb.append(n.key);
|
||||||
|
sb.append(" - ");
|
||||||
|
sb.append(String.valueOf(n.value));
|
||||||
|
sb.append("]");
|
||||||
|
});
|
||||||
|
return sb.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class GdbProperty {
|
||||||
|
private String key;
|
||||||
|
private Object value;
|
||||||
|
private PropertyType cardinality;
|
||||||
|
|
||||||
|
private GdbProperty(final String key, final Object value, final PropertyType card) {
|
||||||
|
this.key = key;
|
||||||
|
this.value = value;
|
||||||
|
this.cardinality = card;
|
||||||
|
}
|
||||||
|
|
||||||
|
public PropertyType getCardinality() {
|
||||||
|
return this.cardinality;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getKey() {
|
||||||
|
return this.key;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Object getValue() {
|
||||||
|
return this.value;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -3,18 +3,19 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
||||||
|
|
||||||
import com.alibaba.datax.common.element.Record;
|
|
||||||
import groovy.lang.Tuple2;
|
|
||||||
|
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Record;
|
||||||
|
|
||||||
|
import groovy.lang.Tuple2;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
public interface GdbGraph extends AutoCloseable {
|
public interface GdbGraph extends AutoCloseable {
|
||||||
List<Tuple2<Record, Exception>> add(List<Tuple2<Record, GdbElement>> records);
|
List<Tuple2<Record, Exception>> add(List<Tuple2<Record, GdbElement>> records);
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
void close();
|
void close();
|
||||||
}
|
}
|
||||||
|
@ -3,15 +3,17 @@
|
|||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
package com.alibaba.datax.plugin.writer.gdbwriter.model;
|
||||||
|
|
||||||
import java.util.*;
|
import java.util.ArrayList;
|
||||||
|
import java.util.HashMap;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Map;
|
||||||
|
import java.util.Random;
|
||||||
|
|
||||||
import com.alibaba.datax.common.element.Record;
|
import com.alibaba.datax.common.element.Record;
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
import com.alibaba.datax.plugin.writer.gdbwriter.Key;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.util.GdbDuplicateIdException;
|
import com.alibaba.datax.plugin.writer.gdbwriter.util.GdbDuplicateIdException;
|
||||||
import com.github.benmanes.caffeine.cache.Cache;
|
|
||||||
import com.github.benmanes.caffeine.cache.Caffeine;
|
|
||||||
import groovy.lang.Tuple2;
|
import groovy.lang.Tuple2;
|
||||||
import lombok.extern.slf4j.Slf4j;
|
import lombok.extern.slf4j.Slf4j;
|
||||||
|
|
||||||
@ -21,176 +23,198 @@ import lombok.extern.slf4j.Slf4j;
|
|||||||
*/
|
*/
|
||||||
@Slf4j
|
@Slf4j
|
||||||
public class ScriptGdbGraph extends AbstractGdbGraph {
|
public class ScriptGdbGraph extends AbstractGdbGraph {
|
||||||
private static final String VAR_PREFIX = "GDB___";
|
private static final String VAR_PREFIX = "GDB___";
|
||||||
private static final String VAR_ID = VAR_PREFIX + "id";
|
private static final String VAR_ID = VAR_PREFIX + "id";
|
||||||
private static final String VAR_LABEL = VAR_PREFIX + "label";
|
private static final String VAR_LABEL = VAR_PREFIX + "label";
|
||||||
private static final String VAR_FROM = VAR_PREFIX + "from";
|
private static final String VAR_FROM = VAR_PREFIX + "from";
|
||||||
private static final String VAR_TO = VAR_PREFIX + "to";
|
private static final String VAR_TO = VAR_PREFIX + "to";
|
||||||
private static final String VAR_PROP_KEY = VAR_PREFIX + "PK";
|
private static final String VAR_PROP_KEY = VAR_PREFIX + "PK";
|
||||||
private static final String VAR_PROP_VALUE = VAR_PREFIX + "PV";
|
private static final String VAR_PROP_VALUE = VAR_PREFIX + "PV";
|
||||||
private static final String ADD_V_START = "g.addV(" + VAR_LABEL + ").property(id, " + VAR_ID + ")";
|
private static final String ADD_V_START = "g.addV(" + VAR_LABEL + ").property(id, " + VAR_ID + ")";
|
||||||
private static final String ADD_E_START = "g.addE(" + VAR_LABEL + ").property(id, " + VAR_ID + ").from(V("
|
private static final String ADD_E_START =
|
||||||
+ VAR_FROM + ")).to(V(" + VAR_TO + "))";
|
"g.addE(" + VAR_LABEL + ").property(id, " + VAR_ID + ").from(V(" + VAR_FROM + ")).to(V(" + VAR_TO + "))";
|
||||||
|
|
||||||
private static final String UPDATE_V_START = "g.V("+VAR_ID+")";
|
private static final String UPDATE_V_START = "g.V(" + VAR_ID + ")";
|
||||||
private static final String UPDATE_E_START = "g.E("+VAR_ID+")";
|
private static final String UPDATE_E_START = "g.E(" + VAR_ID + ")";
|
||||||
|
|
||||||
private Cache<Integer, String> propertyCache;
|
private Random random;
|
||||||
private Random random;
|
|
||||||
|
|
||||||
public ScriptGdbGraph() {
|
public ScriptGdbGraph() {
|
||||||
propertyCache = Caffeine.newBuilder().maximumSize(1024).build();
|
this.random = new Random();
|
||||||
random = new Random();
|
}
|
||||||
}
|
|
||||||
|
|
||||||
public ScriptGdbGraph(Configuration config, boolean session) {
|
public ScriptGdbGraph(final Configuration config, final boolean session) {
|
||||||
super(config, session);
|
super(config, session);
|
||||||
|
|
||||||
propertyCache = Caffeine.newBuilder().maximumSize(1024).build();
|
this.random = new Random();
|
||||||
random = new Random();
|
log.info("Init as ScriptGdbGraph.");
|
||||||
|
}
|
||||||
|
|
||||||
log.info("Init as ScriptGdbGraph.");
|
/**
|
||||||
}
|
* Apply list of {@link GdbElement} to GDB, return the failed records
|
||||||
|
*
|
||||||
|
* @param records
|
||||||
|
* list of element to apply
|
||||||
|
* @return
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
public List<Tuple2<Record, Exception>> add(final List<Tuple2<Record, GdbElement>> records) {
|
||||||
|
final List<Tuple2<Record, Exception>> errors = new ArrayList<>();
|
||||||
|
try {
|
||||||
|
beginTx();
|
||||||
|
for (final Tuple2<Record, GdbElement> elementTuple2 : records) {
|
||||||
|
try {
|
||||||
|
addInternal(elementTuple2.getSecond());
|
||||||
|
} catch (final Exception e) {
|
||||||
|
errors.add(new Tuple2<>(elementTuple2.getFirst(), e));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
doCommit();
|
||||||
|
} catch (final Exception ex) {
|
||||||
|
doRollback();
|
||||||
|
throw new RuntimeException(ex);
|
||||||
|
}
|
||||||
|
return errors;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
private void addInternal(final GdbElement element) {
|
||||||
* Apply list of {@link GdbElement} to GDB, return the failed records
|
try {
|
||||||
* @param records list of element to apply
|
addInternal(element, false);
|
||||||
* @return
|
} catch (final GdbDuplicateIdException e) {
|
||||||
*/
|
if (this.updateMode == Key.UpdateMode.SKIP) {
|
||||||
@Override
|
log.debug("Skip duplicate id {}", element.getId());
|
||||||
public List<Tuple2<Record, Exception>> add(List<Tuple2<Record, GdbElement>> records) {
|
} else if (this.updateMode == Key.UpdateMode.INSERT) {
|
||||||
List<Tuple2<Record, Exception>> errors = new ArrayList<>();
|
throw new RuntimeException(e);
|
||||||
try {
|
} else if (this.updateMode == Key.UpdateMode.MERGE) {
|
||||||
beginTx();
|
if (element.getProperties().isEmpty()) {
|
||||||
for (Tuple2<Record, GdbElement> elementTuple2 : records) {
|
return;
|
||||||
try {
|
}
|
||||||
addInternal(elementTuple2.getSecond());
|
|
||||||
} catch (Exception e) {
|
|
||||||
errors.add(new Tuple2<>(elementTuple2.getFirst(), e));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
doCommit();
|
|
||||||
} catch (Exception ex) {
|
|
||||||
doRollback();
|
|
||||||
throw new RuntimeException(ex);
|
|
||||||
}
|
|
||||||
return errors;
|
|
||||||
}
|
|
||||||
|
|
||||||
private void addInternal(GdbElement element) {
|
try {
|
||||||
try {
|
addInternal(element, true);
|
||||||
addInternal(element, false);
|
} catch (final GdbDuplicateIdException e1) {
|
||||||
} catch (GdbDuplicateIdException e) {
|
log.error("duplicate id {} while update...", element.getId());
|
||||||
if (updateMode == Key.UpdateMode.SKIP) {
|
throw new RuntimeException(e1);
|
||||||
log.debug("Skip duplicate id {}", element.getId());
|
}
|
||||||
} else if (updateMode == Key.UpdateMode.INSERT) {
|
}
|
||||||
throw new RuntimeException(e);
|
}
|
||||||
} else if (updateMode == Key.UpdateMode.MERGE) {
|
}
|
||||||
if (element.getProperties().isEmpty()) {
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
private void addInternal(final GdbElement element, final boolean update) throws GdbDuplicateIdException {
|
||||||
addInternal(element, true);
|
boolean firstAdd = !update;
|
||||||
} catch (GdbDuplicateIdException e1) {
|
final boolean isVertex = (element instanceof GdbVertex);
|
||||||
log.error("duplicate id {} while update...", element.getId());
|
final List<GdbElement.GdbProperty> params = element.getProperties();
|
||||||
throw new RuntimeException(e1);
|
final List<GdbElement.GdbProperty> subParams = new ArrayList<>(this.propertiesBatchNum);
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private void addInternal(GdbElement element, boolean update) throws GdbDuplicateIdException {
|
final int idLength = element.getId().length();
|
||||||
Map<String, Object> params = element.getProperties();
|
int attachLength = element.getLabel().length();
|
||||||
Map<String, Object> subParams = new HashMap<>(propertiesBatchNum);
|
if (element instanceof GdbEdge) {
|
||||||
boolean firstAdd = !update;
|
attachLength += ((GdbEdge)element).getFrom().length();
|
||||||
boolean isVertex = (element instanceof GdbVertex);
|
attachLength += ((GdbEdge)element).getTo().length();
|
||||||
|
}
|
||||||
|
|
||||||
for (Map.Entry<String, Object> entry : params.entrySet()) {
|
int requestLength = idLength;
|
||||||
subParams.put(entry.getKey(), entry.getValue());
|
for (final GdbElement.GdbProperty entry : params) {
|
||||||
if (subParams.size() >= propertiesBatchNum) {
|
final String propKey = entry.getKey();
|
||||||
setGraphDbElement(element, subParams, isVertex, firstAdd);
|
final Object propValue = entry.getValue();
|
||||||
firstAdd = false;
|
|
||||||
subParams.clear();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (!subParams.isEmpty() || firstAdd) {
|
|
||||||
setGraphDbElement(element, subParams, isVertex, firstAdd);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private Tuple2<String, Map<String, Object>> buildDsl(GdbElement element,
|
int appendLength = propKey.length();
|
||||||
Map<String, Object> properties,
|
if (propValue instanceof String) {
|
||||||
boolean isVertex, boolean firstAdd) {
|
appendLength += ((String)propValue).length();
|
||||||
Map<String, Object> params = new HashMap<>();
|
}
|
||||||
|
|
||||||
String dslPropertyPart = propertyCache.get(properties.size(), keys -> {
|
if (checkSplitDsl(firstAdd, requestLength, attachLength, appendLength, subParams.size())) {
|
||||||
final StringBuilder sb = new StringBuilder();
|
setGraphDbElement(element, subParams, isVertex, firstAdd);
|
||||||
for (int i = 0; i < keys; i++) {
|
firstAdd = false;
|
||||||
sb.append(".property(").append(VAR_PROP_KEY).append(i)
|
subParams.clear();
|
||||||
.append(", ").append(VAR_PROP_VALUE).append(i).append(")");
|
requestLength = idLength;
|
||||||
}
|
}
|
||||||
return sb.toString();
|
|
||||||
});
|
|
||||||
|
|
||||||
String dsl;
|
requestLength += appendLength;
|
||||||
if (isVertex) {
|
subParams.add(entry);
|
||||||
dsl = (firstAdd ? ADD_V_START : UPDATE_V_START) + dslPropertyPart;
|
}
|
||||||
} else {
|
if (!subParams.isEmpty() || firstAdd) {
|
||||||
dsl = (firstAdd ? ADD_E_START : UPDATE_E_START) + dslPropertyPart;
|
checkSplitDsl(firstAdd, requestLength, attachLength, 0, 0);
|
||||||
if (firstAdd) {
|
setGraphDbElement(element, subParams, isVertex, firstAdd);
|
||||||
params.put(VAR_FROM, ((GdbEdge)element).getFrom());
|
}
|
||||||
params.put(VAR_TO, ((GdbEdge)element).getTo());
|
}
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
int index = 0;
|
private boolean checkSplitDsl(final boolean firstAdd, final int requestLength, final int attachLength, final int appendLength,
|
||||||
for (Map.Entry<String, Object> entry : properties.entrySet()) {
|
final int propNum) {
|
||||||
params.put(VAR_PROP_KEY+index, entry.getKey());
|
final int length = firstAdd ? requestLength + attachLength : requestLength;
|
||||||
params.put(VAR_PROP_VALUE+index, entry.getValue());
|
if (length > this.maxRequestLength) {
|
||||||
index++;
|
throw new IllegalArgumentException("request length over limit(" + this.maxRequestLength + ")");
|
||||||
}
|
}
|
||||||
|
return length + appendLength > this.maxRequestLength || propNum >= this.propertiesBatchNum;
|
||||||
|
}
|
||||||
|
|
||||||
if (firstAdd) {
|
private Tuple2<String, Map<String, Object>> buildDsl(final GdbElement element, final List<GdbElement.GdbProperty> properties,
|
||||||
params.put(VAR_LABEL, element.getLabel());
|
final boolean isVertex, final boolean firstAdd) {
|
||||||
}
|
final Map<String, Object> params = new HashMap<>();
|
||||||
params.put(VAR_ID, element.getId());
|
final StringBuilder sb = new StringBuilder();
|
||||||
|
if (isVertex) {
|
||||||
|
sb.append(firstAdd ? ADD_V_START : UPDATE_V_START);
|
||||||
|
} else {
|
||||||
|
sb.append(firstAdd ? ADD_E_START : UPDATE_E_START);
|
||||||
|
}
|
||||||
|
|
||||||
return new Tuple2<>(dsl, params);
|
for (int i = 0; i < properties.size(); i++) {
|
||||||
}
|
final GdbElement.GdbProperty prop = properties.get(i);
|
||||||
|
|
||||||
private void setGraphDbElement(GdbElement element, Map<String, Object> properties,
|
sb.append(".property(");
|
||||||
boolean isVertex, boolean firstAdd) throws GdbDuplicateIdException {
|
if (prop.getCardinality() == Key.PropertyType.set) {
|
||||||
int retry = 10;
|
sb.append("set, ");
|
||||||
int idleTime = random.nextInt(10) + 10;
|
}
|
||||||
Tuple2<String, Map<String, Object>> elementDsl = buildDsl(element, properties, isVertex, firstAdd);
|
sb.append(VAR_PROP_KEY).append(i).append(", ").append(VAR_PROP_VALUE).append(i).append(")");
|
||||||
|
|
||||||
while (retry > 0) {
|
params.put(VAR_PROP_KEY + i, prop.getKey());
|
||||||
try {
|
params.put(VAR_PROP_VALUE + i, prop.getValue());
|
||||||
runInternal(elementDsl.getFirst(), elementDsl.getSecond());
|
}
|
||||||
log.debug("AddElement {}", element.getId());
|
|
||||||
return;
|
if (firstAdd) {
|
||||||
} catch (Exception e) {
|
params.put(VAR_LABEL, element.getLabel());
|
||||||
String cause = e.getCause() == null ? "" : e.getCause().toString();
|
if (!isVertex) {
|
||||||
if (cause.contains("rejected from")) {
|
params.put(VAR_FROM, ((GdbEdge)element).getFrom());
|
||||||
retry--;
|
params.put(VAR_TO, ((GdbEdge)element).getTo());
|
||||||
try {
|
}
|
||||||
Thread.sleep(idleTime);
|
}
|
||||||
} catch (InterruptedException e1) {
|
params.put(VAR_ID, element.getId());
|
||||||
// ...
|
|
||||||
}
|
return new Tuple2<>(sb.toString(), params);
|
||||||
idleTime = Math.min(idleTime * 2, 2000);
|
}
|
||||||
continue;
|
|
||||||
} else if (firstAdd && cause.contains("GraphDB id exists")) {
|
private void setGraphDbElement(final GdbElement element, final List<GdbElement.GdbProperty> properties, final boolean isVertex,
|
||||||
throw new GdbDuplicateIdException(e);
|
final boolean firstAdd) throws GdbDuplicateIdException {
|
||||||
}
|
int retry = 10;
|
||||||
log.error("Add Failed id {}, dsl {}, params {}, e {}", element.getId(),
|
int idleTime = this.random.nextInt(10) + 10;
|
||||||
elementDsl.getFirst(), elementDsl.getSecond(), e);
|
final Tuple2<String, Map<String, Object>> elementDsl = buildDsl(element, properties, isVertex, firstAdd);
|
||||||
throw new RuntimeException(e);
|
|
||||||
}
|
while (retry > 0) {
|
||||||
}
|
try {
|
||||||
log.error("Add Failed id {}, dsl {}, params {}", element.getId(),
|
runInternal(elementDsl.getFirst(), elementDsl.getSecond());
|
||||||
elementDsl.getFirst(), elementDsl.getSecond());
|
log.debug("AddElement {}", element.getId());
|
||||||
throw new RuntimeException("failed to queue new element to server");
|
return;
|
||||||
}
|
} catch (final Exception e) {
|
||||||
|
final String cause = e.getCause() == null ? "" : e.getCause().toString();
|
||||||
|
if (cause.contains("rejected from") || cause.contains("Timeout waiting to lock key")) {
|
||||||
|
retry--;
|
||||||
|
try {
|
||||||
|
Thread.sleep(idleTime);
|
||||||
|
} catch (final InterruptedException e1) {
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
idleTime = Math.min(idleTime * 2, 2000);
|
||||||
|
continue;
|
||||||
|
} else if (firstAdd && cause.contains("GraphDB id exists")) {
|
||||||
|
throw new GdbDuplicateIdException(e);
|
||||||
|
}
|
||||||
|
log.error("Add Failed id {}, dsl {}, params {}, e {}", element.getId(), elementDsl.getFirst(),
|
||||||
|
elementDsl.getSecond(), e);
|
||||||
|
throw new RuntimeException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
log.error("Add Failed id {}, dsl {}, params {}", element.getId(), elementDsl.getFirst(),
|
||||||
|
elementDsl.getSecond());
|
||||||
|
throw new RuntimeException("failed to queue new element to server");
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -7,53 +7,57 @@ import java.io.IOException;
|
|||||||
import java.io.InputStream;
|
import java.io.InputStream;
|
||||||
import java.util.function.Supplier;
|
import java.util.function.Supplier;
|
||||||
|
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
|
||||||
import com.alibaba.datax.common.exception.DataXException;
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode;
|
import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode;
|
||||||
import com.alibaba.fastjson.JSON;
|
import com.alibaba.fastjson.JSON;
|
||||||
import com.alibaba.fastjson.JSONObject;
|
import com.alibaba.fastjson.JSONObject;
|
||||||
|
|
||||||
import org.apache.commons.lang3.StringUtils;
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @author jerrywang
|
* @author jerrywang
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
public interface ConfigHelper {
|
public interface ConfigHelper {
|
||||||
static void assertConfig(String key, Supplier<Boolean> f) {
|
static void assertConfig(final String key, final Supplier<Boolean> f) {
|
||||||
if (!f.get()) {
|
if (!f.get()) {
|
||||||
throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, key);
|
throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, key);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void assertHasContent(Configuration config, String key) {
|
static void assertHasContent(final Configuration config, final String key) {
|
||||||
assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key)));
|
assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key)));
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* NOTE: {@code Configuration::get(String, Class<T>)} doesn't work.
|
* NOTE: {@code Configuration::get(String, Class<T>)} doesn't work.
|
||||||
*
|
*
|
||||||
* @param conf Configuration
|
* @param conf
|
||||||
* @param key key path to configuration
|
* Configuration
|
||||||
* @param cls Class of result type
|
* @param key
|
||||||
* @return the target configuration object of type T
|
* key path to configuration
|
||||||
*/
|
* @param cls
|
||||||
static <T> T getConfig(Configuration conf, String key, Class<T> cls) {
|
* Class of result type
|
||||||
JSONObject j = (JSONObject) conf.get(key);
|
* @return the target configuration object of type T
|
||||||
return JSON.toJavaObject(j, cls);
|
*/
|
||||||
}
|
static <T> T getConfig(final Configuration conf, final String key, final Class<T> cls) {
|
||||||
|
final JSONObject j = (JSONObject)conf.get(key);
|
||||||
/**
|
return JSON.toJavaObject(j, cls);
|
||||||
* Create a configuration from the specified file on the classpath.
|
}
|
||||||
*
|
|
||||||
* @param name file name
|
/**
|
||||||
* @return Configuration instance.
|
* Create a configuration from the specified file on the classpath.
|
||||||
*/
|
*
|
||||||
static Configuration fromClasspath(String name) {
|
* @param name
|
||||||
try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) {
|
* file name
|
||||||
return Configuration.from(is);
|
* @return Configuration instance.
|
||||||
} catch (IOException e) {
|
*/
|
||||||
throw new IllegalArgumentException("File not found: " + name);
|
static Configuration fromClasspath(final String name) {
|
||||||
}
|
try (final InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) {
|
||||||
}
|
return Configuration.from(is);
|
||||||
|
} catch (final IOException e) {
|
||||||
|
throw new IllegalArgumentException("File not found: " + name);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -1,9 +1,8 @@
|
|||||||
/*
|
/*
|
||||||
* (C) 2019-present Alibaba Group Holding Limited.
|
* (C) 2019-present Alibaba Group Holding Limited.
|
||||||
*
|
*
|
||||||
* This program is free software; you can redistribute it and/or modify
|
* This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public
|
||||||
* it under the terms of the GNU General Public License version 2 as
|
* License version 2 as published by the Free Software Foundation.
|
||||||
* published by the Free Software Foundation.
|
|
||||||
*/
|
*/
|
||||||
package com.alibaba.datax.plugin.writer.gdbwriter.util;
|
package com.alibaba.datax.plugin.writer.gdbwriter.util;
|
||||||
|
|
||||||
@ -13,11 +12,11 @@ package com.alibaba.datax.plugin.writer.gdbwriter.util;
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
public class GdbDuplicateIdException extends Exception {
|
public class GdbDuplicateIdException extends Exception {
|
||||||
public GdbDuplicateIdException(Exception e) {
|
public GdbDuplicateIdException(Exception e) {
|
||||||
super(e);
|
super(e);
|
||||||
}
|
}
|
||||||
|
|
||||||
public GdbDuplicateIdException() {
|
public GdbDuplicateIdException() {
|
||||||
super();
|
super();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -34,6 +34,14 @@ import java.util.Map;
|
|||||||
public class HbaseSQLHelper {
|
public class HbaseSQLHelper {
|
||||||
private static final Logger LOG = LoggerFactory.getLogger(HbaseSQLHelper.class);
|
private static final Logger LOG = LoggerFactory.getLogger(HbaseSQLHelper.class);
|
||||||
|
|
||||||
|
static {
|
||||||
|
try {
|
||||||
|
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
|
||||||
|
} catch (Throwable t) {
|
||||||
|
throw new RuntimeException("faild load org.apache.phoenix.jdbc.PhoenixDriver", t);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
public static org.apache.hadoop.conf.Configuration generatePhoenixConf(HbaseSQLReaderConfig readerConfig) {
|
public static org.apache.hadoop.conf.Configuration generatePhoenixConf(HbaseSQLReaderConfig readerConfig) {
|
||||||
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
|
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
|
||||||
|
|
||||||
|
@ -203,19 +203,20 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba
|
|||||||
* 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下:
|
* 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下:
|
||||||
|
|
||||||
```
|
```
|
||||||
"column": [
|
|
||||||
{
|
"column": [
|
||||||
"index":1,
|
{
|
||||||
"name": "cf1:q1",
|
"index":1,
|
||||||
"type": "string"
|
"name": "cf1:q1",
|
||||||
},
|
"type": "string"
|
||||||
{
|
},
|
||||||
"index":2,
|
{
|
||||||
"name": "cf1:q2",
|
"index":2,
|
||||||
"type": "string"
|
"name": "cf1:q2",
|
||||||
}
|
"type": "string"
|
||||||
]
|
}
|
||||||
|
]
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
* 必选:是<br />
|
* 必选:是<br />
|
||||||
@ -227,17 +228,17 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba
|
|||||||
* 描述:要写入的hbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。hbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下:
|
* 描述:要写入的hbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。hbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下:
|
||||||
|
|
||||||
```
|
```
|
||||||
"rowkeyColumn": [
|
"rowkeyColumn": [
|
||||||
{
|
{
|
||||||
"index":0,
|
"index":0,
|
||||||
"type":"string"
|
"type":"string"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"index":-1,
|
"index":-1,
|
||||||
"type":"string",
|
"type":"string",
|
||||||
"value":"_"
|
"value":"_"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -250,19 +251,19 @@ HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,Hba
|
|||||||
* 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下:
|
* 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下:
|
||||||
|
|
||||||
```
|
```
|
||||||
"versionColumn":{
|
"versionColumn":{
|
||||||
"index":1
|
"index":1
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
或者
|
或者
|
||||||
|
|
||||||
```
|
```
|
||||||
"versionColumn":{
|
"versionColumn":{
|
||||||
"index":-1,
|
"index":-1,
|
||||||
"value":123456789
|
"value":123456789
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -58,7 +58,9 @@ hbase20xsqlreader插件实现了从Phoenix(HBase SQL)读取数据,对应版本
|
|||||||
* **queryServerAddress**
|
* **queryServerAddress**
|
||||||
|
|
||||||
* 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。
|
* 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。
|
||||||
|
增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性.
|
||||||
|
格式参考:http://127.0.0.1:8765;user=root;password=root
|
||||||
|
|
||||||
* 必选:是 <br />
|
* 必选:是 <br />
|
||||||
|
|
||||||
* 默认值:无 <br />
|
* 默认值:无 <br />
|
||||||
|
@ -14,7 +14,7 @@
|
|||||||
<packaging>jar</packaging>
|
<packaging>jar</packaging>
|
||||||
|
|
||||||
<properties>
|
<properties>
|
||||||
<phoenix.version>5.1.0-HBase-2.0.0.2</phoenix.version>
|
<phoenix.version>5.2.5-HBase-2.x</phoenix.version>
|
||||||
</properties>
|
</properties>
|
||||||
|
|
||||||
<dependencies>
|
<dependencies>
|
||||||
|
@ -120,7 +120,9 @@ HBase20xsqlwriter实现了向hbase中的SQL表(phoenix)批量导入数据的功
|
|||||||
|
|
||||||
* **queryServerAddress**
|
* **queryServerAddress**
|
||||||
|
|
||||||
* 描述:Phoenix QueryServer地址,为必填项,格式:http://${hostName}:${ip},如http://172.16.34.58:8765
|
* 描述:Phoenix QueryServer地址,为必填项,格式:http://${hostName}:${ip},如http://172.16.34.58:8765。
|
||||||
|
增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性.
|
||||||
|
格式参考:http://127.0.0.1:8765;user=root;password=root
|
||||||
* 必选:是
|
* 必选:是
|
||||||
* 默认值:无
|
* 默认值:无
|
||||||
|
|
||||||
|
@ -14,7 +14,7 @@
|
|||||||
<packaging>jar</packaging>
|
<packaging>jar</packaging>
|
||||||
|
|
||||||
<properties>
|
<properties>
|
||||||
<phoenix.version>5.1.0-HBase-2.0.0.2</phoenix.version>
|
<phoenix.version>5.2.5-HBase-2.x</phoenix.version>
|
||||||
<commons-codec.version>1.8</commons-codec.version>
|
<commons-codec.version>1.8</commons-codec.version>
|
||||||
</properties>
|
</properties>
|
||||||
|
|
||||||
|
@ -6,12 +6,12 @@ import com.alibaba.datax.common.exception.DataXException;
|
|||||||
import com.alibaba.datax.common.plugin.RecordReceiver;
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
import com.alibaba.datax.common.plugin.TaskPluginCollector;
|
import com.alibaba.datax.common.plugin.TaskPluginCollector;
|
||||||
import com.alibaba.datax.common.util.Configuration;
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
import com.google.common.collect.Lists;
|
|
||||||
import org.slf4j.Logger;
|
import org.slf4j.Logger;
|
||||||
import org.slf4j.LoggerFactory;
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
import java.math.BigDecimal;
|
import java.math.BigDecimal;
|
||||||
import java.sql.*;
|
import java.sql.*;
|
||||||
|
import java.util.ArrayList;
|
||||||
import java.util.Arrays;
|
import java.util.Arrays;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
@ -154,7 +154,7 @@ public class HBase20xSQLWriterTask {
|
|||||||
* 从接收器中获取每条记录,写入Phoenix
|
* 从接收器中获取每条记录,写入Phoenix
|
||||||
*/
|
*/
|
||||||
private void writeData(RecordReceiver lineReceiver) throws SQLException {
|
private void writeData(RecordReceiver lineReceiver) throws SQLException {
|
||||||
List<Record> buffer = Lists.newArrayListWithExpectedSize(batchSize);
|
List<Record> buffer = new ArrayList<Record>(batchSize);
|
||||||
Record record = null;
|
Record record = null;
|
||||||
while ((record = lineReceiver.getFromReader()) != null) {
|
while ((record = lineReceiver.getFromReader()) != null) {
|
||||||
// 校验列数量是否符合预期
|
// 校验列数量是否符合预期
|
||||||
|
@ -81,10 +81,10 @@ public class HdfsWriter extends Writer {
|
|||||||
//writeMode check
|
//writeMode check
|
||||||
this.writeMode = this.writerSliceConfig.getNecessaryValue(Key.WRITE_MODE, HdfsWriterErrorCode.REQUIRED_VALUE);
|
this.writeMode = this.writerSliceConfig.getNecessaryValue(Key.WRITE_MODE, HdfsWriterErrorCode.REQUIRED_VALUE);
|
||||||
writeMode = writeMode.toLowerCase().trim();
|
writeMode = writeMode.toLowerCase().trim();
|
||||||
Set<String> supportedWriteModes = Sets.newHashSet("append", "nonconflict");
|
Set<String> supportedWriteModes = Sets.newHashSet("append", "nonconflict", "truncate");
|
||||||
if (!supportedWriteModes.contains(writeMode)) {
|
if (!supportedWriteModes.contains(writeMode)) {
|
||||||
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
||||||
String.format("仅支持append, nonConflict两种模式, 不支持您配置的 writeMode 模式 : [%s]",
|
String.format("仅支持append, nonConflict, truncate三种模式, 不支持您配置的 writeMode 模式 : [%s]",
|
||||||
writeMode));
|
writeMode));
|
||||||
}
|
}
|
||||||
this.writerSliceConfig.set(Key.WRITE_MODE, writeMode);
|
this.writerSliceConfig.set(Key.WRITE_MODE, writeMode);
|
||||||
@ -179,6 +179,9 @@ public class HdfsWriter extends Writer {
|
|||||||
LOG.error(String.format("冲突文件列表为: [%s]", StringUtils.join(allFiles, ",")));
|
LOG.error(String.format("冲突文件列表为: [%s]", StringUtils.join(allFiles, ",")));
|
||||||
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
||||||
String.format("由于您配置了writeMode nonConflict,但您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path));
|
String.format("由于您配置了writeMode nonConflict,但您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path));
|
||||||
|
}else if ("truncate".equalsIgnoreCase(writeMode) && isExistFile) {
|
||||||
|
LOG.info(String.format("由于您配置了writeMode truncate, [%s] 下面的内容将被覆盖重写", path));
|
||||||
|
hdfsHelper.deleteFiles(existFilePaths);
|
||||||
}
|
}
|
||||||
}else{
|
}else{
|
||||||
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE,
|
||||||
|
BIN
images/DataX开源用户交流群.jpg
Normal file
BIN
images/DataX开源用户交流群.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 193 KiB |
BIN
images/DataX开源用户交流群2.jpg
Normal file
BIN
images/DataX开源用户交流群2.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 195 KiB |
BIN
images/DataX开源用户交流群3.jpg
Normal file
BIN
images/DataX开源用户交流群3.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 189 KiB |
BIN
images/DataX开源用户交流群4.jpg
Normal file
BIN
images/DataX开源用户交流群4.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 191 KiB |
BIN
images/DataX开源用户交流群5.jpg
Normal file
BIN
images/DataX开源用户交流群5.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 189 KiB |
@ -36,6 +36,7 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建
|
|||||||
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
||||||
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
||||||
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
||||||
|
| | OceanBase | √ | √ |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)|
|
||||||
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
||||||
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
||||||
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
||||||
|
241
kingbaseesreader/doc/kingbaseesreader.md
Normal file
241
kingbaseesreader/doc/kingbaseesreader.md
Normal file
@ -0,0 +1,241 @@
|
|||||||
|
|
||||||
|
# KingbaseesReader 插件文档
|
||||||
|
|
||||||
|
|
||||||
|
___
|
||||||
|
|
||||||
|
|
||||||
|
## 1 快速介绍
|
||||||
|
|
||||||
|
KingbaseesReader插件实现了从KingbaseES读取数据。在底层实现上,KingbaseesReader通过JDBC连接远程KingbaseES数据库,并执行相应的sql语句将数据从KingbaseES库中SELECT出来。
|
||||||
|
|
||||||
|
## 2 实现原理
|
||||||
|
|
||||||
|
简而言之,KingbaseesReader通过JDBC连接器连接到远程的KingbaseES数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程KingbaseES数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。
|
||||||
|
|
||||||
|
对于用户配置Table、Column、Where的信息,KingbaseesReader将其拼接为SQL语句发送到KingbaseES数据库;对于用户配置querySql信息,KingbaseesReader直接将其发送到KingbaseES数据库。
|
||||||
|
|
||||||
|
|
||||||
|
## 3 功能说明
|
||||||
|
|
||||||
|
### 3.1 配置样例
|
||||||
|
|
||||||
|
* 配置一个从KingbaseES数据库同步抽取数据到本地的作业:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": {
|
||||||
|
//设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它.
|
||||||
|
"byte": 1048576
|
||||||
|
},
|
||||||
|
//出错限制
|
||||||
|
"errorLimit": {
|
||||||
|
//出错的record条数上限,当大于该值即报错。
|
||||||
|
"record": 0,
|
||||||
|
//出错的record百分比上限 1.0表示100%,0.02表示2%
|
||||||
|
"percentage": 0.02
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "kingbaseesreader",
|
||||||
|
"parameter": {
|
||||||
|
// 数据库连接用户名
|
||||||
|
"username": "xx",
|
||||||
|
// 数据库连接密码
|
||||||
|
"password": "xx",
|
||||||
|
"column": [
|
||||||
|
"id","name"
|
||||||
|
],
|
||||||
|
//切分主键
|
||||||
|
"splitPk": "id",
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"table": [
|
||||||
|
"table"
|
||||||
|
],
|
||||||
|
"jdbcUrl": [
|
||||||
|
"jdbc:kingbase8://host:port/database"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"writer": {
|
||||||
|
//writer类型
|
||||||
|
"name": "streamwriter",
|
||||||
|
//是否打印内容
|
||||||
|
"parameter": {
|
||||||
|
"print":true,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
* 配置一个自定义SQL的数据库同步任务到本地内容的作业:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": 1048576
|
||||||
|
},
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "kingbaseesreader",
|
||||||
|
"parameter": {
|
||||||
|
"username": "xx",
|
||||||
|
"password": "xx",
|
||||||
|
"where": "",
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"querySql": [
|
||||||
|
"select db_id,on_line_flag from db_info where db_id < 10;"
|
||||||
|
],
|
||||||
|
"jdbcUrl": [
|
||||||
|
"jdbc:kingbase8://host:port/database", "jdbc:kingbase8://host:port/database"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"writer": {
|
||||||
|
"name": "streamwriter",
|
||||||
|
"parameter": {
|
||||||
|
"print": false,
|
||||||
|
"encoding": "UTF-8"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### 3.2 参数说明
|
||||||
|
|
||||||
|
* **jdbcUrl**
|
||||||
|
|
||||||
|
* 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,KingbaseesReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,KingbaseesReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。
|
||||||
|
|
||||||
|
jdbcUrl按照KingbaseES官方规范,并可以填写连接附件控制信息。具体请参看[KingbaseES官方文档](https://help.kingbase.com.cn/doc-view-5683.html)。
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **username**
|
||||||
|
|
||||||
|
* 描述:数据源的用户名 <br />
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **password**
|
||||||
|
|
||||||
|
* 描述:数据源指定用户名的密码 <br />
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **table**
|
||||||
|
|
||||||
|
* 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,KingbaseesReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。<br />
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **column**
|
||||||
|
|
||||||
|
* 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。
|
||||||
|
|
||||||
|
支持列裁剪,即列可以挑选部分列进行导出。
|
||||||
|
|
||||||
|
支持列换序,即列可以不按照表schema信息进行导出。
|
||||||
|
|
||||||
|
支持常量配置,用户需要按照KingbaseES语法格式:
|
||||||
|
["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"]
|
||||||
|
id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。
|
||||||
|
|
||||||
|
**column必须用户显示指定同步的列集合,不允许为空!**
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **splitPk**
|
||||||
|
|
||||||
|
* 描述:KingbaseesReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。
|
||||||
|
|
||||||
|
推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。
|
||||||
|
|
||||||
|
目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,KingbaseesReader将报错!
|
||||||
|
|
||||||
|
splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:空 <br />
|
||||||
|
|
||||||
|
* **where**
|
||||||
|
|
||||||
|
* 描述:筛选条件,KingbaseesReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。<br />
|
||||||
|
|
||||||
|
where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **querySql**
|
||||||
|
|
||||||
|
* 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id <br />
|
||||||
|
|
||||||
|
`当用户配置querySql时,KingbaseesReader直接忽略table、column、where条件的配置`。
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **fetchSize**
|
||||||
|
|
||||||
|
* 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。<br />
|
||||||
|
|
||||||
|
`注意,该值过大(>2048)可能造成DataX进程OOM。`。
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:1024 <br />
|
||||||
|
|
||||||
|
|
||||||
|
### 3.3 类型转换
|
||||||
|
|
||||||
|
目前KingbaseesReader支持大部分KingbaseES类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。
|
||||||
|
|
||||||
|
下面列出KingbaseesReader针对KingbaseES类型转换列表:
|
||||||
|
|
||||||
|
|
||||||
|
| DataX 内部类型| KingbaseES 数据类型 |
|
||||||
|
| -------- | ----- |
|
||||||
|
| Long |bigint, bigserial, integer, smallint, serial |
|
||||||
|
| Double |double precision, money, numeric, real |
|
||||||
|
| String |varchar, char, text, bit, inet|
|
||||||
|
| Date |date, time, timestamp |
|
||||||
|
| Boolean |bool|
|
||||||
|
| Bytes |bytea|
|
||||||
|
|
||||||
|
请注意:
|
||||||
|
|
||||||
|
* `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。
|
88
kingbaseesreader/pom.xml
Normal file
88
kingbaseesreader/pom.xml
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
<parent>
|
||||||
|
<artifactId>datax-all</artifactId>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
|
||||||
|
<artifactId>kingbaseesreader</artifactId>
|
||||||
|
<name>kingbaseesreader</name>
|
||||||
|
<packaging>jar</packaging>
|
||||||
|
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-common</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<artifactId>slf4j-log4j12</artifactId>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
<artifactId>slf4j-api</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>ch.qos.logback</groupId>
|
||||||
|
<artifactId>logback-classic</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>plugin-rdbms-util</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.kingbase8</groupId>
|
||||||
|
<artifactId>kingbase8</artifactId>
|
||||||
|
<version>8.2.0</version>
|
||||||
|
<scope>system</scope>
|
||||||
|
<systemPath>${basedir}/src/main/libs/kingbase8-8.2.0.jar</systemPath>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
</dependencies>
|
||||||
|
|
||||||
|
<build>
|
||||||
|
<plugins>
|
||||||
|
<!-- compiler plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>${jdk-version}</source>
|
||||||
|
<target>${jdk-version}</target>
|
||||||
|
<encoding>${project-sourceEncoding}</encoding>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- assembly plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-assembly-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<descriptors>
|
||||||
|
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||||
|
</descriptors>
|
||||||
|
<finalName>datax</finalName>
|
||||||
|
</configuration>
|
||||||
|
<executions>
|
||||||
|
<execution>
|
||||||
|
<id>dwzip</id>
|
||||||
|
<phase>package</phase>
|
||||||
|
<goals>
|
||||||
|
<goal>single</goal>
|
||||||
|
</goals>
|
||||||
|
</execution>
|
||||||
|
</executions>
|
||||||
|
</plugin>
|
||||||
|
</plugins>
|
||||||
|
</build>
|
||||||
|
|
||||||
|
</project>
|
42
kingbaseesreader/src/main/assembly/package.xml
Normal file
42
kingbaseesreader/src/main/assembly/package.xml
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
<assembly
|
||||||
|
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||||
|
<id></id>
|
||||||
|
<formats>
|
||||||
|
<format>dir</format>
|
||||||
|
</formats>
|
||||||
|
<includeBaseDirectory>false</includeBaseDirectory>
|
||||||
|
<fileSets>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/resources</directory>
|
||||||
|
<includes>
|
||||||
|
<include>plugin.json</include>
|
||||||
|
<include>plugin_job_template.json</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/reader/kingbaseesreader</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>target/</directory>
|
||||||
|
<includes>
|
||||||
|
<include>kingbaseesreader-0.0.1-SNAPSHOT.jar</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/reader/kingbaseesreader</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/libs</directory>
|
||||||
|
<includes>
|
||||||
|
<include>*.*</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/reader/kingbaseesreader/libs</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
</fileSets>
|
||||||
|
|
||||||
|
<dependencySets>
|
||||||
|
<dependencySet>
|
||||||
|
<useProjectArtifact>false</useProjectArtifact>
|
||||||
|
<outputDirectory>plugin/reader/kingbaseesreader/libs</outputDirectory>
|
||||||
|
<scope>runtime</scope>
|
||||||
|
</dependencySet>
|
||||||
|
</dependencySets>
|
||||||
|
</assembly>
|
@ -0,0 +1,7 @@
|
|||||||
|
package com.alibaba.datax.plugin.reader.kingbaseesreader;
|
||||||
|
|
||||||
|
public class Constant {
|
||||||
|
|
||||||
|
public static final int DEFAULT_FETCH_SIZE = 1000;
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,86 @@
|
|||||||
|
package com.alibaba.datax.plugin.reader.kingbaseesreader;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordSender;
|
||||||
|
import com.alibaba.datax.common.spi.Reader;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
public class KingbaseesReader extends Reader {
|
||||||
|
|
||||||
|
private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES;
|
||||||
|
|
||||||
|
public static class Job extends Reader.Job {
|
||||||
|
|
||||||
|
private Configuration originalConfig;
|
||||||
|
private CommonRdbmsReader.Job commonRdbmsReaderMaster;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.originalConfig = super.getPluginJobConf();
|
||||||
|
int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE,
|
||||||
|
Constant.DEFAULT_FETCH_SIZE);
|
||||||
|
if (fetchSize < 1) {
|
||||||
|
throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE,
|
||||||
|
String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize));
|
||||||
|
}
|
||||||
|
this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize);
|
||||||
|
|
||||||
|
this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE);
|
||||||
|
this.commonRdbmsReaderMaster.init(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<Configuration> split(int adviceNumber) {
|
||||||
|
return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsReaderMaster.post(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsReaderMaster.destroy(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class Task extends Reader.Task {
|
||||||
|
|
||||||
|
private Configuration readerSliceConfig;
|
||||||
|
private CommonRdbmsReader.Task commonRdbmsReaderSlave;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.readerSliceConfig = super.getPluginJobConf();
|
||||||
|
this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId());
|
||||||
|
this.commonRdbmsReaderSlave.init(this.readerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void startRead(RecordSender recordSender) {
|
||||||
|
int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE);
|
||||||
|
|
||||||
|
this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender,
|
||||||
|
super.getTaskPluginCollector(), fetchSize);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsReaderSlave.post(this.readerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
BIN
kingbaseesreader/src/main/libs/kingbase8-8.2.0.jar
Normal file
BIN
kingbaseesreader/src/main/libs/kingbase8-8.2.0.jar
Normal file
Binary file not shown.
6
kingbaseesreader/src/main/resources/plugin.json
Normal file
6
kingbaseesreader/src/main/resources/plugin.json
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"name": "kingbaseesreader",
|
||||||
|
"class": "com.alibaba.datax.plugin.reader.kingbaseesreader.KingbaseesReader",
|
||||||
|
"description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.",
|
||||||
|
"developer": "alibaba"
|
||||||
|
}
|
13
kingbaseesreader/src/main/resources/plugin_job_template.json
Normal file
13
kingbaseesreader/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
{
|
||||||
|
"name": "kingbaseesreader",
|
||||||
|
"parameter": {
|
||||||
|
"username": "",
|
||||||
|
"password": "",
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"table": [],
|
||||||
|
"jdbcUrl": []
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
208
kingbaseeswriter/doc/kingbaseeswriter.md
Normal file
208
kingbaseeswriter/doc/kingbaseeswriter.md
Normal file
@ -0,0 +1,208 @@
|
|||||||
|
# DataX KingbaseesWriter
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
## 1 快速介绍
|
||||||
|
|
||||||
|
KingbaseesWriter插件实现了写入数据到 KingbaseES主库目的表的功能。在底层实现上,KingbaseesWriter通过JDBC连接远程 KingbaseES 数据库,并执行相应的 insert into ... sql 语句将数据写入 KingbaseES,内部会分批次提交入库。
|
||||||
|
|
||||||
|
KingbaseesWriter面向ETL开发工程师,他们使用KingbaseesWriter从数仓导入数据到KingbaseES。同时 KingbaseesWriter亦可以作为数据迁移工具为DBA等用户提供服务。
|
||||||
|
|
||||||
|
|
||||||
|
## 2 实现原理
|
||||||
|
|
||||||
|
KingbaseesWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句
|
||||||
|
|
||||||
|
|
||||||
|
* `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
|
||||||
|
|
||||||
|
<br />
|
||||||
|
|
||||||
|
注意:
|
||||||
|
1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。
|
||||||
|
2. KingbaseesWriter和MysqlWriter不同,不支持配置writeMode参数。
|
||||||
|
|
||||||
|
|
||||||
|
## 3 功能说明
|
||||||
|
|
||||||
|
### 3.1 配置样例
|
||||||
|
|
||||||
|
* 这里使用一份从内存产生到 KingbaseesWriter导入的数据。
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"job": {
|
||||||
|
"setting": {
|
||||||
|
"speed": {
|
||||||
|
"channel": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"reader": {
|
||||||
|
"name": "streamreader",
|
||||||
|
"parameter": {
|
||||||
|
"column" : [
|
||||||
|
{
|
||||||
|
"value": "DataX",
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": 19880808,
|
||||||
|
"type": "long"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "1988-08-08 08:08:08",
|
||||||
|
"type": "date"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": true,
|
||||||
|
"type": "bool"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"value": "test",
|
||||||
|
"type": "bytes"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"sliceRecordCount": 1000
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"writer": {
|
||||||
|
"name": "kingbaseeswriter",
|
||||||
|
"parameter": {
|
||||||
|
"username": "xx",
|
||||||
|
"password": "xx",
|
||||||
|
"column": [
|
||||||
|
"id",
|
||||||
|
"name"
|
||||||
|
],
|
||||||
|
"preSql": [
|
||||||
|
"delete from test"
|
||||||
|
],
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"jdbcUrl": "jdbc:kingbase8://127.0.0.1:3002/datax",
|
||||||
|
"table": [
|
||||||
|
"test"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### 3.2 参数说明
|
||||||
|
|
||||||
|
* **jdbcUrl**
|
||||||
|
|
||||||
|
* 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。
|
||||||
|
|
||||||
|
注意:1、在一个数据库上只能配置一个值。
|
||||||
|
2、jdbcUrl按照KingbaseES官方规范,并可以填写连接附加参数信息。具体请参看KingbaseES官方文档或者咨询对应 DBA。
|
||||||
|
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **username**
|
||||||
|
|
||||||
|
* 描述:目的数据库的用户名 <br />
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **password**
|
||||||
|
|
||||||
|
* 描述:目的数据库的密码 <br />
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **table**
|
||||||
|
|
||||||
|
* 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。
|
||||||
|
|
||||||
|
注意:table 和 jdbcUrl 必须包含在 connection 配置单元中
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **column**
|
||||||
|
|
||||||
|
* 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"]
|
||||||
|
|
||||||
|
注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败
|
||||||
|
2、此处 column 不能配置任何常量值
|
||||||
|
|
||||||
|
* 必选:是 <br />
|
||||||
|
|
||||||
|
* 默认值:否 <br />
|
||||||
|
|
||||||
|
* **preSql**
|
||||||
|
|
||||||
|
* 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称 <br />
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **postSql**
|
||||||
|
|
||||||
|
* 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql ) <br />
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:无 <br />
|
||||||
|
|
||||||
|
* **batchSize**
|
||||||
|
|
||||||
|
* 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与KingbaseES的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。<br />
|
||||||
|
|
||||||
|
* 必选:否 <br />
|
||||||
|
|
||||||
|
* 默认值:1024 <br />
|
||||||
|
|
||||||
|
### 3.3 类型转换
|
||||||
|
|
||||||
|
目前 KingbaseesWriter支持大部分 KingbaseES类型,但也存在部分没有支持的情况,请注意检查你的类型。
|
||||||
|
|
||||||
|
下面列出 KingbaseesWriter针对 KingbaseES类型转换列表:
|
||||||
|
|
||||||
|
| DataX 内部类型| KingbaseES 数据类型 |
|
||||||
|
| -------- | ----- |
|
||||||
|
| Long |bigint, bigserial, integer, smallint, serial |
|
||||||
|
| Double |double precision, money, numeric, real |
|
||||||
|
| String |varchar, char, text, bit|
|
||||||
|
| Date |date, time, timestamp |
|
||||||
|
| Boolean |bool|
|
||||||
|
| Bytes |bytea|
|
||||||
|
|
||||||
|
|
||||||
|
## FAQ
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
**Q: KingbaseesWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?**
|
||||||
|
|
||||||
|
A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
**Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?**
|
||||||
|
|
||||||
|
A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。
|
||||||
|
第二种,向临时表导入数据,完成后再 rename 到线上表。
|
||||||
|
|
||||||
|
***
|
84
kingbaseeswriter/pom.xml
Normal file
84
kingbaseeswriter/pom.xml
Normal file
@ -0,0 +1,84 @@
|
|||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
<parent>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-all</artifactId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
<artifactId>kingbaseeswriter</artifactId>
|
||||||
|
<name>kingbaseeswriter</name>
|
||||||
|
<packaging>jar</packaging>
|
||||||
|
<description>writer data into kingbasees database</description>
|
||||||
|
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-common</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<artifactId>slf4j-log4j12</artifactId>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
<artifactId>slf4j-api</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>ch.qos.logback</groupId>
|
||||||
|
<artifactId>logback-classic</artifactId>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>plugin-rdbms-util</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.kingbase8</groupId>
|
||||||
|
<artifactId>kingbase8</artifactId>
|
||||||
|
<version>8.2.0</version>
|
||||||
|
<scope>system</scope>
|
||||||
|
<systemPath>${basedir}/src/main/libs/kingbase8-8.2.0.jar</systemPath>
|
||||||
|
</dependency>
|
||||||
|
|
||||||
|
</dependencies>
|
||||||
|
<build>
|
||||||
|
<plugins>
|
||||||
|
<!-- compiler plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>${jdk-version}</source>
|
||||||
|
<target>${jdk-version}</target>
|
||||||
|
<encoding>${project-sourceEncoding}</encoding>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- assembly plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-assembly-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<descriptors>
|
||||||
|
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||||
|
</descriptors>
|
||||||
|
<finalName>datax</finalName>
|
||||||
|
</configuration>
|
||||||
|
<executions>
|
||||||
|
<execution>
|
||||||
|
<id>dwzip</id>
|
||||||
|
<phase>package</phase>
|
||||||
|
<goals>
|
||||||
|
<goal>single</goal>
|
||||||
|
</goals>
|
||||||
|
</execution>
|
||||||
|
</executions>
|
||||||
|
</plugin>
|
||||||
|
</plugins>
|
||||||
|
</build>
|
||||||
|
</project>
|
42
kingbaseeswriter/src/main/assembly/package.xml
Normal file
42
kingbaseeswriter/src/main/assembly/package.xml
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
<assembly
|
||||||
|
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||||
|
<id></id>
|
||||||
|
<formats>
|
||||||
|
<format>dir</format>
|
||||||
|
</formats>
|
||||||
|
<includeBaseDirectory>false</includeBaseDirectory>
|
||||||
|
<fileSets>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/resources</directory>
|
||||||
|
<includes>
|
||||||
|
<include>plugin.json</include>
|
||||||
|
<include>plugin_job_template.json</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/kingbaseeswriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>target/</directory>
|
||||||
|
<includes>
|
||||||
|
<include>kingbaseeswriter-0.0.1-SNAPSHOT.jar</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/kingbaseeswriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/libs</directory>
|
||||||
|
<includes>
|
||||||
|
<include>*.*</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/kingbaseeswriter/libs</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
</fileSets>
|
||||||
|
|
||||||
|
<dependencySets>
|
||||||
|
<dependencySet>
|
||||||
|
<useProjectArtifact>false</useProjectArtifact>
|
||||||
|
<outputDirectory>plugin/writer/kingbaseeswriter/libs</outputDirectory>
|
||||||
|
<scope>runtime</scope>
|
||||||
|
</dependencySet>
|
||||||
|
</dependencySets>
|
||||||
|
</assembly>
|
@ -0,0 +1,100 @@
|
|||||||
|
package com.alibaba.datax.plugin.writer.kingbaseeswriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
|
import com.alibaba.datax.common.spi.Writer;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.util.DataBaseType;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter;
|
||||||
|
import com.alibaba.datax.plugin.rdbms.writer.Key;
|
||||||
|
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
public class KingbaseesWriter extends Writer {
|
||||||
|
private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES;
|
||||||
|
|
||||||
|
public static class Job extends Writer.Job {
|
||||||
|
private Configuration originalConfig = null;
|
||||||
|
private CommonRdbmsWriter.Job commonRdbmsWriterMaster;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.originalConfig = super.getPluginJobConf();
|
||||||
|
|
||||||
|
// warn:not like mysql, KingbaseES only support insert mode, don't use
|
||||||
|
String writeMode = this.originalConfig.getString(Key.WRITE_MODE);
|
||||||
|
if (null != writeMode) {
|
||||||
|
throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR,
|
||||||
|
String.format("写入模式(writeMode)配置有误. 因为KingbaseES不支持配置参数项 writeMode: %s, KingbaseES仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode));
|
||||||
|
}
|
||||||
|
|
||||||
|
this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE);
|
||||||
|
this.commonRdbmsWriterMaster.init(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
this.commonRdbmsWriterMaster.prepare(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<Configuration> split(int mandatoryNumber) {
|
||||||
|
return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsWriterMaster.post(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsWriterMaster.destroy(this.originalConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class Task extends Writer.Task {
|
||||||
|
private Configuration writerSliceConfig;
|
||||||
|
private CommonRdbmsWriter.Task commonRdbmsWriterSlave;
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.writerSliceConfig = super.getPluginJobConf();
|
||||||
|
this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){
|
||||||
|
@Override
|
||||||
|
public String calcValueHolder(String columnType){
|
||||||
|
if("serial".equalsIgnoreCase(columnType)){
|
||||||
|
return "?::int";
|
||||||
|
}else if("bit".equalsIgnoreCase(columnType)){
|
||||||
|
return "?::bit varying";
|
||||||
|
}
|
||||||
|
return "?::" + columnType;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
this.commonRdbmsWriterSlave.init(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void startWrite(RecordReceiver recordReceiver) {
|
||||||
|
this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void post() {
|
||||||
|
this.commonRdbmsWriterSlave.post(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
BIN
kingbaseeswriter/src/main/libs/kingbase8-8.2.0.jar
Normal file
BIN
kingbaseeswriter/src/main/libs/kingbase8-8.2.0.jar
Normal file
Binary file not shown.
6
kingbaseeswriter/src/main/resources/plugin.json
Normal file
6
kingbaseeswriter/src/main/resources/plugin.json
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"name": "kingbaseeswriter",
|
||||||
|
"class": "com.alibaba.datax.plugin.writer.kingbaseeswriter.KingbaseesWriter",
|
||||||
|
"description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.",
|
||||||
|
"developer": "alibaba"
|
||||||
|
}
|
17
kingbaseeswriter/src/main/resources/plugin_job_template.json
Normal file
17
kingbaseeswriter/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
{
|
||||||
|
"name": "kingbaseeswriter",
|
||||||
|
"parameter": {
|
||||||
|
"username": "",
|
||||||
|
"password": "",
|
||||||
|
"column": [],
|
||||||
|
"preSql": [],
|
||||||
|
"connection": [
|
||||||
|
{
|
||||||
|
"jdbcUrl": "",
|
||||||
|
"table": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"preSql": [],
|
||||||
|
"postSql": []
|
||||||
|
}
|
||||||
|
}
|
6
kuduwriter/README.md
Normal file
6
kuduwriter/README.md
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
# datax-kudu-plugin
|
||||||
|
datax kudu的writer插件
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
仅在kudu11进行过测试
|
BIN
kuduwriter/doc/image-20200901193148188.png
Normal file
BIN
kuduwriter/doc/image-20200901193148188.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 40 KiB |
143
kuduwriter/doc/kuduwirter.md
Normal file
143
kuduwriter/doc/kuduwirter.md
Normal file
@ -0,0 +1,143 @@
|
|||||||
|
# datax-kudu-plugins
|
||||||
|
datax kudu的writer插件
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
eg:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "kuduwriter",
|
||||||
|
"parameter": {
|
||||||
|
"kuduConfig": {
|
||||||
|
"kudu.master_addresses": "***",
|
||||||
|
"timeout": 60000,
|
||||||
|
"sessionTimeout": 60000
|
||||||
|
|
||||||
|
},
|
||||||
|
"table": "",
|
||||||
|
"replicaCount": 3,
|
||||||
|
"truncate": false,
|
||||||
|
"writeMode": "upsert",
|
||||||
|
"partition": {
|
||||||
|
"range": {
|
||||||
|
"column1": [
|
||||||
|
{
|
||||||
|
"lower": "2020-08-25",
|
||||||
|
"upper": "2020-08-26"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"lower": "2020-08-26",
|
||||||
|
"upper": "2020-08-27"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"lower": "2020-08-27",
|
||||||
|
"upper": "2020-08-28"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"hash": {
|
||||||
|
"column": [
|
||||||
|
"column1"
|
||||||
|
],
|
||||||
|
"number": 3
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"name": "c1",
|
||||||
|
"type": "string",
|
||||||
|
"primaryKey": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"index": 1,
|
||||||
|
"name": "c2",
|
||||||
|
"type": "string",
|
||||||
|
"compress": "DEFAULT_COMPRESSION",
|
||||||
|
"encoding": "AUTO_ENCODING",
|
||||||
|
"comment": "注解xxxx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"batchSize": 1024,
|
||||||
|
"bufferSize": 2048,
|
||||||
|
"skipFail": false,
|
||||||
|
"encoding": "UTF-8"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
必须参数:
|
||||||
|
|
||||||
|
```json
|
||||||
|
"writer": {
|
||||||
|
"name": "kuduwriter",
|
||||||
|
"parameter": {
|
||||||
|
"kuduConfig": {
|
||||||
|
"kudu.master_addresses": "***"
|
||||||
|
},
|
||||||
|
"table": "***",
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"name": "c1",
|
||||||
|
"type": "string",
|
||||||
|
"primaryKey": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "c2",
|
||||||
|
"type": "string",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "c3",
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "c4",
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
主键列请写到最前面
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### 配置列表
|
||||||
|
|
||||||
|
| name | default | description | 是否必须 |
|
||||||
|
| -------------- | ------------------- | ------------------------------------------------------------ | -------- |
|
||||||
|
| kuduConfig | | kudu配置 (kudu.master_addresses等) | 是 |
|
||||||
|
| table | | 导入目标表名 | 是 |
|
||||||
|
| partition | | 分区 | 否 |
|
||||||
|
| column | | 列 | 是 |
|
||||||
|
| name | | 列名 | 是 |
|
||||||
|
| type | string | 列的类型,现支持INT, FLOAT, STRING, BIGINT, DOUBLE, BOOLEAN, LONG。 | 否 |
|
||||||
|
| index | 升序排列 | 列索引位置(要么全部列都写,要么都不写),如reader中取到的某一字段在第二位置(eg: name, id, age)但kudu目标表结构不同(eg:id,name, age),此时就需要将index赋值为(1,0,2),默认顺序(0,1,2) | 否 |
|
||||||
|
| primaryKey | false | 是否为主键(请将所有的主键列写在前面),不表明主键将不会检查过滤脏数据 | 否 |
|
||||||
|
| compress | DEFAULT_COMPRESSION | 压缩格式 | 否 |
|
||||||
|
| encoding | AUTO_ENCODING | 编码 | 否 |
|
||||||
|
| replicaCount | 3 | 保留副本个数 | 否 |
|
||||||
|
| hash | | hash分区 | 否 |
|
||||||
|
| number | 3 | hash分区个数 | 否 |
|
||||||
|
| range | | range分区 | 否 |
|
||||||
|
| lower | | range分区下限 (eg: sql建表:partition value='haha' 对应:“lower”:“haha”,“upper”:“haha\000”) | 否 |
|
||||||
|
| upper | | range分区上限(eg: sql建表:partition "10" <= VALUES < "20" 对应:“lower”:“10”,“upper”:“20”) | 否 |
|
||||||
|
| truncate | false | 是否清空表,本质上是删表重建 | 否 |
|
||||||
|
| writeMode | upsert | upsert,insert,update | 否 |
|
||||||
|
| batchSize | 512 | 每xx行数据flush一次结果(最好不要超过1024) | 否 |
|
||||||
|
| bufferSize | 3072 | 缓冲区大小 | 否 |
|
||||||
|
| skipFail | false | 是否跳过插入不成功的数据 | 否 |
|
||||||
|
| timeout | 60000 | client超时时间,如创建表,删除表操作的超时时间。单位:ms | 否 |
|
||||||
|
| sessionTimeout | 60000 | session超时时间 单位:ms | 否 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
82
kuduwriter/pom.xml
Normal file
82
kuduwriter/pom.xml
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||||
|
<parent>
|
||||||
|
<artifactId>datax-all</artifactId>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<version>0.0.1-SNAPSHOT</version>
|
||||||
|
</parent>
|
||||||
|
<modelVersion>4.0.0</modelVersion>
|
||||||
|
|
||||||
|
<artifactId>kuduwriter</artifactId>
|
||||||
|
<dependencies>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-common</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<artifactId>slf4j-log4j12</artifactId>
|
||||||
|
<groupId>org.slf4j</groupId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>org.apache.kudu</groupId>
|
||||||
|
<artifactId>kudu-client</artifactId>
|
||||||
|
<version>1.11.1</version>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>junit</groupId>
|
||||||
|
<artifactId>junit</artifactId>
|
||||||
|
<version>4.13.1</version>
|
||||||
|
<scope>test</scope>
|
||||||
|
</dependency>
|
||||||
|
<dependency>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-core</artifactId>
|
||||||
|
<version>${datax-project-version}</version>
|
||||||
|
<exclusions>
|
||||||
|
<exclusion>
|
||||||
|
<groupId>com.alibaba.datax</groupId>
|
||||||
|
<artifactId>datax-service-face</artifactId>
|
||||||
|
</exclusion>
|
||||||
|
</exclusions>
|
||||||
|
<scope>test</scope>
|
||||||
|
</dependency>
|
||||||
|
</dependencies>
|
||||||
|
|
||||||
|
<build>
|
||||||
|
<plugins>
|
||||||
|
<!-- compiler plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-compiler-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<source>${jdk-version}</source>
|
||||||
|
<target>${jdk-version}</target>
|
||||||
|
<encoding>${project-sourceEncoding}</encoding>
|
||||||
|
</configuration>
|
||||||
|
</plugin>
|
||||||
|
<!-- assembly plugin -->
|
||||||
|
<plugin>
|
||||||
|
<artifactId>maven-assembly-plugin</artifactId>
|
||||||
|
<configuration>
|
||||||
|
<descriptors>
|
||||||
|
<descriptor>src/main/assembly/package.xml</descriptor>
|
||||||
|
</descriptors>
|
||||||
|
<finalName>datax</finalName>
|
||||||
|
</configuration>
|
||||||
|
<executions>
|
||||||
|
<execution>
|
||||||
|
<id>dwzip</id>
|
||||||
|
<phase>package</phase>
|
||||||
|
<goals>
|
||||||
|
<goal>single</goal>
|
||||||
|
</goals>
|
||||||
|
</execution>
|
||||||
|
</executions>
|
||||||
|
</plugin>
|
||||||
|
</plugins>
|
||||||
|
</build>
|
||||||
|
</project>
|
35
kuduwriter/src/main/assembly/package.xml
Normal file
35
kuduwriter/src/main/assembly/package.xml
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
<assembly
|
||||||
|
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
|
||||||
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
|
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
|
||||||
|
<id></id>
|
||||||
|
<formats>
|
||||||
|
<format>dir</format>
|
||||||
|
</formats>
|
||||||
|
<includeBaseDirectory>false</includeBaseDirectory>
|
||||||
|
<fileSets>
|
||||||
|
<fileSet>
|
||||||
|
<directory>src/main/resources</directory>
|
||||||
|
<includes>
|
||||||
|
<include>plugin.json</include>
|
||||||
|
<include>plugin_job_template.json</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/kuduwriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
<fileSet>
|
||||||
|
<directory>target/</directory>
|
||||||
|
<includes>
|
||||||
|
<include>kuduwriter-0.0.1-SNAPSHOT.jar</include>
|
||||||
|
</includes>
|
||||||
|
<outputDirectory>plugin/writer/kuduwriter</outputDirectory>
|
||||||
|
</fileSet>
|
||||||
|
</fileSets>
|
||||||
|
|
||||||
|
<dependencySets>
|
||||||
|
<dependencySet>
|
||||||
|
<useProjectArtifact>false</useProjectArtifact>
|
||||||
|
<outputDirectory>plugin/writer/kuduwriter/libs</outputDirectory>
|
||||||
|
<scope>runtime</scope>
|
||||||
|
</dependencySet>
|
||||||
|
</dependencySets>
|
||||||
|
</assembly>
|
@ -0,0 +1,37 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
|
||||||
|
import java.util.Arrays;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-31 19:12
|
||||||
|
**/
|
||||||
|
public enum ColumnType {
|
||||||
|
INT("int"),
|
||||||
|
FLOAT("float"),
|
||||||
|
STRING("string"),
|
||||||
|
BIGINT("bigint"),
|
||||||
|
DOUBLE("double"),
|
||||||
|
BOOLEAN("boolean"),
|
||||||
|
LONG("long");
|
||||||
|
private String mode;
|
||||||
|
ColumnType(String mode) {
|
||||||
|
this.mode = mode.toLowerCase();
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getMode() {
|
||||||
|
return mode;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static ColumnType getByTypeName(String modeName) {
|
||||||
|
for (ColumnType modeType : values()) {
|
||||||
|
if (modeType.mode.equalsIgnoreCase(modeName)) {
|
||||||
|
return modeType;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE,
|
||||||
|
String.format("Kuduwriter does not support the type:%s, currently supported types are:%s", modeName, Arrays.asList(values())));
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,21 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-31 14:42
|
||||||
|
**/
|
||||||
|
public class Constant {
|
||||||
|
public static final String DEFAULT_ENCODING = "UTF-8";
|
||||||
|
// public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss";
|
||||||
|
|
||||||
|
public static final String COMPRESSION = "DEFAULT_COMPRESSION";
|
||||||
|
public static final String ENCODING = "AUTO_ENCODING";
|
||||||
|
public static final Long ADMIN_TIMEOUTMS = 60000L;
|
||||||
|
public static final Long SESSION_TIMEOUTMS = 60000L;
|
||||||
|
|
||||||
|
|
||||||
|
public static final String INSERT_MODE = "upsert";
|
||||||
|
public static final long DEFAULT_WRITE_BATCH_SIZE = 512L;
|
||||||
|
public static final long DEFAULT_MUTATION_BUFFER_SPACE = 3072L;
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,34 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
|
||||||
|
import java.util.Arrays;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-31 14:47
|
||||||
|
**/
|
||||||
|
public enum InsertModeType {
|
||||||
|
Insert("insert"),
|
||||||
|
Upsert("upsert"),
|
||||||
|
Update("update");
|
||||||
|
private String mode;
|
||||||
|
|
||||||
|
InsertModeType(String mode) {
|
||||||
|
this.mode = mode.toLowerCase();
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getMode() {
|
||||||
|
return mode;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static InsertModeType getByTypeName(String modeName) {
|
||||||
|
for (InsertModeType modeType : values()) {
|
||||||
|
if (modeType.mode.equalsIgnoreCase(modeName)) {
|
||||||
|
return modeType;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE,
|
||||||
|
String.format("Kuduwriter does not support the mode :[%s], currently supported mode types are :%s", modeName, Arrays.asList(values())));
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,45 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-31 14:17
|
||||||
|
**/
|
||||||
|
public class Key {
|
||||||
|
public final static String KUDU_CONFIG = "kuduConfig";
|
||||||
|
public final static String KUDU_MASTER = "kudu.master_addresses";
|
||||||
|
public final static String KUDU_ADMIN_TIMEOUT = "timeout";
|
||||||
|
public final static String KUDU_SESSION_TIMEOUT = "sessionTimeout";
|
||||||
|
|
||||||
|
public final static String TABLE = "table";
|
||||||
|
public final static String PARTITION = "partition";
|
||||||
|
public final static String COLUMN = "column";
|
||||||
|
|
||||||
|
public static final String NAME = "name";
|
||||||
|
public static final String TYPE = "type";
|
||||||
|
public static final String INDEX = "index";
|
||||||
|
public static final String PRIMARYKEY = "primaryKey";
|
||||||
|
public static final String COMPRESSION = "compress";
|
||||||
|
public static final String COMMENT = "comment";
|
||||||
|
public final static String ENCODING = "encoding";
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
public static final String NUM_REPLICAS = "replicaCount";
|
||||||
|
public static final String HASH = "hash";
|
||||||
|
public static final String HASH_NUM = "number";
|
||||||
|
|
||||||
|
public static final String RANGE = "range";
|
||||||
|
public static final String LOWER = "lower";
|
||||||
|
public static final String UPPER = "upper";
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
public static final String TRUNCATE = "truncate";
|
||||||
|
|
||||||
|
public static final String INSERT_MODE = "writeMode";
|
||||||
|
|
||||||
|
public static final String WRITE_BATCH_SIZE = "batchSize";
|
||||||
|
|
||||||
|
public static final String MUTATION_BUFFER_SPACE = "bufferSize";
|
||||||
|
public static final String SKIP_FAIL = "skipFail";
|
||||||
|
}
|
@ -0,0 +1,369 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Column;
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.fastjson.JSON;
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
import org.apache.commons.lang3.Validate;
|
||||||
|
import org.apache.kudu.ColumnSchema;
|
||||||
|
import org.apache.kudu.Schema;
|
||||||
|
import org.apache.kudu.Type;
|
||||||
|
import org.apache.kudu.client.*;
|
||||||
|
import org.apache.kudu.shaded.org.checkerframework.checker.units.qual.K;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
import sun.rmi.runtime.Log;
|
||||||
|
|
||||||
|
import java.nio.charset.Charset;
|
||||||
|
import java.util.*;
|
||||||
|
import java.util.concurrent.SynchronousQueue;
|
||||||
|
import java.util.concurrent.ThreadFactory;
|
||||||
|
import java.util.concurrent.ThreadPoolExecutor;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
import java.util.concurrent.atomic.AtomicInteger;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-27 18:30
|
||||||
|
**/
|
||||||
|
public class Kudu11xHelper {
|
||||||
|
|
||||||
|
private static final Logger LOG = LoggerFactory.getLogger(Kudu11xHelper.class);
|
||||||
|
|
||||||
|
public static Map<String, Object> getKuduConfiguration(String kuduConfig) {
|
||||||
|
if (StringUtils.isBlank(kuduConfig)) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE,
|
||||||
|
"Connection configuration information required.");
|
||||||
|
}
|
||||||
|
Map<String, Object> kConfiguration;
|
||||||
|
try {
|
||||||
|
kConfiguration = JSON.parseObject(kuduConfig, HashMap.class);
|
||||||
|
Validate.isTrue(kConfiguration != null, "kuduConfig is null!");
|
||||||
|
kConfiguration.put(Key.KUDU_ADMIN_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_ADMIN_TIMEOUT, Constant.ADMIN_TIMEOUTMS));
|
||||||
|
kConfiguration.put(Key.KUDU_SESSION_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_SESSION_TIMEOUT, Constant.SESSION_TIMEOUTMS));
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e);
|
||||||
|
}
|
||||||
|
|
||||||
|
return kConfiguration;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static KuduClient getKuduClient(String kuduConfig) {
|
||||||
|
Map<String, Object> conf = Kudu11xHelper.getKuduConfiguration(kuduConfig);
|
||||||
|
KuduClient kuduClient = null;
|
||||||
|
try {
|
||||||
|
String masterAddress = (String) conf.get(Key.KUDU_MASTER);
|
||||||
|
kuduClient = new KuduClient.KuduClientBuilder(masterAddress)
|
||||||
|
.defaultAdminOperationTimeoutMs((Long) conf.get(Key.KUDU_ADMIN_TIMEOUT))
|
||||||
|
.defaultOperationTimeoutMs((Long) conf.get(Key.KUDU_SESSION_TIMEOUT))
|
||||||
|
.build();
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e);
|
||||||
|
}
|
||||||
|
return kuduClient;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static KuduTable getKuduTable(Configuration configuration, KuduClient kuduClient) {
|
||||||
|
String tableName = configuration.getString(Key.TABLE);
|
||||||
|
|
||||||
|
KuduTable table = null;
|
||||||
|
try {
|
||||||
|
if (kuduClient.tableExists(tableName)) {
|
||||||
|
table = kuduClient.openTable(tableName);
|
||||||
|
} else {
|
||||||
|
synchronized (Kudu11xHelper.class) {
|
||||||
|
if (!kuduClient.tableExists(tableName)) {
|
||||||
|
Schema schema = Kudu11xHelper.getSchema(configuration);
|
||||||
|
CreateTableOptions tableOptions = new CreateTableOptions();
|
||||||
|
|
||||||
|
Kudu11xHelper.setTablePartition(configuration, tableOptions, schema);
|
||||||
|
//副本数
|
||||||
|
Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3);
|
||||||
|
tableOptions.setNumReplicas(numReplicas);
|
||||||
|
table = kuduClient.createTable(tableName, schema, tableOptions);
|
||||||
|
} else {
|
||||||
|
table = kuduClient.openTable(tableName);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_TABLE_ERROR, e);
|
||||||
|
}
|
||||||
|
return table;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void createTable(Configuration configuration) {
|
||||||
|
String tableName = configuration.getString(Key.TABLE);
|
||||||
|
String kuduConfig = configuration.getString(Key.KUDU_CONFIG);
|
||||||
|
KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig);
|
||||||
|
try {
|
||||||
|
Schema schema = Kudu11xHelper.getSchema(configuration);
|
||||||
|
CreateTableOptions tableOptions = new CreateTableOptions();
|
||||||
|
|
||||||
|
Kudu11xHelper.setTablePartition(configuration, tableOptions, schema);
|
||||||
|
//副本数
|
||||||
|
Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3);
|
||||||
|
tableOptions.setNumReplicas(numReplicas);
|
||||||
|
kuduClient.createTable(tableName, schema, tableOptions);
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.GREATE_KUDU_TABLE_ERROR, e);
|
||||||
|
} finally {
|
||||||
|
AtomicInteger i = new AtomicInteger(10);
|
||||||
|
while (i.get() > 0) {
|
||||||
|
try {
|
||||||
|
if (kuduClient.isCreateTableDone(tableName)) {
|
||||||
|
Kudu11xHelper.closeClient(kuduClient);
|
||||||
|
LOG.info("Table " + tableName + " is created!");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
i.decrementAndGet();
|
||||||
|
LOG.error("timeout!");
|
||||||
|
} catch (KuduException e) {
|
||||||
|
LOG.info("Wait for the table to be created..... " + i);
|
||||||
|
try {
|
||||||
|
Thread.sleep(100L);
|
||||||
|
} catch (InterruptedException ex) {
|
||||||
|
ex.printStackTrace();
|
||||||
|
}
|
||||||
|
i.decrementAndGet();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
if (kuduClient != null) {
|
||||||
|
kuduClient.close();
|
||||||
|
}
|
||||||
|
} catch (KuduException e) {
|
||||||
|
LOG.info("Kudu client has been shut down!");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public static ThreadPoolExecutor createRowAddThreadPool(int coreSize) {
|
||||||
|
return new ThreadPoolExecutor(coreSize,
|
||||||
|
coreSize,
|
||||||
|
60L,
|
||||||
|
TimeUnit.SECONDS,
|
||||||
|
new SynchronousQueue<Runnable>(),
|
||||||
|
new ThreadFactory() {
|
||||||
|
private final ThreadGroup group = System.getSecurityManager() == null ? Thread.currentThread().getThreadGroup() : System.getSecurityManager().getThreadGroup();
|
||||||
|
private final AtomicInteger threadNumber = new AtomicInteger(1);
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Thread newThread(Runnable r) {
|
||||||
|
Thread t = new Thread(group, r,
|
||||||
|
"pool-kudu_rows_add-thread-" + threadNumber.getAndIncrement(),
|
||||||
|
0);
|
||||||
|
if (t.isDaemon())
|
||||||
|
t.setDaemon(false);
|
||||||
|
if (t.getPriority() != Thread.NORM_PRIORITY)
|
||||||
|
t.setPriority(Thread.NORM_PRIORITY);
|
||||||
|
return t;
|
||||||
|
}
|
||||||
|
}, new ThreadPoolExecutor.CallerRunsPolicy());
|
||||||
|
}
|
||||||
|
|
||||||
|
public static List<List<Configuration>> getColumnLists(List<Configuration> columns) {
|
||||||
|
int quota = 8;
|
||||||
|
int num = (columns.size() - 1) / quota + 1;
|
||||||
|
int gap = columns.size() / num;
|
||||||
|
List<List<Configuration>> columnLists = new ArrayList<>(num);
|
||||||
|
for (int j = 0; j < num - 1; j++) {
|
||||||
|
List<Configuration> destList = new ArrayList<>(columns.subList(j * gap, (j + 1) * gap));
|
||||||
|
columnLists.add(destList);
|
||||||
|
}
|
||||||
|
List<Configuration> destList = new ArrayList<>(columns.subList(gap * (num - 1), columns.size()));
|
||||||
|
columnLists.add(destList);
|
||||||
|
return columnLists;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static boolean isTableExists(Configuration configuration) {
|
||||||
|
String tableName = configuration.getString(Key.TABLE);
|
||||||
|
String kuduConfig = configuration.getString(Key.KUDU_CONFIG);
|
||||||
|
KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig);
|
||||||
|
try {
|
||||||
|
return kuduClient.tableExists(tableName);
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e);
|
||||||
|
} finally {
|
||||||
|
Kudu11xHelper.closeClient(kuduClient);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void closeClient(KuduClient kuduClient) {
|
||||||
|
try {
|
||||||
|
if (kuduClient != null) {
|
||||||
|
kuduClient.close();
|
||||||
|
}
|
||||||
|
} catch (KuduException e) {
|
||||||
|
LOG.warn("The \"kudu client\" was not stopped gracefully. !");
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public static Schema getSchema(Configuration configuration) {
|
||||||
|
List<Configuration> columns = configuration.getListConfiguration(Key.COLUMN);
|
||||||
|
List<ColumnSchema> columnSchemas = new ArrayList<>();
|
||||||
|
Schema schema = null;
|
||||||
|
if (columns == null || columns.isEmpty()) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, "column is not defined,eg:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"},{\"name\": \"cf1:column1\",\"type\": \"long\"}]");
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
for (Configuration column : columns) {
|
||||||
|
|
||||||
|
String type = "BIGINT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ||
|
||||||
|
"LONG".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ?
|
||||||
|
"INT64" : "INT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ?
|
||||||
|
"INT32" : column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase();
|
||||||
|
String name = column.getNecessaryValue(Key.NAME, Kudu11xWriterErrorcode.REQUIRED_VALUE);
|
||||||
|
Boolean key = column.getBool(Key.PRIMARYKEY, false);
|
||||||
|
String encoding = column.getString(Key.ENCODING, Constant.ENCODING).toUpperCase();
|
||||||
|
String compression = column.getString(Key.COMPRESSION, Constant.COMPRESSION).toUpperCase();
|
||||||
|
String comment = column.getString(Key.COMMENT, "");
|
||||||
|
|
||||||
|
columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder(name, Type.getTypeForName(type))
|
||||||
|
.key(key)
|
||||||
|
.encoding(ColumnSchema.Encoding.valueOf(encoding))
|
||||||
|
.compressionAlgorithm(ColumnSchema.CompressionAlgorithm.valueOf(compression))
|
||||||
|
.comment(comment)
|
||||||
|
.build());
|
||||||
|
}
|
||||||
|
schema = new Schema(columnSchemas);
|
||||||
|
} catch (Exception e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, e);
|
||||||
|
}
|
||||||
|
return schema;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static Integer getPrimaryKeyIndexUntil(List<Configuration> columns) {
|
||||||
|
int i = 0;
|
||||||
|
while (i < columns.size()) {
|
||||||
|
Configuration col = columns.get(i);
|
||||||
|
if (!col.getBool(Key.PRIMARYKEY, false)) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void setTablePartition(Configuration configuration,
|
||||||
|
CreateTableOptions tableOptions,
|
||||||
|
Schema schema) {
|
||||||
|
Configuration partition = configuration.getConfiguration(Key.PARTITION);
|
||||||
|
if (partition == null) {
|
||||||
|
ColumnSchema columnSchema = schema.getColumns().get(0);
|
||||||
|
tableOptions.addHashPartitions(Collections.singletonList(columnSchema.getName()), 3);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
//range分区
|
||||||
|
Configuration range = partition.getConfiguration(Key.RANGE);
|
||||||
|
if (range != null) {
|
||||||
|
List<String> rangeColums = new ArrayList<>(range.getKeys());
|
||||||
|
tableOptions.setRangePartitionColumns(rangeColums);
|
||||||
|
for (String rangeColum : rangeColums) {
|
||||||
|
List<Configuration> lowerAndUppers = range.getListConfiguration(rangeColum);
|
||||||
|
for (Configuration lowerAndUpper : lowerAndUppers) {
|
||||||
|
PartialRow lower = schema.newPartialRow();
|
||||||
|
lower.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.LOWER, Kudu11xWriterErrorcode.REQUIRED_VALUE));
|
||||||
|
PartialRow upper = schema.newPartialRow();
|
||||||
|
upper.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.UPPER, Kudu11xWriterErrorcode.REQUIRED_VALUE));
|
||||||
|
tableOptions.addRangePartition(lower, upper);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
LOG.info("Set range partition complete!");
|
||||||
|
}
|
||||||
|
|
||||||
|
// 设置Hash分区
|
||||||
|
Configuration hash = partition.getConfiguration(Key.HASH);
|
||||||
|
if (hash != null) {
|
||||||
|
List<String> hashColums = hash.getList(Key.COLUMN, String.class);
|
||||||
|
Integer hashPartitionNum = configuration.getInt(Key.HASH_NUM, 3);
|
||||||
|
tableOptions.addHashPartitions(hashColums, hashPartitionNum);
|
||||||
|
LOG.info("Set hash partition complete!");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void validateParameter(Configuration configuration) {
|
||||||
|
LOG.info("Start validating parameters!");
|
||||||
|
configuration.getNecessaryValue(Key.KUDU_CONFIG, Kudu11xWriterErrorcode.REQUIRED_VALUE);
|
||||||
|
configuration.getNecessaryValue(Key.TABLE, Kudu11xWriterErrorcode.REQUIRED_VALUE);
|
||||||
|
String encoding = configuration.getString(Key.ENCODING, Constant.DEFAULT_ENCODING);
|
||||||
|
if (!Charset.isSupported(encoding)) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE,
|
||||||
|
String.format("Encoding is not supported:[%s] .", encoding));
|
||||||
|
}
|
||||||
|
configuration.set(Key.ENCODING, encoding);
|
||||||
|
String insertMode = configuration.getString(Key.INSERT_MODE, Constant.INSERT_MODE);
|
||||||
|
try {
|
||||||
|
InsertModeType.getByTypeName(insertMode);
|
||||||
|
} catch (Exception e) {
|
||||||
|
insertMode = Constant.INSERT_MODE;
|
||||||
|
}
|
||||||
|
configuration.set(Key.INSERT_MODE, insertMode);
|
||||||
|
|
||||||
|
Long writeBufferSize = configuration.getLong(Key.WRITE_BATCH_SIZE, Constant.DEFAULT_WRITE_BATCH_SIZE);
|
||||||
|
configuration.set(Key.WRITE_BATCH_SIZE, writeBufferSize);
|
||||||
|
|
||||||
|
Long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE, Constant.DEFAULT_MUTATION_BUFFER_SPACE);
|
||||||
|
configuration.set(Key.MUTATION_BUFFER_SPACE, mutationBufferSpace);
|
||||||
|
|
||||||
|
Boolean isSkipFail = configuration.getBool(Key.SKIP_FAIL, false);
|
||||||
|
configuration.set(Key.SKIP_FAIL, isSkipFail);
|
||||||
|
List<Configuration> columns = configuration.getListConfiguration(Key.COLUMN);
|
||||||
|
List<Configuration> goalColumns = new ArrayList<>();
|
||||||
|
//column参数验证
|
||||||
|
int indexFlag = 0;
|
||||||
|
boolean primaryKey = true;
|
||||||
|
int primaryKeyFlag = 0;
|
||||||
|
for (int i = 0; i < columns.size(); i++) {
|
||||||
|
Configuration col = columns.get(i);
|
||||||
|
String index = col.getString(Key.INDEX);
|
||||||
|
if (index == null) {
|
||||||
|
index = String.valueOf(i);
|
||||||
|
col.set(Key.INDEX, index);
|
||||||
|
indexFlag++;
|
||||||
|
}
|
||||||
|
if(primaryKey != col.getBool(Key.PRIMARYKEY, false)){
|
||||||
|
primaryKey = col.getBool(Key.PRIMARYKEY, false);
|
||||||
|
primaryKeyFlag++;
|
||||||
|
}
|
||||||
|
goalColumns.add(col);
|
||||||
|
}
|
||||||
|
if (indexFlag != 0 && indexFlag != columns.size()) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE,
|
||||||
|
"\"index\" either has values for all of them, or all of them are null!");
|
||||||
|
}
|
||||||
|
if (primaryKeyFlag > 1){
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE,
|
||||||
|
"\"primaryKey\" must be written in the front!");
|
||||||
|
}
|
||||||
|
configuration.set(Key.COLUMN, goalColumns);
|
||||||
|
// LOG.info("------------------------------------");
|
||||||
|
// LOG.info(configuration.toString());
|
||||||
|
// LOG.info("------------------------------------");
|
||||||
|
LOG.info("validate parameter complete!");
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void truncateTable(Configuration configuration) {
|
||||||
|
String kuduConfig = configuration.getString(Key.KUDU_CONFIG);
|
||||||
|
String userTable = configuration.getString(Key.TABLE);
|
||||||
|
LOG.info(String.format("Because you have configured truncate is true,KuduWriter begins to truncate table %s .", userTable));
|
||||||
|
KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig);
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (kuduClient.tableExists(userTable)) {
|
||||||
|
kuduClient.deleteTable(userTable);
|
||||||
|
LOG.info(String.format("table %s has been deleted.", userTable));
|
||||||
|
}
|
||||||
|
} catch (KuduException e) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.DELETE_KUDU_ERROR, e);
|
||||||
|
} finally {
|
||||||
|
Kudu11xHelper.closeClient(kuduClient);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,85 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
|
import com.alibaba.datax.common.spi.Writer;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-27 16:58
|
||||||
|
**/
|
||||||
|
public class Kudu11xWriter extends Writer {
|
||||||
|
public static class Job extends Writer.Job{
|
||||||
|
private static final Logger LOG = LoggerFactory.getLogger(Job.class);
|
||||||
|
private Configuration config = null;
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.config = this.getPluginJobConf();
|
||||||
|
Kudu11xHelper.validateParameter(this.config);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void prepare() {
|
||||||
|
Boolean truncate = config.getBool(Key.TRUNCATE,false);
|
||||||
|
if(truncate){
|
||||||
|
Kudu11xHelper.truncateTable(this.config);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!Kudu11xHelper.isTableExists(config)){
|
||||||
|
Kudu11xHelper.createTable(config);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<Configuration> split(int i) {
|
||||||
|
List<Configuration> splitResultConfigs = new ArrayList<>();
|
||||||
|
for (int j = 0; j < i; j++) {
|
||||||
|
splitResultConfigs.add(config.clone());
|
||||||
|
}
|
||||||
|
|
||||||
|
return splitResultConfigs;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public static class Task extends Writer.Task{
|
||||||
|
private Configuration taskConfig;
|
||||||
|
private KuduWriterTask kuduTaskProxy;
|
||||||
|
private static final Logger LOG = LoggerFactory.getLogger(Job.class);
|
||||||
|
@Override
|
||||||
|
public void init() {
|
||||||
|
this.taskConfig = super.getPluginJobConf();
|
||||||
|
this.kuduTaskProxy = new KuduWriterTask(this.taskConfig);
|
||||||
|
}
|
||||||
|
@Override
|
||||||
|
public void startWrite(RecordReceiver lineReceiver) {
|
||||||
|
this.kuduTaskProxy.startWriter(lineReceiver,super.getTaskPluginCollector());
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void destroy() {
|
||||||
|
try {
|
||||||
|
if (kuduTaskProxy.session != null) {
|
||||||
|
kuduTaskProxy.session.close();
|
||||||
|
}
|
||||||
|
}catch (Exception e){
|
||||||
|
LOG.warn("The \"kudu session\" was not stopped gracefully !");
|
||||||
|
}
|
||||||
|
Kudu11xHelper.closeClient(kuduTaskProxy.kuduClient);
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,39 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.spi.ErrorCode;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-27 19:25
|
||||||
|
**/
|
||||||
|
public enum Kudu11xWriterErrorcode implements ErrorCode {
|
||||||
|
REQUIRED_VALUE("Kuduwriter-00", "You are missing a required parameter value."),
|
||||||
|
ILLEGAL_VALUE("Kuduwriter-01", "You fill in the parameter values are not legitimate."),
|
||||||
|
GET_KUDU_CONNECTION_ERROR("Kuduwriter-02", "Error getting Kudu connection."),
|
||||||
|
GET_KUDU_TABLE_ERROR("Kuduwriter-03", "Error getting Kudu table."),
|
||||||
|
CLOSE_KUDU_CONNECTION_ERROR("Kuduwriter-04", "Error closing Kudu connection."),
|
||||||
|
CLOSE_KUDU_SESSION_ERROR("Kuduwriter-06", "Error closing Kudu table connection."),
|
||||||
|
PUT_KUDU_ERROR("Kuduwriter-07", "IO exception occurred when writing to Kudu."),
|
||||||
|
DELETE_KUDU_ERROR("Kuduwriter-08", "An exception occurred while delete Kudu table."),
|
||||||
|
GREATE_KUDU_TABLE_ERROR("Kuduwriter-09", "Error creating Kudu table."),
|
||||||
|
PARAMETER_NUM_ERROR("Kuduwriter-10","The number of parameters does not match.")
|
||||||
|
;
|
||||||
|
|
||||||
|
private final String code;
|
||||||
|
private final String description;
|
||||||
|
|
||||||
|
|
||||||
|
Kudu11xWriterErrorcode(String code, String description) {
|
||||||
|
this.code = code;
|
||||||
|
this.description = description;
|
||||||
|
}
|
||||||
|
@Override
|
||||||
|
public String getCode() {
|
||||||
|
return code;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getDescription() {
|
||||||
|
return description;
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,216 @@
|
|||||||
|
package com.q1.datax.plugin.writer.kudu11xwriter;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.element.Column;
|
||||||
|
import com.alibaba.datax.common.element.Record;
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.plugin.RecordReceiver;
|
||||||
|
import com.alibaba.datax.common.plugin.TaskPluginCollector;
|
||||||
|
import com.alibaba.datax.common.util.Configuration;
|
||||||
|
import com.alibaba.datax.common.util.RetryUtil;
|
||||||
|
import org.apache.commons.lang3.StringUtils;
|
||||||
|
import org.apache.kudu.client.*;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.Collections;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.concurrent.*;
|
||||||
|
import java.util.concurrent.atomic.AtomicInteger;
|
||||||
|
import java.util.concurrent.atomic.AtomicLong;
|
||||||
|
import java.util.concurrent.atomic.LongAdder;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-31 16:55
|
||||||
|
**/
|
||||||
|
public class KuduWriterTask {
|
||||||
|
private final static Logger LOG = LoggerFactory.getLogger(KuduWriterTask.class);
|
||||||
|
|
||||||
|
private List<Configuration> columns;
|
||||||
|
private List<List<Configuration>> columnLists;
|
||||||
|
private ThreadPoolExecutor pool;
|
||||||
|
private String encoding;
|
||||||
|
private Double batchSize;
|
||||||
|
private Boolean isUpsert;
|
||||||
|
private Boolean isSkipFail;
|
||||||
|
public KuduClient kuduClient;
|
||||||
|
public KuduSession session;
|
||||||
|
private KuduTable table;
|
||||||
|
private Integer primaryKeyIndexUntil;
|
||||||
|
|
||||||
|
private final Object lock = new Object();
|
||||||
|
|
||||||
|
public KuduWriterTask(Configuration configuration) {
|
||||||
|
columns = configuration.getListConfiguration(Key.COLUMN);
|
||||||
|
columnLists = Kudu11xHelper.getColumnLists(columns);
|
||||||
|
pool = Kudu11xHelper.createRowAddThreadPool(columnLists.size());
|
||||||
|
|
||||||
|
this.encoding = configuration.getString(Key.ENCODING);
|
||||||
|
this.batchSize = configuration.getDouble(Key.WRITE_BATCH_SIZE);
|
||||||
|
this.isUpsert = !configuration.getString(Key.INSERT_MODE).equalsIgnoreCase("insert");
|
||||||
|
this.isSkipFail = configuration.getBool(Key.SKIP_FAIL);
|
||||||
|
long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE);
|
||||||
|
|
||||||
|
this.kuduClient = Kudu11xHelper.getKuduClient(configuration.getString(Key.KUDU_CONFIG));
|
||||||
|
this.table = Kudu11xHelper.getKuduTable(configuration, kuduClient);
|
||||||
|
this.session = kuduClient.newSession();
|
||||||
|
session.setFlushMode(SessionConfiguration.FlushMode.MANUAL_FLUSH);
|
||||||
|
session.setMutationBufferSpace((int) mutationBufferSpace);
|
||||||
|
this.primaryKeyIndexUntil = Kudu11xHelper.getPrimaryKeyIndexUntil(columns);
|
||||||
|
// tableName = configuration.getString(Key.TABLE);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector) {
|
||||||
|
LOG.info("kuduwriter began to write!");
|
||||||
|
Record record;
|
||||||
|
LongAdder counter = new LongAdder();
|
||||||
|
try {
|
||||||
|
while ((record = lineReceiver.getFromReader()) != null) {
|
||||||
|
if (record.getColumnNumber() != columns.size()) {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.PARAMETER_NUM_ERROR, " number of record fields:" + record.getColumnNumber() + " number of configuration fields:" + columns.size());
|
||||||
|
}
|
||||||
|
boolean isDirtyRecord = false;
|
||||||
|
|
||||||
|
|
||||||
|
for (int i = 0; i < primaryKeyIndexUntil && !isDirtyRecord; i++) {
|
||||||
|
Column column = record.getColumn(i);
|
||||||
|
isDirtyRecord = StringUtils.isBlank(column.asString());
|
||||||
|
}
|
||||||
|
|
||||||
|
if (isDirtyRecord) {
|
||||||
|
taskPluginCollector.collectDirtyRecord(record, "primarykey field is null");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
CountDownLatch countDownLatch = new CountDownLatch(columnLists.size());
|
||||||
|
Upsert upsert = table.newUpsert();
|
||||||
|
Insert insert = table.newInsert();
|
||||||
|
PartialRow row;
|
||||||
|
if (isUpsert) {
|
||||||
|
//覆盖更新
|
||||||
|
row = upsert.getRow();
|
||||||
|
} else {
|
||||||
|
//增量更新
|
||||||
|
row = insert.getRow();
|
||||||
|
}
|
||||||
|
List<Future<?>> futures = new ArrayList<>();
|
||||||
|
for (List<Configuration> columnList : columnLists) {
|
||||||
|
Record finalRecord = record;
|
||||||
|
Future<?> future = pool.submit(() -> {
|
||||||
|
try {
|
||||||
|
for (Configuration col : columnList) {
|
||||||
|
String name = col.getString(Key.NAME);
|
||||||
|
ColumnType type = ColumnType.getByTypeName(col.getString(Key.TYPE, "string"));
|
||||||
|
Column column = finalRecord.getColumn(col.getInt(Key.INDEX));
|
||||||
|
String rawData = column.asString();
|
||||||
|
if (rawData == null) {
|
||||||
|
synchronized (lock) {
|
||||||
|
row.setNull(name);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
switch (type) {
|
||||||
|
case INT:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addInt(name, Integer.parseInt(rawData));
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case LONG:
|
||||||
|
case BIGINT:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addLong(name, Long.parseLong(rawData));
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case FLOAT:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addFloat(name, Float.parseFloat(rawData));
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case DOUBLE:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addDouble(name, Double.parseDouble(rawData));
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case BOOLEAN:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addBoolean(name, Boolean.getBoolean(rawData));
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case STRING:
|
||||||
|
default:
|
||||||
|
synchronized (lock) {
|
||||||
|
row.addString(name, rawData);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
countDownLatch.countDown();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
futures.add(future);
|
||||||
|
}
|
||||||
|
countDownLatch.await();
|
||||||
|
for (Future<?> future : futures) {
|
||||||
|
future.get();
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
RetryUtil.executeWithRetry(() -> {
|
||||||
|
if (isUpsert) {
|
||||||
|
//覆盖更新
|
||||||
|
session.apply(upsert);
|
||||||
|
} else {
|
||||||
|
//增量更新
|
||||||
|
session.apply(insert);
|
||||||
|
}
|
||||||
|
//flush
|
||||||
|
if (counter.longValue() > (batchSize * 0.8)) {
|
||||||
|
session.flush();
|
||||||
|
counter.reset();
|
||||||
|
}
|
||||||
|
counter.increment();
|
||||||
|
return true;
|
||||||
|
}, 5, 500L, true);
|
||||||
|
|
||||||
|
} catch (Exception e) {
|
||||||
|
LOG.error("Record Write Failure!", e);
|
||||||
|
if (isSkipFail) {
|
||||||
|
LOG.warn("Since you have configured \"skipFail\" to be true, this record will be skipped !");
|
||||||
|
taskPluginCollector.collectDirtyRecord(record, e.getMessage());
|
||||||
|
} else {
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (Exception e) {
|
||||||
|
LOG.error("write failure! the task will exit!");
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage());
|
||||||
|
}
|
||||||
|
AtomicInteger i = new AtomicInteger(10);
|
||||||
|
try {
|
||||||
|
while (i.get() > 0) {
|
||||||
|
if (session.hasPendingOperations()) {
|
||||||
|
session.flush();
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
Thread.sleep(20L);
|
||||||
|
i.decrementAndGet();
|
||||||
|
}
|
||||||
|
} catch (Exception e) {
|
||||||
|
LOG.info("Waiting for data to be written to kudu...... " + i + "s");
|
||||||
|
|
||||||
|
} finally {
|
||||||
|
try {
|
||||||
|
pool.shutdown();
|
||||||
|
//强制刷写
|
||||||
|
session.flush();
|
||||||
|
} catch (KuduException e) {
|
||||||
|
LOG.error("kuduwriter flush error! The results may be incomplete!");
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,9 @@
|
|||||||
|
package com.q1.kudu.conf;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-09-16 11:39
|
||||||
|
**/
|
||||||
|
public class KuduConfig {
|
||||||
|
|
||||||
|
}
|
7
kuduwriter/src/main/resources/plugin.json
Normal file
7
kuduwriter/src/main/resources/plugin.json
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"name": "kuduwriter",
|
||||||
|
"class": "com.q1.datax.plugin.writer.kudu11xwriter.Kudu11xWriter",
|
||||||
|
"description": "use put: prod. mechanism: use kudu java api put data.",
|
||||||
|
"developer": "com.q1.daizihao"
|
||||||
|
}
|
||||||
|
|
59
kuduwriter/src/main/resources/plugin_job_template.json
Normal file
59
kuduwriter/src/main/resources/plugin_job_template.json
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
{
|
||||||
|
"name": "kuduwriter",
|
||||||
|
"parameter": {
|
||||||
|
"kuduConfig": {
|
||||||
|
"kudu.master_addresses": "***",
|
||||||
|
"timeout": 60000,
|
||||||
|
"sessionTimeout": 60000
|
||||||
|
|
||||||
|
},
|
||||||
|
"table": "",
|
||||||
|
"replicaCount": 3,
|
||||||
|
"truncate": false,
|
||||||
|
"writeMode": "upsert",
|
||||||
|
"partition": {
|
||||||
|
"range": {
|
||||||
|
"column1": [
|
||||||
|
{
|
||||||
|
"lower": "2020-08-25",
|
||||||
|
"upper": "2020-08-26"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"lower": "2020-08-26",
|
||||||
|
"upper": "2020-08-27"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"lower": "2020-08-27",
|
||||||
|
"upper": "2020-08-28"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"hash": {
|
||||||
|
"column": [
|
||||||
|
"column1"
|
||||||
|
],
|
||||||
|
"number": 3
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"column": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"name": "c1",
|
||||||
|
"type": "string",
|
||||||
|
"primaryKey": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"index": 1,
|
||||||
|
"name": "c2",
|
||||||
|
"type": "string",
|
||||||
|
"compress": "DEFAULT_COMPRESSION",
|
||||||
|
"encoding": "AUTO_ENCODING",
|
||||||
|
"comment": "注解xxxx"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"batchSize": 1024,
|
||||||
|
"bufferSize": 2048,
|
||||||
|
"skipFail": false,
|
||||||
|
"encoding": "UTF-8"
|
||||||
|
}
|
||||||
|
}
|
40
kuduwriter/src/test/java/com/dai/test.java
Normal file
40
kuduwriter/src/test/java/com/dai/test.java
Normal file
@ -0,0 +1,40 @@
|
|||||||
|
package com.dai;
|
||||||
|
|
||||||
|
import com.alibaba.datax.common.exception.DataXException;
|
||||||
|
import com.alibaba.datax.common.util.RetryUtil;
|
||||||
|
import com.q1.datax.plugin.writer.kudu11xwriter.*;
|
||||||
|
import static org.apache.kudu.client.AsyncKuduClient.LOG;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author daizihao
|
||||||
|
* @create 2020-08-28 11:03
|
||||||
|
**/
|
||||||
|
public class test {
|
||||||
|
static boolean isSkipFail;
|
||||||
|
|
||||||
|
|
||||||
|
public static void main(String[] args) {
|
||||||
|
try {
|
||||||
|
while (true) {
|
||||||
|
try {
|
||||||
|
RetryUtil.executeWithRetry(()->{
|
||||||
|
throw new RuntimeException();
|
||||||
|
},5,1000L,true);
|
||||||
|
|
||||||
|
} catch (Exception e) {
|
||||||
|
LOG.error("Data write failed!", e);
|
||||||
|
System.out.println(isSkipFail);
|
||||||
|
if (isSkipFail) {
|
||||||
|
LOG.warn("Because you have configured skipFail is true,this data will be skipped!");
|
||||||
|
}else {
|
||||||
|
System.out.println("异常抛出");
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (Exception e) {
|
||||||
|
LOG.error("write failed! the task will exit!");
|
||||||
|
throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -8,7 +8,7 @@ MongoDBReader 插件利用 MongoDB 的java客户端MongoClient进行MongoDB的
|
|||||||
MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的JOB程序按照指定的规则对MongoDB中的数据进行分片,并行读取,然后将MongoDB支持的类型通过逐一判断转换成Datax支持的类型。
|
MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的JOB程序按照指定的规则对MongoDB中的数据进行分片,并行读取,然后将MongoDB支持的类型通过逐一判断转换成Datax支持的类型。
|
||||||
|
|
||||||
#### 3 功能说明
|
#### 3 功能说明
|
||||||
* 该示例从ODPS读一份数据到MongoDB。
|
* 该示例从MongoDB读一份数据到ODPS。
|
||||||
|
|
||||||
{
|
{
|
||||||
"job": {
|
"job": {
|
||||||
@ -132,6 +132,7 @@ MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的J
|
|||||||
* name:Column的名字。【必填】
|
* name:Column的名字。【必填】
|
||||||
* type:Column的类型。【选填】
|
* type:Column的类型。【选填】
|
||||||
* splitter:因为MongoDB支持数组类型,但是Datax框架本身不支持数组类型,所以mongoDB读出来的数组类型要通过这个分隔符合并成字符串。【选填】
|
* splitter:因为MongoDB支持数组类型,但是Datax框架本身不支持数组类型,所以mongoDB读出来的数组类型要通过这个分隔符合并成字符串。【选填】
|
||||||
|
* query: MongoDB的额外查询条件。【选填】
|
||||||
|
|
||||||
#### 5 类型转换
|
#### 5 类型转换
|
||||||
|
|
||||||
@ -146,4 +147,4 @@ MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的J
|
|||||||
|
|
||||||
|
|
||||||
#### 6 性能报告
|
#### 6 性能报告
|
||||||
#### 7 测试报告
|
#### 7 测试报告
|
||||||
|
@ -139,7 +139,7 @@ MongoDBWriter通过Datax框架获取Reader生成的数据,然后将Datax支持
|
|||||||
* splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组时,才使用这个参数,通过这个参数指定的分隔符,将字符串分隔存储到MongoDB的数组中。【选填】
|
* splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组时,才使用这个参数,通过这个参数指定的分隔符,将字符串分隔存储到MongoDB的数组中。【选填】
|
||||||
* upsertInfo:指定了传输数据时更新的信息。【选填】
|
* upsertInfo:指定了传输数据时更新的信息。【选填】
|
||||||
* isUpsert:当设置为true时,表示针对相同的upsertKey做更新操作。【选填】
|
* isUpsert:当设置为true时,表示针对相同的upsertKey做更新操作。【选填】
|
||||||
* upsertKey:upsertKey指定了没行记录的业务主键。用来做更新时使用。【选填】
|
* upsertKey:upsertKey指定了每行记录的业务主键。用来做更新时使用。【选填】
|
||||||
|
|
||||||
#### 5 类型转换
|
#### 5 类型转换
|
||||||
|
|
||||||
@ -154,4 +154,4 @@ MongoDBWriter通过Datax框架获取Reader生成的数据,然后将Datax支持
|
|||||||
|
|
||||||
|
|
||||||
#### 6 性能报告
|
#### 6 性能报告
|
||||||
#### 7 测试报告
|
#### 7 测试报告
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user