From 87a4428c5d1f3e78bc7438b6367aede3fd4ec24f Mon Sep 17 00:00:00 2001 From: shf Date: Thu, 18 Jan 2018 18:46:27 +0800 Subject: [PATCH 1/8] =?UTF-8?q?=E5=85=AC=E5=91=8A?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..f316b022 --- /dev/null +++ b/README.md @@ -0,0 +1,4 @@ +## 公告 + DataX新的feature merge,github代码暂时不可访问,我们会尽快在github push出来,敬请期待!
+ 带来不便,敬请谅解。
+ DataX开源项目组 From 4a43947f7fb87c655c91476bf36379cb2a261087 Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Fri, 2 Feb 2018 17:50:46 +0800 Subject: [PATCH 2/8] Delete README.md --- README.md | 4 ---- 1 file changed, 4 deletions(-) delete mode 100644 README.md diff --git a/README.md b/README.md deleted file mode 100644 index f316b022..00000000 --- a/README.md +++ /dev/null @@ -1,4 +0,0 @@ -## 公告 - DataX新的feature merge,github代码暂时不可访问,我们会尽快在github push出来,敬请期待!
- 带来不便,敬请谅解。
- DataX开源项目组 From c6c09310dbf82d8c977366a2b1bc8c387274f3e3 Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Fri, 2 Feb 2018 18:22:09 +0800 Subject: [PATCH 3/8] Update README.md --- README.md | 43 ++++++++++++++++++++++--------------------- introduction.md | 41 +++++++++++++++++++++-------------------- 2 files changed, 43 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index d0f81879..b5539021 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 # DataX详细介绍 -##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/wiki/DataX-Introduction) +##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/blob/master/introduction.md) @@ -28,31 +28,32 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 -# Support Data Channels +# Support Data Channels DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) | 类型 | 数据源 | Reader(读) | Writer(写) |文档| | ------------ | ---------- | :-------: | :-------: |:-------: | -| RDBMS 关系型数据库 | MySQL | √ | √ |![读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、![写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| -| | Oracle | √ | √ |![读]() 、![写]()| -| | SQLServer | √ | √ |![读]() 、![写]()| -| | PostgreSQL | √ | √ |![读]() 、![写]()| -| | DRDS | √ | √ |![读]() 、![写]()| -| | 达梦 | √ | √ |![读]() 、![写]()| -| | 通用RDBMS(支持所有关系型数据库) | √ | √ |![读]() 、![写]()| -| 阿里云数仓数据存储 | ODPS | √ | √ |![读]() 、![写]()| -| | ADS | | √ |![读]() 、![写]()| -| | OSS | √ | √ |![读]() 、![写]()| -| | OCS | √ | √ |![读]() 、![写]()| -| NoSQL数据存储 | OTS | √ | √ |![读]() 、![写]()| -| | Hbase0.94 | √ | √ |![读]() 、![写]()| -| | Hbase1.1 | √ | √ |![读]() 、![写]()| -| | MongoDB | √ | √ |![读]() 、![写]()| -| 无结构化数据存储 | TxtFile | √ | √ |![读]() 、![写]()| -| | FTP | √ | √ |![读]() 、![写]()| -| | HDFS | √ | √ |![读]() 、![写]()| -| | Elasticsearch | | √ |![读]() 、![写]()| +| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| +|             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| +| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| +| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| +| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| +| | 达梦 | √ | √ |[读]() 、[写]()| +| | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读]() 、[写]()| +| 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md)| +| | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)| +| | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)| +| | OCS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)| +| NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)| +| | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)| +| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)| +| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)| +| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| +| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)| +| | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)| +| | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| +| | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)| # 我要开发新的插件 请点击:[DataX插件开发宝典](xxx) diff --git a/introduction.md b/introduction.md index f989d6d2..b27607c7 100644 --- a/introduction.md +++ b/introduction.md @@ -34,25 +34,26 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建 | 类型 | 数据源 | Reader(读) | Writer(写) |文档| | ------------ | ---------- | :-------: | :-------: |:-------: | -| RDBMS 关系型数据库 | MySQL | √ | √ |![读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、![写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| -| | Oracle | √ | √ |![读]() 、![写]()| -| | SQLServer | √ | √ |![读]() 、![写]()| -| | PostgreSQL | √ | √ |![读]() 、![写]()| -| | DRDS | √ | √ |![读]() 、![写]()| -| | 达梦 | √ | √ |![读]() 、![写]()| -| | 通用RDBMS(支持所有关系型数据库) | √ | √ |![读]() 、![写]()| -| 阿里云数仓数据存储 | ODPS | √ | √ |![读]() 、![写]()| -| | ADS | | √ |![读]() 、![写]()| -| | OSS | √ | √ |![读]() 、![写]()| -| | OCS | √ | √ |![读]() 、![写]()| -| NoSQL数据存储 | OTS | √ | √ |![读]() 、![写]()| -| | Hbase0.94 | √ | √ |![读]() 、![写]()| -| | Hbase1.1 | √ | √ |![读]() 、![写]()| -| | MongoDB | √ | √ |![读]() 、![写]()| -| 无结构化数据存储 | TxtFile | √ | √ |![读]() 、![写]()| -| | FTP | √ | √ |![读]() 、![写]()| -| | HDFS | √ | √ |![读]() 、![写]()| -| | Elasticsearch | | √ |![读]() 、![写]()| +| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| +|             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| +| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| +| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| +| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| +| | 达梦 | √ | √ |[读]() 、[写]()| +| | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读]() 、[写]()| +| 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpsswriter/doc/odpswriter.md)| +| | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)| +| | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)| +| | OCS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)| +| NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)| +| | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)| +| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)| +| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)| +| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| +| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)| +| | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)| +| | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| +| | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)| DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:[DataX数据源指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) @@ -147,4 +148,4 @@ DataX 3.0 开源版本支持单机多线程模式完成同步作业运行,本 - 在任务结束之后,打印总体运行情况 - ![datax_end_info](https://cloud.githubusercontent.com/assets/1067175/17850930/0484d3ac-6892-11e6-9c1d-b102ad210a32.png) \ No newline at end of file + ![datax_end_info](https://cloud.githubusercontent.com/assets/1067175/17850930/0484d3ac-6892-11e6-9c1d-b102ad210a32.png) From fbe6c727d711f63b1ebaacb2585c5820889d20a3 Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Mon, 5 Feb 2018 21:43:21 +0800 Subject: [PATCH 4/8] Update userGuid.md --- userGuid.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/userGuid.md b/userGuid.md index 5b17e068..7b9e7817 100644 --- a/userGuid.md +++ b/userGuid.md @@ -17,7 +17,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 * 工具部署 - * 方法一、直接下载DataX工具包:[DataX](https://github.com/alibaba/DataX) + * 方法一、直接下载DataX工具包:[DataX下载地址](http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz) 下载后解压至本地某个目录,进入bin目录,即可运行同步作业: @@ -25,7 +25,8 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 $ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json} ``` - + 自检脚本: + /home{YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json * 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX) (1)、下载DataX源码: From 4608abe947d9540ca3c11774d59a9c2768bbd891 Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Mon, 5 Feb 2018 21:45:15 +0800 Subject: [PATCH 5/8] Update userGuid.md --- userGuid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/userGuid.md b/userGuid.md index 7b9e7817..25fb95fa 100644 --- a/userGuid.md +++ b/userGuid.md @@ -26,7 +26,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源 $ python datax.py {YOUR_JOB.json} ``` 自检脚本: - /home{YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json +    python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json * 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX) (1)、下载DataX源码: From 26a6035683e4c06b685971a5c29a941dd6ca62a1 Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Mon, 14 May 2018 10:22:19 +0800 Subject: [PATCH 6/8] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b5539021..8513db79 100644 --- a/README.md +++ b/README.md @@ -106,6 +106,7 @@ This software is free to use under the Apache License [Apache license](https://g ```` 钉钉用户请扫描以下二维码进行讨论: -![DataX-OpenSource-Dingding](https://raw.githubusercontent.com/alibaba/DataX/master/images/datax-opensource-dingding.png) +![DataX-OpenSource-Dingding](https://img.alicdn.com/tfs/TB1nPz6rbGYBuNjy0FoXXciBFXa-310-425.png) + From 9a119ad0ff72f25022245e14e2f6572b23e0babb Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Wed, 23 May 2018 11:27:44 +0800 Subject: [PATCH 7/8] change the logic of getting orc column count in DFSUtil.java --- .idea/misc.xml | 13 +++++++++++++ .idea/vcs.xml | 6 ++++++ .../datax/plugin/reader/hdfsreader/DFSUtil.java | 7 +------ 3 files changed, 20 insertions(+), 6 deletions(-) create mode 100644 .idea/misc.xml create mode 100644 .idea/vcs.xml diff --git a/.idea/misc.xml b/.idea/misc.xml new file mode 100644 index 00000000..d30d09e2 --- /dev/null +++ b/.idea/misc.xml @@ -0,0 +1,13 @@ + + + + + + + + + \ No newline at end of file diff --git a/.idea/vcs.xml b/.idea/vcs.xml new file mode 100644 index 00000000..94a25f7f --- /dev/null +++ b/.idea/vcs.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java index 364dfead..c39d3847 100644 --- a/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java +++ b/hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java @@ -486,15 +486,10 @@ public class DFSUtil { } private int getAllColumnsCount(String filePath) { - int columnsCount; - final String colFinal = "_col"; Path path = new Path(filePath); try { Reader reader = OrcFile.createReader(path, OrcFile.readerOptions(hadoopConf)); - String type_struct = reader.getObjectInspector().getTypeName(); - columnsCount = (type_struct.length() - type_struct.replace(colFinal, "").length()) - / colFinal.length(); - return columnsCount; + return reader.getTypes().get(0).getSubtypesCount(); } catch (IOException e) { String message = "读取orcfile column列数失败,请联系系统管理员"; throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message); From 84cfeb51775379e0a2aba62ace62ec1f1ab2072b Mon Sep 17 00:00:00 2001 From: "mingya.wmy" Date: Wed, 23 May 2018 11:48:34 +0800 Subject: [PATCH 8/8] rm some idea files --- .idea/misc.xml | 13 ------------- .idea/vcs.xml | 6 ------ 2 files changed, 19 deletions(-) delete mode 100644 .idea/misc.xml delete mode 100644 .idea/vcs.xml diff --git a/.idea/misc.xml b/.idea/misc.xml deleted file mode 100644 index d30d09e2..00000000 --- a/.idea/misc.xml +++ /dev/null @@ -1,13 +0,0 @@ - - - - - - - - - \ No newline at end of file diff --git a/.idea/vcs.xml b/.idea/vcs.xml deleted file mode 100644 index 94a25f7f..00000000 --- a/.idea/vcs.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - - \ No newline at end of file