mirror of
https://github.com/alibaba/DataX.git
synced 2025-05-02 04:59:51 +08:00
Merge branch 'master' of https://github.com/alibaba/DataX
This commit is contained in:
commit
e0d6d6618b
46
README.md
46
README.md
@ -16,7 +16,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
|||||||
|
|
||||||
# DataX详细介绍
|
# DataX详细介绍
|
||||||
|
|
||||||
##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/wiki/DataX-Introduction)
|
##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/blob/master/introduction.md)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -28,31 +28,32 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Support Data Channels
|
# Support Data Channels
|
||||||
|
|
||||||
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
|
DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
|
||||||
|
|
||||||
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
|
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
|
||||||
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
||||||
| RDBMS 关系型数据库 | MySQL | √ | √ | 、|
|
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
||||||
| | Oracle | √ | √ |![读]() 、![写]()|
|
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
||||||
| | SQLServer | √ | √ |![读]() 、![写]()|
|
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
||||||
| | PostgreSQL | √ | √ |![读]() 、![写]()|
|
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
||||||
| | DRDS | √ | √ |![读]() 、![写]()|
|
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
||||||
| | 达梦 | √ | √ |![读]() 、![写]()|
|
| | 达梦 | √ | √ |[读]() 、[写]()|
|
||||||
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |![读]() 、![写]()|
|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读]() 、[写]()|
|
||||||
| 阿里云数仓数据存储 | ODPS | √ | √ |![读]() 、![写]()|
|
| 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md)|
|
||||||
| | ADS | | √ |![读]() 、![写]()|
|
| | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)|
|
||||||
| | OSS | √ | √ |![读]() 、![写]()|
|
| | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)|
|
||||||
| | OCS | √ | √ |![读]() 、![写]()|
|
| | OCS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)|
|
||||||
| NoSQL数据存储 | OTS | √ | √ |![读]() 、![写]()|
|
| NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)|
|
||||||
| | Hbase0.94 | √ | √ |![读]() 、![写]()|
|
| | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)|
|
||||||
| | Hbase1.1 | √ | √ |![读]() 、![写]()|
|
| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)|
|
||||||
| | MongoDB | √ | √ |![读]() 、![写]()|
|
| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)|
|
||||||
| 无结构化数据存储 | TxtFile | √ | √ |![读]() 、![写]()|
|
| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
||||||
| | FTP | √ | √ |![读]() 、![写]()|
|
| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)|
|
||||||
| | HDFS | √ | √ |![读]() 、![写]()|
|
| | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)|
|
||||||
| | Elasticsearch | | √ |![读]() 、![写]()|
|
| | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
||||||
|
| | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)|
|
||||||
|
|
||||||
# 我要开发新的插件
|
# 我要开发新的插件
|
||||||
请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md)
|
请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md)
|
||||||
@ -105,6 +106,7 @@ This software is free to use under the Apache License [Apache license](https://g
|
|||||||
````
|
````
|
||||||
钉钉用户请扫描以下二维码进行讨论:
|
钉钉用户请扫描以下二维码进行讨论:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -486,15 +486,10 @@ public class DFSUtil {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private int getAllColumnsCount(String filePath) {
|
private int getAllColumnsCount(String filePath) {
|
||||||
int columnsCount;
|
|
||||||
final String colFinal = "_col";
|
|
||||||
Path path = new Path(filePath);
|
Path path = new Path(filePath);
|
||||||
try {
|
try {
|
||||||
Reader reader = OrcFile.createReader(path, OrcFile.readerOptions(hadoopConf));
|
Reader reader = OrcFile.createReader(path, OrcFile.readerOptions(hadoopConf));
|
||||||
String type_struct = reader.getObjectInspector().getTypeName();
|
return reader.getTypes().get(0).getSubtypesCount();
|
||||||
columnsCount = (type_struct.length() - type_struct.replace(colFinal, "").length())
|
|
||||||
/ colFinal.length();
|
|
||||||
return columnsCount;
|
|
||||||
} catch (IOException e) {
|
} catch (IOException e) {
|
||||||
String message = "读取orcfile column列数失败,请联系系统管理员";
|
String message = "读取orcfile column列数失败,请联系系统管理员";
|
||||||
throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message);
|
throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message);
|
||||||
|
@ -34,25 +34,26 @@ DataX本身作为离线数据同步框架,采用Framework + plugin架构构建
|
|||||||
|
|
||||||
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
|
| 类型 | 数据源 | Reader(读) | Writer(写) |文档|
|
||||||
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
| ------------ | ---------- | :-------: | :-------: |:-------: |
|
||||||
| RDBMS 关系型数据库 | MySQL | √ | √ | 、|
|
| RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)|
|
||||||
| | Oracle | √ | √ |![读]() 、![写]()|
|
| | Oracle | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)|
|
||||||
| | SQLServer | √ | √ |![读]() 、![写]()|
|
| | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)|
|
||||||
| | PostgreSQL | √ | √ |![读]() 、![写]()|
|
| | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)|
|
||||||
| | DRDS | √ | √ |![读]() 、![写]()|
|
| | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)|
|
||||||
| | 达梦 | √ | √ |![读]() 、![写]()|
|
| | 达梦 | √ | √ |[读]() 、[写]()|
|
||||||
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |![读]() 、![写]()|
|
| | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读]() 、[写]()|
|
||||||
| 阿里云数仓数据存储 | ODPS | √ | √ |![读]() 、![写]()|
|
| 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpsswriter/doc/odpswriter.md)|
|
||||||
| | ADS | | √ |![读]() 、![写]()|
|
| | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)|
|
||||||
| | OSS | √ | √ |![读]() 、![写]()|
|
| | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)|
|
||||||
| | OCS | √ | √ |![读]() 、![写]()|
|
| | OCS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)|
|
||||||
| NoSQL数据存储 | OTS | √ | √ |![读]() 、![写]()|
|
| NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)|
|
||||||
| | Hbase0.94 | √ | √ |![读]() 、![写]()|
|
| | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)|
|
||||||
| | Hbase1.1 | √ | √ |![读]() 、![写]()|
|
| | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)|
|
||||||
| | MongoDB | √ | √ |![读]() 、![写]()|
|
| | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)|
|
||||||
| 无结构化数据存储 | TxtFile | √ | √ |![读]() 、![写]()|
|
| | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
||||||
| | FTP | √ | √ |![读]() 、![写]()|
|
| 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)|
|
||||||
| | HDFS | √ | √ |![读]() 、![写]()|
|
| | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)|
|
||||||
| | Elasticsearch | | √ |![读]() 、![写]()|
|
| | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
|
||||||
|
| | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)|
|
||||||
|
|
||||||
|
|
||||||
DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:[DataX数据源指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
|
DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:[DataX数据源指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels)
|
||||||
@ -147,4 +148,4 @@ DataX 3.0 开源版本支持单机多线程模式完成同步作业运行,本
|
|||||||
|
|
||||||
- 在任务结束之后,打印总体运行情况
|
- 在任务结束之后,打印总体运行情况
|
||||||
|
|
||||||

|

|
||||||
|
@ -17,7 +17,7 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
|||||||
|
|
||||||
* 工具部署
|
* 工具部署
|
||||||
|
|
||||||
* 方法一、直接下载DataX工具包:[DataX](https://github.com/alibaba/DataX)
|
* 方法一、直接下载DataX工具包:[DataX下载地址](http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz)
|
||||||
|
|
||||||
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
|
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
|
||||||
|
|
||||||
@ -25,7 +25,8 @@ DataX本身作为数据同步框架,将不同数据源的同步抽象为从源
|
|||||||
$ cd {YOUR_DATAX_HOME}/bin
|
$ cd {YOUR_DATAX_HOME}/bin
|
||||||
$ python datax.py {YOUR_JOB.json}
|
$ python datax.py {YOUR_JOB.json}
|
||||||
```
|
```
|
||||||
|
自检脚本:
|
||||||
|
python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
|
||||||
* 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX)
|
* 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX)
|
||||||
|
|
||||||
(1)、下载DataX源码:
|
(1)、下载DataX源码:
|
||||||
|
Loading…
Reference in New Issue
Block a user