DataX/mongodbreader/doc/mongodbreader.md

150 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

### Datax MongoDBReader
#### 1 快速介绍
MongoDBReader 插件利用 MongoDB 的java客户端MongoClient进行MongoDB的读操作。最新版本的Mongo已经将DB锁的粒度从DB级别降低到document级别配合上MongoDB强大的索引功能基本可以达到高性能的读取MongoDB的需求。
#### 2 实现原理
MongoDBReader通过Datax框架从MongoDB并行的读取数据通过主控的JOB程序按照指定的规则对MongoDB中的数据进行分片并行读取然后将MongoDB支持的类型通过逐一判断转换成Datax支持的类型。
#### 3 功能说明
* 该示例从ODPS读一份数据到MongoDB。
{
"job": {
"setting": {
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": ["127.0.0.1:27017"],
"userName": "",
"userPassword": "",
"dbName": "tag_per_data",
"collectionName": "tag_data12",
"column": [
{
"name": "unique_id",
"type": "string"
},
{
"name": "sid",
"type": "string"
},
{
"name": "user_id",
"type": "string"
},
{
"name": "auction_id",
"type": "string"
},
{
"name": "content_type",
"type": "string"
},
{
"name": "pool_type",
"type": "string"
},
{
"name": "frontcat_id",
"type": "Array",
"spliter": ""
},
{
"name": "categoryid",
"type": "Array",
"spliter": ""
},
{
"name": "gmt_create",
"type": "string"
},
{
"name": "taglist",
"type": "Array",
"spliter": " "
},
{
"name": "property",
"type": "string"
},
{
"name": "scorea",
"type": "int"
},
{
"name": "scoreb",
"type": "int"
},
{
"name": "scorec",
"type": "int"
}
]
}
},
"writer": {
"name": "odpswriter",
"parameter": {
"project": "tb_ai_recommendation",
"table": "jianying_tag_datax_read_test01",
"column": [
"unique_id",
"sid",
"user_id",
"auction_id",
"content_type",
"pool_type",
"frontcat_id",
"categoryid",
"gmt_create",
"taglist",
"property",
"scorea",
"scoreb"
],
"accessId": "**************",
"accessKey": "********************",
"truncate": true,
"odpsServer": "xxx/api",
"tunnelServer": "xxx",
"accountType": "aliyun"
}
}
}
]
}
}
#### 4 参数说明
* address MongoDB的数据地址信息因为MonogDB可能是个集群则ip端口信息需要以Json数组的形式给出。【必填】
* userNameMongoDB的用户名。【选填】
* userPassword MongoDB的密码。【选填】
* collectionName MonogoDB的集合名。【必填】
* columnMongoDB的文档列名。【必填】
* nameColumn的名字。【必填】
* typeColumn的类型。【选填】
* splitter因为MongoDB支持数组类型但是Datax框架本身不支持数组类型所以mongoDB读出来的数组类型要通过这个分隔符合并成字符串。【选填】
* query: MongoDB的额外查询条件。【选填】
#### 5 类型转换
| DataX 内部类型| MongoDB 数据类型 |
| -------- | ----- |
| Long | int, Long |
| Double | double |
| String | string, array |
| Date | date |
| Boolean | boolean |
| Bytes | bytes |
#### 6 性能报告
#### 7 测试报告