简述
- 1.是阿里巴巴内部被广泛使用的离线数据同步工具或平台
- 2.通过配置
reader+writer
的配置方式,实现各种异步数据源之间的数据同步 - 3.提供的一步数据源包含
mysql/sqlserver/oracle/hdfs/hive/hbase
和各种通过配置jdbc方式连接的数据源
通用关系型数据库插件(RDBMS)
- 1.通过JDBC连接远程RDBMS数据库
- 2.可通过query的方式选取数据
- 3.可以通过注册数据库驱动等方式增加任意多样的关系数据库
- 4.注册数据库驱动
进入rdbmsreader对应目录,这里
${DATAX_HOME}
为DataX主目录,即:${DATAX_HOME}/plugin/reader/rdbmswriter
在rdbmsreader插件目录下有
plugin.json
配置文件,在此文件中注册您具体的数据库驱动,具体放在drivers数组中。
rdbmsreader插件在任务执行时会动态选择合适的数据库驱动连接数据库{ "name": "rdbmsreader", "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", "drivers": [ "dm.jdbc.driver.DmDriver", "com.ibm.db2.jcc.DB2Driver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver" ] }
在rdbmsreader插件目录下有libs子目录,您需要将您具体的数据库驱动放到libs目录下
$ tree . |-- libs | |-- Dm7JdbcDriver16.jar | |-- commons-collections-3.0.jar | |-- commons-io-2.4.jar | |-- commons-lang3-3.3.2.jar | |-- commons-math3-3.1.1.jar | |-- datax-common-0.0.1-SNAPSHOT.jar | |-- datax-service-face-1.0.23-20160120.024328-1.jar | |-- db2jcc4.jar | |-- druid-1.0.15.jar | |-- edb-jdbc16.jar | |-- fastjson-1.1.46.sec01.jar | |-- guava-r05.jar | |-- hamcrest-core-1.3.jar | |-- jconn3-1.0.0-SNAPSHOT.jar | |-- logback-classic-1.0.13.jar | |-- logback-core-1.0.13.jar | |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar | `-- slf4j-api-1.7.10.jar |-- plugin.json |-- plugin_job_template.json `-- rdbmsreader-0.0.1-SNAPSHOT.jar
举个栗子
Hive to MySQL
{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "/user/hive/warehouse/dim.db/dim_brand/*",
"defaultFS": "hdfs://Ucluster",
"column": [
{"index":0,"type":"string"},
{"index":1,"type":"string"},
{"index":2,"type":"string"},
{"index":3,"type":"string"}
],
"fileType": "orc",
"encoding": "UTF-8",
"nullFormat":"\\N",
"fieldDelimiter": ",",
"hadoopConfig": {
"dfs.nameservices": "Ucluster",
"dfs.ha.namenodes.Ucluster": "nn1,nn2",
"dfs.namenode.rpc-address.Ucluster.nn1": "uhadoop-mzwc2w-master1:8020",
"dfs.namenode.rpc-address.Ucluster.nn2": "uhadoop-mzwc2w-master2:8020",
"dfs.client.failover.proxy.provider.Ucluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
}
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "root",
"writeMode": "replace",
"batchSize": 1024,
"column": [
"dim_brand_id"
,"brand_code"
,"brand_name"
,"update_time"
],
"session": [ ],
"preSql": [ ],
"postSql":[ ],
"connection": [{
"jdbcUrl": "jdbc:mysql://localhost:3306/dim?useUnicode=true&characterEncoding=utf8",
"table": [
"dim_brand"
]
}]
}
}
}]
}
}
Sap-Hana Reader
{
"job": {
"setting": {
"speed": {
"byte": 1048576
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "rdbmsreader",
"parameter": {
"username": "HC2BW4",
"password": "xxxxx",
"connection": [
{
"querySql": [
"select * from HC2BW4.ZHC_WZCD limit 20"
],
"jdbcUrl": [
"jdbc:sap://10.0.30.115:30041/"
]
}
],
"column": [ ],
"fetchSize": 1024,
"where": " "
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}
]
}
}