DATAX使用小记



简述

  • 1.是阿里巴巴内部被广泛使用的离线数据同步工具或平台
  • 2.通过配置reader+writer的配置方式,实现各种异步数据源之间的数据同步
  • 3.提供的一步数据源包含mysql/sqlserver/oracle/hdfs/hive/hbase和各种通过配置jdbc方式连接的数据源

通用关系型数据库插件(RDBMS)

  • 1.通过JDBC连接远程RDBMS数据库
  • 2.可通过query的方式选取数据
  • 3.可以通过注册数据库驱动等方式增加任意多样的关系数据库
  • 4.注册数据库驱动
    • 进入rdbmsreader对应目录,这里${DATAX_HOME}为DataX主目录,即: ${DATAX_HOME}/plugin/reader/rdbmswriter

    • 在rdbmsreader插件目录下有plugin.json配置文件,在此文件中注册您具体的数据库驱动,具体放在drivers数组中。
      rdbmsreader插件在任务执行时会动态选择合适的数据库驱动连接数据库

      {
        "name": "rdbmsreader",
        "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader",
        "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.",
        "developer": "alibaba",
        "drivers": [
          "dm.jdbc.driver.DmDriver",
          "com.ibm.db2.jcc.DB2Driver",
          "com.sybase.jdbc3.jdbc.SybDriver",
          "com.edb.Driver"
        ]
      }
      
    • 在rdbmsreader插件目录下有libs子目录,您需要将您具体的数据库驱动放到libs目录下

      $ tree
      .
      |-- libs
      |   |-- Dm7JdbcDriver16.jar
      |   |-- commons-collections-3.0.jar
      |   |-- commons-io-2.4.jar
      |   |-- commons-lang3-3.3.2.jar
      |   |-- commons-math3-3.1.1.jar
      |   |-- datax-common-0.0.1-SNAPSHOT.jar
      |   |-- datax-service-face-1.0.23-20160120.024328-1.jar
      |   |-- db2jcc4.jar
      |   |-- druid-1.0.15.jar
      |   |-- edb-jdbc16.jar
      |   |-- fastjson-1.1.46.sec01.jar
      |   |-- guava-r05.jar
      |   |-- hamcrest-core-1.3.jar
      |   |-- jconn3-1.0.0-SNAPSHOT.jar
      |   |-- logback-classic-1.0.13.jar
      |   |-- logback-core-1.0.13.jar
      |   |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar
      |   `-- slf4j-api-1.7.10.jar
      |-- plugin.json
      |-- plugin_job_template.json
      `-- rdbmsreader-0.0.1-SNAPSHOT.jar
      

举个栗子

Hive to MySQL

{
  "job": {
    "setting": {
      "speed": {
        "channel": 3
      }
    },
    "content": [{
      "reader": {
        "name": "hdfsreader",
        "parameter": {
          "path": "/user/hive/warehouse/dim.db/dim_brand/*",
          "defaultFS": "hdfs://Ucluster",
          "column": [
            {"index":0,"type":"string"},
            {"index":1,"type":"string"},
            {"index":2,"type":"string"},
            {"index":3,"type":"string"}
          ],
          "fileType": "orc",
          "encoding": "UTF-8",
          "nullFormat":"\\N",
          "fieldDelimiter": ",",
          "hadoopConfig": {
            "dfs.nameservices": "Ucluster",
            "dfs.ha.namenodes.Ucluster": "nn1,nn2",
            "dfs.namenode.rpc-address.Ucluster.nn1": "uhadoop-mzwc2w-master1:8020",
            "dfs.namenode.rpc-address.Ucluster.nn2": "uhadoop-mzwc2w-master2:8020",
            "dfs.client.failover.proxy.provider.Ucluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
          }
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "root",
          "writeMode": "replace",
          "batchSize": 1024,
          "column": [
            "dim_brand_id"
            ,"brand_code"
            ,"brand_name"
            ,"update_time"
          ],
          "session": [ ],
          "preSql": [ ],
          "postSql":[ ],
          "connection": [{
            "jdbcUrl": "jdbc:mysql://localhost:3306/dim?useUnicode=true&characterEncoding=utf8",
            "table": [
              "dim_brand"
            ]
          }]
        }
      }
    }]
  }
}

Sap-Hana Reader

{
  "job": {
    "setting": {
      "speed": {
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "rdbmsreader",
          "parameter": {
            "username": "HC2BW4",
            "password": "xxxxx",
            "connection": [
              {
                "querySql": [
                  "select * from HC2BW4.ZHC_WZCD limit 20"
                ],
                "jdbcUrl": [
                  "jdbc:sap://10.0.30.115:30041/"
                ]
              }
            ],
            "column": [ ],
            "fetchSize": 1024,
            "where": " "
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "print": true
          }
        }
      }
    ]
  }
}

文章作者: darebeat
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 darebeat !
 上一篇
SQOOP原理和使用小记 SQOOP原理和使用小记
Sqoop是一个在结构化数据和Hadoop之间进行批量数据迁移的工具,结构化数据可以是MySQL、Oracle等RDBMS.Sqoop底层用MapReduce程序实现抽取、转换、加载,MapReduce天生的特性保证了并行化和高容错率,而且相比Kettle等传统ETL工具,任务跑在Hadoop集群上,减少了ETL服务器资源的使用情况。
2021-01-31
下一篇 
TeamViewer解决5分钟强制退出问题 TeamViewer解决5分钟强制退出问题
TeamViewer解决5分钟强制退出问题
2021-01-20
  目录