Sqoop使用和简介

    xiaoxiao2021-03-25  59

    Sqoop 工具是Hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。

    下面介绍几个常用的sqoop的命令,以及一些参数:

     

    序号

    命令/command

    说明

    1

    impor

    ImportTool

    从关系型数据库中导入数据(来自表或者查询语句)到HDFS中

    2

    export

    ExportTool

    将HDFS中的数据导入到关系型数据库中

    3

    codegen

    CodeGenTool

    获取数据库中某张表数据生成Java并打成jar包

    4

    create-hive-table

    CreateHiveTableTool

    创建Hive表

    5

    eval

    EvalSqlTool

    查看SQL执行结果

    6

    import-all-tables

    ImportAllTablesTool

    导入某个数据库下所有表到HDFS中

    7

    job

    JobTool

     

    8

    list-databases

    ListDatabasesTool

    列出所有数据库名

    9

    list-tables

    ListTablesTool

    列出某个数据库下所有表

    10

    merge

    MergeTool

     

    11

    metastore

    MetastoreTool

     

    12

    help

    HelpTool

    查看帮助

    13

    version

    VersionTool

    查看版本

     

    接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments

    Incrementalimport arguments

    Outputline formatting arguments

    Inputparsing arguments,Hive arguments

    HBasearguments

    GenericHadoop command-line arguments

     

    1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数

    序号

    参数

    说明

    样例

    1

    connect

    连接关系型数据库的URL

    jdbc:mysql://localhost/sqoop_datas

    2

    connection-manager

    连接管理类,一般不用

     

    3

    driver

    连接驱动

     

    4

    hadoop-home

    hadoop目录

    /home/hadoop

    5

    help

    查看帮助信息

     

    6

    password

    连接关系型数据库的密码

     

    7

    username

    链接关系型数据库的用户名

     

    8

    verbose

    查看更多的信息,其实是将日志级别调低

    该参数后面不接值

     

    Importcontrol arguments:

    Argument

    Description

    --append

    Append data to an existing dataset in HDFS

    --as-avrodatafile

    Imports data to Avro Data Files

    --as-sequencefile

    Imports data to SequenceFiles

    --as-textfile

    Imports data as plain text (default)

    --boundary-query <statement>

    Boundary query to use for creating splits

    --columns <col,col,col…>

    Columns to import from table

    --direct

    Use direct import fast path

    --direct-split-size <n>

    Split the input stream every n bytes when importing in direct mode

    --inline-lob-limit <n>

    Set the maximum size for an inline LOB

    -m,--num-mappers <n>

    Use n map tasks to import in parallel

    -e,--query <statement>

    Import the results of statement.

    --split-by <column-name>

    Column of the table used to split work units

    --table <table-name>

    Table to read

    --target-dir <dir>

    HDFS destination dir

    --warehouse-dir <dir>

    HDFS parent for table destination

    --where <where clause>

    WHERE clause to use during import

    -z,--compress

    Enable compression

    --compression-codec <c>

    Use Hadoop codec (default gzip)

    --null-string <null-string>

    The string to be written for a null value for string columns

    --null-non-string <null-string>

    The string to be written for a null value for non-string columns

     

     

    Incrementalimport arguments:

    Argument

    Description

    --check-column (col)

    Specifies the column to be examined when determining which rows to import.

    --incremental (mode)

    Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

    --last-value (value)

    Specifies the maximum value of the check column from the previous import.

     

     

    Output lineformatting arguments:

    Argument

    Description

    --enclosed-by <char>

    Sets a required field enclosing character

    --escaped-by <char>

    Sets the escape character

    --fields-terminated-by <char>

    Sets the field separator character

    --lines-terminated-by <char>

    Sets the end-of-line character

    --mysql-delimiters

    Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '

    --optionally-enclosed-by <char>

    Sets a field enclosing character

     

     

    Hivearguments:

    Argument

    Description

    --hive-home <dir>

    Override $HIVE_HOME

    --hive-import

    Import tables into Hive (Uses Hive’s default delimiters if none are set.)

    --hive-overwrite

    Overwrite existing data in the Hive table.

    --create-hive-table

    If set, then the job will fail if the target hive

     

    table exits. By default this property is false.

    --hive-table <table-name>

    Sets the table name to use when importing to Hive.

    --hive-drop-import-delims

    Drops \n, \r, and \01 from string fields when importing to Hive.

    --hive-delims-replacement

    Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.

    --hive-partition-key

    Name of a hive field to partition are sharded on

    --hive-partition-value <v>

    String-value that serves as partition key for this imported into hive in this job.

    --map-column-hive <map>

    Override default mapping from SQL type to Hive type for configured columns.

     

     

    HBasearguments:

    Argument

    Description

    --column-family <family>

    Sets the target column family for the import

    --hbase-create-table

    If specified, create missing HBase tables

    --hbase-row-key <col>

    Specifies which input column to use as the row key

    --hbase-table <table-name>

    Specifies an HBase table to use as the target instead of HDFS

     

     

    Codegeneration arguments:

    Argument

    Description

    --bindir <dir>

    Output directory for compiled objects

    --class-name <name>

    Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.

    --jar-file <file>

    Disable code generation; use specified jar

    --outdir <dir>

    Output directory for generated code

    --package-name <name>

    Put auto-generated classes in this package

    --map-column-java <m>

    Override default mapping from SQL type to Java type for configured columns.

    Sqoop 的详细介绍:请点这里 Sqoop 的下载地址:请点这里

    相关阅读:

    通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据 http://www.linuxidc.com/Linux/2013-06/85817.htm

    [Hadoop] Sqoop安装过程详解 http://www.linuxidc.com/Linux/2013-05/84082.htm

    用Sqoop进行MySQL和HDFS系统间的数据互导 http://www.linuxidc.com/Linux/2013-04/83447.htm

    Hadoop Oozie学习笔记 Oozie不支持Sqoop问题解决 http://www.linuxidc.com/Linux/2012-08/67027.htm

    Hadoop生态系统搭建(hadoop hive hbase zookeeper oozie Sqoop) http://www.linuxidc.com/Linux/2012-03/55721.htm

    Hadoop学习全程记录——使用Sqoop将MySQL中数据导入到Hive中 http://www.linuxidc.com/Linux/2012-01/51993.htm

    转载请注明原文地址: https://ju.6miu.com/read-38493.html

    最新回复(0)