mirror of
https://github.com/apache/sqoop.git
synced 2025-05-04 11:31:35 +08:00

From: Christopher Douglas <cdouglas@apache.org> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149834 13f79535-47bb-0310-9956-ffa450edef68
93 lines
3.7 KiB
Plaintext
93 lines
3.7 KiB
Plaintext
|
|
////
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
////
|
|
|
|
|
|
Automatic Full-database Import
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
If you want to import all the tables in a database, you can use the
|
|
+--all-tables+ command to do so:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables
|
|
----
|
|
|
|
This will query the database for the available tables, generate an ORM
|
|
class for each table, and run a MapReduce job to import each one.
|
|
Hadoop uses the DBInputFormat to read from a database into a Mapper
|
|
instance. To read a table into a MapReduce program requires creating a
|
|
class to hold the fields of one row of the table. One of the benefits
|
|
of Sqoop is that it generates this class definition for you, based on
|
|
the table definition in the database.
|
|
|
|
The generated +.java+ files are, by default, placed in the current
|
|
directory. You can supply a different directory with the +--outdir+
|
|
parameter. These are then compiled into +.class+ and +.jar+ files for use
|
|
by the MapReduce job that it launches. These files are created in a
|
|
temporary directory. You can redirect this target with +--bindir+.
|
|
|
|
Each table will be imported into a separate directory in HDFS, with
|
|
the same name as the table. For instance, if my Hadoop username is
|
|
aaron, the above command would have generated the following
|
|
directories in HDFS:
|
|
|
|
----
|
|
/user/aaron/employee_names
|
|
/user/aaron/payroll_checks
|
|
/user/aaron/job_descriptions
|
|
/user/aaron/office_supplies
|
|
----
|
|
|
|
You can change the base directory under which the tables are loaded
|
|
with the +--warehouse-dir+ parameter. For example:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables \
|
|
--warehouse-dir /common/warehouse
|
|
----
|
|
|
|
This would create the following directories instead:
|
|
|
|
----
|
|
/common/warehouse/employee_names
|
|
/common/warehouse/payroll_checks
|
|
/common/warehouse/job_descriptions
|
|
/common/warehouse/office_supplies
|
|
----
|
|
|
|
By default the data will be read into text files in HDFS. Each of the
|
|
columns will be represented as comma-delimited text. Each row is
|
|
terminated by a newline. See the section on "Controlling the Output
|
|
Format" below for information on how to change these delimiters.
|
|
|
|
If you want to leverage compression and binary file formats, the
|
|
+--as-sequencefile+ argument to Sqoop will import the table
|
|
to a set of SequenceFiles instead. This stores each field of each
|
|
database record in a separate object in a SequenceFile.
|
|
This representation is also likely to be higher performance when used
|
|
as an input to subsequent MapReduce programs as it does not require
|
|
parsing. For completeness, Sqoop provides an +--as-textfile+ option, which is
|
|
implied by default. An +--as-textfile+ on the command-line will override
|
|
a previous +--as-sequencefile+ argument.
|
|
|
|
The SequenceFile format will embed the records from the database as
|
|
objects using the code generated by Sqoop. It is important that you
|
|
retain the +.java+ file for this class, as you will need to be able to
|
|
instantiate the same type to read the objects back later, in other
|
|
user-defined applications.
|
|
|