mirror of
https://github.com/apache/sqoop.git
synced 2025-05-10 22:00:16 +08:00

From: Thomas White <tomwhite@apache.org> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149840 13f79535-47bb-0310-9956-ffa450edef68
78 lines
3.3 KiB
Plaintext
78 lines
3.3 KiB
Plaintext
|
|
////
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
////
|
|
|
|
|
|
Direct-mode Imports
|
|
-------------------
|
|
|
|
While the JDBC-based import method used by Sqoop provides it with the
|
|
ability to read from a variety of databases using a generic driver, it
|
|
is not the most high-performance method available. Sqoop can read from
|
|
certain database systems faster by using their built-in export tools.
|
|
|
|
For example, Sqoop can read from a MySQL database by using the +mysqldump+
|
|
tool distributed with MySQL. You can take advantage of this faster
|
|
import method by running Sqoop with the +--direct+ argument. This
|
|
combined with a connect string that begins with +jdbc:mysql://+ will
|
|
inform Sqoop that it should select the faster access method.
|
|
|
|
If your delimiters exactly match the delimiters used by +mysqldump+,
|
|
then Sqoop will use a fast-path that copies the data directly from
|
|
+mysqldump+'s output into HDFS. Otherwise, Sqoop will parse +mysqldump+'s
|
|
output into fields and transcode them into the user-specified delimiter set.
|
|
This incurs additional processing, so performance may suffer.
|
|
For convenience, the +--mysql-delimiters+
|
|
argument will set all the output delimiters to be consistent with
|
|
+mysqldump+'s format.
|
|
|
|
Sqoop also provides a direct-mode backend for PostgreSQL that uses the
|
|
+COPY TO STDOUT+ protocol from +psql+. No specific delimiter set provides
|
|
better performance; Sqoop will forward delimiter control arguments to
|
|
+psql+.
|
|
|
|
The "Supported Databases" section provides a full list of database vendors
|
|
which have direct-mode support from Sqoop.
|
|
|
|
When writing to HDFS, direct mode will open a single output file to receive
|
|
the results of the import. You can instruct Sqoop to use multiple output
|
|
files by using the +--direct-split-size+ argument which takes a size in
|
|
bytes. Sqoop will generate files of approximately this size. e.g.,
|
|
+--direct-split-size 1000000+ will generate files of approximately 1 MB
|
|
each. If compressing the HDFS files with +--compress+, this will allow
|
|
subsequent MapReduce programs to use multiple mappers across your data
|
|
in parallel.
|
|
|
|
Tool-specific arguments
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Sqoop will generate a set of command-line arguments with which it invokes
|
|
the underlying direct-mode tool (e.g., mysqldump). You can specify additional
|
|
arguments which should be passed to the tool by passing them to Sqoop
|
|
after a single '+-+' argument. e.g.:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:mysql://localhost/db --table foo --direct - --lock-tables
|
|
----
|
|
|
|
The +--lock-tables+ argument (and anything else to the right of the +-+ argument)
|
|
will be passed directly to mysqldump.
|
|
|
|
|
|
|
|
|