5
0
mirror of https://github.com/apache/sqoop.git synced 2025-05-04 06:41:12 +08:00

SQOOP-142. Document requirements for direct import

Updated the documentation with details on direct mode execution
requirements.

From: Arvind Prabhakar <arvind@cloudera.com>

git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150005 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Andrew Bayer 2011-07-22 20:04:26 +00:00
parent e33bdbced1
commit 70caf779b0
5 changed files with 32 additions and 61 deletions

View File

@ -33,7 +33,7 @@ Import control options
Columns to export from table
--direct::
Use direct import fast path (mysql only)
Use direct import fast path (MySQL and PostgreSQL)
--direct-split-size (n)::
Split the input stream every 'n' bytes when importing in direct mode.
@ -41,8 +41,8 @@ Import control options
--inline-lob-limit (n)::
Set the maximum size for an inline LOB
-m::
--num-mappers (n)::
-m::
Use 'n' map tasks to import in parallel
--query (statement)::
@ -68,3 +68,9 @@ Import control options
-z::
Uses gzip to compress data as it is written to HDFS
--null-string::
The string to be written for a null value for string columns
--null-non-string::
The string to be written for a null value for non-string columns

View File

@ -28,7 +28,7 @@ Export control options
~~~~~~~~~~~~~~~~~~~~~~
--direct::
Use direct import fast path (mysql only)
Use direct import fast path (MySQL)
--export-dir (dir)::
HDFS source path for the export
@ -70,6 +70,10 @@ after a lone '--' on the command-line.
In MySQL direct mode, additional arguments are passed directly to
mysqldump.
Note: When using MySQL direct mode, the MySQL bulk utilities
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
present in the shell path of the task process.
ENVIRONMENT
-----------

View File

@ -24,62 +24,7 @@ The +--connect+ and +--table+ options are required.
include::common-args.txt[]
Import control options
~~~~~~~~~~~~~~~~~~~~~~
--append::
Append data to an existing HDFS dataset
--as-sequencefile::
Imports data to SequenceFiles
--as-textfile::
Imports data as plain text (default)
--columns (col,col,col...)::
Columns to export from table
--direct::
Use direct import fast path (mysql only)
--direct-split-size (n)::
Split the input stream every 'n' bytes when importing in direct mode.
--inline-lob-limit (n)::
Set the maximum size for an inline LOB
--num-mappers (n)::
-m::
Use 'n' map tasks to import in parallel
--query (statement)::
Imports the results of +statement+ instead of a table
--split-by (column-name)::
Column of the table used to split the table for parallel import
--table (table-name)::
The table to import
--target-dir (dir)::
Explicit HDFS target directory for the import.
--warehouse-dir (dir)::
Tables are uploaded to the HDFS path +/warehouse/dir/(tablename)/+
--where (clause)::
Import only the rows for which _clause_ is true.
e.g.: `--where "user_id > 400 AND hidden == 0"`
--compress::
-z::
Uses gzip to compress data as it is written to HDFS
--null-string::
The string to be written for a null value for string columns
--null-non-string::
The string to be written for a null value for non-string columns
include::import-args.txt[]
include::output-args.txt[]
@ -126,6 +71,14 @@ after a lone '--' on the command-line.
In MySQL direct mode, additional arguments are passed directly to
mysqldump.
Note: When using MySQL direct mode, the MySQL bulk utilities
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
present in the shell path of the task process.
Note: When using PostgreSQL direct mode, the PostgreSQL client utility
+psql+ should be available on the task nodes and present in the shell path
of the task process.
ENVIRONMENT
-----------

View File

@ -83,6 +83,9 @@ MySQL provides a direct mode for exports as well, using the
to specify this codepath. This may be
higher-performance than the standard JDBC codepath.
NOTE: When using export in direct mode with MySQL, the MySQL bulk utility
+mysqlimport+ must be available in the shell path of the task process.
The +\--input-null-string+ and +\--input-null-non-string+ arguments are
optional. If +\--input-null-string+ is not specified, then the string
"null" will be interpreted as null for string-type columns.

View File

@ -33,7 +33,7 @@ $ sqoop import (generic-args) (import-args)
$ sqoop-import (generic-args) (import-args)
----
While the Hadoop generic arguments must preceed any import arguments,
While the Hadoop generic arguments must precede any import arguments,
you can type the import arguments in any order with respect to one
another.
@ -246,6 +246,11 @@ data to a temporary directory and then rename the files into the normal
target directory in a manner that does not conflict with existing filenames
in that directory.
NOTE: When using the direct mode of import, certain database client utilities
are expected to be present in the shell path of the task process. For MySQL
the utilities +mysqldump+ and +mysqlimport+ are required, whereas for
PostgreSQL the utility +psql+ is required.
Incremental Imports
^^^^^^^^^^^^^^^^^^^