mirror of
https://github.com/apache/sqoop.git
synced 2025-05-04 06:10:18 +08:00
SQOOP-142. Document requirements for direct import
Updated the documentation with details on direct mode execution requirements. From: Arvind Prabhakar <arvind@cloudera.com> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150005 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e33bdbced1
commit
70caf779b0
@ -33,7 +33,7 @@ Import control options
|
||||
Columns to export from table
|
||||
|
||||
--direct::
|
||||
Use direct import fast path (mysql only)
|
||||
Use direct import fast path (MySQL and PostgreSQL)
|
||||
|
||||
--direct-split-size (n)::
|
||||
Split the input stream every 'n' bytes when importing in direct mode.
|
||||
@ -41,8 +41,8 @@ Import control options
|
||||
--inline-lob-limit (n)::
|
||||
Set the maximum size for an inline LOB
|
||||
|
||||
-m::
|
||||
--num-mappers (n)::
|
||||
-m::
|
||||
Use 'n' map tasks to import in parallel
|
||||
|
||||
--query (statement)::
|
||||
@ -68,3 +68,9 @@ Import control options
|
||||
-z::
|
||||
Uses gzip to compress data as it is written to HDFS
|
||||
|
||||
--null-string::
|
||||
The string to be written for a null value for string columns
|
||||
|
||||
--null-non-string::
|
||||
The string to be written for a null value for non-string columns
|
||||
|
||||
|
@ -28,7 +28,7 @@ Export control options
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
--direct::
|
||||
Use direct import fast path (mysql only)
|
||||
Use direct import fast path (MySQL)
|
||||
|
||||
--export-dir (dir)::
|
||||
HDFS source path for the export
|
||||
@ -70,6 +70,10 @@ after a lone '--' on the command-line.
|
||||
In MySQL direct mode, additional arguments are passed directly to
|
||||
mysqldump.
|
||||
|
||||
Note: When using MySQL direct mode, the MySQL bulk utilities
|
||||
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
|
||||
present in the shell path of the task process.
|
||||
|
||||
ENVIRONMENT
|
||||
-----------
|
||||
|
||||
|
@ -24,62 +24,7 @@ The +--connect+ and +--table+ options are required.
|
||||
|
||||
include::common-args.txt[]
|
||||
|
||||
Import control options
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
--append::
|
||||
Append data to an existing HDFS dataset
|
||||
|
||||
--as-sequencefile::
|
||||
Imports data to SequenceFiles
|
||||
|
||||
--as-textfile::
|
||||
Imports data as plain text (default)
|
||||
|
||||
--columns (col,col,col...)::
|
||||
Columns to export from table
|
||||
|
||||
--direct::
|
||||
Use direct import fast path (mysql only)
|
||||
|
||||
--direct-split-size (n)::
|
||||
Split the input stream every 'n' bytes when importing in direct mode.
|
||||
|
||||
--inline-lob-limit (n)::
|
||||
Set the maximum size for an inline LOB
|
||||
|
||||
--num-mappers (n)::
|
||||
-m::
|
||||
Use 'n' map tasks to import in parallel
|
||||
|
||||
--query (statement)::
|
||||
Imports the results of +statement+ instead of a table
|
||||
|
||||
--split-by (column-name)::
|
||||
Column of the table used to split the table for parallel import
|
||||
|
||||
--table (table-name)::
|
||||
The table to import
|
||||
|
||||
--target-dir (dir)::
|
||||
Explicit HDFS target directory for the import.
|
||||
|
||||
--warehouse-dir (dir)::
|
||||
Tables are uploaded to the HDFS path +/warehouse/dir/(tablename)/+
|
||||
|
||||
--where (clause)::
|
||||
Import only the rows for which _clause_ is true.
|
||||
e.g.: `--where "user_id > 400 AND hidden == 0"`
|
||||
|
||||
--compress::
|
||||
-z::
|
||||
Uses gzip to compress data as it is written to HDFS
|
||||
|
||||
--null-string::
|
||||
The string to be written for a null value for string columns
|
||||
|
||||
--null-non-string::
|
||||
The string to be written for a null value for non-string columns
|
||||
include::import-args.txt[]
|
||||
|
||||
include::output-args.txt[]
|
||||
|
||||
@ -126,6 +71,14 @@ after a lone '--' on the command-line.
|
||||
In MySQL direct mode, additional arguments are passed directly to
|
||||
mysqldump.
|
||||
|
||||
Note: When using MySQL direct mode, the MySQL bulk utilities
|
||||
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
|
||||
present in the shell path of the task process.
|
||||
|
||||
Note: When using PostgreSQL direct mode, the PostgreSQL client utility
|
||||
+psql+ should be available on the task nodes and present in the shell path
|
||||
of the task process.
|
||||
|
||||
ENVIRONMENT
|
||||
-----------
|
||||
|
||||
|
@ -83,6 +83,9 @@ MySQL provides a direct mode for exports as well, using the
|
||||
to specify this codepath. This may be
|
||||
higher-performance than the standard JDBC codepath.
|
||||
|
||||
NOTE: When using export in direct mode with MySQL, the MySQL bulk utility
|
||||
+mysqlimport+ must be available in the shell path of the task process.
|
||||
|
||||
The +\--input-null-string+ and +\--input-null-non-string+ arguments are
|
||||
optional. If +\--input-null-string+ is not specified, then the string
|
||||
"null" will be interpreted as null for string-type columns.
|
||||
|
@ -33,7 +33,7 @@ $ sqoop import (generic-args) (import-args)
|
||||
$ sqoop-import (generic-args) (import-args)
|
||||
----
|
||||
|
||||
While the Hadoop generic arguments must preceed any import arguments,
|
||||
While the Hadoop generic arguments must precede any import arguments,
|
||||
you can type the import arguments in any order with respect to one
|
||||
another.
|
||||
|
||||
@ -246,6 +246,11 @@ data to a temporary directory and then rename the files into the normal
|
||||
target directory in a manner that does not conflict with existing filenames
|
||||
in that directory.
|
||||
|
||||
NOTE: When using the direct mode of import, certain database client utilities
|
||||
are expected to be present in the shell path of the task process. For MySQL
|
||||
the utilities +mysqldump+ and +mysqlimport+ are required, whereas for
|
||||
PostgreSQL the utility +psql+ is required.
|
||||
|
||||
Incremental Imports
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user