mirror of
https://github.com/apache/sqoop.git
synced 2025-05-04 06:22:46 +08:00
SQOOP-142. Document requirements for direct import
Updated the documentation with details on direct mode execution requirements. From: Arvind Prabhakar <arvind@cloudera.com> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150005 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e33bdbced1
commit
70caf779b0
@ -33,7 +33,7 @@ Import control options
|
|||||||
Columns to export from table
|
Columns to export from table
|
||||||
|
|
||||||
--direct::
|
--direct::
|
||||||
Use direct import fast path (mysql only)
|
Use direct import fast path (MySQL and PostgreSQL)
|
||||||
|
|
||||||
--direct-split-size (n)::
|
--direct-split-size (n)::
|
||||||
Split the input stream every 'n' bytes when importing in direct mode.
|
Split the input stream every 'n' bytes when importing in direct mode.
|
||||||
@ -41,8 +41,8 @@ Import control options
|
|||||||
--inline-lob-limit (n)::
|
--inline-lob-limit (n)::
|
||||||
Set the maximum size for an inline LOB
|
Set the maximum size for an inline LOB
|
||||||
|
|
||||||
-m::
|
|
||||||
--num-mappers (n)::
|
--num-mappers (n)::
|
||||||
|
-m::
|
||||||
Use 'n' map tasks to import in parallel
|
Use 'n' map tasks to import in parallel
|
||||||
|
|
||||||
--query (statement)::
|
--query (statement)::
|
||||||
@ -63,8 +63,14 @@ Import control options
|
|||||||
--where (clause)::
|
--where (clause)::
|
||||||
Import only the rows for which _clause_ is true.
|
Import only the rows for which _clause_ is true.
|
||||||
e.g.: `--where "user_id > 400 AND hidden == 0"`
|
e.g.: `--where "user_id > 400 AND hidden == 0"`
|
||||||
|
|
||||||
--compress::
|
--compress::
|
||||||
-z::
|
-z::
|
||||||
Uses gzip to compress data as it is written to HDFS
|
Uses gzip to compress data as it is written to HDFS
|
||||||
|
|
||||||
|
--null-string::
|
||||||
|
The string to be written for a null value for string columns
|
||||||
|
|
||||||
|
--null-non-string::
|
||||||
|
The string to be written for a null value for non-string columns
|
||||||
|
|
||||||
|
@ -28,7 +28,7 @@ Export control options
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
--direct::
|
--direct::
|
||||||
Use direct import fast path (mysql only)
|
Use direct import fast path (MySQL)
|
||||||
|
|
||||||
--export-dir (dir)::
|
--export-dir (dir)::
|
||||||
HDFS source path for the export
|
HDFS source path for the export
|
||||||
@ -70,6 +70,10 @@ after a lone '--' on the command-line.
|
|||||||
In MySQL direct mode, additional arguments are passed directly to
|
In MySQL direct mode, additional arguments are passed directly to
|
||||||
mysqldump.
|
mysqldump.
|
||||||
|
|
||||||
|
Note: When using MySQL direct mode, the MySQL bulk utilities
|
||||||
|
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
|
||||||
|
present in the shell path of the task process.
|
||||||
|
|
||||||
ENVIRONMENT
|
ENVIRONMENT
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
|
@ -24,62 +24,7 @@ The +--connect+ and +--table+ options are required.
|
|||||||
|
|
||||||
include::common-args.txt[]
|
include::common-args.txt[]
|
||||||
|
|
||||||
Import control options
|
include::import-args.txt[]
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
--append::
|
|
||||||
Append data to an existing HDFS dataset
|
|
||||||
|
|
||||||
--as-sequencefile::
|
|
||||||
Imports data to SequenceFiles
|
|
||||||
|
|
||||||
--as-textfile::
|
|
||||||
Imports data as plain text (default)
|
|
||||||
|
|
||||||
--columns (col,col,col...)::
|
|
||||||
Columns to export from table
|
|
||||||
|
|
||||||
--direct::
|
|
||||||
Use direct import fast path (mysql only)
|
|
||||||
|
|
||||||
--direct-split-size (n)::
|
|
||||||
Split the input stream every 'n' bytes when importing in direct mode.
|
|
||||||
|
|
||||||
--inline-lob-limit (n)::
|
|
||||||
Set the maximum size for an inline LOB
|
|
||||||
|
|
||||||
--num-mappers (n)::
|
|
||||||
-m::
|
|
||||||
Use 'n' map tasks to import in parallel
|
|
||||||
|
|
||||||
--query (statement)::
|
|
||||||
Imports the results of +statement+ instead of a table
|
|
||||||
|
|
||||||
--split-by (column-name)::
|
|
||||||
Column of the table used to split the table for parallel import
|
|
||||||
|
|
||||||
--table (table-name)::
|
|
||||||
The table to import
|
|
||||||
|
|
||||||
--target-dir (dir)::
|
|
||||||
Explicit HDFS target directory for the import.
|
|
||||||
|
|
||||||
--warehouse-dir (dir)::
|
|
||||||
Tables are uploaded to the HDFS path +/warehouse/dir/(tablename)/+
|
|
||||||
|
|
||||||
--where (clause)::
|
|
||||||
Import only the rows for which _clause_ is true.
|
|
||||||
e.g.: `--where "user_id > 400 AND hidden == 0"`
|
|
||||||
|
|
||||||
--compress::
|
|
||||||
-z::
|
|
||||||
Uses gzip to compress data as it is written to HDFS
|
|
||||||
|
|
||||||
--null-string::
|
|
||||||
The string to be written for a null value for string columns
|
|
||||||
|
|
||||||
--null-non-string::
|
|
||||||
The string to be written for a null value for non-string columns
|
|
||||||
|
|
||||||
include::output-args.txt[]
|
include::output-args.txt[]
|
||||||
|
|
||||||
@ -126,6 +71,14 @@ after a lone '--' on the command-line.
|
|||||||
In MySQL direct mode, additional arguments are passed directly to
|
In MySQL direct mode, additional arguments are passed directly to
|
||||||
mysqldump.
|
mysqldump.
|
||||||
|
|
||||||
|
Note: When using MySQL direct mode, the MySQL bulk utilities
|
||||||
|
+mysqldump+ and +mysqlimport+ should be available on the task nodes and
|
||||||
|
present in the shell path of the task process.
|
||||||
|
|
||||||
|
Note: When using PostgreSQL direct mode, the PostgreSQL client utility
|
||||||
|
+psql+ should be available on the task nodes and present in the shell path
|
||||||
|
of the task process.
|
||||||
|
|
||||||
ENVIRONMENT
|
ENVIRONMENT
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
|
@ -83,6 +83,9 @@ MySQL provides a direct mode for exports as well, using the
|
|||||||
to specify this codepath. This may be
|
to specify this codepath. This may be
|
||||||
higher-performance than the standard JDBC codepath.
|
higher-performance than the standard JDBC codepath.
|
||||||
|
|
||||||
|
NOTE: When using export in direct mode with MySQL, the MySQL bulk utility
|
||||||
|
+mysqlimport+ must be available in the shell path of the task process.
|
||||||
|
|
||||||
The +\--input-null-string+ and +\--input-null-non-string+ arguments are
|
The +\--input-null-string+ and +\--input-null-non-string+ arguments are
|
||||||
optional. If +\--input-null-string+ is not specified, then the string
|
optional. If +\--input-null-string+ is not specified, then the string
|
||||||
"null" will be interpreted as null for string-type columns.
|
"null" will be interpreted as null for string-type columns.
|
||||||
|
@ -33,7 +33,7 @@ $ sqoop import (generic-args) (import-args)
|
|||||||
$ sqoop-import (generic-args) (import-args)
|
$ sqoop-import (generic-args) (import-args)
|
||||||
----
|
----
|
||||||
|
|
||||||
While the Hadoop generic arguments must preceed any import arguments,
|
While the Hadoop generic arguments must precede any import arguments,
|
||||||
you can type the import arguments in any order with respect to one
|
you can type the import arguments in any order with respect to one
|
||||||
another.
|
another.
|
||||||
|
|
||||||
@ -246,6 +246,11 @@ data to a temporary directory and then rename the files into the normal
|
|||||||
target directory in a manner that does not conflict with existing filenames
|
target directory in a manner that does not conflict with existing filenames
|
||||||
in that directory.
|
in that directory.
|
||||||
|
|
||||||
|
NOTE: When using the direct mode of import, certain database client utilities
|
||||||
|
are expected to be present in the shell path of the task process. For MySQL
|
||||||
|
the utilities +mysqldump+ and +mysqlimport+ are required, whereas for
|
||||||
|
PostgreSQL the utility +psql+ is required.
|
||||||
|
|
||||||
Incremental Imports
|
Incremental Imports
|
||||||
^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user