mirror of
https://github.com/apache/sqoop.git
synced 2025-05-03 05:19:34 +08:00
SQOOP-355. Improve Sqoop Documentation for Avro data file support.
(Doug Cutting via Arvind Prabhakar) git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1178574 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
f4a4fbb304
commit
7c6b8b3e7d
@ -29,7 +29,7 @@ process is a set of files containing a copy of the imported table.
|
||||
The import process is performed in parallel. For this reason, the
|
||||
output will be in multiple files. These files may be delimited text
|
||||
files (for example, with commas or tabs separating each field), or
|
||||
binary SequenceFiles containing serialized record data.
|
||||
binary Avro or SequenceFiles containing serialized record data.
|
||||
|
||||
A by-product of the import process is a generated Java class which
|
||||
can encapsulate one row of the imported table. This class is used
|
||||
|
@ -22,5 +22,5 @@
|
||||
The +import+ tool imports an individual table from an RDBMS to HDFS.
|
||||
Each row from a table is represented as a separate record in HDFS.
|
||||
Records can be stored as text files (one record per line), or in
|
||||
binary representation in SequenceFiles.
|
||||
binary representation as Avro or SequenceFiles.
|
||||
|
||||
|
@ -344,11 +344,17 @@ manipulated by custom MapReduce programs (reading from SequenceFiles
|
||||
is higher-performance than reading from text files, as records do not
|
||||
need to be parsed).
|
||||
|
||||
By default, data is not compressed. You can compress
|
||||
your data by using the deflate (gzip) algorithm with the +-z+ or
|
||||
+\--compress+ argument, or specify any Hadoop compression codec using the
|
||||
+\--compression-codec+ argument. This applies to both SequenceFiles or text
|
||||
files.
|
||||
Avro data files are a compact, efficient binary format that provides
|
||||
interoperability with applications written in other programming
|
||||
languages. Avro also supports versioning, so that when, e.g., columns
|
||||
are added or removed from a table, previously imported data files can
|
||||
be processed along with new ones.
|
||||
|
||||
By default, data is not compressed. You can compress your data by
|
||||
using the deflate (gzip) algorithm with the +-z+ or +\--compress+
|
||||
argument, or specify any Hadoop compression codec using the
|
||||
+\--compression-codec+ argument. This applies to SequenceFile, text,
|
||||
and Avro files.
|
||||
|
||||
Large Objects
|
||||
^^^^^^^^^^^^^
|
||||
|
@ -304,8 +304,8 @@ This would run a MapReduce job where the value in the +id+ column
|
||||
of each row is used to join rows; rows in the +newer+ dataset will
|
||||
be used in preference to rows in the +older+ dataset.
|
||||
|
||||
This can be used with both SequenceFile- and text-based incremental
|
||||
imports. The file types of the newer and older datasets must be the
|
||||
same.
|
||||
This can be used with both SequenceFile-, Avro- and text-based
|
||||
incremental imports. The file types of the newer and older datasets
|
||||
must be the same.
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user