5
0
mirror of https://github.com/apache/sqoop.git synced 2025-05-03 19:51:41 +08:00

SQOOP-355. Improve Sqoop Documentation for Avro data file support.

(Doug Cutting via Arvind Prabhakar)


git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1178574 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Arvind Prabhakar 2011-10-03 20:55:13 +00:00
parent f4a4fbb304
commit 7c6b8b3e7d
4 changed files with 16 additions and 10 deletions

View File

@ -29,7 +29,7 @@ process is a set of files containing a copy of the imported table.
The import process is performed in parallel. For this reason, the The import process is performed in parallel. For this reason, the
output will be in multiple files. These files may be delimited text output will be in multiple files. These files may be delimited text
files (for example, with commas or tabs separating each field), or files (for example, with commas or tabs separating each field), or
binary SequenceFiles containing serialized record data. binary Avro or SequenceFiles containing serialized record data.
A by-product of the import process is a generated Java class which A by-product of the import process is a generated Java class which
can encapsulate one row of the imported table. This class is used can encapsulate one row of the imported table. This class is used

View File

@ -22,5 +22,5 @@
The +import+ tool imports an individual table from an RDBMS to HDFS. The +import+ tool imports an individual table from an RDBMS to HDFS.
Each row from a table is represented as a separate record in HDFS. Each row from a table is represented as a separate record in HDFS.
Records can be stored as text files (one record per line), or in Records can be stored as text files (one record per line), or in
binary representation in SequenceFiles. binary representation as Avro or SequenceFiles.

View File

@ -344,11 +344,17 @@ manipulated by custom MapReduce programs (reading from SequenceFiles
is higher-performance than reading from text files, as records do not is higher-performance than reading from text files, as records do not
need to be parsed). need to be parsed).
By default, data is not compressed. You can compress Avro data files are a compact, efficient binary format that provides
your data by using the deflate (gzip) algorithm with the +-z+ or interoperability with applications written in other programming
+\--compress+ argument, or specify any Hadoop compression codec using the languages. Avro also supports versioning, so that when, e.g., columns
+\--compression-codec+ argument. This applies to both SequenceFiles or text are added or removed from a table, previously imported data files can
files. be processed along with new ones.
By default, data is not compressed. You can compress your data by
using the deflate (gzip) algorithm with the +-z+ or +\--compress+
argument, or specify any Hadoop compression codec using the
+\--compression-codec+ argument. This applies to SequenceFile, text,
and Avro files.
Large Objects Large Objects
^^^^^^^^^^^^^ ^^^^^^^^^^^^^

View File

@ -304,8 +304,8 @@ This would run a MapReduce job where the value in the +id+ column
of each row is used to join rows; rows in the +newer+ dataset will of each row is used to join rows; rows in the +newer+ dataset will
be used in preference to rows in the +older+ dataset. be used in preference to rows in the +older+ dataset.
This can be used with both SequenceFile- and text-based incremental This can be used with both SequenceFile-, Avro- and text-based
imports. The file types of the newer and older datasets must be the incremental imports. The file types of the newer and older datasets
same. must be the same.