5
0
mirror of https://github.com/apache/sqoop.git synced 2025-05-02 08:11:21 +08:00

SQOOP-3390: Document S3Guard usage with Sqoop

(Boglarka Egyed via Szabolcs Vasas)
This commit is contained in:
Szabolcs Vasas 2018-10-24 16:44:10 +02:00
parent 15097756c7
commit 9e328a53e1

View File

@ -161,4 +161,34 @@ $ sqoop import \
--external-table-dir s3a://example-bucket/external-directory
----
Data from RDBMS can be imported into an external Hive table backed by S3 as Parquet file format too.
Data from RDBMS can be imported into an external Hive table backed by S3 as Parquet file format too.
Hadoop S3Guard usage with Sqoop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Amazon S3 offers eventual consistency for PUTS and DELETES in all regions which means the visibility of the files
are not guaranteed in a specific time after creation. Due to this behavior it can happen that right after a
sqoop import the data will not be visible immediately. For learning more about the core concepts of Amazon S3
please see the official documentation at https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts.
S3Guard is an experimental feature for the S3A client in Hadoop which can use a database as a store of metadata about
objects in an S3 bucket. For learning more about S3Guard please see the Hadoop documentation at
https://hadoop.apache.org/docs/r3.0.3/hadoop-aws/tools/hadoop-aws/s3guard.html.
S3Guard can be enabled during sqoop imports via setting properties described in the linked documentation.
Example usage with setting S3Guard:
----
$ sqoop import \
-Dfs.s3a.access.key=$AWS_ACCESS_KEY \
-Dfs.s3a.secret.key=$AWS_SECRET_KEY \
-Dfs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore \
-Dfs.s3a.s3guard.ddb.region=$BUCKET_REGION \
-Dfs.s3a.s3guard.ddb.table.create=true \
--connect $CONN \
--username $USER \
--password $PWD \
--table $TABLENAME \
--target-dir s3a://example-bucket/target-directory
----