mirror of
https://github.com/apache/sqoop.git
synced 2025-05-02 07:21:58 +08:00
rephrased new lines a bit
This commit is contained in:
parent
821cb6bfd3
commit
936116ff07
@ -131,34 +131,32 @@ Create an external Hive table:
|
||||
$ sqoop import --hive-import --create-hive-table --connect $CONN --table $TABLENAME --username $USER --password $PASS --external-table-dir /tmp/foobar_example --hive-table foobar
|
||||
----
|
||||
|
||||
Type Mapping in a Hive import using parquet files
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
Decimals in Hive imports using parquet files
|
||||
++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
As mentioned above, a hive import is a two-step process in Sqoop:
|
||||
- Sqoop imports the data with the import tool onto HDFS first.
|
||||
- Then, Sqoop generates a Hive statement and executes it, effectively creating a table in Hive.
|
||||
As mentioned above, a Hive import is a two-step process in Sqoop:
|
||||
first, the data is imported onto HDFS, then a statement is generated and executed to create a Hive table.
|
||||
|
||||
Since Sqoop is using an avro schema to write parquet files, the SQL types of the source table's column are first
|
||||
converted into avro types and an avro schema is created. This schema is then used in a regular Parquet import.
|
||||
After the data was imported onto HDFS successfully, in the second step, Sqoop uses the Avro
|
||||
schema generated for the parquet import to create the Hive query and maps the Avro types to Hive
|
||||
types.
|
||||
Since Sqoop is using an avro schema to write parquet files, first an Avro schema is generated from the SQL types.
|
||||
This schema is then used in a regular Parquet import. After the data was imported onto HDFS successfully,
|
||||
Sqoop uses the Avro schema to create a Hive command to create a table in Hive and maps the Avro types to Hive
|
||||
types in this process.
|
||||
|
||||
Decimals are converted to String in a parquet import per default, so Decimal columns appear as String
|
||||
Decimal SQL types are converted to Strings in a parquet import per default, so Decimal columns appear as String
|
||||
columns in Hive per default. You can change this behavior and use logical types instead, so that Decimals
|
||||
will be mapped to the Hive type Decimal as well. This has to be enabled with the
|
||||
+sqoop.parquet.decimal_padding.enable+ property. As noted in the section discussing
|
||||
will be properly mapped to the Hive type Decimal as well. This has to be enabled with the
|
||||
+sqoop.parquet.logical_types.decimal.enable+ property. As noted in the section discussing
|
||||
'Padding number types in avro and parquet import', you should also specify the default precision and scale and
|
||||
enable decimal padding.
|
||||
enable padding.
|
||||
|
||||
A limitation of Hive is that the maximum precision and scale is 38. When converting SQL types to the Hive Decimal
|
||||
type, precision and scale will be modified to meet this limitation, automatically. The data itself however, will
|
||||
only have to adhere to the limitations of the Parquet import, thus values with a precision and scale bigger than
|
||||
38 will be present on storage on HDFS, but they won't be visible in Hive, (since Hive is a schema-on-read tool).
|
||||
only have to adhere to the limitations of the Parquet file format, thus values with a precision and scale bigger than
|
||||
38 will be present on storage, but they won't be readable by Hive, (since Hive is a schema-on-read tool).
|
||||
|
||||
Enable padding and specifying a default precision and scale in a Hive Import:
|
||||
Enabling padding and specifying a default precision and scale in a Hive Import:
|
||||
----
|
||||
$ sqoop import -Dsqoop.parquet.decimal_padding.enable=true -Dsqoop.parquet.logical_types.decimal.enable=true
|
||||
$ sqoop import -Dsqoop.avro.decimal_padding.enable=true -Dsqoop.parquet.logical_types.decimal.enable=true
|
||||
-Dsqoop.avro.logical_types.decimal.default.precision=38 -Dsqoop.avro.logical_types.decimal.default.scale=10
|
||||
--hive-import --connect $CONN --table $TABLENAME --username $USER --password $PASS --as-parquetfile
|
||||
----
|
||||
|
Loading…
Reference in New Issue
Block a user