From 936116ff07d9ba8f19f16f07c3ae4f9c2dabaf01 Mon Sep 17 00:00:00 2001 From: Fero Szabo Date: Thu, 6 Dec 2018 16:13:33 +0100 Subject: [PATCH] rephrased new lines a bit --- src/docs/user/hive.txt | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/src/docs/user/hive.txt b/src/docs/user/hive.txt index 03c2bff0..498602c2 100644 --- a/src/docs/user/hive.txt +++ b/src/docs/user/hive.txt @@ -131,34 +131,32 @@ Create an external Hive table: $ sqoop import --hive-import --create-hive-table --connect $CONN --table $TABLENAME --username $USER --password $PASS --external-table-dir /tmp/foobar_example --hive-table foobar ---- -Type Mapping in a Hive import using parquet files -+++++++++++++++++++++++++++++++++++++++++++++++++ +Decimals in Hive imports using parquet files +++++++++++++++++++++++++++++++++++++++++++++ -As mentioned above, a hive import is a two-step process in Sqoop: -- Sqoop imports the data with the import tool onto HDFS first. -- Then, Sqoop generates a Hive statement and executes it, effectively creating a table in Hive. +As mentioned above, a Hive import is a two-step process in Sqoop: +first, the data is imported onto HDFS, then a statement is generated and executed to create a Hive table. -Since Sqoop is using an avro schema to write parquet files, the SQL types of the source table's column are first -converted into avro types and an avro schema is created. This schema is then used in a regular Parquet import. -After the data was imported onto HDFS successfully, in the second step, Sqoop uses the Avro -schema generated for the parquet import to create the Hive query and maps the Avro types to Hive -types. +Since Sqoop is using an avro schema to write parquet files, first an Avro schema is generated from the SQL types. +This schema is then used in a regular Parquet import. After the data was imported onto HDFS successfully, +Sqoop uses the Avro schema to create a Hive command to create a table in Hive and maps the Avro types to Hive +types in this process. -Decimals are converted to String in a parquet import per default, so Decimal columns appear as String +Decimal SQL types are converted to Strings in a parquet import per default, so Decimal columns appear as String columns in Hive per default. You can change this behavior and use logical types instead, so that Decimals -will be mapped to the Hive type Decimal as well. This has to be enabled with the -+sqoop.parquet.decimal_padding.enable+ property. As noted in the section discussing +will be properly mapped to the Hive type Decimal as well. This has to be enabled with the ++sqoop.parquet.logical_types.decimal.enable+ property. As noted in the section discussing 'Padding number types in avro and parquet import', you should also specify the default precision and scale and -enable decimal padding. +enable padding. A limitation of Hive is that the maximum precision and scale is 38. When converting SQL types to the Hive Decimal type, precision and scale will be modified to meet this limitation, automatically. The data itself however, will -only have to adhere to the limitations of the Parquet import, thus values with a precision and scale bigger than -38 will be present on storage on HDFS, but they won't be visible in Hive, (since Hive is a schema-on-read tool). +only have to adhere to the limitations of the Parquet file format, thus values with a precision and scale bigger than +38 will be present on storage, but they won't be readable by Hive, (since Hive is a schema-on-read tool). -Enable padding and specifying a default precision and scale in a Hive Import: +Enabling padding and specifying a default precision and scale in a Hive Import: ---- -$ sqoop import -Dsqoop.parquet.decimal_padding.enable=true -Dsqoop.parquet.logical_types.decimal.enable=true +$ sqoop import -Dsqoop.avro.decimal_padding.enable=true -Dsqoop.parquet.logical_types.decimal.enable=true -Dsqoop.avro.logical_types.decimal.default.precision=38 -Dsqoop.avro.logical_types.decimal.default.scale=10 --hive-import --connect $CONN --table $TABLENAME --username $USER --password $PASS --as-parquetfile ----