mirror of
https://github.com/apache/sqoop.git
synced 2025-05-21 19:31:13 +08:00
SQOOP-1481: SQOOP2: Document the public apis and end-end design for the SQ2 Connector
(Abraham Elmahrek via Jarek Jarcec Cecho)
This commit is contained in:
parent
4c964a97be
commit
b600036573
@ -329,14 +329,14 @@ Create new job object.
|
|||||||
+------------------------+------------------------------------------------------------------+
|
+------------------------+------------------------------------------------------------------+
|
||||||
| Argument | Description |
|
| Argument | Description |
|
||||||
+========================+==================================================================+
|
+========================+==================================================================+
|
||||||
| ``-x``, ``--xid <x>`` | Create new job object for connection with id ``<x>`` |
|
| ``-f``, ``--from <x>`` | Create new job object with a FROM connection with id ``<x>`` |
|
||||||
+------------------------+------------------------------------------------------------------+
|
+------------------------+------------------------------------------------------------------+
|
||||||
| ``-t``, ``--type <t>`` | Create new job object with type ``<t>`` (``import``, ``export``) |
|
| ``-t``, ``--to <t>`` | Create new job object with a TO connection with id ``<x>`` |
|
||||||
+------------------------+------------------------------------------------------------------+
|
+------------------------+------------------------------------------------------------------+
|
||||||
|
|
||||||
Example: ::
|
Example: ::
|
||||||
|
|
||||||
create job --xid 1
|
create job --from 1 --to 2
|
||||||
|
|
||||||
Update Command
|
Update Command
|
||||||
--------------
|
--------------
|
||||||
|
@ -26,17 +26,15 @@ using the code of built-in connector ( ``GenericJdbcConnector`` ) as example.
|
|||||||
What is Connector?
|
What is Connector?
|
||||||
++++++++++++++++++
|
++++++++++++++++++
|
||||||
|
|
||||||
Connector provides interaction with external databases.
|
The connector provides the facilities to interact with external data sources.
|
||||||
Connector reads data from databases for import,
|
The connector can read from, or write to, a data source.
|
||||||
and write data to databases for export.
|
|
||||||
Interaction with Hadoop is taken cared by common modules of Sqoop 2 framework.
|
|
||||||
|
|
||||||
When do we add a new connector?
|
When do we add a new connector?
|
||||||
===============================
|
===============================
|
||||||
You add a new connector when you need to extract data from a new data source, or load
|
You add a new connector when you need to extract data from a new data source, or load
|
||||||
data to a new target.
|
data to a new target.
|
||||||
In addition to the connector API, Sqoop 2 also has an engine interface.
|
In addition to the connector API, Sqoop 2 also has an engine interface.
|
||||||
At the moment the only engine is MapReduce,but we may support additional engines in the future.
|
At the moment the only engine is MapReduce, but we may support additional engines in the future.
|
||||||
Since many parallel execution engines are capable of reading/writing data
|
Since many parallel execution engines are capable of reading/writing data
|
||||||
there may be a question of whether support for specific data stores should be done
|
there may be a question of whether support for specific data stores should be done
|
||||||
through a new connector or new engine.
|
through a new connector or new engine.
|
||||||
@ -51,57 +49,73 @@ Connector Implementation
|
|||||||
|
|
||||||
The ``SqoopConnector`` class defines functionality
|
The ``SqoopConnector`` class defines functionality
|
||||||
which must be provided by Connectors.
|
which must be provided by Connectors.
|
||||||
Each Connector must extends ``SqoopConnector`` and overrides methods shown below.
|
Each Connector must extend ``SqoopConnector`` and override the methods shown below.
|
||||||
::
|
::
|
||||||
|
|
||||||
public abstract String getVersion();
|
public abstract String getVersion();
|
||||||
public abstract ResourceBundle getBundle(Locale locale);
|
public abstract ResourceBundle getBundle(Locale locale);
|
||||||
public abstract Class getConnectionConfigurationClass();
|
public abstract Class getLinkConfigurationClass();
|
||||||
public abstract Class getJobConfigurationClass(MJob.Type jobType);
|
public abstract Class getJobConfigurationClass(Direction direction);
|
||||||
public abstract Importer getImporter();
|
public abstract From getFrom();
|
||||||
public abstract Exporter getExporter();
|
public abstract To getTo();
|
||||||
public abstract Validator getValidator();
|
public abstract Validator getValidator();
|
||||||
public abstract MetadataUpgrader getMetadataUpgrader();
|
public abstract MetadataUpgrader getMetadataUpgrader();
|
||||||
|
|
||||||
The ``getImporter`` method returns Importer_ instance
|
Connectors can optionally override the following methods:
|
||||||
which is a placeholder for the modules needed for import.
|
::
|
||||||
|
|
||||||
The ``getExporter`` method returns Exporter_ instance
|
public List<Direction> getSupportedDirections();
|
||||||
which is a placeholder for the modules needed for export.
|
|
||||||
|
|
||||||
|
The ``getFrom`` method returns From_ instance
|
||||||
|
which is a placeholder for the modules needed to read from a data source.
|
||||||
|
|
||||||
|
The ``getTo`` method returns Exporter_ instance
|
||||||
|
which is a placeholder for the modules needed to write to a data source.
|
||||||
|
|
||||||
Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
|
Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
|
||||||
``getJobConfigurationClass`` and ``getValidator``
|
``getJobConfigurationClass`` and ``getValidator``
|
||||||
are concerned to `Connector configurations`_ .
|
are concerned to `Connector configurations`_ .
|
||||||
|
|
||||||
|
The ``getSupportedDirections`` method returns a list of directions
|
||||||
Importer
|
that a connector supports. This should be some subset of:
|
||||||
========
|
|
||||||
|
|
||||||
Connector's ``getImporter`` method returns ``Importer`` instance
|
|
||||||
which is a placeholder for the modules needed for import
|
|
||||||
such as Partitioner_ and Extractor_ .
|
|
||||||
Built-in ``GenericJdbcConnector`` defines ``Importer`` like this.
|
|
||||||
::
|
::
|
||||||
|
|
||||||
private static final Importer IMPORTER = new Importer(
|
public List<Direction> getSupportedDirections() {
|
||||||
GenericJdbcImportInitializer.class,
|
return Arrays.asList(new Direction[]{
|
||||||
GenericJdbcImportPartitioner.class,
|
Direction.FROM,
|
||||||
GenericJdbcImportExtractor.class,
|
Direction.TO
|
||||||
GenericJdbcImportDestroyer.class);
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
From
|
||||||
|
====
|
||||||
|
|
||||||
|
The connector's ``getFrom`` method returns ``From`` instance
|
||||||
|
which is a placeholder for the modules needed for reading
|
||||||
|
from a data source. Modules such as Partitioner_ and Extractor_ .
|
||||||
|
The built-in ``GenericJdbcConnector`` defines ``From`` like this.
|
||||||
|
::
|
||||||
|
|
||||||
|
private static final From FROM = new From(
|
||||||
|
GenericJdbcFromInitializer.class,
|
||||||
|
GenericJdbcPartitioner.class,
|
||||||
|
GenericJdbcExtractor.class,
|
||||||
|
GenericJdbcFromDestroyer.class);
|
||||||
|
|
||||||
...
|
...
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Importer getImporter() {
|
public From getFrom() {
|
||||||
return IMPORTER;
|
return FROM;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
Extractor
|
Extractor
|
||||||
---------
|
---------
|
||||||
|
|
||||||
Extractor (E for ETL) extracts data from external database and
|
Extractor (E for ETL) extracts data from external database.
|
||||||
writes it to Sqoop framework for import.
|
|
||||||
|
|
||||||
Extractor must overrides ``extract`` method.
|
Extractor must overrides ``extract`` method.
|
||||||
::
|
::
|
||||||
@ -114,7 +128,13 @@ Extractor must overrides ``extract`` method.
|
|||||||
The ``extract`` method extracts data from database in some way and
|
The ``extract`` method extracts data from database in some way and
|
||||||
writes it to ``DataWriter`` (provided by context) as `Intermediate representation`_ .
|
writes it to ``DataWriter`` (provided by context) as `Intermediate representation`_ .
|
||||||
|
|
||||||
Extractor must iterates in the ``extract`` method until the data from database exhausts.
|
Extractors use Writer's provided by the ExtractorContext to send a record through the
|
||||||
|
framework.
|
||||||
|
::
|
||||||
|
|
||||||
|
context.getDataWriter().writeArrayRecord(array);
|
||||||
|
|
||||||
|
The extractor must iterate through the entire dataset in the ``extract`` method.
|
||||||
::
|
::
|
||||||
|
|
||||||
while (resultSet.next()) {
|
while (resultSet.next()) {
|
||||||
@ -127,9 +147,9 @@ Extractor must iterates in the ``extract`` method until the data from database e
|
|||||||
Partitioner
|
Partitioner
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Partitioner creates ``Partition`` instances based on configurations.
|
The Partitioner creates ``Partition`` instances based on configurations.
|
||||||
The number of ``Partition`` instances is decided
|
The number of ``Partition`` instances is decided
|
||||||
based on the value users specified as the numbers of ectractors
|
based on the value users specified as the numbers of extractors
|
||||||
in job configuration.
|
in job configuration.
|
||||||
|
|
||||||
``Partition`` instances are passed to Extractor_ as the argument of ``extract`` method.
|
``Partition`` instances are passed to Extractor_ as the argument of ``extract`` method.
|
||||||
@ -157,35 +177,35 @@ for doing preparation such as adding dependent jar files.
|
|||||||
Destroyer is instantiated after MapReduce job is finished for clean up.
|
Destroyer is instantiated after MapReduce job is finished for clean up.
|
||||||
|
|
||||||
|
|
||||||
Exporter
|
To
|
||||||
========
|
==
|
||||||
|
|
||||||
Connector's ``getExporter`` method returns ``Exporter`` instance
|
The Connector's ``getTo`` method returns a ``To`` instance
|
||||||
which is a placeholder for the modules needed for export
|
which is a placeholder for the modules needed for writing
|
||||||
such as Loader_ .
|
to a data source such as Loader_ .
|
||||||
Built-in ``GenericJdbcConnector`` defines ``Exporter`` like this.
|
The built-in ``GenericJdbcConnector`` defines ``To`` like this.
|
||||||
::
|
::
|
||||||
|
|
||||||
private static final Exporter EXPORTER = new Exporter(
|
private static final To TO = new To(
|
||||||
GenericJdbcExportInitializer.class,
|
GenericJdbcToInitializer.class,
|
||||||
GenericJdbcExportLoader.class,
|
GenericJdbcLoader.class,
|
||||||
GenericJdbcExportDestroyer.class);
|
GenericJdbcToDestroyer.class);
|
||||||
|
|
||||||
...
|
...
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Exporter getExporter() {
|
public To getTo() {
|
||||||
return EXPORTER;
|
return TO;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
Loader
|
Loader
|
||||||
------
|
------
|
||||||
|
|
||||||
Loader (L for ETL) receives data from Sqoop framework and
|
A loader (L for ETL) receives data from the Sqoop framework and
|
||||||
loads it to external database.
|
loads it to an external database.
|
||||||
|
|
||||||
Loader must overrides ``load`` method.
|
Loaders must overrides ``load`` method.
|
||||||
::
|
::
|
||||||
|
|
||||||
public abstract void load(LoaderContext context,
|
public abstract void load(LoaderContext context,
|
||||||
@ -195,7 +215,7 @@ Loader must overrides ``load`` method.
|
|||||||
The ``load`` method reads data from ``DataReader`` (provided by context)
|
The ``load`` method reads data from ``DataReader`` (provided by context)
|
||||||
in `Intermediate representation`_ and loads it to database in some way.
|
in `Intermediate representation`_ and loads it to database in some way.
|
||||||
|
|
||||||
Loader must iterates in the ``load`` method until the data from ``DataReader`` exhausts.
|
Loader must iterate in the ``load`` method until the data from ``DataReader`` is exhausted.
|
||||||
::
|
::
|
||||||
|
|
||||||
while ((array = context.getDataReader().readArrayRecord()) != null) {
|
while ((array = context.getDataReader().readArrayRecord()) != null) {
|
||||||
@ -218,7 +238,7 @@ Connector Configurations
|
|||||||
Connector specifications
|
Connector specifications
|
||||||
========================
|
========================
|
||||||
|
|
||||||
Framework of the Sqoop loads definitions of connectors
|
Sqoop loads definitions of connectors
|
||||||
from the file named ``sqoopconnector.properties``
|
from the file named ``sqoopconnector.properties``
|
||||||
which each connector implementation provides.
|
which each connector implementation provides.
|
||||||
::
|
::
|
||||||
@ -231,7 +251,7 @@ which each connector implementation provides.
|
|||||||
Configurations
|
Configurations
|
||||||
==============
|
==============
|
||||||
|
|
||||||
Implementation of ``SqoopConnector`` overrides methods such as
|
Implementations of ``SqoopConnector`` overrides methods such as
|
||||||
``getConnectionConfigurationClass`` and ``getJobConfigurationClass``
|
``getConnectionConfigurationClass`` and ``getJobConfigurationClass``
|
||||||
returning configuration class.
|
returning configuration class.
|
||||||
::
|
::
|
||||||
@ -242,12 +262,12 @@ returning configuration class.
|
|||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public Class getJobConfigurationClass(MJob.Type jobType) {
|
public Class getJobConfigurationClass(Direction direction) {
|
||||||
switch (jobType) {
|
switch (direction) {
|
||||||
case IMPORT:
|
case FROM:
|
||||||
return ImportJobConfiguration.class;
|
return FromJobConfiguration.class;
|
||||||
case EXPORT:
|
case TO:
|
||||||
return ExportJobConfiguration.class;
|
return ToJobConfiguration.class;
|
||||||
default:
|
default:
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
@ -260,7 +280,7 @@ Annotations such as
|
|||||||
are provided for defining configurations of each connectors
|
are provided for defining configurations of each connectors
|
||||||
using these models.
|
using these models.
|
||||||
|
|
||||||
``ConfigurationClass`` is place holder for ``FormClasses`` .
|
``ConfigurationClass`` is a place holder for ``FormClasses`` .
|
||||||
::
|
::
|
||||||
|
|
||||||
@ConfigurationClass
|
@ConfigurationClass
|
||||||
@ -323,16 +343,12 @@ Validator validates configurations set by users.
|
|||||||
Internal of Sqoop2 MapReduce Job
|
Internal of Sqoop2 MapReduce Job
|
||||||
++++++++++++++++++++++++++++++++
|
++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and ``SqoopReducer``
|
Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and ``SqoopReducer``.
|
||||||
for the both of import and export.
|
|
||||||
|
|
||||||
- For import, ``Extractor`` provided by connector extracts data from databases,
|
When reading from a data source, the ``Extractor`` provided by the FROM connector extracts data from a database,
|
||||||
and ``Loader`` provided by Sqoop2 loads data into Hadoop.
|
and the ``Loader``, provided by the TO connector, loads data into another database.
|
||||||
|
|
||||||
- For export, ``Extractor`` provided by Sqoop2 exracts data from Hadoop,
|
The diagram below describes the initialization phase of a job.
|
||||||
and ``Loader`` provided by connector loads data into databases.
|
|
||||||
|
|
||||||
The diagram below describes the initialization phase of IMPORT job.
|
|
||||||
``SqoopInputFormat`` create splits using ``Partitioner`` .
|
``SqoopInputFormat`` create splits using ``Partitioner`` .
|
||||||
::
|
::
|
||||||
|
|
||||||
@ -351,8 +367,8 @@ The diagram below describes the initialization phase of IMPORT job.
|
|||||||
|-------------------------------------------------->|SqoopSplit|
|
|-------------------------------------------------->|SqoopSplit|
|
||||||
| | | `----+-----'
|
| | | `----+-----'
|
||||||
|
|
||||||
The diagram below describes the map phase of IMPORT job.
|
The diagram below describes the map phase of a job.
|
||||||
``SqoopMapper`` invokes extractor's ``extract`` method.
|
``SqoopMapper`` invokes FROM connector's extractor's ``extract`` method.
|
||||||
::
|
::
|
||||||
|
|
||||||
,-----------.
|
,-----------.
|
||||||
@ -378,8 +394,8 @@ The diagram below describes the map phase of IMPORT job.
|
|||||||
| | | context.write
|
| | | context.write
|
||||||
| | |-------------------------->
|
| | |-------------------------->
|
||||||
|
|
||||||
The diagram below decribes the reduce phase of EXPORT job.
|
The diagram below decribes the reduce phase of a job.
|
||||||
``OutputFormat`` invokes loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor`` ).
|
``OutputFormat`` invokes TO connector's loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor`` ).
|
||||||
::
|
::
|
||||||
|
|
||||||
,-------. ,---------------------.
|
,-------. ,---------------------.
|
||||||
|
@ -418,7 +418,7 @@ So far, the resource contains only explanations for fields of forms. For example
|
|||||||
"name":"generic-jdbc-connector",
|
"name":"generic-jdbc-connector",
|
||||||
"class":"org.apache.sqoop.connector.jdbc.GenericJdbcConnector",
|
"class":"org.apache.sqoop.connector.jdbc.GenericJdbcConnector",
|
||||||
"job-forms":{
|
"job-forms":{
|
||||||
"IMPORT":[
|
"FROM":[
|
||||||
{
|
{
|
||||||
"id":2,
|
"id":2,
|
||||||
"inputs":[
|
"inputs":[
|
||||||
@ -475,7 +475,7 @@ So far, the resource contains only explanations for fields of forms. For example
|
|||||||
"type":"CONNECTION"
|
"type":"CONNECTION"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"EXPORT":[
|
"TO":[
|
||||||
{
|
{
|
||||||
"id":3,
|
"id":3,
|
||||||
"inputs":[
|
"inputs":[
|
||||||
@ -624,35 +624,7 @@ the id of the form to track these parameter inputs.
|
|||||||
},
|
},
|
||||||
"framework-version":"1",
|
"framework-version":"1",
|
||||||
"job-forms":{
|
"job-forms":{
|
||||||
"IMPORT":[
|
"FROM":[
|
||||||
{
|
|
||||||
"id":5,
|
|
||||||
"inputs":[
|
|
||||||
{
|
|
||||||
"id":18,
|
|
||||||
"values":"HDFS",
|
|
||||||
"name":"output.storageType",
|
|
||||||
"type":"ENUM",
|
|
||||||
"sensitive":false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id":19,
|
|
||||||
"values":"TEXT_FILE,SEQUENCE_FILE",
|
|
||||||
"name":"output.outputFormat",
|
|
||||||
"type":"ENUM",
|
|
||||||
"sensitive":false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id":20,
|
|
||||||
"name":"output.outputDirectory",
|
|
||||||
"type":"STRING",
|
|
||||||
"size":255,
|
|
||||||
"sensitive":false
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"name":"output",
|
|
||||||
"type":"CONNECTION"
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"id":6,
|
"id":6,
|
||||||
"inputs":[
|
"inputs":[
|
||||||
@ -673,23 +645,9 @@ the id of the form to track these parameter inputs.
|
|||||||
"type":"CONNECTION"
|
"type":"CONNECTION"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"EXPORT":[
|
"TO":[
|
||||||
{
|
{
|
||||||
"id":7,
|
"id":1,
|
||||||
"inputs":[
|
|
||||||
{
|
|
||||||
"id":23,
|
|
||||||
"name":"input.inputDirectory",
|
|
||||||
"type":"STRING",
|
|
||||||
"size":255,
|
|
||||||
"sensitive":false
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"name":"input",
|
|
||||||
"type":"CONNECTION"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id":8,
|
|
||||||
"inputs":[
|
"inputs":[
|
||||||
{
|
{
|
||||||
"id":24,
|
"id":24,
|
||||||
@ -1272,7 +1230,7 @@ Provide the id of the job in the url [jid] part. If you provide ``all`` in the [
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"connector-id":1,
|
"connector-id":1,
|
||||||
"type":"IMPORT",
|
"type":"FROM",
|
||||||
"framework":[
|
"framework":[
|
||||||
{
|
{
|
||||||
"id":5,
|
"id":5,
|
||||||
|
@ -65,7 +65,7 @@ public List<Direction> getSupportedDirections() {
|
|||||||
/**
|
/**
|
||||||
* @return Get job configuration group class per direction type or null if not supported
|
* @return Get job configuration group class per direction type or null if not supported
|
||||||
*/
|
*/
|
||||||
public abstract Class getJobConfigurationClass(Direction jobType);
|
public abstract Class getJobConfigurationClass(Direction direction);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @return an <tt>From</tt> that provides classes for performing import.
|
* @return an <tt>From</tt> that provides classes for performing import.
|
||||||
|
Loading…
Reference in New Issue
Block a user