5
0
mirror of https://github.com/apache/sqoop.git synced 2025-05-04 23:29:44 +08:00

SQOOP-1908: Sqoop2: Document external connector support

(Veena Basavaraj via Abraham Elmahrek)
This commit is contained in:
Abraham Elmahrek 2015-01-20 19:03:50 -08:00
parent 1f89de2172
commit e41bc6e31f
2 changed files with 103 additions and 1 deletions

View File

@ -62,6 +62,19 @@ Connectors can optionally override the following methods:
public List<Direction> getSupportedDirections();
public Class<? extends IntermediateDataFormat<?>> getIntermediateDataFormat()
The ``getVersion`` method returns the current version of the connector
It is important to provide a unique identifier every time a connector jar is released externally.
In case of the Sqoop built-in connectors, the version refers to the Sqoop build/release version. External
connectors can also use the same or similar mechanism to set this version. The version number is critical for
the connector upgrade logic used in Sqoop
::
@Override
public String getVersion() {
return VersionInfo.getBuildVersion();
}
The ``getFrom`` method returns From_ instance
which is a ``Transferable`` entity that encapsulates the operations
@ -237,6 +250,72 @@ Loader must iterate in the ``load`` method until the data from ``DataReader`` is
NOTE: we do not yet support a stage for connector developers to control how to balance the loading/writitng of data across the mutiple loaders. In future we may be adding this to the connector API to have custom logic to balance the loading across multiple reducers.
Sqoop Connector Identifier : sqoopconnector.properties
======================================================
Every Sqoop 2 connector needs to have a sqoopconnector.properties in the packaged jar to be identified by Sqoop.
A typical ``sqoopconnector.properties`` for a sqoop2 connector looks like below
::
# Sqoop Foo Connector Properties
org.apache.sqoop.connector.class = org.apache.sqoop.connector.foo.FooConnector
org.apache.sqoop.connector.name = sqoop-foo-connector
If the above file does not exist, then Sqoop will not load this jar and thus cannot be registered into Sqoop repository for creating Sqoop jobs
Sqoop Connector Build-time Dependencies
=======================================
Sqoop provides the connector-sdk module identified by the package:``org.apache.sqoop.connector`` It provides the public facing apis for the external connectors
to extend from. It also provides common utilities that the connectors can utilize for converting data to and from the sqoop intermediate data format
The common-test module identified by the package ``org.apache.sqoop.common.test`` provides utilities used related to the built-in connectors such as the JDBC, HDFS,
and Kafka connectors that can be used by the external connectors for creating the end-end integration test for sqoop jobs
The test module identified by the package ``org.apache.sqoop.test`` provides various minicluster utilites the integration tests can extend from to run
a sqoop job with the given sqoop connector either using it as a ``FROM`` or ``TO`` data-source
Hence the pom.xml for the sqoop kite connector built using the kite-sdk might look something like below
::
<dependencies>
<!-- Sqoop modules -->
<dependency>
<groupId>org.apache.sqoop</groupId>
<artifactId>connector-sdk</artifactId>
</dependency>
<!-- Testing specified modules -->
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.sqoop</groupId>
<artifactId>sqoop-common-test</artifactId>
</dependency>
<dependency>
<groupId>org.apache.sqoop</groupId>
<artifactId>test</artifactId>
</dependency>
<!-- Connector required modules -->
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-data-core</artifactId>
</dependency>
....
</dependencies>
Configurables
+++++++++++++
@ -370,6 +449,27 @@ Sqoop 2 provides a list of standard input validators that can be used by differe
The validation logic is executed when users creating the sqoop jobs input values for the link and job configs associated with the ``From`` and ``To`` instances of the connectors associated with the job.
Loading External Connectors
+++++++++++++++++++++++++++
Loading new connector say sqoop-foo-connector to the sqoop2, here are the steps to follow
1. Create a ``sqoop-foo-connector.jar``. Make sure the jar contains the ``sqoopconnector.properties`` for it to be picked up by Sqoop
2. Add this jar to the a folder on your installation machine and update the path to this folder in the sqoop.properties located under the ``server/conf`` directory under the Sqoop2 for the key ``org.apache.sqoop.connector.external.loadpath``
::
#
# External connectors load path
# "/path/to/external/connectors/": Add all the connector JARs in the specified folder
#
org.apache.sqoop.connector.external.loadpath=/path/to/connector
3. Start the Sqoop 2 server and while initializing the server this jar should be loaded into the Sqoop 2's class path and registered into the Sqoop 2 repository
Sqoop 2 MapReduce Job Execution Lifecycle with Connector API
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@ -457,6 +557,8 @@ The diagram below decribes the reduce phase of a job.
| | | | | | | write into Data Source
| | | | | | |----------------------->
More details can be found in `Sqoop MR Execution Engine`_
.. _`Sqoop MR Execution Engine`: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+MR+Execution+Engine
.. _`Intermediate Data Format representation`: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation

View File

@ -57,7 +57,7 @@ If you are keen on contributing to Sqoop and get your hands dirty building conne
- `Building Sqoop 2 <BuildingSqoop2.html>`_
- `Sqoop Development Environment Setup <DevEnv.html>`_
- `Developing a Sqoop Connector with Connection API <ConnectorDevelopment.html>`_
- `Developing a Sqoop Connector with Connector API <ConnectorDevelopment.html>`_
- `Developing Sqoop application with REST API <RESTAPI.html>`_
- `Developing Sqoop application using Sqoop Java Client API <ClientAPI.html>`_