SQOOP-1655: SQOOP2 DOC: Document getSchema() and its use in the connector dev guide

(Gwen Shapira via Jarek Jarcec Cecho)
2025-05-16 00:41:23 +08:00 · 2014-11-02 15:17:05 -08:00 · 2014-11-02 15:17:05 -08:00 · 85d5476f7e
commit 85d5476f7e
parent 8b51236c21
2 changed files with 17 additions and 8 deletions
--- a/docs/src/site/sphinx/ConnectorDevelopment.rst
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@ -70,7 +70,7 @@ Connectors can optionally override the following methods:
 The ``getFrom`` method returns From_ instance
 which is a placeholder for the modules needed to read from a data source.
-The ``getTo`` method returns Exporter_ instance
+The ``getTo`` method returns Extractor_ instance
 which is a placeholder for the modules needed to write to a data source.
 Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
@ -170,11 +170,22 @@ Connectors can define the design of ``Partition`` on their own.
 Initializer and Destroyer
 -------------------------
 .. _Initializer:
 .. _Destroyer:
 Initializer is instantiated before the submission of MapReduce job
-for doing preparation such as adding dependent jar files.
+for doing preparation such as connecting to the data source, creating temporary tables or adding dependent jar files.
-Destroyer is instantiated after MapReduce job is finished for clean up.
+In addition to the Initialize() method where the preparation activities occur, the Initializer must implement a getSchema() method.
 This method is used by the framework to match the data extracted by the ``From`` connector with the data as the ``To`` connector expects it.
 In case of a relational database or columnar database, the returned Schema object will include collection of columns with their data types.
 If the data source is schema-less, such as a file, an empty Schema object can be returned (i.e a Schema object without any columns).
 Note that Sqoop2 currently does not support ETL between two schema-less sources. We expect for each job that either the connector providing
 the ``From`` instance or the connector providing the ``To`` instance will have a schema. If both instances have a schema, Sqoop2 will load data by column name.
 I.e, data in column "A" in data source will be loaded to column "A" in target.
 Destroyer is instantiated after MapReduce job is finished for clean up, for example dropping temporary tables and closing connections.
 To
@ -226,10 +237,8 @@ Loader must iterate in the ``load`` method until the data from ``DataReader`` is
 Initializer and Destroyer
 -------------------------
-Initializer is instantiated before the submission of MapReduce job
+Initializer_ and Destroyer_ of a ``To`` instance are used in a similar way to those of a ``From`` instance.
-for doing preparation such as adding dependent jar files.
+Refer to the previous section for more details.
 Destroyer is instantiated after MapReduce job is finished for clean up.
 Connector Configurations
--- a/docs/src/site/sphinx/index.rst
+++ b/docs/src/site/sphinx/index.rst
@ -59,7 +59,7 @@ Developer Guide
 - `Building Sqoop2 <BuildingSqoop2.html>`_
 - `Development Environment Setup <DevEnv.html>`_
 - `Java Client API Guide <ClientAPI.html>`_
- `Developping Connector <ConnectorDevelopment.html>`_
+- `Developing a Connector <ConnectorDevelopment.html>`_
 - `REST API Guide <RESTAPI.html>`_
 Overview