sqoop/src/docs/user/connecting.txt


////
   Licensed to Cloudera, Inc. under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   Cloudera, Inc. licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
////


Connecting to a Database Server
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sqoop is designed to import tables from a database into HDFS. To do
so, you must specify a _connect string_ that describes how to connect to the
database. The _connect string_ is similar to a URL, and is communicated to
Sqoop with the +\--connect+ argument. This describes the server and
database to connect to; it may also specify the port. For example:

----
$ sqoop import --connect jdbc:mysql://database.example.com/employees
----

This string will connect to a MySQL database named +employees+ on the
host +database.example.com+. It's important that you *do not* use the URL
+localhost+ if you intend to use Sqoop with a distributed Hadoop
cluster. The connect string you supply will be used on TaskTracker nodes
throughout your MapReduce cluster; if you specify the
literal name +localhost+, each node will connect to a different
database (or more likely, no database at all). Instead, you should use
the full hostname or IP address of the database host that can be seen
by all your remote nodes.

You might need to authenticate against the database before you can
access it. You can use the +\--username+ and +\--password+ or +-P+ parameters
to supply a username and a password to the database. For example:

----
$ sqoop import --connect jdbc:mysql://database.example.com/employees \
    --username aaron --password 12345
----

.Password security
WARNING: The +\--password+ parameter is insecure, as other users may
be able to read your password from the command-line arguments via
the output of programs such as `ps`. The *+-P+* argument will read
a password from a console prompt, and is the preferred method of
entering credentials. Credentials may still be transferred between
nodes of the MapReduce cluster using insecure means.

Sqoop automatically supports several databases, including MySQL.
Connect strings beginning with +jdbc:mysql://+ are handled
automatically in Sqoop, though you may need to install the driver
yourself. (A full list of databases with built-in support is provided
in the "Supported Databases" section.)

You can use Sqoop with any other
JDBC-compliant database. First, download the appropriate JDBC
driver for the type of database you want to import, and install the .jar
file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop
cluster, or some other directory which is in the classpath
on all nodes. Each driver +.jar+ file also has a specific driver class which defines
the entry-point to the driver. For example, MySQL's Connector/J library has
a driver class of +com.mysql.jdbc.Driver+. Refer to your database
vendor-specific documentation to determine the main driver class.
This class must be provided as an argument to Sqoop with +\--driver+.

For example, to connect to a SQLServer database, first download the driver from
microsoft.com and install it in your Hadoop lib path.

Then run Sqoop. For example:

----
$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
    --connect <connect-string> ...
----