5
0
mirror of https://github.com/apache/sqoop.git synced 2025-05-08 02:50:38 +08:00
sqoop/doc/connecting.txt
Andrew Bayer 4b193ec488 MAPREDUCE-906. Update Sqoop documentation. Contributed by Aaron Kimball
From: Christopher Douglas <cdouglas@apache.org>

git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149834 13f79535-47bb-0310-9956-ffa450edef68
2011-07-22 20:03:26 +00:00

86 lines
3.8 KiB
Plaintext

////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
Connecting to a Database Server
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sqoop is designed to import tables from a database into HDFS. As such,
it requires a _connect string_ that describes how to connect to the
database. The _connect string_ looks like a URL, and is communicated to
Sqoop with the +--connect+ argument. This describes the server and
database to connect to; it may also specify the port. e.g.:
----
$ sqoop --connect jdbc:mysql://database.example.com/employees
----
This string will connect to a MySQL database named +employees+ on the
host +database.example.com+. It's important that you *do not* use the URL
+localhost+ if you intend to use Sqoop with a distributed Hadoop
cluster. The connect string you supply will be used on TaskTracker nodes
throughout your MapReduce cluster; if they're told to connect to the
literal name +localhost+, they'll each reach a different
database (or more likely, no database at all)! Instead, you should use
the full hostname or IP address of the database host that can be seen
by all your remote nodes.
You may need to authenticate against the database before you can
access it. The +--username+ and +--password+ or +-P+ parameters can
be used to supply a username and a password to the database. e.g.:
----
$ sqoop --connect jdbc:mysql://database.example.com/employees \
--username aaron --password 12345
----
.Password security
WARNING: The +--password+ parameter is insecure, as other users may
be able to read your password from the command-line arguments via
the output of programs such as `ps`. The *+-P+* argument will read
a password from a console prompt, and is the preferred method of
entering credentials. Credentials may still be transferred between
nodes of the MapReduce cluster using insecure means.
Sqoop automatically supports several databases, including MySQL. Connect strings beginning
with +jdbc:mysql://+ are handled automatically Sqoop, though you may need
to install the driver yourself. (A full list of databases with
built-in support is provided in the "Supported Databases" section, below.)
You can use Sqoop with any other
JDBC-compliant database as well. First, download the appropriate JDBC
driver for the database you want to import from, and install the .jar
file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop
cluster, or some other directory which is in the classpath
on all nodes. Each driver jar also has a specific driver class which defines
the entry-point to the driver. For example, MySQL's Connector/J library has
a driver class of +com.mysql.jdbc.Driver+. Refer to your database
vendor-specific documentation to determine the main driver class.
This class must be provided as an argument to Sqoop with +--driver+.
For example, to connect to a postgres database, first download the driver from
link:http://jdbc.postgresql.org[http://jdbc.postgresql.org] and
install it in your Hadoop lib path.
Then run Sqoop with something like:
----
$ sqoop --connect jdbc:postgresql://postgres-server.example.com/employees \
--driver org.postgresql.Driver
----