mirror of
https://github.com/apache/sqoop.git
synced 2025-05-08 02:50:38 +08:00

From: Christopher Douglas <cdouglas@apache.org> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149834 13f79535-47bb-0310-9956-ffa450edef68
86 lines
3.8 KiB
Plaintext
86 lines
3.8 KiB
Plaintext
|
|
////
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
////
|
|
|
|
|
|
Connecting to a Database Server
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Sqoop is designed to import tables from a database into HDFS. As such,
|
|
it requires a _connect string_ that describes how to connect to the
|
|
database. The _connect string_ looks like a URL, and is communicated to
|
|
Sqoop with the +--connect+ argument. This describes the server and
|
|
database to connect to; it may also specify the port. e.g.:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:mysql://database.example.com/employees
|
|
----
|
|
|
|
This string will connect to a MySQL database named +employees+ on the
|
|
host +database.example.com+. It's important that you *do not* use the URL
|
|
+localhost+ if you intend to use Sqoop with a distributed Hadoop
|
|
cluster. The connect string you supply will be used on TaskTracker nodes
|
|
throughout your MapReduce cluster; if they're told to connect to the
|
|
literal name +localhost+, they'll each reach a different
|
|
database (or more likely, no database at all)! Instead, you should use
|
|
the full hostname or IP address of the database host that can be seen
|
|
by all your remote nodes.
|
|
|
|
You may need to authenticate against the database before you can
|
|
access it. The +--username+ and +--password+ or +-P+ parameters can
|
|
be used to supply a username and a password to the database. e.g.:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:mysql://database.example.com/employees \
|
|
--username aaron --password 12345
|
|
----
|
|
|
|
.Password security
|
|
WARNING: The +--password+ parameter is insecure, as other users may
|
|
be able to read your password from the command-line arguments via
|
|
the output of programs such as `ps`. The *+-P+* argument will read
|
|
a password from a console prompt, and is the preferred method of
|
|
entering credentials. Credentials may still be transferred between
|
|
nodes of the MapReduce cluster using insecure means.
|
|
|
|
Sqoop automatically supports several databases, including MySQL. Connect strings beginning
|
|
with +jdbc:mysql://+ are handled automatically Sqoop, though you may need
|
|
to install the driver yourself. (A full list of databases with
|
|
built-in support is provided in the "Supported Databases" section, below.)
|
|
|
|
You can use Sqoop with any other
|
|
JDBC-compliant database as well. First, download the appropriate JDBC
|
|
driver for the database you want to import from, and install the .jar
|
|
file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop
|
|
cluster, or some other directory which is in the classpath
|
|
on all nodes. Each driver jar also has a specific driver class which defines
|
|
the entry-point to the driver. For example, MySQL's Connector/J library has
|
|
a driver class of +com.mysql.jdbc.Driver+. Refer to your database
|
|
vendor-specific documentation to determine the main driver class.
|
|
This class must be provided as an argument to Sqoop with +--driver+.
|
|
|
|
For example, to connect to a postgres database, first download the driver from
|
|
link:http://jdbc.postgresql.org[http://jdbc.postgresql.org] and
|
|
install it in your Hadoop lib path.
|
|
Then run Sqoop with something like:
|
|
|
|
----
|
|
$ sqoop --connect jdbc:postgresql://postgres-server.example.com/employees \
|
|
--driver org.postgresql.Driver
|
|
----
|
|
|