mirror of
https://github.com/apache/sqoop.git
synced 2025-05-04 02:21:10 +08:00

Add documentation for all SqoopTool implementations. Add database compatibility notes. Separate user guide from the develpment guide. From: Aaron Kimball <aaron@cloudera.com> git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149902 13f79535-47bb-0310-9956-ffa450edef68
88 lines
3.8 KiB
Plaintext
88 lines
3.8 KiB
Plaintext
|
|
////
|
|
Licensed to Cloudera, Inc. under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
Cloudera, Inc. licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
////
|
|
|
|
|
|
Connecting to a Database Server
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Sqoop is designed to import tables from a database into HDFS. To do
|
|
so, you must specify a _connect string_ that describes how to connect to the
|
|
database. The _connect string_ is similar to a URL, and is communicated to
|
|
Sqoop with the +\--connect+ argument. This describes the server and
|
|
database to connect to; it may also specify the port. For example:
|
|
|
|
----
|
|
$ sqoop import --connect jdbc:mysql://database.example.com/employees
|
|
----
|
|
|
|
This string will connect to a MySQL database named +employees+ on the
|
|
host +database.example.com+. It's important that you *do not* use the URL
|
|
+localhost+ if you intend to use Sqoop with a distributed Hadoop
|
|
cluster. The connect string you supply will be used on TaskTracker nodes
|
|
throughout your MapReduce cluster; if you specify the
|
|
literal name +localhost+, each node will connect to a different
|
|
database (or more likely, no database at all). Instead, you should use
|
|
the full hostname or IP address of the database host that can be seen
|
|
by all your remote nodes.
|
|
|
|
You might need to authenticate against the database before you can
|
|
access it. You can use the +\--username+ and +\--password+ or +-P+ parameters
|
|
to supply a username and a password to the database. For example:
|
|
|
|
----
|
|
$ sqoop import --connect jdbc:mysql://database.example.com/employees \
|
|
--username aaron --password 12345
|
|
----
|
|
|
|
.Password security
|
|
WARNING: The +\--password+ parameter is insecure, as other users may
|
|
be able to read your password from the command-line arguments via
|
|
the output of programs such as `ps`. The *+-P+* argument will read
|
|
a password from a console prompt, and is the preferred method of
|
|
entering credentials. Credentials may still be transferred between
|
|
nodes of the MapReduce cluster using insecure means.
|
|
|
|
Sqoop automatically supports several databases, including MySQL.
|
|
Connect strings beginning with +jdbc:mysql://+ are handled
|
|
automatically in Sqoop, though you may need to install the driver
|
|
yourself. (A full list of databases with built-in support is provided
|
|
in the "Supported Databases" section.)
|
|
|
|
You can use Sqoop with any other
|
|
JDBC-compliant database. First, download the appropriate JDBC
|
|
driver for the type of database you want to import, and install the .jar
|
|
file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop
|
|
cluster, or some other directory which is in the classpath
|
|
on all nodes. Each driver +.jar+ file also has a specific driver class which defines
|
|
the entry-point to the driver. For example, MySQL's Connector/J library has
|
|
a driver class of +com.mysql.jdbc.Driver+. Refer to your database
|
|
vendor-specific documentation to determine the main driver class.
|
|
This class must be provided as an argument to Sqoop with +\--driver+.
|
|
|
|
For example, to connect to a SQLServer database, first download the driver from
|
|
microsoft.com and install it in your Hadoop lib path.
|
|
|
|
Then run Sqoop. For example:
|
|
|
|
----
|
|
$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
|
|
--connect <connect-string> ...
|
|
----
|
|
|
|
|