This change introduces a new option that can be used to pass custom
connection parameters while creating JDBC connections. If no connection
parameters are specified, the system defaults to the old behavior.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150051 13f79535-47bb-0310-9956-ffa450edef68
This change introduces a new Connection Manager for SQL Server along
with basic test case to exercise part of the functionality. It also
addresses the problem noted in SQOOP-229 by overriding the
getCurTimestampQuery method as suggested.
(Patrick Angeles via Arvind Prabhakar)
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150049 13f79535-47bb-0310-9956-ffa450edef68
This patch adds a checkstyle module to detect trailing white
spaces. It also removed various current instances of trailing
white spaces in the code.
From: Ahmed Radwan <ahmed@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150046 13f79535-47bb-0310-9956-ffa450edef68
This patch fixes a bug that prevents importing data into
an existing hive table with the 'hive-overwrite' argument set.
From: Ahmed Radwan <ahmed@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150043 13f79535-47bb-0310-9956-ffa450edef68
Adding setter-methods and a field-based equals-implementation to
the generated classes. These new methods enhance the usage of the
generated classes.
(Michael Häusler via ahmed)
From: Ahmed Radwan <ahmed@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150037 13f79535-47bb-0310-9956-ffa450edef68
The SqoopRecord.toString() and SqoopRecord.toString(DelimiterSet) methods
always append an end-of-record delimiter. Sqoop uses its own OutputFormat
when rendering these to text files, so that the user's delimiters are
preserved.
Other users could use this OutputFormat when working with SqoopRecord
instances in their own MapReduce code, but it would also be nice to "play
nice" with TextOutputFormat in the event that the intent is
newline-terminated records.
This patch allows users to suppress end-of-record delimiter generation when
formatting records with toString.
(Aaron Kimball via Arvind Prabhakar)
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150025 13f79535-47bb-0310-9956-ffa450edef68
This change introduces a setField(fieldName, fieldVal) method for
SqoopRecord instances which would allow an arbitrary programmatic
"setter" function without requiring reflection.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150021 13f79535-47bb-0310-9956-ffa450edef68
The change will look for ToolPlugin definitions in the
sqoop.tool.plugins configuration entry, or conf/tools.d. Each
ToolPlugin returns a list of ToolDesc entries, which are then
registered with SqoopTool.register() before the user's arguments
are parsed. The user can then run 'sqoop <custom-tool> args...'
as if it were part of the natural Sqoop system.
(Aaron Kimball via Arvind Prabhakar)
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150020 13f79535-47bb-0310-9956-ffa450edef68
This change allows Sqoop unit tests to be run against a real cluster.
(Konstantin Boudnik via arvind)
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150012 13f79535-47bb-0310-9956-ffa450edef68
This change adds the ability of specifying the max. number of fetched records
from the database. This will solve problems that may arise when importing
large tables.
(Michael Häusler via ahmed)
From: Ahmed Radwan <ahmed@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150011 13f79535-47bb-0310-9956-ffa450edef68
Changes include explicitly setting the Zookeeper client port and increasing
the memory limit from 256m to 512m in build.xml.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150008 13f79535-47bb-0310-9956-ffa450edef68
This change removes the test that asserts the presence of a non-default hosts
file configuration. It also adds the necessary comments to the PostgresqlTest
to allow configuring the server for default hosts file configuration.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150003 13f79535-47bb-0310-9956-ffa450edef68
This change introduces the ability to use a staging table for intermediate
storage during execution for regular export jobs in insert mode. This allows
all of exported data to first be populated in the staging table and then
inserted into the destination table in a single transaction. Thus if a failure
were to occur during export, it is less likely to corrupt the destination
table data. Moreover, the staging table is emptied before the export
job starts populating it, which ensures that re-running the job does not
require any special clean up.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1150002 13f79535-47bb-0310-9956-ffa450edef68
This change allows Sqoop to load options from an options file. An
options file is specified using --options-file. All options that
are otherwise specified on the command line should be specified
in this file in the order they would otherwise appear on the command
line. Options files can contain empty lines and comments for
readability. More than one options file may be used for a single
tool invocation if so preferred. Leading and trailing spaces are
ignored unless they appear within single or double quotes. Quoted
options extending into multiple lines are not supported.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149999 13f79535-47bb-0310-9956-ffa450edef68
This change modifies Sqoop build to use IVY for retrieving HBase and
Zookeeper dependencies. Along with this update, the version number
for HBase and Hadoop have been incremented to match the CDH3 Beta 3
versions. Due to this, a couple of tests had to be modified in order
to accommodate the changed behavior of the Hadoop classes.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149997 13f79535-47bb-0310-9956-ffa450edef68
This change removes the ShimLoader and various Shim classes such as CDH3Shim
etc. It introduces a couple of new classes - ConfigurationConstants and
ConfigurationHelper - that provide a unique place for articulating interface
related details such as configuration keys that can likely change from version
to version of Hadoop.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149994 13f79535-47bb-0310-9956-ffa450edef68
Hive allows the use of keywords as column and table names as long as they are
escaped using back-ticks. This change makes Sqoop always escape table and
column names using back-ticks thereby allowing Sqoop to work with Hive tables
that use keywords for either the table name or column names.
(Lars Francke via arvind)
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149989 13f79535-47bb-0310-9956-ffa450edef68
Current CDH3 build includes version 0.20.3 of Hadoop which is now
mapped to CDH3Shim loader. Apart from that, this change includes
a change in build.xml and OracleUtils test class that allows the
ability to override connect string for Oracle tests.
From: Arvind Prabhakar <arvind@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149984 13f79535-47bb-0310-9956-ffa450edef68
Adds 'merge' tool.
Adds MergeJob, Merge*Mapper, MergeReducer.
Merge-specific arguments added to SqoopOptions, BaseSqoopTool.
Add TestMerge to test that this tool functions as expected.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149980 13f79535-47bb-0310-9956-ffa450edef68
Added ManagerFactory.accept(SessionData) API to allow ManagerFactory
to inspect the chosen SqoopTool.
Deprecated ManagerFactory.accept(SqoopOptions).
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149958 13f79535-47bb-0310-9956-ffa450edef68
Add com.cloudera.sqoop.io.NamedFifo class to represent named FIFO objects.
Added TestNamedFifo as unit test.
MySQLExportMapper now uses this utility.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149954 13f79535-47bb-0310-9956-ffa450edef68
Restart MiniHBaseCluster between each test to prevent triggering
livelock at end of test battery.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149950 13f79535-47bb-0310-9956-ffa450edef68
Modify ImportTool, SessionTool, to support incremental imports.
Add TestIncrementalImport to unit test incremental imports.
SqoopOptions now implements Cloneable.
SQOOP-44. Bugfix in ClassWriter: fix NPE if the case of column names specified
with --columns do not match the case reported by the database.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149944 13f79535-47bb-0310-9956-ffa450edef68
Copy DataDrivenDBInputFormat, its dependencies, and tests into Sqoop
in the com.cloudera.sqoop.mapreduce.db package.
Reformatted code to match Sqoop style guide and eliminate findbugs warnings.
Changed existing Sqoop code to use this implementation rather than Hadoop's.
Modified TestDataDrivenDBInputFormat to use mem-only hsqldb to prevent
Hudson race condition.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149941 13f79535-47bb-0310-9956-ffa450edef68
Added SessionStorage API.
Added SessonData API.
Added SessionStorageFactory.
Added SessionTool for create/delete/execute/show/list operations on sessions.
SqoopOptions can read and write all "sticky" state to a Properties instance.
Added HsqldbSessionStorage to implement SessionStorage API.
Added AutoHsqldbStorage to auto-instantiate a local metastore for the user.
Added client metastore connection parameters to sqoop-site.xml.
Added metastore tool (MetastoreTool).
Added HsqldbMetaStore for standalone metastore instance.
Added metastore properties to sqoop-default.xml.
Added TestSessions unit tests of session API.
Renamed conf/sqoop-default.xml to conf/sqoop-site-template.xml.
Added conf/.gitignore for sqoop-site.xml.
Tests run:
Tested all metastore operations on an import session.
Tested that ~/.sqoop/-based storage will be auto-created by the metastore.
Tested that 'sqoop metastore'-based metastores can be connected to
by external clients.
Tested that 'sqoop metastore --shutdown' will gracefully shut down a running
metastore instance.
Tested that passwords are not stored in the metastore by default, and the
user is prompted for the password when executing that saved session.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149940 13f79535-47bb-0310-9956-ffa450edef68
Added FieldMappable and FieldMapProcessor interfaces.
Added ProcessingException class.
Added NullOutputCommitter class.
SqoopRecord now has delegate() method which calls a FieldMapProcessor.
ClassWriter now generates getFieldMap() method for SqoopRecords.
Added HBasePutProcessor to transform SqoopRecords into Put commands,
implementing FieldMapProcessor.
Added PutTransformer interface class and ToStringPutTransformer implementation.
Added DelegatingOutputFormat that uses a FieldMapProcessor.
Added HBase deps to build.xml via hbase.home property.
Added HBase, ZooKeeper to the dependency net added by configure-sqoop.
Added HBaseImportJob, HBaseImportMapper.
ImportJobBase now has jobSetup() step executed just before job submission.
ImportJobContext now holds a reference to the ConnManager.
DataDrivenImportJob retrieves ConnManager from ImportJobContext, it no longer
creates a new one.
Added HBase table import configuration parameters to SqoopOptions, ImportTool.
SqlManager.importQuery() needs to set ConnManager in ImportJobContext.
Added HBase import user documentation.
Described PutTransformer API in developer docs.
Added HBase unit tests.
Added ANT_ARGUMENTS env variable to Hudson test scripts to allow freeform parameters.
Added HBASE_HOME and ZOOKEEPER_HOME variables to hudson scripts.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149935 13f79535-47bb-0310-9956-ffa450edef68
Ensure that tests involving dates/times use proper ANSI SQL Date/Time escape
formatting (yyyy-mm-dd or hh:mm:ss). After Java 1.6u17, dates of the form
yyyy-m-dd or hhⓂ️ss are not parsed by java.sql.Date/Time and throw
IllegalArgumentException.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149934 13f79535-47bb-0310-9956-ffa450edef68
Add --update-key argument to sqoop-export tool.
Refactor ExportOutputFormat into AsyncSqlOutputFormat.
Added UpdateOutputFormat.
ClassWriter now allows alternate serialization order in database write() method.
SqoopOptions holds column list for alternate db serialization order.
Added TestExportUpdate unit test battery.
AsyncSqlRecordWriter now allows "batch" execution mode.
Updated documentation for export updates.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149933 13f79535-47bb-0310-9956-ffa450edef68
Add ConnManager.importQuery() API.
Change BaseSqoopTool.DEBUG_SQL_CMD_ARG to SQL_QUERY_ARG to reflect
the broader applicability of the argument.
Change 'debugSqlCmd' member of SqoopOptions to 'sqlQuery'.
CompilationManager now sets jar name based on specified class name.
Add tests for query support.
Add documentation for query-based import.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149932 13f79535-47bb-0310-9956-ffa450edef68
Moves TaskId to com.cloudera.sqoop.util.
Add com.cloudera.sqoop.lib.DelimiterSet.
Rewrite FieldFormatter, RecordParser, to use DelimiterSet.
Add generated class version id to SqoopRecord.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149907 13f79535-47bb-0310-9956-ffa450edef68
Introduce LobFile format for storing large objects.
Implemented LobFile.Reader, LobFile.Writer classes.
Added a performance test of LobFile reading/writing speed.
Build system: fix cobertura build deps.
Remove unused utility classes from o.a.h.s.io.
Use LobFile for external storage in {B,C}lobRef.
Added LobReaderCache.
Converted BlobRef to read from LobFiles (through LobReaderCache).
LargeObjectLoader writes to LobFiles.
Common code from BlobRef and ClobRef factored out into LobRef abstract
base class.
Updated Test{B,C}lobRef and TestLargeObjectLoader for new external LOB storage.
Updated *ImportMappers to close LargeObjectLoaders when they're done.
Added performance tests to build.
Added script to run perf tests; factored out common logic into config script.
Fixed ivy dependency resolution to use multiple configuration inheritance.
Added LobFileStressTest.
Added readme with instructions to src/perftest directory.
Added CodecMap that abstracts compression codec classes to names.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149897 13f79535-47bb-0310-9956-ffa450edef68
Introduced SqoopTool interface.
Added cli package for option parsing:
includes RelatedOptions, ToolOptions, SqoopParser.
'Sqoop' is now a wrapper that invokes a SqoopTool.
Added setter methods for all fields of SqoopOptions.
Added commons-cli 1.2 build dependency.
Argument parsing is removed from SqoopOptions and pushed into individual tools.
Added HelpTool to display basic usage information for Sqoop and
usage for subcommands.
Added ImportTool to perform imports.
Added EvalSqlTool.
Added ExportTool.
Added ImportAllTablesTool.
Added ListDatabasesTool, ListTablesTool.
Added CodeGenTool.
Added CreateHiveTableTool.
Small changes to orm.ClassWriter.
Auto-generate bin scripts for all tools; include in release package.
Allow user to provide build properties in a file.
Shim use of GenericOptionsParser to allow cross-compilation.
Fix Hive testcases to pass under CDH.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149894 13f79535-47bb-0310-9956-ffa450edef68
OracleConnManager now nullifies references to connection instances
after returning them to the connection cache or closing them.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149891 13f79535-47bb-0310-9956-ffa450edef68
All classes which depend on MapReduce APIs which change from
interfaces to classes between 0.20 and 0.22 are moved to distribution-
specific shim jars.
"Common" shim classes are now compiled multiple times against different
Hadoop distributions.
Shim classes are broken out into separate jars; ShimLoader now picks
the appropriate jar to load at runtime.
Configuration constants moved into HadoopShim.
BlobRef/ClobRef methods changed to use Mapper.Context for binary compatibility.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149884 13f79535-47bb-0310-9956-ffa450edef68
Version-incompatible code now moved to HadoopShim subclasses.
HadoopShim singleton instance dynamically loaded based on VersionInfo.
Separate MRUnit builds from Apache and CDH placed in /lib subdirs.
Modified 'ant package' target to properly include all shims.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149880 13f79535-47bb-0310-9956-ffa450edef68
Using --direct in conjunction with --export-dir on a MySQL database will use
mysqlimport to emit the data to the database.
DirectMySQLManager now creates instances of MySQLExportJob.
src/test/.../MySQLUtils is renamed to MySQLTestUtils to avoid conflict with
src/java/.../MySQLUtils added by this patch.
MySQLUtils contains methods factored out of import-specific code for sharing
with exports.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149877 13f79535-47bb-0310-9956-ffa450edef68
Some spurious warnings (and inconsequential warnings in test code)
have been disabled by src/test/findbugsExcludeFile.xml.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149874 13f79535-47bb-0310-9956-ffa450edef68
OracleManager now caches Connection instances for subsequent OracleManager
instances.
Refactored uses of ConnManager to call close() before discarding them.
This allows the Oracle JUnit tests to sleep less frequently to wait for Oracle
to reap closed server-side connection resources, improving Oracle test speed
by 50%.
Sleeping cannot be fully eliminated because MapReduce-side Connections are not
governed by this caching mechanism.
Also added some debugging advice re. this topic to OracleManagerTest's comment.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149872 13f79535-47bb-0310-9956-ffa450edef68
Uses CombineFileInputFormat to run exports over a target number
of mappers independent of the number of input files.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149869 13f79535-47bb-0310-9956-ffa450edef68
CLOB/BLOB data may now be stored in additional files in HDFS which are
accessible through streams if the data cannot be fully materialized in RAM.
Adds tests for external large objects.
Refactored large object loading into the map() method from readFields().
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149866 13f79535-47bb-0310-9956-ffa450edef68
Major refactoring of DataDrivenImportJob to support mysqldump in mappers.
ImportJobBase added below DataDrivenImportJob.
MySQLDumpImportJob added on top of ImportJobBase.
LocalMySQLManager -> renamed to -> DirectMySQLManager now just runs MysqldumpIJ.
MySQLDumpImportJob configures MySQLDumpMapper to run mysqldump instances on
multiple nodes and is split-aware (via MySQLDumpInputFormat).
TestImportJob works with new ImportJobBase framework.
Added test that imports a subset of columns in mysql imports.
From: Aaron Kimball <aaron@cloudera.com>
git-svn-id: https://svn.apache.org/repos/asf/incubator/sqoop/trunk@1149865 13f79535-47bb-0310-9956-ffa450edef68