_覓 | 覦覈襦 | 豕蠏手 | 殊螳 | 譯殊碁 |
FrontPage › SQLServer+Sqoop
|
|
How to install Sqoop
Sqoop stands for Sql to Hadoop and is a connector to the relational world outside of Hadoop The install is nice and easy, theres a couple of ways to do it but were going to try the easiest way which is to add the Cloudera Distribution for Hadoop (CDH) repository and use the Linux built in add-apt commands
Step 1 Add the CDH repository and update
Create a file called /etc/apt/sources.list.d/cloudera.list find out what distribution of Linux you are running $ lsb_release -a 3) add the following to the file created in step 1 replacing MY_DIST with the distribution name found in step 2 deb http://archive.cloudera.com/debian MY_DIST-cdh3 contrib deb-src http://archive.cloudera.com/debianMY_DIST-cdh3 contrib e.g. running lucid you would need
deb http://archive.cloudera.com/debian lucid-cdh3 contrib
deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib run sudo apt-get update
install curl
$ sudo apt-get install curl
add a public key to your repository $ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
Step 2 update the local repository and install sqoop $ sudo apt-get update $ sudo apt-get install sqoop In fact now we have added the CDH3 repository we can install many of the other parts of the Hadoop eco-system e.g. Hive, Pig etc. just by using the sudo apt-get install command. Thats Sqoop is installed if you have MySql locally on the same machine you can run commands such as
$ sqoop export connect jdbc:mysql://localhost/htest table hresult username mscbi password mscbi export-dir /user/bigdatablog/outdir
will export the contents of the files in /user/bigdatablog/outdir to a table called hresult in the database htest. Step 3 Install the SQL Server connector (if you need it)
Theres a connector for SQL Server distributed by Microsoft, you need to download th JDBC driver from microsoft.com Then installation is again, pretty simple. run the following
$ export SQOOP_HOME=/usr/lib/sqoop
$ export SQOOP_CONF_DIR=/usr/lib/sqoop/conf and also add the commands to your .bashrc file (see installing Hadoop for instructions on how to edit this file) Go to micosoft.com and get tar.gz file for the SQL Server JDBC driver
go to the Downloads directory, unzip the tar and copy the jar fil to /usr/bin/sqoop/lib
$ cd Downloads (assumes starting from the home directory)
$ tar -zxvf sqljdbc_3.0.1301.101_enu.tar.gz (Check the version number you downloaded) $ sudo cp sqljdbc_3.0/enu/sqljdbc4.jar /usr/lib/sqoop/lib Now we have the JDBC driver in place, were ready to install the SQL Server connector, assuming you downloaded it in the step above, unzip the tar, move the directory to /usr/local, set the environment variable and permissions then install it $ tar -zxvf sqoop-sqlserver-1.0.tar.gz
$ sudo mv sqoop-sqlserver-1.0 /usr/local $ export MSSQL_CONNECTOR_HOME=/usr/local/sqoop-sqlserver-1.0 in the Install Hadoop example, we set up a Hadoop group change the ownership of the sqoop directories now and youll save time fiddling around with permissions later. $ sudo chown -R bigdatablog:hadoop /usr/lib/sqoop
$ sudo chown -R bigdatablog:hadoop /etc/sqoop Now install the connector $ cd /usr/local/sqoop-sqlserver-1.0
$ ./install.sh That should be it, the following should list all the databases on the server UKTAG1218 logging on as ID = dba with password dba BTW, its not a production system with that id and password $ sqoop export connect jdbc:sqlserver://uktag1218;username=dba;password=dba;database=MS_test export-dir outdir fields-terminated-by \t table htest
will take the files in the HDFS directory outdir and load them into a table called htest in a database called MS_Test note the fields-terminated-by \t splits the file in to columns seperated by tabs. The default is to load CSV files
鏤
|
螳 磯殊 蟆螻 . 豌 れ 讌襷 谿谿 襷讌. |