@raghujuluri: June 2013

Using Oracle NoSQL Database with Cloudera Distribution for Hadoop

By Deepak Vohra

Get a test project up and running to explore the basic principles involved.

Introduced in 2011, Oracle NoSQL Database is a highly available, highly scalable, key/value storage based (nonrelational) database that provides support for CRUD operations via a Java API. A related technology, the Hadoop MapReduce framework, provides a distributed environment for developing applications that process large quantities of data in parallel on large clusters.

In this article we discuss integrating Oracle NoSQL Database with Cloudera Distribution for Hadoop (CDH) on Windows OS via an Oracle JDeveloper project (download). We will also demonstrate processing the NoSQL Database data in Hadoop using a MapReduce job.

Setup

The following software is required for this project. Download and install anything on the list you don’t already have according to the respective instructions.

Oracle NoSQL Database, Community Edition
Oracle JDeveloper 11.1.2 .1
Cygwin
CDH2 (or Apache Hadoop 0.22.0)
Java SE 7

Install Java 1.7 in a directory (without spaces in its name) in the directory path. Set the JAVA_HOME environment variable.

Configuring Oracle NoSQL Database in Oracle JDeveloper

First, we’ll need to configure the NoSQL database server as an external tool in JDeveloper. Select Tools>External Tools. In the External Tools window select New. In the Create External Tool wizard select Tool Type: External Program and click Next. In Program Optionsspecify the following program options.

Field	Value
Program Executable	C:\JDK7\Java\jdk1.7.0_05\bin\java.exe
Arguments	-jar ./lib/kvstore-1.2.123.jar kvlite
Run Directory	C:\OracleNoSQL\kv-1.2.123

Click Finish in Create External Tools:

Oracle NoSQL Database is now configured as an external tool; the external tool name may vary based on whether other tools requiring the same program executable are also configured. Click on OK in External Tools.

Next, select Tools>Java 1. The Oracle NoSQL Database server starts up and a key-value (KV) store is created.

The NoSQL Database store has the following args by default:

Arg	Value
-root	kvroot
-store	kvstore
-host	localhost
-port	5000
-admin	5001

On subsequent runs of the external tool for the NoSQL Database server the existing KV store is opened with the same configuration with which it was created:

Running the HelloBigDataWorld Example

The NoSQL Database package includes some examples in the C:\OracleNoSQL\kv-1.2.123\examples directory. We will run the following examples in this article:

hello.HelloBigDataWorld
hadoop.CountMinorKeys

The HelloBigDataWorld example can be run using an external tool configuration or as a Java application.

Using as an External Tool

To run HelloBigDataWorld as an external tool select Tools>External Tools and create a new external tool configuration with the same procedure as with the NoSQL Database server. We need to create two configurations, one for compiling the HelloBigDataWorld file and another for running the compiled application. Specify the following program options for compiling HelloBigDataWorld.

Program Option	Value
Program Executable	C:\JDK7\Java\jdk1.7.0_05\bin\javac.exe
Arguments	-cp ./examples;./lib/kvclient-1.2.123.jar examples/hello/HelloBigDataWorld.java
Run Directory	C:/OracleNoSQL/kv-1.2.123

The program options for compiling the hello/HelloBigDataWorld.java file are shown below. Click Finish.

An external tool Javac gets created. Select Tools>Javac to compile the hello/ HelloBigDataWorld.java class. Next, create an external tool for running the hello.HelloBigDataWorld class file using the following configuration.

Program Option	Value
Program Executable	C:\JDK7\Java\jdk1.7.0_05\bin\java.exe
Arguments	-cp ./examples;./lib/kvclient-1.2.123.jar hello.HelloBigDataWorld
Run Directory	C:/OracleNoSQL/kv-1.2.123

The classpath should include the kvclient-1.2.123.jar file. Click Finish.

To run the hello.HelloBigDataWorld class select Tools>Java. The hello.HelloBigDataWorld application runs and a short message is written.

Running in a Java Application

Next, we will run the hello.HelloBigDataWorld application as a Java application in an Oracle JDeveloper project. To create a new application:

Select Java Desktop Application in New Gallery.
Specify an Application Name (e.g., NoSQLDB) and select the default directory. Click Next.
Specify a Project Name (e.g., NoSQLDB) and click Finish.

Next, create a Java class in the project.

Select Java Class in New Gallery and click OK.
In Create Java Class specify class name as “HelloBigDataWorld” and package as “hello”. Click OK. The hello.HelloBigDataWorld class is added to the application.
Copy the hello/HelloBigDataWorld.java file from the C:\OracleNoSQL\kv-1.2.123\examples directory to the class file in Oracle JDeveloper.

In the example application, a new oracle.kv.KVStore is created using the KVStoreFactory class:

store = KVStoreFactory.getStore(new KVStoreConfig(storeName, hostName + ":" + hostPort));

Key/value pairs are created and stored in the KV store:

final String keyString = "Hello";
final String valueString = "Big Data World!";
store.put(Key.createKey(keyString), Value.createValue(valueString.getBytes()));

The key/value are retrieved from the store and output. Subsequently the KV store is closed.

final ValueVersion valueVersion = store.get(Key.createKey(keyString));
System.out.println(keyString + " " + new String(valueVersion.getValue().getValue())+ "\n ");
store.close();

The hello.HelloBigDataWorld class is shown below.

To run the HelloBigDataWorld class add the C:\OracleNoSQL\kv-1.2.123\lib\kvclient-1.2.123.jar file to the Libraries and Classpath.

To run the application right-click on the class and select Run. The hello.HelloBigDataWorld class runs and one line of output is generated. The example application creates only one key/value pair.

In the next section we will run the hadoop.CountMinorKeys.java example. To prepare for that, rerun the HelloBigDataWorld example to create additional key/value pairs in the KV store:

Processing NoSQL Database Data in Hadoop

Next, we will run the Hadoop example in C:\OracleNoSQL\kv-1.2.123\examples\hadoop\CountMinorKeys.java. Create a Java class called hadoop/CountMinorKeys.java and copy the \examples\hadoop\CountMinorKeys.java file to that class.

Add the CDH jar file to the project..

Configuring the Hadoop Cluster

Next, we will configure the Hadoop cluster. In CDH2 there are three configuration files: core-site.xml, mapred-site.xml, and hdfs-site.xml. In the conf/core-site.xml specify the fs.default.name parameter, which is the URI of NameNode.

        fs.default.name
        hdfs://localhost:9100

The core-site.xml is shown below.

In conf/mapred-site.xml specify the mapred.job.tracker parameter for the Host or IP and port of JobTracker. Specify host as localhost and port as 9101.

        mapred.job.tracker
        localhost:9101

The conf/mapred-site.xml is shown below.

Specify the dfs.replication parameter in conf/hdfs-site.xml configuration file. The dfs.replication parameter specifies how many machines a single file should be replicated to before becoming available. The value should not exceed the number of DataNodes. (We use one DataNode in this example.)

        dfs.replication
        1

The conf/hdfs-site.xml is shown below.

Having configured a Hadoop cluster, we now start the cluster. But, first, we need to create a Hadoop Distributed File System (HDFS) for the files used in processing the Hadoop data. Run the following command in Cygwin.

>cd hadoop-0.20.1+169.127

>bin/hadoop namenode -format

A storage directory, \tmp\hadoop-dvohra\dfs, is created.

We also need to create a deployment profile for the hadoop.CountMinorKeys application. Select the project node in Application Navigator and select File>New.
In New Gallery select Deployment Profiles JAR File and click OK.
In Create Deployment Profile, specify Deployment Profile Name (hadoop) and click OK.
In Edit JAR Deployment Profile Properties, select the default settings and click OK.
A new deployment profile is created. Click OK.

To deploy the deployment profile right-click on the NoSQL project and select Deploy>hadoop.

In Deployment Action, select Deploy to JAR file and click Next. Click Finish in Summary. The hadoop.jar gets deployed to the deploy directory in the JDeveloper project. Copy the hadoop.jar to the C:\cygwin\home\dvohra\hadoop-0.20.1+169.127 directory as the application shall be run from the hadoop-0.20.1+169.127 directory in Cygwin.

Starting the Hadoop Cluster

Typically a multi-node Hadoop cluster consists of the following nodes.

Node Name	Function	Type
NameNode	For the HDFS storage layer management. We formatted the NameNode to create a storage layer in the previous section.	master
JobTracker	MapReduce data processing management; assigns tasks	master
DataNode	Stores filesystem data, HDFS storage layer processing	slave
TaskTracker	MapReduce processing	slave
Secondary NameNode	Stores modifications to the filesystem and periodically merges the changes with the current HDFS state.

Next, we shall start the nodes in the cluster. To start the NameNode run the following commands in Cygwin.

> cd hadoop-0.20.1+169.127
> bin/hadoop namenode

Start the Secondary NameNode with the following commands:

> cd hadoop-0.20.1+169.127
> bin/hadoop secondarynamenode

Start the DataNode:

> cd hadoop-0.20.1+169.127
> bin/hadoop datanode

Start the JobTracker :

> cd hadoop-0.20.1+169.127
> bin/hadoop jobtracker

Start the TaskTracker:

> cd hadoop-0.20.1+169.127
> bin/hadoop tasktracker

Running a MapReduce Job

Next, we shall run the hadoop.CountMinorKeys application for which created the hadoop.jar file. The hadoop.CountMinorKeys application runs a MapReduce job on the Oracle NoSQL Database data in the KV store and generates an output in the Hadoop HDFS. The NoSQL Database server Java API is in the kvclient-1.2.123.jar directory. Copy the kvclient-1.2.123.jar from the C:\NoSQLDB\kv-1.2.123\lib directory to the C:\cygwin\home\dvohra\hadoop-0.22.0\lib directory, which is in the classpath of Hadoop. Run the hadoop.jar with the following commands in Cygwin.

> cd hadoop-0.20.1+169.127
> bin/hadoop jar hadoop.jar hadoop.CountMinorKeys kvstore dvohra-PC:5000 hdfs://localhost:9100/tmp/hadoop/output/

The MapReduce job runs and the output is generated in the hdfs://localhost:9100/tmp/hadoop/output/ directory.

List the files in the temp/hadoop/output directory with the following command.

> bin/hadoop dfs -ls hdfs://localhost:9100/tmp/hadoop/output

The MapReduce job output is generated in the part-r-00000 file, which gets listed with the previous command.

Get the part-r-00000 file to the local filesystem with the command:

bin/hadoop dfs -get hdfs://localhost:9100/tmp/hadoop/output/part-r-00000 part-r-00000

The MapReduce job ouput is shown in Oracle JDeveloper; the output lists the number of records for each major key in the KV store, which was created with the first example application, hello.HelloBigDataWorld.

Congratulations, your project is complete!

Visit https://github.com/mkamithkumar/hadoop-eclipse-plugins and find if eclipse plugin for your hadoop version is available . If not follow the below to build your own.

Prerequisites:

Install Java (jdk1.7.0_21 is used here)
Install Hadoop. Refer https://docs.google.com/document/d/1v-J19xwJn-Pw9F8OCgLn04dqKwYkGIOxyqRAIGpmHk0/edit?pli=1
Install Eclipse(juno is used here)

Steps:

1. Open Eclipse using terminal or dashboard.

2. Under Eclipse, Click File>Import, select General>Existing project into workspace and click Next.

3. Set Select root directory to ${YOUR_HADOOP_HOME}/src/contrib/eclipse-plugin then under Projects select MapReduceTools and click Finish (Make sure you do not check any options in this window like Copy projects into workspace).

4. MapReduceTools project will appear under Project Explorer window. Right click it and select Build Path>Configure Build Path.

5. In opened windows under Java Build Path> Libraries double click hadoop-core-{version}.jar and browse it to ${YOUR_HADOOP_HOME}/hadoop-core-{version}.jar.Click OK to confirm.

6. Now open build.xml under MapReduceTools project and add below lines just after

/*some jar files inclusion*/

7. In build.xml add below line just after .

8. In build.xml replace lines

with

9. Save build.xml and close this file.

10. Now open MANIFEST.MF under MapReduceTools/META-INF folder and replace Bundle-ClassPath: line with below, save it and close this file

Bundle-ClassPath: classes/,

lib/hadoop-core.jar,

lib/commons-cli-1.2.jar,

lib/commons-configuration-1.6.jar,

lib/commons-httpclient-3.0.1.jar,

lib/commons-lang-2.4.jar,

lib/jackson-core-asl-1.8.8.jar,

lib/jackson-mapper-asl-1.8.8.jar

11. To edit another file we have to open Terminal and open file build-contrib.xml

hduser@Hadoop:~$ sudo gedit ${YOUR_HADOOP_HOME}/src/contrib/build-contrib.xml

12. build-contrib.xml add below 2 lines

<!– This is Hadoop Version –>

<!– This is eclipse folder path –>

just after

13. Now move back to Eclipse and right click build.xml file under Project Explorer and click Run As>Ant Build. After successful build the jar file can be obtained from

${YOUR_HADOOP_HOME}/build/contrib/eclipse-plugin folder

14. Copy above compiled jar file to Eclipse plugins directory and enjoy.

Reference:

http://iredlof.com/part-4-compile-hadoop-v1-0-4-eclipse-plugin-on-ubuntu-12-10/

@raghujuluri

Friday, June 28, 2013

Oracle NoSQL Database with Cloudera Distribution for Hadoop

Using Oracle NoSQL Database with Cloudera Distribution for Hadoop

Setup

Configuring Oracle NoSQL Database in Oracle JDeveloper

Running the HelloBigDataWorld Example

Processing NoSQL Database Data in Hadoop

Running a MapReduce Job

Sunday, June 02, 2013

Create Eclipse(Juno) Plugin for Hadoop

Blog Archive

About Me

Popular Posts