Friday, June 28, 2013

Oracle NoSQL Database with Cloudera Distribution for Hadoop

Using Oracle NoSQL Database with Cloudera Distribution for Hadoop

By Deepak Vohra
Get a test project up and running to explore the basic principles involved.
Introduced in 2011, Oracle NoSQL Database is a highly available, highly scalable, key/value storage based (nonrelational) database that provides support for CRUD operations via a Java API. A related technology, the Hadoop MapReduce framework, provides a distributed environment for developing applications that process large quantities of data in parallel on large clusters.
In this article we discuss integrating Oracle NoSQL Database with Cloudera Distribution for Hadoop (CDH) on Windows OS via an Oracle JDeveloper project (download). We will also demonstrate processing the NoSQL Database data in Hadoop using a MapReduce job.

Setup

The following software is required for this project. Download and install anything on the list you don’t already have according to the respective instructions.
Install Java 1.7 in a directory (without spaces in its name) in the directory path. Set the JAVA_HOME environment variable.

Configuring Oracle NoSQL Database in Oracle JDeveloper

First, we’ll need to configure the NoSQL database server as an external tool in JDeveloper. Select Tools>External Tools. In the External Tools window select New. In the Create External Tool wizard select Tool Type: External Program and click Next. In Program Optionsspecify the following program options.
Field
Value
Program Executable
C:\JDK7\Java\jdk1.7.0_05\bin\java.exe
Arguments
-jar ./lib/kvstore-1.2.123.jar kvlite
Run Directory
C:\OracleNoSQL\kv-1.2.123

Click Finish in Create External Tools:
nosql-hadoop-f1 
Oracle NoSQL Database is now configured as an external tool; the external tool name may vary based on whether other tools requiring the same program executable are also configured.  Click on OK in External Tools.
Next, select Tools>Java 1. The Oracle NoSQL Database server starts up and a key-value (KV) store is created. 
nosql-hadoop-f2
The NoSQL Database store has the following args by default:
Arg
Value
-root
kvroot
-store
kvstore
-host
localhost
-port
5000
-admin
5001

On subsequent runs of the external tool for the NoSQL Database server the existing KV store is opened with the same configuration with which it was created: 

nosql-hadoop-f3


Running the HelloBigDataWorld Example

The NoSQL Database package includes some examples in the C:\OracleNoSQL\kv-1.2.123\examples directory. We will run the following examples in this article:
  • hello.HelloBigDataWorld
  • hadoop.CountMinorKeys 
The HelloBigDataWorld example can be run using an external tool configuration or as a Java application.
Using as an External Tool
To run HelloBigDataWorld as an external tool select Tools>External Tools and create a new external tool configuration with the same procedure as with the NoSQL Database server. We need to create two configurations, one for compiling the HelloBigDataWorld file and another for running the compiled application. Specify the following program options for compiling HelloBigDataWorld.
Program Option
Value
Program Executable
C:\JDK7\Java\jdk1.7.0_05\bin\javac.exe
Arguments
-cp ./examples;./lib/kvclient-1.2.123.jar examples/hello/HelloBigDataWorld.java
Run Directory
C:/OracleNoSQL/kv-1.2.123

The program options for compiling the hello/HelloBigDataWorld.java file are shown below. Click Finish.
 nosql-hadoop-f4
An external tool Javac gets created. Select Tools>Javac to compile the hello/ HelloBigDataWorld.java class. Next, create an external tool for running the hello.HelloBigDataWorld class file using the following configuration.
Program Option
Value
Program Executable
C:\JDK7\Java\jdk1.7.0_05\bin\java.exe
Arguments
-cp ./examples;./lib/kvclient-1.2.123.jar hello.HelloBigDataWorld
Run Directory
C:/OracleNoSQL/kv-1.2.123

The classpath should include the kvclient-1.2.123.jar file. Click Finish
nosql-hadoop-f5
To run the hello.HelloBigDataWorld class select Tools>Java. The hello.HelloBigDataWorld application runs and a short message is written.
 nosql-hadoop-f6
Running in a Java Application
Next, we will run the hello.HelloBigDataWorld application as a Java application in an Oracle JDeveloper project. To create a new application:
  • Select Java Desktop Application in New Gallery.
  • Specify an Application Name (e.g., NoSQLDB) and select the default directory. Click Next.
  • Specify a Project Name (e.g., NoSQLDB) and click Finish
Next, create a Java class in the project.
  • Select Java Class in New Gallery and click OK.
  • In Create Java Class specify class name as “HelloBigDataWorld” and package as “hello”. Click OK. The hello.HelloBigDataWorld class is added to the application.
  • Copy the hello/HelloBigDataWorld.java file from the C:\OracleNoSQL\kv-1.2.123\examples directory to the class file in Oracle JDeveloper.
In the example application, a new oracle.kv.KVStore is created using the KVStoreFactory class:
store = KVStoreFactory.getStore(new KVStoreConfig(storeName, hostName + ":" + hostPort));
Key/value pairs are created and stored in the KV store:
        final String keyString = "Hello";
        final String valueString = "Big Data World!";
store.put(Key.createKey(keyString), Value.createValue(valueString.getBytes()));
The key/value are retrieved from the store and output. Subsequently the KV store is closed.
final ValueVersion valueVersion = store.get(Key.createKey(keyString));
System.out.println(keyString + " " + new String(valueVersion.getValue().getValue())+ "\n ");
store.close();
The hello.HelloBigDataWorld class is shown below.
 nosql-hadoop-f7
To run the HelloBigDataWorld class add the C:\OracleNoSQL\kv-1.2.123\lib\kvclient-1.2.123.jar file to the Libraries and Classpath.
 nosql-hadoop-f8
To run the application right-click on the class and select Run. The hello.HelloBigDataWorld class runs and one line of output is generated.  The example application creates only one key/value pair.
In the next section we will run the hadoop.CountMinorKeys.java example. To prepare for that, rerun the HelloBigDataWorld example to create additional key/value pairs in the KV store:
 nosql-hadoop-f9 

Processing NoSQL Database Data in Hadoop

Next, we will run the Hadoop example in C:\OracleNoSQL\kv-1.2.123\examples\hadoop\CountMinorKeys.java. Create a Java class called hadoop/CountMinorKeys.java and copy the \examples\hadoop\CountMinorKeys.java file to that class.
 nosql-hadoop-f10
Add the CDH jar file to the project..
 nosql-hadoop-f11
Configuring the Hadoop Cluster
Next, we will configure the Hadoop cluster. In CDH2 there are three configuration files: core-site.xml, mapred-site.xml, and hdfs-site.xml.  In the conf/core-site.xml specify the fs.default.name parameter, which is the URI of NameNode.





        fs.default.name
        hdfs://localhost:9100
   
The core-site.xml is shown below.
 nosql-hadoop-f12
In conf/mapred-site.xml specify the mapred.job.tracker parameter for the Host or IP and port of JobTracker. Specify host as localhost and port as 9101.



        mapred.job.tracker
        localhost:9101
   
The conf/mapred-site.xml is shown below.
 nosql-hadoop-f13
Specify the dfs.replication parameter in conf/hdfs-site.xml configuration file. The dfs.replication parameter specifies how many machines a single file should be replicated to before becoming available. The value should not exceed the number of DataNodes. (We use one DataNode in this example.)




   
        dfs.replication
        1
   
The conf/hdfs-site.xml is shown below.
nosql-hadoop-f14

Having configured a Hadoop cluster, we now start the cluster. But, first, we need to create a Hadoop Distributed File System (HDFS) for the files used in processing the Hadoop data. Run the following command in Cygwin.
>cd hadoop-0.20.1+169.127
>bin/hadoop namenode -format
A storage directory, \tmp\hadoop-dvohra\dfs, is created.
nosql-hadoop-f15
 
  • We also need to create a deployment profile for the hadoop.CountMinorKeys application. Select the project node in Application Navigator and select File>New.
  • In New Gallery select Deployment Profiles JAR File and click OK.
  • In Create Deployment Profile, specify Deployment Profile Name (hadoop) and click OK.
  • In Edit JAR Deployment Profile Properties, select the default settings and click OK.
  • A new deployment profile is created. Click OK.
To deploy the deployment profile right-click on the NoSQL project and select Deploy>hadoop.
 nosql-hadoop-f16

In Deployment Action, select Deploy to JAR file and click Next. Click Finish in Summary. The hadoop.jar gets deployed to the deploy directory in the JDeveloper project. Copy the hadoop.jar to the C:\cygwin\home\dvohra\hadoop-0.20.1+169.127 directory as the application shall be run from the hadoop-0.20.1+169.127 directory in Cygwin.
Starting the Hadoop Cluster
Typically a multi-node Hadoop cluster consists of  the following nodes.
Node Name
Function
Type
NameNode
For the HDFS storage layer management. We formatted the NameNode to create a storage layer in the previous section.
master
JobTracker
MapReduce data processing management; assigns tasks
master
DataNode
Stores filesystem data, HDFS storage layer processing  
slave
TaskTracker
MapReduce processing
slave
Secondary NameNode
Stores modifications to the filesystem and periodically merges the changes with the current HDFS state.


Next, we shall start the nodes in the cluster. To start the NameNode run the following commands in Cygwin.
> cd hadoop-0.20.1+169.127
> bin/hadoop namenode
 nosql-hadoop-f17
Start the Secondary NameNode with the following commands:
> cd hadoop-0.20.1+169.127
> bin/hadoop secondarynamenode
nosql-hadoop-f18

Start the DataNode:
> cd hadoop-0.20.1+169.127
> bin/hadoop datanode
nosql-hadoop-f19

Start the JobTracker :
> cd hadoop-0.20.1+169.127
> bin/hadoop jobtracker
nosql-hadoop-f20

Start the TaskTracker:
> cd hadoop-0.20.1+169.127
> bin/hadoop tasktracker
 nosql-hadoop-f21


Running a MapReduce Job

Next, we shall run the hadoop.CountMinorKeys application for which created the hadoop.jar file. The hadoop.CountMinorKeys  application runs a MapReduce job on the Oracle NoSQL Database data in the KV store and generates an output in the Hadoop HDFS. The NoSQL Database server Java API is in the kvclient-1.2.123.jar directory. Copy the kvclient-1.2.123.jar from the C:\NoSQLDB\kv-1.2.123\lib directory to the C:\cygwin\home\dvohra\hadoop-0.22.0\lib directory, which is in the classpath of Hadoop. Run the hadoop.jar with the following commands in Cygwin.
> cd hadoop-0.20.1+169.127
> bin/hadoop jar hadoop.jar hadoop.CountMinorKeys   kvstore dvohra-PC:5000 hdfs://localhost:9100/tmp/hadoop/output/  
The MapReduce job runs and the output is generated in the hdfs://localhost:9100/tmp/hadoop/output/  directory.
 nosql-hadoop-f22
List the files in the temp/hadoop/output directory with the following command.
> bin/hadoop dfs -ls hdfs://localhost:9100/tmp/hadoop/output
The MapReduce job output is generated in the part-r-00000 file, which gets listed with the previous command.
nosql-hadoop-f23

Get the part-r-00000 file to the local filesystem with the command:
bin/hadoop dfs -get hdfs://localhost:9100/tmp/hadoop/output/part-r-00000  part-r-00000
The MapReduce job ouput is shown in Oracle JDeveloper; the output lists the number of records for each major key in the KV store, which was created with the first example application, hello.HelloBigDataWorld.
 nosql-hadoop-f24
Congratulations, your project is complete! 

Sunday, June 02, 2013

Create Eclipse(Juno) Plugin for Hadoop



Visit https://github.com/mkamithkumar/hadoop-eclipse-plugins and find if eclipse plugin for your hadoop version is available . If not follow the below to build your own.

Prerequisites:


Steps:

1. Open Eclipse using terminal or dashboard.
2. Under Eclipse, Click File>Import, select General>Existing project into workspace and click Next.
3. Set Select root directory to ${YOUR_HADOOP_HOME}/src/contrib/eclipse-plugin then under Projects select MapReduceTools and click Finish (Make sure you do not check any options in this window like Copy projects into workspace).
4. MapReduceTools project will appear under Project Explorer window. Right click it and select Build Path>Configure Build Path.
5. In opened windows under Java Build Path> Libraries double click hadoop-core-{version}.jar and browse it to ${YOUR_HADOOP_HOME}/hadoop-core-{version}.jar.Click OK to confirm.
6. Now open build.xml under MapReduceTools project and add below lines just after
/*some jar files inclusion*/
7. In build.xml add below line just after .
8. In build.xml replace lines
with

9. Save build.xml and close this file.
10. Now open MANIFEST.MF under MapReduceTools/META-INF folder and replace Bundle-ClassPath: line with below, save it and close this file
Bundle-ClassPath: classes/,
lib/hadoop-core.jar,
lib/commons-cli-1.2.jar,
lib/commons-configuration-1.6.jar,
lib/commons-httpclient-3.0.1.jar,
lib/commons-lang-2.4.jar,
lib/jackson-core-asl-1.8.8.jar,
lib/jackson-mapper-asl-1.8.8.jar



11. To edit another file we have to open Terminal and open file build-contrib.xml
hduser@Hadoop:~$ sudo gedit ${YOUR_HADOOP_HOME}/src/contrib/build-contrib.xml

12. build-contrib.xml add below 2 lines

<!– This is Hadoop Version –>
<!– This is eclipse folder path –>
just after

13. Now move back to Eclipse and right click build.xml file under Project Explorer and click Run As>Ant Build. After successful build the jar file can be obtained from
${YOUR_HADOOP_HOME}/build/contrib/eclipse-plugin folder

14. Copy above compiled jar file to Eclipse plugins directory and enjoy.

Reference:


Popular Posts