Thursday, July 26, 2012

HIVE - HADOOP : MSQL AS METASTORE - Part III

HIVE - HADOOP : MSQL AS METASTORE - Part III


In my previous article (Hive with Derby ) described the drawback of using Embedded Metastore i.e only one Hive Session can open connection to this Metastore so at any point of time only one user will be active others will be passive , to over come this use Standalone Database as MetaStore.


If you use Standalone Database then Hive Supports Multiple Sessions therefore Multiple Users can access at same instant, this is referred as Local Metastore. Any JDBC-Compliant database could be used as Metastore. In previous article i demonstrated how to install and start Derby Database and also how to configure hive to connect to this Derby  with a example.




In the Current blog, i wanted to demonstrate how to install , execute Mysql Server and configuration to be made in Hive to connect to Mysql server and store metadata.Mysql is world's most used open source Relational Database Management System (RDBMS).


MYSQL Installation :


1) Download MySql software from  MYSQL Software  
2) Modify the password for MySql 


if you face exception during password change.. ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO) 



/etc/init.d/mysqld stop


mysqld_safe --skip-grant-tables &
mysql -u root
mysql> use mysql;
mysql> update user set password=PASSWORD("newrootpassword") where User='root';
mysql> flush privileges;
mysql> quit
/etc/init.d/mysqld stop
/etc/init.d/mysqld start





For Example:



[root@slc01mcd ~]# mysqld_safe --skip-grant-tables &
[1] 12943
[root@slc01mcd ~]# Starting mysqld daemon with databases from /var/lib/mysql
mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.77 Source distribution


Type 'help;' or '\h' for help. Type '\c' to clear the buffer.


mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Database changed
mysql> update user set password=PASSWORD("Welcome1") where User='root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 3  Changed: 2  Warnings: 0


mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)


mysql>  quit
Bye

Once the Password set .. check the verify the tables present in MySql DB

[root@slc01mcd ~]# /etc/init.d/mysqld start
Starting MySQL:                                            [  OK  ]
[root@slc01mcd ~]# mysql --user=root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.77 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;
ERROR 1046 (3D000): No database selected

/homeHADOOP/hive-0.9.0/conf/hive-site.xml

HIVE Configuration (hive-site.xml)

  hive.metastore.local
  true
  controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM

  javax.jdo.option.ConnectionURL
jdbc:derby://hostname:1527/metastore_db;create=true
  JDBC connect string for a JDBC metastore

  javax.jdo.option.ConnectionDriverName
org.apache.derby.jdbc.ClientDriver
  Driver class name for a JDBC metastore

  javax.jdo.option.ConnectionUserName
  root
  username to use against metastore database

  javax.jdo.option.ConnectionPassword
  Welcome1
  password to use against metastore database

Jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=root
javax.jdo.option.ConnectionPassword=Welcome1

Copy JDBC driver Jar file for Mysql to the hadoop lib floder

cp mysql-connector-java-5.1.11/*.jar  /homeHADOOP/hive-0.9.0/lib

Sample Program :

Hive> show tables;

OK
Time taken: 0.048 seconds

Execute max_cgpa.q script (Previous article Hive - Part-I )


[root@slc01mcd hive-0.9.0]# hive -f max_cgpa.q 

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201207180701_1297257265.txt
OK
Time taken: 3.819 seconds
Copying data from file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt
Copying file: file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt
Loading data to table default.maxcgpa1
Deleted file:/user/hive/warehouse/maxcgpa1
OK
Time taken: 0.383 seconds
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Execution log at: /tmp/root/root_20120718070101_e67d60e9-5f66-482a-8151-eebc5cfefaf4.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2012-07-18 07:01:23,161 null map = 0%,  reduce = 0%
2012-07-18 07:01:26,174 null map = 100%,  reduce = 0%
2012-07-18 07:01:29,186 null map = 100%,  reduce = 100%
Ended Job = job_local_0001

Execution completed successfully

Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
cse     8.6
ece     9.0
Time taken: 10.476 seconds

[root@slc01mcd hive-0.9.0]# hive show tables;

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201207180701_545279755.txt
hive> show tables;
OK
maxcgpa1
Time taken: 2.769 seconds

Now Verify whether newly created table maxcgpa1 is in MySql databse or not ..

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema | 
| mysql              | 
| test               | 
+--------------------+
3 rows in set (0.00 sec)

This Shows that MySql Database is used as MetaStore by Hive from Now on .. and Multiple Sessions (Users) can Connect to Hive .. and perform hadoop operations ...





1 comment:

Alfred Avina said...

If we consider the Big data modernization solutions, then adaptive learning is an excellent way to make it successful.

Popular Posts