@raghujuluri: HIVE - HADOOP : MSQL AS METASTORE

HIVE - HADOOP : MSQL AS METASTORE - Part III

In my previous article (Hive with Derby ) described the drawback of using Embedded Metastore i.e only one Hive Session can open connection to this Metastore so at any point of time only one user will be active others will be passive , to over come this use Standalone Database as MetaStore.

If you use Standalone Database then Hive Supports Multiple Sessions therefore Multiple Users can access at same instant, this is referred as Local Metastore. Any JDBC-Compliant database could be used as Metastore. In previous article i demonstrated how to install and start Derby Database and also how to configure hive to connect to this Derby with a example.

In the Current blog, i wanted to demonstrate how to install , execute Mysql Server and configuration to be made in Hive to connect to Mysql server and store metadata.Mysql is world's most used open source Relational Database Management System (RDBMS).

MYSQL Installation :

1) Download MySql software from MYSQL Software
2) Modify the password for MySql

if you face exception during password change.. ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

/etc/init.d/mysqld stop

mysqld_safe --skip-grant-tables &
mysql -u root
mysql> use mysql;
mysql> update user set password=PASSWORD("newrootpassword") where User='root';
mysql> flush privileges;
mysql> quit
/etc/init.d/mysqld stop
/etc/init.d/mysqld start

For Example:

[root@slc01mcd ~]# mysqld_safe --skip-grant-tables &
[1] 12943
[root@slc01mcd ~]# Starting mysqld daemon with databases from /var/lib/mysql
mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.77 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> update user set password=PASSWORD("Welcome1") where User='root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 3 Changed: 2 Warnings: 0

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye

Once the Password set .. check the verify the tables present in MySql DB

[root@slc01mcd ~]# /etc/init.d/mysqld start

Starting MySQL: [ OK ]

[root@slc01mcd ~]# mysql --user=root -p

Enter password:

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 2

Server version: 5.0.77 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;

ERROR 1046 (3D000): No database selected

/homeHADOOP/hive-0.9.0/conf/hive-site.xml

HIVE Configuration (hive-site.xml)

hive.metastore.local

true

controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM

javax.jdo.option.ConnectionURL

jdbc:derby://hostname:1527/metastore_db;create=true

JDBC connect string for a JDBC metastore

javax.jdo.option.ConnectionDriverName

org.apache.derby.jdbc.ClientDriver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

root

username to use against metastore database

javax.jdo.option.ConnectionPassword

Welcome1

password to use against metastore database

Jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl

org.jpox.autoCreateSchema=false

org.jpox.validateTables=false

org.jpox.validateColumns=false

org.jpox.validateConstraints=false

org.jpox.storeManagerType=rdbms

org.jpox.autoCreateSchema=true

org.jpox.autoStartMechanismMode=checked

org.jpox.transactionIsolation=read_committed

javax.jdo.option.DetachAllOnCommit=true

javax.jdo.option.NontransactionalRead=true

javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver

javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true

javax.jdo.option.ConnectionUserName=root

javax.jdo.option.ConnectionPassword=Welcome1

Copy JDBC driver Jar file for Mysql to the hadoop lib floder

cp mysql-connector-java-5.1.11/*.jar /homeHADOOP/hive-0.9.0/lib

Sample Program :

Hive> show tables;

Time taken: 0.048 seconds

Execute max_cgpa.q script (Previous article Hive - Part-I )

[root@slc01mcd hive-0.9.0]# hive -f max_cgpa.q

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties

Hive history file=/tmp/root/hive_job_log_root_201207180701_1297257265.txt

Time taken: 3.819 seconds

Copying data from file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt

Copying file: file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt

Loading data to table default.maxcgpa1

Deleted file:/user/hive/warehouse/maxcgpa1

Time taken: 0.383 seconds

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapred.reduce.tasks=

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

Execution log at: /tmp/root/root_20120718070101_e67d60e9-5f66-482a-8151-eebc5cfefaf4.log

Job running in-process (local Hadoop)

Hadoop job information for null: number of mappers: 0; number of reducers: 0

2012-07-18 07:01:23,161 null map = 0%, reduce = 0%

2012-07-18 07:01:26,174 null map = 100%, reduce = 0%

2012-07-18 07:01:29,186 null map = 100%, reduce = 100%

Ended Job = job_local_0001

Execution completed successfully

Mapred Local Task Succeeded . Convert the Join into MapJoin

cse 8.6

ece 9.0

Time taken: 10.476 seconds

[root@slc01mcd hive-0.9.0]# hive show tables;

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties

Hive history file=/tmp/root/hive_job_log_root_201207180701_545279755.txt

hive> show tables;

maxcgpa1

Time taken: 2.769 seconds

Now Verify whether newly created table maxcgpa1 is in MySql databse or not ..

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| mysql |

| test |

+--------------------+

3 rows in set (0.00 sec)

This Shows that MySql Database is used as MetaStore by Hive from Now on .. and Multiple Sessions (Users) can Connect to Hive .. and perform hadoop operations ...

@raghujuluri

Thursday, July 26, 2012

HIVE - HADOOP : MSQL AS METASTORE - Part III

1 comment:

Blog Archive

About Me

Popular Posts