HIVE - HADOOP : MSQL AS METASTORE - Part III
In my previous article (Hive with Derby ) described the drawback of using Embedded Metastore i.e only one Hive Session can open connection to this Metastore so at any point of time only one user will be active others will be passive , to over come this use Standalone Database as MetaStore.
If you use Standalone Database then Hive Supports Multiple Sessions therefore Multiple Users can access at same instant, this is referred as Local Metastore. Any JDBC-Compliant database could be used as Metastore. In previous article i demonstrated how to install and start Derby Database and also how to configure hive to connect to this Derby with a example.
In the Current blog, i wanted to demonstrate how to install , execute Mysql Server and configuration to be made in Hive to connect to Mysql server and store metadata.Mysql is world's most used open source Relational Database Management System (RDBMS).
MYSQL Installation :
1) Download MySql software from MYSQL Software
2) Modify the password for MySql
if you face exception during password change.. ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
/etc/init.d/mysqld stop
mysqld_safe --skip-grant-tables &
mysql -u root
mysql> use mysql;
mysql> update user set password=PASSWORD("newrootpassword") where User='root';
mysql> flush privileges;
mysql> quit
/etc/init.d/mysqld stop
/etc/init.d/mysqld start
For Example:
[root@slc01mcd ~]# mysqld_safe --skip-grant-tables &
[1] 12943
[root@slc01mcd ~]# Starting mysqld daemon with databases from /var/lib/mysql
mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.77 Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> update user set password=PASSWORD("Welcome1") where User='root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 3 Changed: 2 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye
hive.metastore.local
true
controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM
javax.jdo.option.ConnectionURL
jdbc:derby://hostname:1527/metastore_db;create=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
org.apache.derby.jdbc.ClientDriver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
Welcome1
password to use against metastore database
In my previous article (Hive with Derby ) described the drawback of using Embedded Metastore i.e only one Hive Session can open connection to this Metastore so at any point of time only one user will be active others will be passive , to over come this use Standalone Database as MetaStore.
If you use Standalone Database then Hive Supports Multiple Sessions therefore Multiple Users can access at same instant, this is referred as Local Metastore. Any JDBC-Compliant database could be used as Metastore. In previous article i demonstrated how to install and start Derby Database and also how to configure hive to connect to this Derby with a example.
In the Current blog, i wanted to demonstrate how to install , execute Mysql Server and configuration to be made in Hive to connect to Mysql server and store metadata.Mysql is world's most used open source Relational Database Management System (RDBMS).
MYSQL Installation :
1) Download MySql software from MYSQL Software
2) Modify the password for MySql
if you face exception during password change.. ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
/etc/init.d/mysqld stop
mysqld_safe --skip-grant-tables &
mysql -u root
mysql> use mysql;
mysql> update user set password=PASSWORD("newrootpassword") where User='root';
mysql> flush privileges;
mysql> quit
/etc/init.d/mysqld stop
/etc/init.d/mysqld start
For Example:
[root@slc01mcd ~]# mysqld_safe --skip-grant-tables &
[1] 12943
[root@slc01mcd ~]# Starting mysqld daemon with databases from /var/lib/mysql
mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.77 Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> update user set password=PASSWORD("Welcome1") where User='root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 3 Changed: 2 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> quit
Bye
Once the Password set .. check the verify the tables present in MySql DB
[root@slc01mcd ~]# /etc/init.d/mysqld start
Starting MySQL: [ OK ]
[root@slc01mcd ~]# mysql --user=root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.77 Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show tables;
ERROR 1046 (3D000): No database selected
/homeHADOOP/hive-0.9.0/conf/hive-site.xml
HIVE Configuration (hive-site.xml)
Jpox.properties
javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby://hostname:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=root
javax.jdo.option.ConnectionPassword=Welcome1
Copy JDBC driver Jar file for Mysql to the hadoop lib floder
cp mysql-connector-java-5.1.11/*.jar
/homeHADOOP/hive-0.9.0/lib
Sample Program :
Hive> show tables;
OK
Time taken: 0.048 seconds
Execute max_cgpa.q script (Previous article Hive - Part-I )
[root@slc01mcd hive-0.9.0]# hive -f max_cgpa.q
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201207180701_1297257265.txt
OK
Time taken: 3.819 seconds
Copying data from file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt
Copying file: file:/scratch/rjuluri/HADOOP/hive-0.9.0/hivesample.txt
Loading data to table default.maxcgpa1
Deleted file:/user/hive/warehouse/maxcgpa1
OK
Time taken: 0.383 seconds
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Execution log at: /tmp/root/root_20120718070101_e67d60e9-5f66-482a-8151-eebc5cfefaf4.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2012-07-18 07:01:23,161 null map = 0%, reduce = 0%
2012-07-18 07:01:26,174 null map = 100%, reduce = 0%
2012-07-18 07:01:29,186 null map = 100%, reduce = 100%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
cse 8.6
ece 9.0
Time taken: 10.476 seconds
[root@slc01mcd hive-0.9.0]# hive show tables;
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/scratch/rjuluri/HADOOP/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201207180701_545279755.txt
hive> show tables;
OK
maxcgpa1
Time taken: 2.769 seconds
Now Verify whether newly created table maxcgpa1 is in MySql databse or not ..
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| test |
+--------------------+
3 rows in set (0.00 sec)
This Shows that MySql Database is used as MetaStore by Hive from Now on .. and Multiple Sessions (Users) can Connect to Hive .. and perform hadoop operations ...
1 comment:
If we consider the Big data modernization solutions, then adaptive learning is an excellent way to make it successful.
Post a Comment