Wednesday, April 13, 2016

Transactions in Hive

Transactions in Hive are introduced in Hive 0.13, but they only partially fulfill the ACID properties like atomicity, consistency, durability, at the partition level. Here, Isolation can be provided by turning on one of the locking mechanisms available with zookeeper or in memory.

But in Hive 0.14, new API’s have been added to completely fulfill the ACID properties while performing any transaction.

Transactions are provided at the row-level in Hive 0.14. The different row level transactions available in Hive 0.14 are as follows:

Insert
Delete
Update

There are numerous limitations with the present transactions available in Hive 0.14. ORC is the file format supported by Hive transaction. It is now essential to have ORC file format for performing transactions in Hive. The table needs to be bucketed in order to support transactions.

Row-level Transactions Available in Hive 0.14

Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.

hive> set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true; 
hive>set hive.exec.dynamic.partition.mode = nonstrict; 
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; 
hive>set hive.compactor.initiator.on = true; 
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;

If the above properties are not set properly, the ‘Insert’ operation will work but ‘Update’ and ‘Delete’ will not work

hive> CREATE TABLE emp(emp_id int,emp_name string,emp_loc string) clustered by (emp_id) into 5 buckets stored as orc TBLPROPERTIES('transactional'='true');

We are bucketing the table by ‘emp_id’ and the table format is ‘orc’, also we are enabling the transactions in the table by specifying it inside the TBLPROPERTIES as ‘transactional’=’true’

hive> INSERT INTO table emp values(1,'raju','hyd'),(2,'ramu','che'),(3,'venkat','che'),(4,'reddy','del'),(5,'stanford','uk'),(6,'JNTUA','atp'),(7,'cambridge','us');

The above command is used to insert row wise data into the Hive table. Here, each row is seperated by ‘( )’ brackets.

hive> select * from emp

hive> UPDATE emp set emp_id = 8 where emp_id = 7;

In this table, we have bucketed the ‘emp_id’ column and performing the Update operation on the same column, so we have go the error

hive> UPDATE emp set emp_name = 'Reddy' where emp_id = 6;

hive> select * from emp


hive> delete from emp where emp_id=5;



No comments:

Post a Comment