Wednesday, April 13, 2016

MySQL Metastore and Apache Derby

Hive stores the metadata related to tables and databases into the external RDBMS like Apache Derby or MYSQL and metadata.

Metastore and database: 

The metastore service provides the interface to the Hive.
The database stores the data definitions and mappings to the data.
The metastore (which consists of services and database) can be configured in different ways. Embedded Apache Derby is used as the default Hive metastore in the Hive configuration. This configuration is called embedded metastore and is good for the sake of development and unit testing, but won’t scale to a production environment as only a single user can connect to the derby database at any instant of time. Starting second instance of the Hive driver will throw an error message.

Apache Derby:

Apache Derby, an Apache DB subproject, is an open source relational database implemented entirely in Java. Some key features include:

Derby is based on the Java, JDBC, and SQL standards.
Derby provides an embedded JDBC driver that lets you embed Derby in any Java-based solution.
Derby also supports the more familiar client/server mode with the Derby Network Client JDBC driver and Derby Network Server.
Derby is easy to install, deploy, and use.
Most importantly Derby is single instance database, which means only one user can access the derby instance at one time and this had been a motivational factor to include Mysql as the default metastore.

Advantages Of using Mysql as a metastore in Hive-

It is Stable
It keeps a track of metadata.

It can support multiple instances of Hive.

In order to change the default metastore from Derby to Mysql we need to change the property in Hive-site.xml.


Since Hive-0.10, we get only hive-default.xml. We need to explicitly create Hive-site.xml to override the default property containing the configuration of Apache Derby.

No comments:

Post a Comment