Hive has two types of tables:
Managed table
External table
Managed table:
Managed table is also called as Internal table. This is the default table in Hive. When we create a table in Hive without specifying it as a managed or external, by default we will get a Managed table.
If we create a table as a managed table, the table will be created in a specific location in HDFS.
By default the table data will be created in /usr/hive/warehouse directory of HDFS.
If we delete a Managed table, both the table data and meta data for that table will be deleted from the HDFS.
hive>create table employee(ename String, esal Int) row format delimited fields terminated by ',';
hive>load data local inpath <path> into table employee;
Check the contents of the table in HDFS by using the below command:
>hadoop dfs -ls hdfs://localhost:9000/user/hive/warehouse/employee
EXTERNAL TABLE:
External table is created for external use as when the data is used outside Hive. Whenever we want to delete the table’s metadata and want to keep the table’s data as it is, we use External table. External table only deletes the schema of the table.
hive> create external table employee_ext(ename String, esal Int) row format delimited fields terminated by ',';
hive>load data local inpath <hive> into table employee_ext;
check the HDFS location of the table using the below command:
>hadoop dfs -ls hdfs://localhost:9000/user/hive/warehouse/employee_ext
When to use External and Managed table
Managed table
Data is temporary
Hive to Manage the table data completely not allowing any external source to use the table
Don’t want data after deletion
External table
The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files
Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things
You are not creating table based on existing table (AS SELECT)
Can create table back and with the same schema and point the location of the data
Managed table
External table
Managed table:
Managed table is also called as Internal table. This is the default table in Hive. When we create a table in Hive without specifying it as a managed or external, by default we will get a Managed table.
If we create a table as a managed table, the table will be created in a specific location in HDFS.
By default the table data will be created in /usr/hive/warehouse directory of HDFS.
If we delete a Managed table, both the table data and meta data for that table will be deleted from the HDFS.
hive>create table employee(ename String, esal Int) row format delimited fields terminated by ',';
hive>load data local inpath <path> into table employee;
Check the contents of the table in HDFS by using the below command:
>hadoop dfs -ls hdfs://localhost:9000/user/hive/warehouse/employee
EXTERNAL TABLE:
External table is created for external use as when the data is used outside Hive. Whenever we want to delete the table’s metadata and want to keep the table’s data as it is, we use External table. External table only deletes the schema of the table.
hive> create external table employee_ext(ename String, esal Int) row format delimited fields terminated by ',';
hive>load data local inpath <hive> into table employee_ext;
check the HDFS location of the table using the below command:
>hadoop dfs -ls hdfs://localhost:9000/user/hive/warehouse/employee_ext
When to use External and Managed table
Managed table
Data is temporary
Hive to Manage the table data completely not allowing any external source to use the table
Don’t want data after deletion
External table
The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files
Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things
You are not creating table based on existing table (AS SELECT)
Can create table back and with the same schema and point the location of the data
No comments:
Post a Comment