Apache Hive Interview Questions and Answers
Intermediate / 1 to 5 years experienced level questions & answers
Ques 1. Explain the key features of Apache Hive.
Key features include SQL-like queries (HiveQL), schema-on-read, extensibility, and scalability.
Ques 2. Differentiate between Hive and HBase.
Hive is a data warehousing solution, whereas HBase is a NoSQL database for real-time read/write access to large datasets.
Ques 3. Explain the difference between Hive and traditional relational databases.
Hive is schema-on-read, while traditional databases are schema-on-write.
Ques 4. How can you load data into Hive from an external table?
You can use the 'LOAD DATA INPATH' or 'INSERT OVERWRITE' command to load data into Hive from an external table.
Example:
LOAD DATA INPATH '/path/to/data' INTO TABLE table_name;
Ques 5. What is the purpose of Hive UDFs (User-Defined Functions)?
Hive UDFs allow users to define custom functions to perform operations not supported by built-in functions.
Ques 6. Explain Hive's internal architecture.
Hive consists of a query compiler, query optimizer, execution engine, and a metastore for storing metadata.
Ques 7. How can you perform data sorting in Hive?
You can use the 'SORT BY' clause in the 'CREATE TABLE' statement to achieve data sorting in Hive.
Example:
CREATE TABLE sorted_table (column1 INT, column2 STRING) SORT BY column1;
Ques 8. What are the differences between Hive and Pig?
Hive is SQL-based, while Pig uses a scripting language called Pig Latin. Hive is suitable for data warehousing, while Pig is more versatile for data processing.
Ques 9. How can you join tables in Hive?
You can perform joins in Hive using the standard SQL syntax, such as INNER JOIN, LEFT JOIN, and RIGHT JOIN.
Example:
SELECT * FROM table1 t1 INNER JOIN table2 t2 ON t1.id = t2.id;
Ques 10. How can you handle null values in Hive?
You can use the 'COALESCE' function or 'CASE' statement to handle null values in Hive queries.
Example:
SELECT column1, COALESCE(column2, 'NA') FROM table_name;
Ques 11. Explain dynamic partitioning in Hive.
Dynamic partitioning in Hive allows the automatic creation of partitions based on a specified column during the data insertion process.
Ques 12. How does Hive handle schema evolution?
Hive supports schema evolution, allowing the addition of new columns to existing tables without affecting the older data.
Ques 13. How can you enable Hive vectorization?
You can enable Hive vectorization by setting the 'hive.vectorized.execution.enabled' configuration property to true.
Ques 14. How can you perform data sampling in Hive?
You can use the 'TABLESAMPLE' clause in the 'SELECT' statement to perform data sampling in Hive.
Example:
SELECT * FROM table_name TABLESAMPLE(BUCKET x OUT OF total);
Ques 15. Explain the use of Hive's EXPLAIN statement.
The 'EXPLAIN' statement in Hive provides the execution plan of a query, helping in query optimization and troubleshooting.
Example:
EXPLAIN SELECT * FROM table_name;
Most helpful rated by users: