Hadoop Interview Questions and Answers
Related differences
Ques 36. What is the difference between Input Split and HDFS Block?
The Logical division of data is called Input Split and physical division of data is called HDFS Block.
Ques 37. What is the difference between HDFS and NAS?
HDFS data blocks are distributed across local drives of all machines in a cluster whereas, NAS data is stored on dedicated hardware.
Ques 38. What is the difference between Hadoop and other data processing tools?
Hadoop facilitates you to increase or decrease the number of mappers without worrying about the volume of data to be processed.
Ques 39. What is distributed cache in Hadoop?
Distributed cache is a facility provided by MapReduce Framework. It is provided to cache files (text, archives etc.) at the time of execution of the job. The Framework copies the necessary files to the slave node before the execution of any task at that node.
Ques 40. What is the functionality of JobTracker in Hadoop? How many instances of a JobTracker run on Hadoop cluster?
JobTracker is a giant service which is used to submit and track MapReduce jobs in Hadoop. Only one JobTracker process runs on any Hadoop cluster. JobTracker runs it within its own JVM process.
Functionalities of JobTracker in Hadoop:
- When client application submits jobs to the JobTracker, the JobTracker talks to the NameNode to find the location of the data.
- It locates TaskTracker nodes with available slots for data.
- It assigns the work to the chosen TaskTracker nodes.
- The TaskTracker nodes are responsible to notify the JobTracker when a task fails and then JobTracker decides what to do then. It may resubmit the task on another node or it may mark that task to avoid.
Most helpful rated by users: