Sunday, 5 March 2017



Hadoop includes three main Components
  • HDFS
  • MapReduce
  • YARN
HDFS:
  • HDFS means Hadoop Distributed File System and it manages big data sets with high volume.
  • HDFS stores the data at distributed manner and it is the primary storage system
  • HDFS allows read and write the files but cannot updated the files in HDFS.
  • When we move file in HDFS that file are automatically split into small files and that small files are replicated of three different servers.
  • HDFS are implemented by Master Slave architecture.Master means namenode and Slave means datande.
NameNode:
  • NameNode is the heart and master oh Hadoop
  • It maintains the namespace system of hadoop
  • NameNode stores the metadata of data blocks that data are permanently stored on Local disk
  • It reduced disk space also.
  • There are two types of NameNode(active and standby)
Secondary Node:
  • Main role of secondary namenode is copy and merge the namespace.
  • Secondary namenode requires huge amount of memory to merge the files.
  • If namenode failure namespace images are stored in secondary namenode and it can be restart the namenode.
DataNode:
  • DataNode also known as slaves.
  • It can be used to actual storage.
  • Work of Datanode are based on NameNode instructions only
MapReduce:
1.Mapper Task:
  • It takes the one input and divide into small parts and distribute to another node.
  • To solve all small programs and send the results to master node.
2.Reducer Task:
  • It combines the all master node results and arranging the results at some formats.
YARN:
  • YARN means Yet Another Resource Negotiator.
  • YARN is the resource management responsible for managing resources in cluster and scheduling applications.
  • It also Known as MapReduce2
  • Responsible of YARN is managing and monitoring the work loads.
  • Yarn Having following two components
               1.Resource Manager(NameNode)
               2.Node Manager(DataNode)