HADOOP: Contribute a limited/specific amount of storage as a slave to the cluster
In this task(4.1) we will be discussing how Hadoop contributes its limited or specific amount of storage as a slave to the cluster.
Lets first dig slightly into the main concepts of Hadoop known so far:
Hadoop is an open-source software framework used for storing and processing Big Data in a distributed manner on large clusters of commodity hardware. Hadoop is licensed under the Apache v2 license.
Hadoop was developed, based on the paper written by Google on the MapReduce system and it applies concepts of functional programming. Hadoop is written in the Java programming language and ranks among the highest-level Apache projects. Hadoop was developed by Doug Cutting and Michael J. Cafarella.
Features of Hadoop
When machines are working as a single unit, if one of the machines fails, another machine will take over the responsibility and work in a reliable and fault-tolerant fashion. Hadoop infrastructure has inbuilt fault tolerance features and hence, Hadoop is highly reliable.
Hadoop uses commodity hardware (like your PC, laptop). For example, in a small Hadoop cluster, all your DataNodes can have normal configurations like 8–16 GB RAM with 5–10 TB hard disk and Xeon processors.
But if I would have used hardware-based RAID with Oracle for the same purpose, I would end up spending 5x times more at least. So, the cost of ownership of a Hadoop-based project is minimized. It is easier to maintain a Hadoop environment and is economical as well. Also, Hadoop is open-source software and hence there is no licensing cost.
Hadoop has the inbuilt capability of integrating seamlessly with cloud-based services. So, if you are installing Hadoop on a cloud, you don’t need to worry about the scalability factor because you can go ahead and procure more hardware and expand your setup within minutes whenever required.
Hadoop is very flexible in terms of the ability to deal with all kinds of data. We discussed “Variety” in our previous blog on Big Data Tutorial, where data can be of any kind, and Hadoop can store and process them all, whether it is structured, semi-structured, or unstructured data.
These 4 characteristics make Hadoop a front-runner as a solution to Big Data challenges. Now that we know what is Hadoop, we can explore the core components of Hadoop. Let us understand, what are the core components of Hadoop.
- It is the master daemon that maintains and manages the DataNodes (slave nodes)
- It records the metadata of all the blocks stored in the cluster, e.g. location of blocks stored, size of the files, permissions, hierarchy, etc.
- It records every change that takes place to the file system metadata
- If a file is deleted in HDFS, the NameNode will immediately record this in the EditLog
- It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are alive
- It keeps a record of all the blocks in the HDFS and DataNode in which they are stored
- It has high availability and federation features which I will discuss in HDFS architecture in detail
- It is the slave daemon which runs on each slave machine
- The actual data is stored on DataNodes
- It is responsible for serving read and write requests from the clients
- It is also responsible for creating blocks, deleting blocks, and replicating the same based on the decisions taken by the NameNode
- It sends heartbeats to the NameNode periodically to report the overall health of HDFS, by default, this frequency is set to 3 seconds
In the starting when two name nodes have been connected to the Datanode whatever volume we have given to the root volume will be shown and now let's see how the Datanode can give some specific amount of storage to the Namenode.
At first, we need to create a volume let’s say 1Gb volume in the same instance as our cluster is. Then attach the newly created volume to one of the Datanode or the slave node.
As visible our created 1GB volume has been attached to this slave. The next step is to create a partition of the volume recently attached.
Here ’n’ represents the creation of a new partition.
‘p’ is the printing of the partition table
‘w’ is writing table to the disk and exit
As visible, the formatting is done successfully now the next step to be done is creating a directory and then we have to replace the existing directory in the HDFS folder using the newly created one.
Now after stopping the Datanode restart the same once more and after entering Hadoop dfsadmin -report we will be able to identify the 1GB storage attached to the Namenode.
Thank you for your patient reading!! Hope I was able to deliver to the best of my knowledge.