The Apache Hadoop is a framework for large data sets to process across one or more clusters with programming model. Apache Hadoop is designed such a way that it will be scalable unto thousands of machines and each machine will offer dedicated computation and storage. In this Article, we will discuss How to install and Configure Single node Hadoop Cluster
Installing Hadoop on enterprise-level setup required multi-node cluster configuration. But for a development environment and learning purpose, configuring single node Hadoop cluster is sufficed. Let’s see the installation in different operating systems.
90% offer on – Hands-On with Hadoop 2: 3-in-1 Online Udemy Course
Get the limited time offer on Udemy for Hands-On with Hadoop 2:3-in-1 Online Course and Run your own Hadoop clusters on your own machine or in the cloud
Actual Price: $199
Offer Price: $4.99
Hurry
Table of Contents
Installing Hadoop
Step 1: Download Install JDK.
Install JDK on Ubuntu with apt-get
repository. For the same, update and upgrade the apt-get repository by passing the following command.
$ sudo apt-get update $ sudo apt-get upgrade
then, install JDK with the following command.
$ sudo apt-get install default-jdk.
Step 2: Download and extract the Installation package
Download the required version Hadoop binary from the Hadoops Release page. In this tutorial, we will take 2.10 and we are going to use wget
tool to download the package.
$ wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
Then, Extract the downloaded file to the required path bypassing the following command.
$ tar -xvf hadoop-2.10.0.tar.gz
Step 3: Add the Hadoop path and Java path to the bash file.
Open .bashrc
file and add the following lines.
export HADOOP_HOME=<HADOOP_BINARY_PATH> export JAVA_HOME=<JAVA_BINARY_PATH> export PATH=$PATH:$HADOOP_HOME/bin
To apply to all active session and forever, source the .bashrc file
export HADOOP_HOME=<HADOOP_BINARY_PATH> export JAVA_HOME=<JAVA_BINARY_PATH> export PATH=$PATH:$HADOOP_HOME/bin
Step 4: Configure Hadoop
As part of configuring Hadoop cluster, we need to edit following files
Then, Let’s see all the configuration files one by one
core-site.xml
core-site.xml
contains the setting for Hadoop core like I/O and filesystems which will be used by HDFS and MapReduce. So, inside <configuration> tag, edit name and value as required. In this tutorial, we are using as default.
hdfs-site.xml
hdfs-site.xml
contains the setting for HDFS core like NameNode, DataNode, Replication factors, Block size and more.
mapred-site.xml
mapred-site.xml
contains the settings for MapReduce application core like Number of CPU cores, Size of Mapper, Reducer Process and more.
Also, this file will not be available directly at the installed directory, but it is available as a template and we can rename it as follows.
$ mv mapred-site.xml.template mapred-site.xml
Then, open the file and edit as required
yarn-site.xml
yarn-site.xml
file contains the settings for memory management and logic selections.
hadoop-env.sh
This is the simple script file that contains the environment variables that are required to run the Hadoop cluster.
Step 5: Format the NameNode.
Formating the namenode will reset all the settings and apply the new setting done earlier. So, this can be done with the namenode executable file as follows.
$ bin/hadoop namenode -format
Note: Formating the NameNode again will erase all the data. So, make sure before doing it.
Step 6: Start the Daemons.
Following are the daemons that need to be started
Then, If you want to start all together, run the following script
$ ./start-all.sh
To check all the running daemons, run
$ jps
This will give you output,
22343 NameNode 28534 NodeManager 28378 ResourceManager 22654 JobHistoryServer 24342 Jps 24653 DataNode
Then, If you see this output, Hadoop Single node Configuration is done. You can check the NameNode interface from the port 50070 (http://localhost:50070/dfshealth.html).
That is all. Single node Hadoop cluster configuration is done.
Conclusion
We have discussed How to install and Configure Single node Hadoop Cluster. Then, In our next article, we will discuss how to configure multi-node Hadoop Cluster. Stay tuned and subscribe DigitalVarys for more articles and study materials on DevOps, Agile, DevSecOps and App Development.
Certified Cloud Automation Architect and DevSecOps expert, skilled in optimizing IT workflows with Six Sigma and Value Stream Management. Proficient in both technical and leadership roles, I deliver robust solutions and lead teams to success.
this is src not binary
Thanks, Anon for pointing the error. Fixed and I appreciate your contribution.