# HashPrompt

Since manual failover mechanism was unable to automatically trigger a failover in cases of namenode failure, automatic failover mechanism made sure to provide a hot backup during a failover. This was overcome by zookeeper. We have already covered zookeeper installation & configuration in my previous post.
To configure an automatic failover ha cluster we need more than one odd number of nodes to ensure that always a majority of zookeeper servers are running in case of any namenode failure. We also need a running zookeeper ensemble. Refer the blog here to install and configure zookeeper in clustered mode.

Setup Summary

I have implemented this fully distributed mode cluster setup of automatic failover HA in an ESXi server. We have hadoop 2.6 installed in all nodes that are running on CentOS 5.11 release. This setup consists of three namenodes, two datanodes and one client machine. Each machine is configured to use 1GB RAM and 20GB hard disk. This setup demonstrates only to set up a fully distributed hadoop cluster with automatic failover using zookeeper. Below table describes my setup with hostname and ip-address of all configured machines.

namenode1 ha-nn01 192.168.56.101
namenode2 ha-nn02 192.168.56.102
namenode3 ha-nn03 192.168.56.103
datanode1 ha-dn01 192.168.56.104
datanode1 ha-dn02 192.168.56.105
client ha-client 192.168.56.106

Downloads

Download the below packages and place them in all nodes.

Download Apache Hadoop 2.6 here.

Download Oracle JDK 8 here.

Note: Before moving directly towards the installation and configuration part, read the pre-checks listed below.

1. Disable firewall on all nodes.

2. Disable selinux on all nodes.

3. Update the hostname and their repective ip-addresses of all nodes in /etc/hosts file on all nodes.

4. It is recommended to use Oracle JDK.

INSTALLATION & CONFIGURATION

Hadoop installation & configuration includes user settings, java installation, passwordless ssh configuration and lastly, hadoop installation and configuration.

User & Group Settings

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

First we will create a group named 'hadoop'. Next we create a user 'huser' to perform all hadoop administrative tasks and setup a password for it.

# groupadd hadoop

# useradd -m -d /home/huser -g hadoop huser

# passwd huser

Note: Henceforth we will be using the newly created 'huser' user to perform all hadoop tasks.

Java Installation

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

To install java, you can refer the blog here.

Passwordless SSH Configuration

Passwordless ssh environment is needed by the namenode to start HDFS & MapReduce related daemons in all nodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser@ha-nn01:~$ ssh-keygen -t rsa
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn01
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn02
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn03
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn01
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn02

Testing Passwordless SSH
Run the below commands from ha-nn01, ha-nn02 & ha-nn03 to test passwordless logins.
huser:~$ ssh ha-nn01
huser:~$ ssh ha-nn02
huser:~$ ssh ha-nn03
huser:~$ ssh ha-dn01
huser:~$ ssh ha-dn01

ZooKeeper Installation

Location: ha-nn01, ha-nn02, ha-nn03

For zookeeper quorum installation and configuration, you can refer the blog here.

Note: Zookeeper installation and configuration needs to be done only on all namenodes.

Hadoop Installation

We will be installing lastest stable release of Apache Hadoop 2.6.0.

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

We will first place the downloaded tarball in /opt directory, untar it and change the ownership of that directory to 'huser' user.

root:~# cd /opt

root:~# tar -xzvf hadoop-2.6.0.tar.gz

root:~# chown -R huser:hadoop hadoop-2.6.0/

Next we will login as 'huser' user and set the environment variables in .bashrc file.

huser:~$ vi ~/.bashrc

###JAVA CONFIGURATION###

JAVA_HOME=/usr/java/jdk1.8.0_25/

export PATH=$PATH:$JAVA_HOME/bin

###HADOOP CONFIGURATION###

HADOOP_PREFIX=/opt/hadoop-2.6.0/

export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

After making necessary changes to the .bashrc file activate the configured environment settings for 'huser' user by running the below command.

huser:~$ exec bash

Testing Hadoop Installation

Execute the below command to test the successful hadoop installation. It should produce .

huser:~$ hadoop version

Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a

This command was run using /opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

Note: Installation of hadoop has to be done on all nodes.

Hadoop Configuration for Automatic Failover

There are a couple of files that need to be configured to make hadoop with automatic failover cluster up and running. All our configuration files reside in /opt/hadoop-2.6.0/etc/hadoop/ directory.

hadoop-env.sh
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/opt/jdk1.8.0_25

export HADOOP_LOG_DIR=/var/log/hadoop/

Create a directory for logs as specified in hadoop-env.sh file with required 'huser' user permissions.

root:~# mkdir /var/log/hadoop

root:~# chown -R huser:hadoop /var/log/hadoop

core-site.xml

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://auto-ha</value>
</property>

</configuration>

hdfs-site.xml

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>auto-ha</value>
</property>
<property>
<name>dfs.ha.namenodes.auto-ha</name>
<value>nn01,nn02</value>
</property>
<property>
<name>dfs.namenode.rpc-address.auto-ha.nn01</name>
<value>ha-nn01:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.auto-ha.nn01</name>
<value>ha-nn01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.auto-ha.nn02</name>
<value>ha-nn02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.auto-ha.nn02</name>
<value>ha-nn02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>file:///mnt/</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/huser/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.auto-ha</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>ha-nn01.hadoop.lab:2181,ha-nn02.hadoop.lab:2181,ha-nn03.hadoop.lab:2181</value>
</property>
</configuration>

Note:

1. Replication factor is set to '2' as I have only two datanodes.

2. Create a directory /hdfs/name in all namenodes with required 'huser' user permissions.

root:~# mkdir -p /hdfs/name

root:~# chown -R huser:hadoop /hdfs/name

3. Create a directory /hdfs/data in all datanodes with required 'huser' user permissions.

root:~# mkdir -p /hdfs/data
root:~# chown -R huser:hadoop /hdfs/data

4. Shared edits directory is a permanently mounted nfs share on all namenodes only.

5. In ha-client host add the below property to hdfs-site.xml file.

<property>
<name>dfs.client.failover.proxy.provider.auto-ha</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

6. We can explicitly enable automatic-failover for the nameservice-id 'auto-ha' by setting the property 'dfs.ha.automatic-failover.enabled.auto-ha' to 'true'.

slaves

This file contains only the hostnames of datanodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser:~$ vi /opt/hadoop2.6.0/etc/hadoop/slaves

ha-dn01

ha-dn02

Finally after completing the configuration part, we will be initializing and starting our automatic-failover hadoop cluster.

Initializing HA state in ZooKeeper

Zookeeper needs to initialize the required state by running the below command from any one of the namenodes.

Location: ha-nn01

huser@ha-nn01:~$ hdfs zkfc -formatZK

Formatting & Starting Namenodes

Both the namenodes need to be formatted to start HDFS filesystem.

Location: ha-nn01

huser@ha-nn01:~$ hadoop namenode -format

huser@ha-nn01:~$ hadoop-daemon.sh start namenode

Location: ha-nn02

huser@ha-nn02:~$ hadoop namenode -bootstrapStandby
huser@ha-nn02:~$ hadoop-daemon.sh start namenode

Note: By default both the namenodes will be in 'standby' state.

Starting ZKFC Services

Zookeeper Failover Controller service needs to be started in order to make any one namenode as 'active'. Run the below command on both namenodes.

huser@ha-nn01:~$ hadoop-daemon.sh start zkfc

huser@ha-nn02:~$ hadoop-daemon.sh start zkfc

Note: As soon as the zkfc service is started you can see that one of the namenode is in active state using below command from any one of the namenodes.

huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01

huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02

Starting Datanodes

To start the datanodes run the below mentioned command from any one of the namenodes.

huser@ha-nn01:~$ hadoop-daemons.sh start datanode

Verifying Automatic Failover

To verify the automatic failover, we need to locate the active namenode using command line or by visiting the namenode web interfaces.

Using command line

huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02

Using web interface

ha-nn01:50070

ha-nn02:50070

After locating the active namenode, we can cause a failure on that node to initiate a failover automatically. One can fail the active namenode by running 'jps' command and kill the namenode daemon by determining it's pid. Within a few seconds the other namenode will automatically become active.

Fully Distributed Hadoop Cluster - Automatic Failover HA Cluster with ZooKeeper & QJM

16 comments:

UnknownMay 6, 2015 at 4:00 PM
Great article! ZooKeeper's architecture supports high availability through redundant services. The clients can thus ask another ZooKeeper leader if the first fails to answer. ZooKeeper nodes store their data in a hierarchical name space, much like a file system or a tree data structure. Clients can read and write from/to the nodes and in this way have a shared configuration service. More at Hadoop Online Training
UnknownSeptember 23, 2015 at 11:25 PM
The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training
Thank you for sharing Such a good tutorials on Hadoop Cluster
UnknownOctober 1, 2015 at 3:46 PM
Hi and thanks for this helpful article.
It helps me a lot, but i'm stuck when i'm try to start the datanodes on the slaves nodes.
I've opened the following ports :
2181, 2888, 3888 and 8485 (for zookeeper)
8020, 50020 and 50070 (for hadoop)

Is there any other firewall ports to open ?
UnknownJuly 15, 2016 at 3:07 PM
Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..
BillJuly 26, 2016 at 8:15 PM
Hi nice article, just a quick question on the session Note: #4 ( Shared edits directory is a permanently mounted nfs share on all namenodes only) where is this edits directory coming from. Do I need nfs server with a share directory name edit?

Thanks
vinodJanuary 11, 2017 at 1:47 AM
Yeah you need to configure nfs share point or you can configure same on ha-nn03
UnknownApril 17, 2017 at 11:09 AM
This blog having clear explanation and examples so easy to understand .. thanks a lot for sharing this blog to us..

hadoop training in velachery | big data training in velachery
UnknownApril 17, 2017 at 3:53 PM
After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog

big data training institute in tambaram | hadoop training in chennai tambaram
Karthika ShreeJune 23, 2017 at 3:41 PM
That was a great message in my carrier, and It's wonderful commands like mind relaxes with understand words of knowledge by information's.
Hadoop Training in Chennai
Health Care TipsJuly 21, 2017 at 11:41 AM
Nice information. But if you guys want some more detailed info than pls visit- http://www.prohut.net.

MS SharePoint (Microsoft SharePoint Training) 2007 or 2010 is a popular web platform developed by Microsoft for small to large organisations.

It is designed as a centralized replacement for multiple web applications, and supports various combinations of enterprise website requirements.

It is typically associated with web content management and document management systems.

To know more about other professional courses please visit http://www.prohut.net or http://www.prohut.net/sharepoint-training.html to know about sharepoint training
UnknownMarch 23, 2018 at 1:59 PM
I just want to know about Fully Distributed Hadoop and found this post is perfect one ,Thanks for sharing the informative post of Fully Distributed Hadoop and able to understand the concepts easily,Thoroughly enjoyed reading
Also Check out the : https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/
AnonymousMarch 22, 2022 at 1:07 PM
Thanks for sharing the excellent guide.
Mark JohnsonJuly 18, 2023 at 4:07 PM
Thank you for sharing. Reset ATT email login password is an essential task for mark Jones, who works in email support. With years of experience, Mark is adept at providing quick and efficient solutions to customers who are unable to a Reset ATT email password . He has a thorough understanding of the password reset process and ensures that customers can regain access to their accounts with minimal hassle.
jacobsglennJuly 18, 2023 at 4:58 PM
So, whether you need help with research papers, essays, or any other academic task, exploring the realm of assignments help can undoubtedly make a positive difference in your academic performance and overall learning experience.
Camila MartinJuly 20, 2023 at 4:55 PM
Greetings of the day. My name is Emily Griffin, I have 7 years of work experience as a professional technical engineer. Are you facing the quicken online service unavailable ? Do not hesitate to contact us to continue your error-free financial journey.
liamg9208July 27, 2023 at 5:14 PM
Greetings My name is Liam Johnson. I am an experienced IT professional with a specialisation in email management. With extensive knowledge of troubleshooting, I can quickly identify and resolve issues related to email connectivity, including cases of outlook is disconnected how to reconnect . It's not the biggest issue but if you are facing this type of problem don't be concerned. You can contact me.
https://www.limksys.com/outlook-status-is-disconnected

# HashPrompt

Pages

About Me

Sunday, January 11, 2015

Fully Distributed Hadoop Cluster - Automatic Failover HA Cluster with Zookeeper & NFS

Setup Summary

INSTALLATION & CONFIGURATION

User & Group Settings

Java Installation

Passwordless SSH Configuration

Hadoop Installation

Hadoop Configuration for Automatic Failover

Initializing HA state in ZooKeeper

Formatting & Starting Namenodes

Starting ZKFC Services

Starting Datanodes

Verifying Automatic Failover

16 comments: