Monday, January 12, 2015

Fully Distributed Hadoop Cluster - Automatic Failover HA Cluster with Zookeeper & QJM

After configuring an automatic failover HA with ZooKeeper and NFS, we will now configure an automatic failover HA with ZooKeeper and QJM.
We will use the Quorum Journal Manager to share edit logs between active and standby namenodes. Any namespace modification done by active namenode is recorded by the journal nodes. These journal node daemons can be run alongside any other daemons like namenode or jobtracker. It is important to note that it is highly recommended to use three journal nodes, so that the edit log information is written to majority of machines.

Setup Summary

The below table describes my setup.

  Role         Hostname   IP-Address
namenode1/     ha-nn01   192.168.56.101
journal-node1 
namenode2/     ha-nn02   192.168.56.102
journal-node2 
namenode3/     ha-nn03   192.168.56.103
journal-node3 
datanode1      ha-dn01   192.168.56.104
datanode2      ha-dn02   192.168.56.105
client         ha-client 192.168.56.101

According to the above table, I have three namenodes which will be concurrently running journal-nodes too. And two datanodes with one client machine.

Downloads
Download the below packages and place them in all nodes.
Download Apache Hadoop 2.6 here.
Download Oracle JDK 8 here.

Note: Before moving directly towards the installation and configuration part, read the pre-checks listed below.
1. Disable firewall on all nodes.
2. Disable selinux on all nodes.
3. Update the hostname and their repective ip-addresses of all nodes in /etc/hosts file on all nodes.
4. It is recommended to use Oracle JDK.

INSTALLATION & CONFIGURATION

Hadoop installation & configuration includes user settings, java installation, passwordless ssh configuration and lastly, hadoop installation and configuration.

User & Group Settings

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

First we will create a group named 'hadoop'. Next we create a user 'huser' to perform all hadoop administrative tasks and setup a password for it.

# groupadd hadoop
# useradd -m -d /home/huser -g hadoop huser
# passwd huser

Note: Henceforth we will be using the newly created 'huser' user to perform all hadoop tasks.

Java Installation

Locationha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

To install java, you can refer the blog here.


Passwordless SSH Configuration

Passwordless ssh environment is needed by the namenode to start HDFS & MapReduce related daemons in all nodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser@ha-nn01:~$ ssh-keygen -t rsa
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn01
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn02
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn03
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn01 
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn02

Testing Passwordless SSH
Run the below commands from ha-nn01, ha-nn02 & ha-nn03 to test passwordless logins.
huser:~$ ssh ha-nn01
huser:~$ ssh ha-nn02
huser:~$ ssh ha-nn03
huser:~$ ssh ha-dn01
huser:~$ ssh ha-dn01


ZooKeeper Installation
Location: ha-nn01, ha-nn02, ha-nn03

For zookeeper quorum installation and configuration, you can refer the blog here.

Note: Zookeeper installation and configuration needs to be done only on all namenodes.

Hadoop Installation

We will be installing lastest stable release of Apache Hadoop 2.6.0.

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

We will first place the downloaded tarball in /opt directory, untar it and change the ownership of that directory to 'huser' user.

root:~# cd /opt
root:~# tar -xzvf hadoop-2.6.0.tar.gz
root:~# chown -R huser:hadoop hadoop-2.6.0/

Next we will login as 'huser' user and set the environment variables in .bashrc file.

huser:~$ vi ~/.bashrc
###JAVA CONFIGURATION###
JAVA_HOME=/usr/java/jdk1.8.0_25/
export PATH=$PATH:$JAVA_HOME/bin

###HADOOP CONFIGURATION###
HADOOP_PREFIX=/opt/hadoop-2.6.0/
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

After making necessary changes to the .bashrc file activate the configured environment settings for 'huser' user by running the below command.
huser:~$ exec bash

Testing Hadoop Installation
Execute the below command to test the successful hadoop installation. It should produce .
huser:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a

This command was run using /opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

Note: Installation of hadoop has to be done on all nodes.


Hadoop Configuration

There are a couple of files that need to be configured to make hadoop with automatic failover cluster with QJM up and running. All our configuration files reside in /opt/hadoop-2.6.0/etc/hadoop/ directory.

hadoop-env.sh
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/hadoop-env.sh


export JAVA_HOME=/opt/jdk1.8.0_25
export HADOOP_LOG_DIR=/var/log/hadoop/

Create a directory for logs as specified in hadoop-env.sh file with required 'huser' user permissions.

root:~# mkdir /var/log/hadoop
root:~# chown -R huser:hadoop /var/log/hadoop


core-site.xml
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://auto-ha</value>
 </property>
</configuration>


hdfs-site.xml
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

<configuration>
 <property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>file:///hdfs/name</value>
 </property>
 <property>
  <name>dfs.data.dir</name>
  <value>file:///hdfs/data</value>
 </property>
 <property>
  <name>dfs.permissions</name>
  <value>false</value>
 </property>
 <property>
  <name>dfs.nameservices</name>
  <value>auto-ha</value>
 </property>
 <property>
  <name>dfs.ha.namenodes.auto-ha</name>
  <value>nn01,nn02</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.auto-ha.nn01</name>
  <value>ha-nn01:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn01</name>
  <value>ha-nn01:50070</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.auto-ha.nn02</name>
  <value>ha-nn02:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn02</name>
  <value>ha-nn02:50070</value>
 </property>
 <property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://ha-nn01:8485;ha-nn02:8485;ha-nn03:8485/auto-ha</value>
 </property>
 <property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/hdfs/journalnode</value>
 </property>
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
 </property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/huser/.ssh/id_rsa</value>
 </property>
 <property>
  <name>dfs.ha.automatic-failover.enabled.auto-ha</name>
  <value>true</value>
 </property>
 <property>
   <name>ha.zookeeper.quorum</name>
   <value>ha-nn01.hadoop.lab:2181,ha-nn02.hadoop.lab:2181,ha-nn03.hadoop.lab:2181</value>
 </property>
</configuration>


Note:
1. Replication factor is set to '2' as I have only two datanodes.

2. Create a directory /hdfs/name in all namenodes with required 'huser' user permissions.
root:~# mkdir -p /hdfs/name
root:~# chown -R huser:hadoop /hdfs/name

3. Create a directory /hdfs/data in all datanodes with required 'huser' user permissions.
root:~# mkdir -p /hdfs/data
root:~# chown -R huser:hadoop /hdfs/data

4. Create a directory /hdfs/journalnode in all namenodes with required 'huser' user permissions.
root:~# mkdir /hdfs/journalnode
root:~# chown -R huser:hadoop /hdfs/journalnode

4. In ha-client host add the below property to hdfs-site.xml file.
<property>
 <name>dfs.client.failover.proxy.provider.auto-ha</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

6. We can explicitly enable automatic-failover for the nameservice-id 'auto-ha' by setting the property 'dfs.ha.automatic-failover.enabled.auto-ha' to 'true'.


slaves
This file contains only the hostnames of datanodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser:~$ vi /opt/hadoop2.6.0/etc/hadoop/slaves

ha-dn01
ha-dn02

Finally after completing the configuration part, we will be initializing and starting our automatic-failover hadoop cluster.

Initializing HA state in ZooKeeper

Zookeeper needs to initialize the required state by running the below command from any one of the namenodes.

Location: ha-nn01

huser@ha-nn01:~$ hdfs zkfc -formatZK


Starting Journal Nodes

In case of Quorum Journal Manager mechanism to share edit logs, we need to start journalnode daemons on all namenodes.
huser@ha-nn01:~$ hadoop-daemon.sh start journalnode

Formatting & Starting Namenodes

Both the namenodes need to be formatted to start HDFS filesystem.

Location: ha-nn01

huser@ha-nn01:~$ hadoop namenode -format
huser@ha-nn01:~$ hadoop-daemon.sh start namenode

Location: ha-nn02

huser@ha-nn02:~$ hadoop namenode -bootstrapStandby
huser@ha-nn02:~$ hadoop-daemon.sh start namenode

Note: By default both the namenodes will be in 'standby' state.

Starting ZKFC Services

Zookeeper Failover Controller service needs to be started in order to make any one namenode as 'active'. Run the below command on both namenodes.

huser@ha-nn01:~$ hadoop-daemon.sh start zkfc
huser@ha-nn02:~$ hadoop-daemon.sh start zkfc

Note: As soon as the zkfc service is started you can see that one of the namenode is in active state using below command from any one of the namenodes.
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02


Starting Datanodes

To start the datanodes run the below mentioned command from any one of the namenodes.
huser@ha-nn01:~$ hadoop-daemons.sh start datanode

Verifying Automatic Failover

To verify the automatic failover, we need to locate the active namenode using command line or by visiting the namenode web interfaces.

Using command line
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02

Using web interface
ha-nn01:50070
ha-nn02:50070

After locating the active namenode, we can cause a failure on that node to initiate a failover automatically. One can fail the active namenode by running 'jps' command and kill the namenode daemon by determining it's pid. Within a few seconds the other namenode will automatically become active.


Related Links

Fully Distributed Hadoop Cluster - Automatic Failover HA Cluster with ZooKeeper & NFS

Sunday, January 11, 2015

ZooKeeper Installation & Configuration

In my previous post we had configured a manual failover hadoop cluster. Now we being clear with the hadoop high availability features, we will slowly proceed towards configuring automatic failover cluster. Prior to that we need 'zookeeper'.


What is ZooKeeper?

An excerpt from Apache ZooKeeper website -

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
Lear more about ZooKeeper on the ZooKeeper Wiki website.

Apache ZooKeeper is an open source software project of Apache Software Foundation which provides solution for various coordination problems in large distributed systems. It serves as a highly reliable centralised coordination service, running on a cluster of servers. It is also highly recommended to run ZooKeeper on minimum three separate servers in a replicated mode and this group of ZooKeeper servers is called ZooKeeper ensemble. Production environments across the world normally opt for a five node ZooKeeper ensemble.

Within this ZooKeeper ensemble, a replicated group of servers that use the same application domain is called a quorum. These servers in quorum have same ZooKeeper configuration file and run in leader-follower model. One of the ZooKeeper servers act as a leader and others as follower. If the leader fails, one of the followers is elected as a leader.

Apache ZooKeeper is implemented in Java. It also ships with C, Java, Perl and Python client bindings.

Download
We have downloaded the latest stable release i.e. zookeeper-3.4.8.


Setup Summary

It is recommended to have odd number of ZooKeeper servers to ensure that always a majority of zookeeper servers are running in case of any namenode failure. Below mentioned is the setup summary.
zkdn1    192.168.56.103
zkdn2    192.168.56.104
zkdn3    192.168.56.105

ZooKeeper Installation

After downloading the tarball, we will place it in system directory /opt and extract it. Perform the below mentioned steps on all three ZooKeeper nodes. We are using "huser" user (with ownership to /opt directory) to install ZooKeeper on all ZooKeeper server nodes.

Set the permissions for 'huser' user.
Location: zkdn1, zkdn2, zkdn3

root~# chown -R huser:hadoop /opt/zookeeper-3.4.8

Next create a zoo.cfg file in conf directory and make the necessary modifications shown below.
Location: zkdn1, zkdn2, zkdn3

huser~$ vi /opt/zookeeper-3.4.8/conf/zoo.cfg
tickTime=2000
dataDir=/opt/zoodata
clientPort=2181
initLimit=10

syncLimit=5

server.1=zkdn1:2888:3888
server.2=zkdn2:2888:3888
server.3=zkdn3:2888:3888

Add the executable path to bashrc file of all namenodes.
huser~$ vi ~/.bashrc
export PATH=$PATH:/opt/zookeeper-3.4.8/bin

Note:

  • ticktime determines the time in milliseconds used by ZooKeeper for heartbeats by clients and defines session registration. The minimum session timeout is twice the tickTime parameter.
  • initLimit provides the maximum time duration used by the zookeeper follower server in the quorum to connect to the leader. Specified in number of ticks.
  • syncLimit provides the maximum time for a follower server to be outdated from a leader server.
  • dataDir points to the directory to store zookeeper data (in-memory data snapshots & transaction logs of data updates).
  • clientPort is the port number that listens for client connections. It is where ZooKeeper clients will initiate a connection.
  • The next three parameters are specified in "server.id=host:port:port" format. In out configuration we have mentioned identifier "1" for the quorum server "zkdn1", identifier "2" for "zkdn2" and identifier "3" for "zkdn3". These identifiers must be unique for each server and must be specified in a file "myid". Port 2888 is used for peer-to-peer communication in the quorum, such as connecting followers to leaders. Port 3888 is used for leader election. All of the communication happens over TCP.

Create a "zoodata" directory with required 'huser' permissions in /opt as declared in /opt/zookeeper-3.4.8/conf/zoo.cfg file and create a file named 'myid' in it. Add 1 to the myid file in zkdn1, 2 to the myid file in zkdn2 and 3 to the myid file in zkdn3.
Location: zkdn1, zkdn2, zkdn3

root:~# mkdir -p /opt/zoodata
root:~# chown -R huser:hadoop /opt/zoodata/

Location: zkdn1
huser@zkdn1~$ vi /opt/zoodata/myid
1

Location: zkdn2
huser@zkdn2~$ vi /opt/zoodata/myid
2

Location: zkdn3
huser@zkdn3~$ vi /opt/zoodata/myid
3

Note: The myid file contains a unique number between 1 and 255 that represent the zookeeper server-id.

Starting ZooKeeper

Run the below command to start zookeeper service on all three namenodes.
huser~$ zkServer.sh status

One can confirm the running of zookeeper service by executing jps command to find the QuorumPeerMain running on each namenode.

Run zkServer.sh script to check the ZooKeeper servers status.
huser~$ zkServer.sh status

zkdn1
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

zkdn2
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

zkdn3
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: leader

Now the ZooKeeper instance is running on all servers with "zkdn3" server being the "leader", zkdn1 and zkdn2 as "follower".

We can also connect to the running ZooKeeper instance using zkCli.sh script.
huser~$ zkCli -server zkdn1:2181

Using the ZooKeeper model various distributed processes coordinate with each other with the help of shared hierarchical namespace of data registers. The data registers can store data with a maximum data size not more than 1MB. These data registers are known as znodes. Znodes can be persistent and ephemeral. Persistent znodes exist till ZooKeeper's namespace lifetime. A persistent znode can be deleted manually. The znode and it's data persist even after the client which created them dies. Persistent znodes are used by the applications that want to store the configuration data. Whereas an ephemeral znode ceases to exist and is deleted once the creator client's session ends. Hence, unlike persistent znodes, ephemeral znodes are not allowed to have children. Distributed group membership like services can be implemented using ephemeral znodes.

Fully Distributed Hadoop Cluster - Automatic Failover HA Cluster with Zookeeper & NFS

Since manual failover mechanism was unable to automatically trigger a failover in cases of namenode failure, automatic failover mechanism made sure to provide a hot backup during a failover. This was overcome by zookeeper. We have already covered zookeeper installation & configuration in my previous post.
To configure an automatic failover ha cluster we need more than one odd number of nodes to ensure that always a majority of zookeeper servers are running in case of any namenode failure. We also need a running zookeeper ensemble. Refer the blog here to install and configure zookeeper in clustered mode.


Setup Summary

I have implemented this fully distributed mode cluster setup of automatic failover HA in an ESXi server. We have hadoop 2.6 installed in all nodes that are running on CentOS 5.11 release. This setup consists of three namenodes, two datanodes and one client machine. Each machine is configured to use 1GB RAM and 20GB hard disk. This setup demonstrates only to set up a fully distributed hadoop cluster with automatic failover using zookeeper. Below table describes my setup with hostname and ip-address of all configured machines.

namenode1  ha-nn01    192.168.56.101
namenode2  ha-nn02    192.168.56.102
namenode3  ha-nn03    192.168.56.103
datanode1  ha-dn01    192.168.56.104
datanode1  ha-dn02    192.168.56.105
client     ha-client  192.168.56.106

Downloads
Download the below packages and place them in all nodes.
Download Apache Hadoop 2.6 here.
Download Oracle JDK 8 here.

Note: Before moving directly towards the installation and configuration part, read the pre-checks listed below.
1. Disable firewall on all nodes.
2. Disable selinux on all nodes.
3. Update the hostname and their repective ip-addresses of all nodes in /etc/hosts file on all nodes.
4. It is recommended to use Oracle JDK.

INSTALLATION & CONFIGURATION

Hadoop installation & configuration includes user settings, java installation, passwordless ssh configuration and lastly, hadoop installation and configuration.

User & Group Settings

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

First we will create a group named 'hadoop'. Next we create a user 'huser' to perform all hadoop administrative tasks and setup a password for it.

# groupadd hadoop
# useradd -m -d /home/huser -g hadoop huser
# passwd huser

Note: Henceforth we will be using the newly created 'huser' user to perform all hadoop tasks.

Java Installation

Locationha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

To install java, you can refer the blog here.

Passwordless SSH Configuration

Passwordless ssh environment is needed by the namenode to start HDFS & MapReduce related daemons in all nodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser@ha-nn01:~$ ssh-keygen -t rsa
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn01
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn02
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-nn03
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn01 
huser@ha-nn01:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub huser@ha-dn02

Testing Passwordless SSH
Run the below commands from ha-nn01, ha-nn02 & ha-nn03 to test passwordless logins.
huser:~$ ssh ha-nn01
huser:~$ ssh ha-nn02
huser:~$ ssh ha-nn03
huser:~$ ssh ha-dn01
huser:~$ ssh ha-dn01

ZooKeeper Installation
Location: ha-nn01, ha-nn02, ha-nn03

For zookeeper quorum installation and configuration, you can refer the blog here.

Note: Zookeeper installation and configuration needs to be done only on all namenodes.

Hadoop Installation

We will be installing lastest stable release of Apache Hadoop 2.6.0.

Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

We will first place the downloaded tarball in /opt directory, untar it and change the ownership of that directory to 'huser' user.

root:~# cd /opt
root:~# tar -xzvf hadoop-2.6.0.tar.gz
root:~# chown -R huser:hadoop hadoop-2.6.0/

Next we will login as 'huser' user and set the environment variables in .bashrc file.

huser:~$ vi ~/.bashrc
###JAVA CONFIGURATION###
JAVA_HOME=/usr/java/jdk1.8.0_25/
export PATH=$PATH:$JAVA_HOME/bin

###HADOOP CONFIGURATION###
HADOOP_PREFIX=/opt/hadoop-2.6.0/
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

After making necessary changes to the .bashrc file activate the configured environment settings for 'huser' user by running the below command.
huser:~$ exec bash

Testing Hadoop Installation
Execute the below command to test the successful hadoop installation. It should produce .
huser:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a

This command was run using /opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

Note: Installation of hadoop has to be done on all nodes.


Hadoop Configuration for Automatic Failover

There are a couple of files that need to be configured to make hadoop with automatic failover cluster up and running. All our configuration files reside in /opt/hadoop-2.6.0/etc/hadoop/ directory.

hadoop-env.sh
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/hadoop-env.sh


export JAVA_HOME=/opt/jdk1.8.0_25
export HADOOP_LOG_DIR=/var/log/hadoop/

Create a directory for logs as specified in hadoop-env.sh file with required 'huser' user permissions.

root:~# mkdir /var/log/hadoop
root:~# chown -R huser:hadoop /var/log/hadoop

core-site.xml
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://auto-ha</value>
 </property>

</configuration>

hdfs-site.xml
Location: ha-nn01, ha-nn02, ha-nn03, ha-dn01, ha-dn02, ha-client

<configuration>
 <property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>file:///hdfs/name</value>
 </property>
 <property>
  <name>dfs.data.dir</name>
  <value>file:///hdfs/data</value>
 </property>
 <property>
  <name>dfs.permissions</name>
  <value>false</value>
 </property>
 <property>
  <name>dfs.nameservices</name>
  <value>auto-ha</value>
 </property>
 <property>
  <name>dfs.ha.namenodes.auto-ha</name>
  <value>nn01,nn02</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.auto-ha.nn01</name>
  <value>ha-nn01:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn01</name>
  <value>ha-nn01:50070</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.auto-ha.nn02</name>
  <value>ha-nn02:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.auto-ha.nn02</name>
  <value>ha-nn02:50070</value>
 </property>
 <property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>file:///mnt/</value>
 </property>
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
 </property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/huser/.ssh/id_rsa</value>
 </property>
 <property>
  <name>dfs.ha.automatic-failover.enabled.auto-ha</name>
  <value>true</value>
 </property>
 <property>
   <name>ha.zookeeper.quorum</name>
   <value>ha-nn01.hadoop.lab:2181,ha-nn02.hadoop.lab:2181,ha-nn03.hadoop.lab:2181</value>
 </property>
</configuration>


Note:
1. Replication factor is set to '2' as I have only two datanodes.

2. Create a directory /hdfs/name in all namenodes with required 'huser' user permissions.
root:~# mkdir -p /hdfs/name
root:~# chown -R huser:hadoop /hdfs/name

3. Create a directory /hdfs/data in all datanodes with required 'huser' user permissions.
root:~# mkdir -p /hdfs/data
root:~# chown -R huser:hadoop /hdfs/data

4. Shared edits directory is a permanently mounted nfs share on all namenodes only.

5. In ha-client host add the below property to hdfs-site.xml file.
<property>
 <name>dfs.client.failover.proxy.provider.auto-ha</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

6. We can explicitly enable automatic-failover for the nameservice-id 'auto-ha' by setting the property 'dfs.ha.automatic-failover.enabled.auto-ha' to 'true'.


slaves
This file contains only the hostnames of datanodes.

Location: ha-nn01, ha-nn02, ha-nn03

huser:~$ vi /opt/hadoop2.6.0/etc/hadoop/slaves

ha-dn01
ha-dn02

Finally after completing the configuration part, we will be initializing and starting our automatic-failover hadoop cluster.

Initializing HA state in ZooKeeper

Zookeeper needs to initialize the required state by running the below command from any one of the namenodes.

Location: ha-nn01

huser@ha-nn01:~$ hdfs zkfc -formatZK

Formatting & Starting Namenodes

Both the namenodes need to be formatted to start HDFS filesystem.

Location: ha-nn01

huser@ha-nn01:~$ hadoop namenode -format
huser@ha-nn01:~$ hadoop-daemon.sh start namenode

Location: ha-nn02

huser@ha-nn02:~$ hadoop namenode -bootstrapStandby
huser@ha-nn02:~$ hadoop-daemon.sh start namenode

Note: By default both the namenodes will be in 'standby' state.

Starting ZKFC Services

Zookeeper Failover Controller service needs to be started in order to make any one namenode as 'active'. Run the below command on both namenodes.

huser@ha-nn01:~$ hadoop-daemon.sh start zkfc
huser@ha-nn02:~$ hadoop-daemon.sh start zkfc

Note: As soon as the zkfc service is started you can see that one of the namenode is in active state using below command from any one of the namenodes.
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02

Starting Datanodes

To start the datanodes run the below mentioned command from any one of the namenodes.
huser@ha-nn01:~$ hadoop-daemons.sh start datanode

Verifying Automatic Failover

To verify the automatic failover, we need to locate the active namenode using command line or by visiting the namenode web interfaces.

Using command line
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn01
huser@ha-nn01:~$ hdfs haadmin -getServiceState nn02

Using web interface
ha-nn01:50070
ha-nn02:50070

After locating the active namenode, we can cause a failure on that node to initiate a failover automatically. One can fail the active namenode by running 'jps' command and kill the namenode daemon by determining it's pid. Within a few seconds the other namenode will automatically become active.


Related Links