Sunday, January 11, 2015

Fully Distributed Hadoop Cluster - Manual Failover HA with NFS

In my last post we had configured Hadoop Federation Cluster in a fully distributed mode. Next we will go for a fully distributed manual failover hadoop HA cluster in this post. I will skip the hadoop and java installation part as we have already gone through those a couple of times in my previous posts. For further learning we will use the hardware configuration mentioned in the below table.

namenode1 ha-nn01   192.168.56.101
namenode2 ha-nn02   192.168.56.102
datanode1 ha-dn01   192.168.56.103
datanode2 ha-dn02   192.168.56.104
client    ha-client 192.168.56.105

We already have 2 namenodes, 2 datanodes and a client node all running on CentOS release 5.11 and ready with the required user configuration, passwordless ssh environment, appropriate java configurations, hadoop installation with all variables and paths declared.
Note: If not set, please follow my last post on “FullyDistributed Hadoop Federation Cluster” till “hadoop installation and testing” step.
Also make sure that you follow the note in the “Downloads” section.

Hadoop Configuration


Moving directly to the configuration part required to setup a manual failover HA.

hadoop-env.sh
Location: ha-nn01, ha-nn02, ha-dn01, ha-dn02, ha-client
huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_25/
export HADOOP_LOG_DIR=/var/log/hadoop/

Create a log directory in the specified path mentioned in hadoop-env.sh file under the parameter HADOOP_LOG_DIR and change the ownership of the directory to “huser” user.
$ sudo mkdir /var/log/hadoop
$ sudo chown -R huser:hadoop /var/log/hadoop

Note: In the client machine there is no need to specify and create a log directory. Similarly, it is needless to declare java's home directory if it is pre-installed.

core-site.xml
Location: ha-nn01, ha-nn02, ha-dn01, ha-dn02, ha-client
huser:~$ sudo vi /opt/hadoop-2.6.0/etc/hadoop/core-site.xml
<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://man-ha</value>
 </property>
</configuration>

hdfs-site.xml
Location: ha-nn01, ha-nn02, ha-dn01, ha-dn02
huser:~$ sudo vi /opt/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
<configuration>
 <property>
  <name>dfs.replication</name>
  <value>2</value>
 </property>
 <property>
  <name>dfs.name.dir</name>
  <value>file:///hdfs/name</value>
 </property>
 <property>
  <name>dfs.data.dir</name>
  <value>file:///hdfs/data</value>
 </property>
 <property>
  <name>dfs.permissions</name>
  <value>false</value>
 </property>
 <property>
  <name>dfs.nameservices</name>
  <value>man-ha</value>
 </property>
 <property>
  <name>dfs.ha.namenodes.man-ha</name>
  <value>nn01,nn02</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.man-ha.nn01</name>
  <value>ha-nn01:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.man-ha.nn01</name>
  <value>ha-nn01:50070</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.man-ha.nn02</name>
  <value>ha-nn02:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.man-ha.nn02</name>
  <value>ha-nn02:50070</value>
 </property>
 <property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>file:///mnt/</value>
 </property>
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
 </property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/huser/.ssh/id_rsa</value>
 </property>
</configuration>

Copy the hdfs-site.xml in client node and add the below mentioned property to it.

Location: ha-client
huser@ha-client:~$ sudo vi /opt/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
<property>
 <name>dfs.client.failover.proxy.provider.man- ha</name>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>


Note: The value for "dfs.namenode.shared.edits.dir" property should point to a shared NFS mounted directory. To create a permanently shared NFS mount, click here.

slaves
Location: ha-nn01, ha-nn02
huser:~$ vi /opt/hadoop-2.6.0/etc/hadoop/slaves
ha-dn01
ha-dn02

Here we complete the required configurations to deploy manual failover Hadoop HA. Get ready to fire it up. The below mentioned administration steps will help us to move further and manage the cluster.

Formatting, starting & activating namenodes & datanodes

We will format the namenodes one by one and start the namenode daemons manually on them. Make sure that the shared edits directory is mounted on all namenodes.
Location: ha-nn01
huser@ha-nn01:~$ hadoop namenode -format
huser@ha-nn01:~$ hadoop-daemon.sh start namenode

Location: ha-nn02
huser@ha-nn02:~$ hadoop namenode -bootstrapStandby
huser@ha-nn02:~$ hadoop-daemon.sh start namenode

Now at this point of time both the namenodes will be in standby mode. To make the desired namenode to run in “active” state, we can run the below command from any namenode machine.
huser@ha-nn02:~$ hdfs haadmin -transitionToActive nn01

Note: In the above example we have activated the namenode-id “nn01” to transition to active state. Do not provide the hostname of the namenode in this field.

Finally, we will start the datanode daemon on all slave nodes using the below command.
huser@ha-nn01:~$ hadoop-daemons.sh start datanode

Alternatively we can start the datanode daemons independently on each slave node.
huser@ha-dn01:~$ hadoop-daemon.sh start datanode


Monitoring

One can check the status of namenodes using below command from any namenode.
huser@ha-nn02:~$ huser@ha-nn02:~$ hdfs haadmin -getServiceState nn01
where, nn01 is the namenode-id for ha-nn01.

Also it is possible to view the status of namenode in a browser from any namenode or client machines by pointing the url to ha-nn01:50070 or ha-nn02:50070

The filesystem operations like create, copy, list, delete, etc should be done using the absolute path.

Example:
To copy a file from local filesystem in client to the active namenode ha-nn01
huser@ha-client ~$ hadoop dfs -copyFromLocal largefile hdfs://ha-nn01/test/




Related links

Single-Node Hadoop Cluster on Ubuntu 14.04

Multi-Node Hadoop Cluster on Ubuntu 14.04

Multi-Node Hadoop Cluster on Oracle Solaris 11 using Zones

Fully Distributed Hadoop Cluster - Automatic Failover HA with ZooKeeper & NFS

10 comments:

  1. Thank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work

    Big data training in velachery
    Hadoop training chennai velachery

    ReplyDelete
  2. Wow that's a wonderfull blog having all details & helpful. Hadoop cluster NJ

    ReplyDelete
  3. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Best Hadoop Training in Chennai | Best hadoop training institute in chennai

    ReplyDelete
  4. This is the exact piece of information that I was searching for a long time(Hadoop Training in Chennai). Processing data is the biggest issue that every cloud based companies are facing worldwide(Big Data Training). Handling this problem made easy with the introduction of big data. Thank you so much for your worth able content here. Keep Posting article like this.

    ReplyDelete
  5. I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

    Software testing training in chennai | Software testing training institutes in chennai | Manual testing training in Chennai

    ReplyDelete
  6. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

    ReplyDelete
  7. I am reading ur post from the beginning, it was so interesting to read & i feel thanks to you for posting such a good blog, keep updates regularly.Best Hadoop Training Institute In Chennai

    ReplyDelete
  8. Thank you for your feedback regarding this spell, I will forward it to the appropriate teams.

    Hadoop Certification in Chennai

    ReplyDelete
  9. Thanks for sharing this informative information. For more you may refer http://www.s4techno.com/hadoop-training-in-pune/

    ReplyDelete



  10. A1 Trainings as one of the best training institute in Hyderabad for online trainings for Hadoop. We have expertise and real time professionals working in Hadoop since 7 years. Our training strategy and materials will help the students for the certification exams also.

    Hadoop Training in Hyderabad

    ReplyDelete