Thursday, June 05, 2014

Single-Node Hadoop Cluster on Ubuntu 14.04

In this tutorial I will demonstrate how to install and run Single-Node Hadoop Cluster in Ubuntu 14.04.

JAVA INSTALLATION
As a requirement java needs to be installed.
user@hadoop-lab:~$ sudo apt-get install openjdk-7-jdk

HADOOP USER & GROUP CREATION
Create a dedicated user account and group for hadoop.
user@hadoop-lab:~$ sudo groupadd hadoop
user@hadoop-lab:~$ sudo useradd -m -d /home/huser/ -g hadoop huser
user@hadoop-lab:~$ sudo passwd huser

Login as “huser” user to do further configurations.
user@hadoop-lab:~$ su – huser

SSH INSTALLATION & PASSPHRASELESS SSH CONFIGURATION
The master node manages slave nodes (starting and stopping services) using SSH.
huser@hadoop-lab:~$ sudo apt-get install openssh-server
huser@hadoop-lab:~$ ssh-keygen -t rsa
huser@hadoop-lab:~$ cp /home/huser/.ssh/id_rsa.pub authorized_keys

Testing SSH Setup
huser@hadoop-lab:~$ ssh localhost

HADOOP INSTALLATION
Download Apache Hadoop Release 1.2.1 from Apache Download Mirrors site into huser home directory, extract it and assign hadoop user & group ownership.
huser@hadoop-lab:~$ tar -xzvf hadoop-1.2.1.tar.gz
huser@hadoop-lab:~$ sudo mv /home/huser/hadoop-1.2.1 /usr/local/hadoop/
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /usr/local/hadoop/

HADOOP USER ENVIRONMENT CONFIGURATION
Set user environment variables for java and hadoop home directories.
huser@hadoop-lab:~$ vi .bashrc


HADOOP CONFIGURATION
Hadoop Environment Setup
Setup Java Home for Hadoop and disable IPv6.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hadoop-env.sh


The main hadoop configurations are stored in 3 files listed below.
core-site.xml
mapred-site.xml
hdfs-site.xml

core-site.xml
Contains default values for core Hadoop properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/core-site.xml


huser@hadoop-lab:~$ mkdir /usr/local/hadoop/tmp

mapred-site.xml
Contains configuration information for MapReduce properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/mapred-site.xml


hdfs-site.xml
Contains server side configuration for Hadoop Distributed File System
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hdfs-site.xml


huser@hadoop-lab:~$ sudo mkdir -p /hdfs/name
huser@hadoop-lab:~$ sudo mkdir /hdfs/data
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /hdfs

FORMATTING NAMENODE
Before we start adding files to HDFS we need to format it. Type 'Y' for the prompt. Y/N prompt is case-sensitive.
huser@hadoop-lab:~$ hadoop namenode -format

STARTING SERVICES
After the namenode has been formatted, it is time to launch hadoop.
huser@hadoop-lab:~$ start-all.sh

As we all know that hadoop is written in java language, hence we can JPS (Java Process Status) tool to check the processes that are running in jvm.
huser@hadoop-lab:~$ jps
The output should look something like shown below in screenshot.


We can also have a look at the web interfaces of HDFS and MapReduce.
HDFS : http://localhost:50070



MapReduce : http://localhost:50030


That's it. I hope you enjoyed learning building up a Single-Node Hadoop Cluster on Ubuntu 14.04. If you have any suggestions, questions or any comments to make, please leave a comment.

Follow the below links to create:
Multi-Node Hadoop Cluster on Ubuntu 14.04

2 comments:

  1. Wow that's a wonderfull blog having all details & helpful. Hadoop cluster NJ

    ReplyDelete
  2. This article describes the Hadoop Software,All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. This post gives great idea on Hadoop Certification for beginners. Also find best Hadoop Online Training in your locality at StaygreenAcademy.com

    ReplyDelete