Thursday, June 05, 2014

Single-Node Hadoop Cluster on Ubuntu 14.04

In this tutorial I will demonstrate how to install and run Single-Node Hadoop Cluster in Ubuntu 14.04.

JAVA INSTALLATION
As a requirement java needs to be installed.
user@hadoop-lab:~$ sudo apt-get install openjdk-7-jdk

HADOOP USER & GROUP CREATION
Create a dedicated user account and group for hadoop.
user@hadoop-lab:~$ sudo groupadd hadoop
user@hadoop-lab:~$ sudo useradd -m -d /home/huser/ -g hadoop huser
user@hadoop-lab:~$ sudo passwd huser

Login as “huser” user to do further configurations.
user@hadoop-lab:~$ su – huser

SSH INSTALLATION & PASSPHRASELESS SSH CONFIGURATION
The master node manages slave nodes (starting and stopping services) using SSH.
huser@hadoop-lab:~$ sudo apt-get install openssh-server
huser@hadoop-lab:~$ ssh-keygen -t rsa
huser@hadoop-lab:~$ cp /home/huser/.ssh/id_rsa.pub authorized_keys

Testing SSH Setup
huser@hadoop-lab:~$ ssh localhost

HADOOP INSTALLATION
Download Apache Hadoop Release 1.2.1 from Apache Download Mirrors site into huser home directory, extract it and assign hadoop user & group ownership.
huser@hadoop-lab:~$ tar -xzvf hadoop-1.2.1.tar.gz
huser@hadoop-lab:~$ sudo mv /home/huser/hadoop-1.2.1 /usr/local/hadoop/
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /usr/local/hadoop/

HADOOP USER ENVIRONMENT CONFIGURATION
Set user environment variables for java and hadoop home directories.
huser@hadoop-lab:~$ vi .bashrc


HADOOP CONFIGURATION
Hadoop Environment Setup
Setup Java Home for Hadoop and disable IPv6.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hadoop-env.sh


The main hadoop configurations are stored in 3 files listed below.
core-site.xml
mapred-site.xml
hdfs-site.xml

core-site.xml
Contains default values for core Hadoop properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/core-site.xml


huser@hadoop-lab:~$ mkdir /usr/local/hadoop/tmp

mapred-site.xml
Contains configuration information for MapReduce properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/mapred-site.xml


hdfs-site.xml
Contains server side configuration for Hadoop Distributed File System
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hdfs-site.xml


huser@hadoop-lab:~$ sudo mkdir -p /hdfs/name
huser@hadoop-lab:~$ sudo mkdir /hdfs/data
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /hdfs

FORMATTING NAMENODE
Before we start adding files to HDFS we need to format it. Type 'Y' for the prompt. Y/N prompt is case-sensitive.
huser@hadoop-lab:~$ hadoop namenode -format

STARTING SERVICES
After the namenode has been formatted, it is time to launch hadoop.
huser@hadoop-lab:~$ start-all.sh

As we all know that hadoop is written in java language, hence we can JPS (Java Process Status) tool to check the processes that are running in jvm.
huser@hadoop-lab:~$ jps
The output should look something like shown below in screenshot.


We can also have a look at the web interfaces of HDFS and MapReduce.
HDFS : http://localhost:50070



MapReduce : http://localhost:50030


That's it. I hope you enjoyed learning building up a Single-Node Hadoop Cluster on Ubuntu 14.04. If you have any suggestions, questions or any comments to make, please leave a comment.

Follow the below links to create:
Multi-Node Hadoop Cluster on Ubuntu 14.04

3 comments:

  1. Wow that's a wonderfull blog having all details & helpful. Hadoop cluster NJ

    ReplyDelete
  2. Really useful information about hadoop, i have to know information about hadoop online training institutes.

    ReplyDelete
  3. Thanks for sharing such a useful post. It is really good. Keep posting. It’s a great article which enriches my knowledge.

    Hadoop Training in Chennai

    ReplyDelete