In
this tutorial I will demonstrate how to install and run Single-Node
Hadoop Cluster in Ubuntu 14.04.
JAVA
INSTALLATION
As a
requirement java needs to be installed.
user@hadoop-lab:~$
sudo apt-get install openjdk-7-jdk
HADOOP
USER & GROUP CREATION
Create
a dedicated user account and group for hadoop.
user@hadoop-lab:~$
sudo groupadd hadoop
user@hadoop-lab:~$
sudo useradd -m -d /home/huser/ -g hadoop huser
user@hadoop-lab:~$
sudo passwd huser
Login
as “huser” user to do further configurations.
user@hadoop-lab:~$
su – huser
SSH
INSTALLATION & PASSPHRASELESS SSH CONFIGURATION
The
master node manages slave nodes (starting and stopping services)
using SSH.
huser@hadoop-lab:~$ sudo apt-get install openssh-server
huser@hadoop-lab:~$ ssh-keygen -t rsa
huser@hadoop-lab:~$ cp /home/huser/.ssh/id_rsa.pub authorized_keys
Testing
SSH Setup
huser@hadoop-lab:~$ ssh localhost
HADOOP
INSTALLATION
Download
Apache Hadoop Release 1.2.1 from Apache
Download Mirrors site into huser home directory, extract it and
assign hadoop user & group ownership.
huser@hadoop-lab:~$
tar -xzvf hadoop-1.2.1.tar.gz
huser@hadoop-lab:~$
sudo mv /home/huser/hadoop-1.2.1 /usr/local/hadoop/
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /usr/local/hadoop/
HADOOP
USER ENVIRONMENT CONFIGURATION
Set user environment variables for java and hadoop home directories.
huser@hadoop-lab:~$ vi .bashrc
HADOOP
CONFIGURATION
Hadoop
Environment Setup
Setup Java Home for Hadoop and disable IPv6.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hadoop-env.sh
The main hadoop configurations are stored in 3 files listed below.
core-site.xml
mapred-site.xml
hdfs-site.xml
core-site.xml
Contains default values for core Hadoop properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/core-site.xml
huser@hadoop-lab:~$ mkdir /usr/local/hadoop/tmp
mapred-site.xml
Contains configuration information for MapReduce properties.
huser@hadoop-lab$ vi /usr/local/hadoop/conf/mapred-site.xml
hdfs-site.xml
Contains server side configuration for Hadoop Distributed File System
huser@hadoop-lab$ vi /usr/local/hadoop/conf/hdfs-site.xml
huser@hadoop-lab:~$ sudo mkdir -p /hdfs/name
huser@hadoop-lab:~$ sudo mkdir /hdfs/data
huser@hadoop-lab:~$ sudo chown -R huser:hadoop /hdfs
FORMATTING
NAMENODE
Before we start adding files to HDFS we need to format it. Type 'Y'
for the prompt. Y/N prompt is case-sensitive.
huser@hadoop-lab:~$ hadoop namenode -format
STARTING
SERVICES
After the namenode has been formatted, it is time to launch hadoop.
huser@hadoop-lab:~$ start-all.sh
As we all know that hadoop is written in java language, hence we can
JPS (Java Process Status) tool to check the processes that are
running in jvm.
huser@hadoop-lab:~$ jps
The output should look something like shown below in screenshot.
We can also have a look at the web interfaces of HDFS and MapReduce.
HDFS
: http://localhost:50070
MapReduce
: http://localhost:50030
That's it. I hope you enjoyed learning building up a Single-Node Hadoop Cluster on Ubuntu 14.04. If you have any suggestions, questions or any comments to make, please leave a comment.
Follow the below links to create:
Multi-Node Hadoop Cluster on Ubuntu 14.04
Wow that's a wonderfull blog having all details & helpful. Hadoop cluster NJ
ReplyDeleteReally useful information about hadoop, i have to know information about hadoop online training institutes.
ReplyDeleteThanks for sharing such a useful post. It is really good. Keep posting. It’s a great article which enriches my knowledge.
ReplyDeleteHadoop Training in Chennai