# HashPrompt

This tutorial demonstrates how to setup an Apache Hadoop 1.2.1 Cluster using Oracle Solaris 11.1 Virtualization Technology or Zones.

I am running this setup inside Oracle VM VirtualBox 4.3.10 on Ubuntu 12.04 and the guest machine is running Oracle Solaris 11. Namenode will run inside Global Zone whereas we will be configuring 4 guest zones with almost same configuration for separate “Secondary Namenode” and 2 datanodes respectively.

LAB SETUP

Nodename	Zone	Hostname	IP-Address
Namenode	Global Zone	nn	10.0.2.15
Secondary Namenode	Guest Zone	nn2	10.0.2.16
Datanode1	Guest Zone	dn1	10.0.2.17
Datanode2	Guest Zone	dn2	10.0.2.18

NOTE: There are a few configurations that are needed to be done in all nodes as required. So I will show for a single node as the commands and steps will be same for all other nodes.

HADOOP USER & GROUP CONFIGURATION

Solaris installed inside a machine is in itself a “Global Zone”. So we have a global zone and say you had created a “user” user. Though we can use the same user to setup our Hadoop cluster but we will use a distinctive user named “huser” to perform our hadoop tasks more clearly.

Creating hadoop group

user$ sudo groupadd hadoop

Creating a hadoop user “huser” and place the user in hadoop group

user$ sudo useradd -m -d /export/home/huser -g hadoop huser

Setting the password for the user “huser”

user$ sudo passwd huser

By default, Oracle Solaris 11.1 does not create users with sudo permissions. Hence we need to assign “huser” user sudo permissions.

user@nn:~$ sudo vi /etc/sudoers

huser ALL=(ALL:ALL) ALL

After adding the user we continue our configurations using “huser” logins.

user@nn:~$ su – huser

NOTE: User and group creation, and configuring sudo permissions is done on the Global Zone which will be configured as namenode. The same will be done again on other zones also after we configure secondary namenode and datanodes zones separately.

HADOOP INSTALLATION ON GLOBAL ZONE (NAMENODE)

huser@nn:~$ sudo mkdir -p /usr/local/

huser@nn:~$ sudo chown -R huser:hadoop /usr/local/

huser@nn:~$ sudo tar -xzvf /tmp/hadoop-1.2.1.tar.gz

huser@nn:~$ sudo mv /tmp/hadoop-1.2.1 /usr/local/hadoop

NOTE: Hadoop installation is done only on namenode because in Oracle Solaris 11 Zones we have a facility to mount and share the data that is same across different zones. Hence it is important to install hadoop before we go for secondary namenode and datanode zones configuration.

CREATING A LOCAL REPOSITORY

Steps to configure a local IPS repository for Oracle Solaris 11.1

ZONES CONFIGURATION

Creating virtual network interfaces for each zone

huser@nn:~$ sudo dladm create-vnic -l net0 nn2

huser@nn:~$ sudo dladm create-vnic -l net0 dn1

huser@nn:~$ sudo dladm create-vnic -l net0 dn2

huser@nn:~$ dladm show-link

Next we will create a zfs dataset for all zones.

huser@nn:~$ sudo zfs create -o mountpoint=/zonefs rpool/zonefs

NOTE: The zfs dataset for zones should not be rpool/ROOT dataset or immediately under the global zone filesystem root ("/") dataset.

Creating secondary namenode zone

huser@nn:~$ sudo zonecfg -z nn2

Use 'create' to begin configuring a new zone.

zonecfg:nn2> create

create: Using system default template 'SYSdefault'

zonecfg:nn2> set autoboot=true

zonecfg:nn2> set zonepath=/zonefs/nn2

zonecfg:nn2> add fs

zonecfg:nn2:fs> set dir=/usr/local/hadoop

zonecfg:nn2:fs> set special=/usr/local/hadoop

zonecfg:nn2:fs> set type=lofs

zonecfg:nn2:fs> set options=[ro,nodevices]

zonecfg:nn2:fs> end

zonecfg:nn2> add net

zonecfg:nn2:net> set physical=nn2

zonecfg:nn2:net> end

zonecfg:nn2> verify

zonecfg:nn2> commit

zonecfg:nn2> exit

Creating datanode zones

huser@nn:~$ sudo zonecfg -z dn1

Use 'create' to begin configuring a new zone.

zonecfg:dn1> create

create: Using system default template 'SYSdefault'

zonecfg:dn1> set autoboot=true

zonecfg:dn1> set zonepath=/zonefs/dn1

zonecfg:dn1> add fs

zonecfg:dn1:fs> set dir=/usr/local/hadoop

zonecfg:dn1:fs> set special=/usr/local/hadoop

zonecfg:dn1:fs> set type=lofs

zonecfg:dn1:fs> set options=[ro,nodevices]

zonecfg:dn1:fs> end

zonecfg:dn1> add net

zonecfg:dn1:net> set physical=dn1

zonecfg:dn1:net> end

zonecfg:dn1> verify

zonecfg:dn1> commit

zonecfg:dn1> exit

huser@nn:~$ sudo zonecfg -z dn2

Use 'create' to begin configuring a new zone.

zonecfg:dn2> create

create: Using system default template 'SYSdefault'

zonecfg:dn2> set autoboot=true

zonecfg:dn2> set zonepath=/zonefs/dn2

zonecfg:dn2> add fs

zonecfg:dn2:fs> set dir=/usr/local/hadoop

zonecfg:dn2:fs> set special=/usr/local/hadoop

zonecfg:dn2:fs> set type=lofs

zonecfg:dn2:fs> set options=[ro,nodevices]

zonecfg:dn2:fs> end

zonecfg:dn2> add net

zonecfg:dn2:net> set physical=dn2

zonecfg:dn2:net> end

zonecfg:dn2> verify

zonecfg:dn2> commit

zonecfg:dn2> exit

All the zones that are created can be listed using the below command.

huser@nn:~$ zoneadm list -cv

Installing zones

huser@nn:~$ sudo zoneadm -z nn2 install

huser@nn:~$ sudo zoneadm -z dn1 install

huser@nn:~$ sudo zoneadm -z dn2 install

It takes time to install the zones depending on your hardware setup.

We can see the list of all installed zones in the below screenshot.

Booting and configuring zones

huser@nn:~$ sudo zoneadm -z nn2 boot

huser@nn:~$ sudo zlogin -C nn2

huser@nn:~$ sudo zoneadm -z dn1 boot

huser@nn:~$ sudo zlogin -C dn1

huser@nn:~$ sudo zoneadm -z dn2 boot

huser@nn:~$ sudo zlogin -C dn2

NOTE: After executing the boot command, we can see the consoles of their respective zones using “zlogin” command. Here we will configure the system configurations as shown in the table for “Lab Setup”. Additionally, we will specify “user” user in user configuration screen without dns and authentication as we had configured global zone while installation. After the system configuration has been done, we will login as “user” user in all nodes.

Create "huser" user & "hadoop" group in all nodes as done earlier in global zone. If the installation & configuration is done differently than the global zone, then make sure that the "huser" user-id & hadoop group-id in datanodes is identical as in namenode.

HOSTS FILE CONFIGURATION

It is necessary to populate the hosts file all nodes with the ip addresses of all other nodes.

PASSWORDLESS SSH CONFIGURATION

Steps to configure passwordless ssh

OPENJDK 7 INSTALLATION

I will be using openjdk-7-jdk package which installs easily.

huser@nn:~$ sudo pkg install --accept developer/java/jdk-7

NOTE: OpenJDK installation has to be done on all nodes.

USER ENVIRONMENT CONFIGURATION

As we have already logged in as “huser” user we will edit the .bashrc file in the home directory of “huser” to set the user environment and paths.

huser@nn:~$ vi .bashrc

Execute the .bashrc file

huser@nn:~$ exec bash

NOTE: User environment has to be configured on all nodes to declare the essential variables for both java and hadoop mainly.

HADOOP CONFIGURATION

Here we will begin our hadoop configuration. All the configuration files are present in “/usr/local/hadoop/conf” directory

NOTE: All configurations have to be done only on namenode because we have already shared and mounted the global zone "/usr/local/hadoop" mountpoint onto all other guest zones.

Hadoop Environment - hadoop-env.sh

huser@nn:~$ vi /usr/local/hadoop/conf/hadoop-env.sh

export JAVA_HOME=/usr/java

export HADOOP_LOG_DIR=/var/log/hadoop-log

Create "hadoop-log" directory as specified in "HADOOP LOG DIR" variable in the hadoop-env.sh configuration file

huser@nn:~$ sudo mkdir /var/log/hadoop-log

huser@nn:~$ sudo chown -R huser:hadoop /var/log/hadoop-log

NOTE: /var/log/hadoop-log directory has to be created on all nodes with required “huser” user and group ownership and permissions.

Default Filesystem - core-site.xml

core-site.xml file which helps in pointing the datanodes to namenodes and which port they should listen to.

huser@nn:~$ vi /usr/local/hadoop/conf/core-site.xml

NOTE: We can use IP address or hostname and whichever port number of the namenode. Commonly used port number is 9000, but I have chosen 10001. “/tmp” directory has to be created only on namenode global zone.

MapReduce Framework – mapred-site.xml

This file used to point all the task-trackers to the job-tracker. The parameter: mapred.job.tracker sets the hostname or IP and port of job-tracker.
huser@nn:~$ vi /usr/local/hadoop/conf/mapred-site.xml

NOTE: Port number 10002 is my choice, it is not mandatory to use the same port number.

HDFS Configuration – hdfs-site.xml

File hdfs-site.xml contains various parmeters that configures HDFS.

huser@nn:~$ vi /usr/local/hadoop/conf/hdfs-site.xml

As specified in the hdfs-site.xml file, we need to create directories "hdfs" and "name" in namenode with appropriate “huser” user & group ownership.

Namenode Configuration

huser@nn:~$ sudo mkdir -p /hdfs/name

huser@nn:~$ sudo chown -R huser:hadoop /hdfs

All Datanodes Configuration

huser@dn1:~$ sudo mkdir /hdfs/data

huser@dn1:~$ sudo chown -R huser:hadoop /hdfs

NOTE: Repeat the datanodes configuration steps in all datanode zones.

conf/masters

The masters file contains the location where the secondary namenode daemon would start.

huser@nn:~$ vi /usr/local/hadoop/conf/masters

nn2

NOTE: We have configured separate machine for the secondary namenode hence we are defining secondary namenode explicitly here.

conf/slaves

The slaves file contains the list of datanodes where the datanode and task-tracker daemons will run.

huser@nn:~$ vi /usr/local/hadoop/conf/slaves

dn1

dn2

dn3

NOTE: The file should contain one entry per line. We can also mention “namenode” and “secondary namenode” hostname in this file if we want to run datanode and task-tracker daemons on “namenode” and “secondary namenode” too.

STARTING HADOOP CLUSTER

Formatting HDFS via namenode

Before we start our cluster we will format the HDFS via namenode. Formatting the namenode means to initialize the directory specified in “dfs.name.dir” and “dfs.data.dir” parameter in hdfs-site.xml file. After formatting the namenode, “current”, “image” and “previous.checkpoint” directories will be created on namenode and the data directory in datanodes will simply be formatted.

huser@nn1:~$ hadoop namenode -format

NOTE: We need to format the namenode only the first time we setup hadoop cluster. Formatting a running cluster will destroy all the existing data. After executing the format command, it will prompt for confirmation where we need to type “Y” as it is case-sensitive.

Starting the Multi-Node Hadoop Cluster

Starting a hadoop cluster can be done by a single command mentioned below.

huser@nn:~$ start-all.sh

This command will first start the HDFS daemons. The namenode daemon is started on namenode and datanode daemon is started on all datanodes. The secondary namenode daemon is also started on secondary namenode. In the second phase it will start theMapReduce daemons, job-tracker on namenode and task-trackers on datanodes.

Execute “jps” command on all nodes to see the java processes running,

“NameNode and JobTracker” processes on namenode,

“SecondaryNameNode” in seconday namenode zone and,

“DataNode and TaskTracker” on datanode zones.

MONITORING HADOOP

Both HDFS and MapReduce provide Web-UI Management websites to browse the HDFS, monitor the logs and jobs.

HDFS: http://nn:50070

The DFSHealth site allows you to browse the HDFS, monitor namenode logs, view nodes, check the space used, etc.

MapReduce

Job-Tracker: http://nn:50030

The job-tracker site provides the information regarding running map and reduce tasks, running and completed jobs and much more.

Here we have completed creating a Multi-Node Hadoop Cluster using Oracle Solaris 11 Zones.

Enjoy!!!

Follow the below links to create:
Single-Node Hadoop Cluster on Ubuntu 14.04
Multi-Node Hadoop Cluster on Ubuntu 14.04

A repository is a location where the clients publish and retrieve packages. There are two ways to obtain copy of Oracle Solaris 11.1 IPS (Image Packaging System) repository image, one method is to download the repository image from the Oracle Solaris 11 Website, create a local repository and the second method is to retrieve the repository directly from the internet. Here we dont have internet, hence we will go with creating a local repository.

Concatenate the downloaded repository files “sol-11_1-repo-full.iso-a” and “sol-11_1-repo-full.iso-b” to “sol-11_1-repo-full.iso”. Execute the below command to do the same.

huser@nn:~$ cat sol-11_1-repo-full.iso-a sol-11_1-repo-full.iso-b > sol-11_1-repo-full.iso

Mount the repository iso image to /mnt directory.

huser@nn:~$ sudo lofiadm -a /mnt/sf_ISOs/sol-11_1-repo-full.iso

huser@nn:~$ lofiadm

Specify the lofiadm device from the “lofiadm” command output to mount the image.

huser@nn:~$ sudo mount -F hsfs /dev/lofi/1 /mnt

Now we will create a zfs dataset for package repository. We have disabled atime to improve the performance during repo updates and enabled compression.

huser@nn:~$ sudo zfs create -o mountpoint=/SoLoRepo/ -o atime=off -o compression=on rpool/SoLoRepo

Copy and synchronize all the files in the repository to the mountpoint we created for the repository using “rsync”.

huser@nn:~$ sudo rsync -aPvz /temp/ /SoLoRepo/

Next we will configure the web interface for the clients to access the repository. The “svc:/application/pkg/server” is a depot server for IPS. We will use port 8080 to listen to request from clients to access data contained on package repository. The “/application/pkg/server” is not enabled by default in a fresh Solaris 11 installation.

To enable clients to access the local repository via HTTP, let’s configure and enable the “svc:/application/pkg/server” using SMF service. We will configure three properties namely “inst_root” to specify the repository directory, default is /var/pkg/repo; “readonly” and “port”.

huser@nn:~$ sudo svccfg -s application/pkg/server setprop pkg/inst_root=/SoLoRepo/repo/

huser@nn:~$ sudo svccfg -s application/pkg/server setprop pkg/readonly=true

huser@nn:~$ sudo svccfg -s application/pkg/server setprop pkg/port=8080

To confirm the value of properties, use the below command.

huser@nn:~$ svccfg -s pkg/server listprop | egrep "inst_root|readonly|port"

Refresh and enable the “application/pkg/server” service

huser@nn:~$ sudo svcadm -v refresh application/pkg/server

huser@nn:~$ sudo svcadm -v enable application/pkg/server

To confirm the above activity, execute the below command.

huser@nn:~$ svcs application/pkg/server

Set the publisher to get the packages from local repository

huser@nn:~$ sudo pkg set-publisher -P -g /SoLoRepo/repo/ solaris

huser@nn:~$ sudo pkg set-publisher -g http://nn:8080 solaris

huser@nn:~$ pkg publisher

Verifying the port

huser@nn:~$ netstat -an | grep 8080

*.8080 *.* 0 0 128000 0 LISTEN

huser@nn:~$

Access to repository can be tested using a browser, pointing to http://nn:8080

Everything being set we will unmount the /temp directory and delete the lofi device.

Unmount the /temp directory and delete the lofi device.

huser@nn:~$ umount /mnt

huser@nn:~$ lofiadm -d /dev/lofi/1

We will try installing terminator (I will need it later) package using our local repository to test it.

Finally, our Oracle Solaris 11.1 local repository has been created, up and running using SMF, and is made available using HTTP and local ZFS.

# HashPrompt

Pages

About Me

Wednesday, May 28, 2014

Multi-Node Hadoop Cluster on Oracle Solaris 11 using Zones

Creating Local IPS Repository in Oracle Solaris 11.1