Voozh

Intro

This project allows running HDFS on Mesos. You should be familiar with HDFS and Mesos basics:

Project requires:

Mesos 0.23.0+
JDK 1.7.x
Hadoop 1.2.x or 2.7.x

Mesos in Vagrant

Project includes vagrant environment, that allows to run Mesos cluster locally. If you are going to use external Mesos cluster, you can skip this section.

1. Start vagrant nodes:

# cd hdfs-mesos/vagrant
# vagrant up

It creates mesos master and slave nodes.

2. Add vagrant node names to /etc/hosts

Now Mesos in vagrant should be running. You can proceed with starting scheduler. For more details about vagrant environment please read vagrant/README.md

Running Scheduler

1. Download hdfs-mesos\*.jar OR clone & build the project:

Download jar:

# mkdir hdfs-mesos
# cd hdfs-mesos
# wget https://github.com/elodina/hdfs-mesos/releases/download/0.0.1.0/hdfs-mesos-0.0.1.0.jar

OR clone & build:

# git clone https://github.com/elodina/hdfs-mesos.git
# cd hdfs-mesos
# ./gradlew jar

2. Download hadoop tarball:

# wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz

3. Start scheduler:

# ./hdfs-mesos.sh scheduler --api=http://$scheduler:7000 --master=zk://$master:2181/mesos --user=vagrant
2016-03-18 15:04:48,785 [main] INFO hdfs.Scheduler - Starting Scheduler:
api: http://$scheduler:7000
files: jar:./hdfs-mesos-0.0.1.0.jar, hadoop:./hadoop-1.2.1.tar.gz
mesos: master:master:5050, user:vagrant, principal:<none>, secret:<none>
framework: name:hdfs, role:*, timeout:30d
2016-03-18 15:04:48,916 [main] INFO hdfs.HttpServer - started on port 7000
I0318 15:04:49.008314 19123 sched.cpp:164] Version: 0.25.0
I0318 15:04:49.017160 19155 sched.cpp:262] New master detected at master@192.168.3.5:5050
I0318 15:04:49.019287 19155 sched.cpp:272] No credentials provided. Attempting to register without authentication
I0318 15:04:49.029218 19155 sched.cpp:641] Framework registered with 20160310-141004-84125888-5050-10895-0006
2016-03-18 15:04:49,044 [Thread-17] INFO hdfs.Scheduler - [registered] framework:#-0006 master:#326bb pid:master@192.168.3.5:5050 hostname:master
2016-03-18 15:04:49,078 [Thread-18] INFO hdfs.Scheduler - [resourceOffers]
slave0#-O761 cpus:1.00; mem:2500.00; disk:35164.00; ports:[5000..32000]
master#-O762 cpus:1.00; mem:2500.00; disk:35164.00; ports:[5000..32000]
...
2016-03-18 15:04:49,078 [Thread-18] INFO hdfs.Scheduler - [resourceOffers]

where:

$scheduler is scheduler address accessible from slave nodes;
$master master address accessible from scheduler node;

Scheduler should register itself and start receiving resource offers. If scheduler is not receiving offers it could be required to specify LIBPROCESS_IP:

# export LIBPROCESS_IP=$scheduler_ip

Now scheduler should be running and you can proceed with starting HDFS nodes.

Running HDFS Cluster

Project provides CLI & REST API for managing HDFS nodes. We will focus first on CLI.

1. Add namenode & datanode:

# ./hdfs-mesos.sh node add nn --type=namenode
node added:
 id: nn
 type: namenode
 state: idle
 resources: cpus:0.5, mem:512

# ./hdfs-mesos.sh node add dn0 --type=datanode
node added:
 id: dn0
 type: datanode
 state: idle
 resources: cpus:0.5, mem:512

2. Start nodes:

# ./hdfs-mesos.sh node start \*
nodes started:
 id: nn
 type: namenode
 state: running
 resources: cpus:0.5, mem:512
 reservation: cpus:0.5, mem:512, ports:http=5000,ipc=5001
 runtime:
 task: 383aaab9-982b-400e-aa35-463e66cdcb3b
 executor: 19065e07-a006-49a4-8f2b-636d8b1f2ad6
 slave: 241be3a2-39bc-417c-a967-82b4018a0762-S0 (master)

 id: dn0
 type: datanode
 state: running
 resources: cpus:0.5, mem:512
 reservation: cpus:0.5, mem:512, ports:http=5002,ipc=5003,data=5004
 runtime:
 task: 37f3bcbb-10a5-4323-96d2-aef8846aa281
 executor: 088463c9-5f2e-4d1d-8195-56427168b86f
 slave: 241be3a2-39bc-417c-a967-82b4018a0762-S0 (master)

Nodes are up & running now.

Note: starting may take some time. You can view the progress via Mesos UI.

3. Do some FS operations:

# hadoop fs -mkdir hdfs://master:5001/dir
# hadoop fs -ls hdfs://master:5001/
Found 1 items
drwxr-xr-x - vagrant supergroup 0 2016-03-17 12:46 /dir

Note: namenode host and ipc port is used in fs url.

Using CLI

Project provides CLI with following structure:

# ./hdfs-mesos.sh help
Usage: <cmd> ...

Commands:
 help [cmd [cmd]] - print general or command-specific help
 scheduler - start scheduler
 node - node management

Help is provided for each command and sub-command:

# ./hdfs-mesos.sh help node
Node management commands
Usage: node <cmd>

Commands:
 list - list nodes
 add - add node
 update - update node
 start - start node
 stop - stop node
 remove - remove node

Run `help node <cmd>` to see details of specific command

# ./hdfs-mesos.sh help node add
Add node
Usage: node add <ids> [options]

Option (* = required) Description
--------------------- -----------
--core-site-opts Hadoop core-site.xml options.
--cpus <Double> CPU amount (0.5, 1, 2).
--executor-jvm-opts Executor JVM options.
--hadoop-jvm-opts Hadoop JVM options.
--hdfs-site-opts Hadoop hdfs-site.xml options.
--mem <Long> Mem amount in Mb.
* --type node type (name_node, data_node).

Generic Options
Option Description
------ -----------
--api REST api url (same as --api option for
 scheduler).

All node-related commands support bulk operations using node-id-expressions. Examples:

# ./hdfs-mesos.sh node add dn0..1 --type=datanode
nodes added:
 id: dn0
 type: datanode
 ...

 id: dn1
 type: datanode
 ...

# ./hdfs-mesos.sh node update dn* --cpus=1
nodes updated:
 id: dn0
 ...
 resources: cpus:1.0, mem:512

 id: dn1
 ...
 resources: cpus:1.0, mem:512

# ./hdfs-mesos.sh node start dn0,dn1
nodes started:
 id: dn0
 ...

 id: dn0
 ...

Id expression examples:

nn – matches node with id nn
* – matches any node (should be slash-escaped in shell)
dn* – matches node with id starting with dn
dn0..2 – matches nodes dn0, dn1, dn2

Using REST

Scheduler uses embedded HTTP server. Server serves two functions:

distributing binaries of Hadoop, JRE and executor;
serving REST API, invoked by CLI;

Most CLI commands map to REST API call. Examples:

CLI command	REST call
`node add nn --type=namenode --cpus=2`	`/api/node/add?node=nn&type=namenode&cpus=2`
`node start dn* --timeout=3m-`	`/api/node/start?node=dn*&timeout=3m`
`node remove dn5`	`/api/node/remove?node=dn5`

REST calls accepts plain HTTP params and return JSON responses. Examples:

# curl http://$scheduler:7000/api/node/list
[
 {
 "id": "nn",
 "type": "namenode",
 ...
 },
 {
 "id": "dn0",
 "type": "datanode",
 ...
 }
]

# curl http://$scheduler:7000/api/node/start?node=nn,dn0
{
 "status": "started",
 "nodes": [
 {
 "id": "nn",
 "state": "running",
 ...
 },
 {
 "id": "dn0",
 "state": "running",
 ...
 }
 ]
}

CLI params maps one-to-one to REST params. CLI params use dashed style while REST params use camel-case. Example of mappings:

CLI param	REST param
`<id>` (node add\|update\|…)	`node`
`timeout` (node start\|stop)	`timeout`
`core-site-opts` (node add\|update)	`coreSiteOpts`
`executor-jvm-opts` (node add\|update)	`executorJvmOpts`

REST API call could return error in some cases. Errors are marked with status code other than 200. Error response is returned in JSON format.

Example:

# curl -v http://192.168.3.1:7000/api/node/start?node=unknown
...
HTTP/1.1 400 node not found
...
{"error":"node not found","code":400}

For more detail on REST API please refer to sources.

Reference:

Apache Hadoop HDFS Data Node Apache Mesos Framework from our JCG partner Joe Stein at the All Things Hadoop blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

and many more ....

I agree to the Terms and Privacy Policy

👁 Image

Thank you!

We will contact you soon.

URL: https://www.javacodegeeks.com/2016/06/apache-hadoop-hdfs-data-node-apache-mesos-framework.html

⇱ Apache Hadoop HDFS Data Node Apache Mesos Framework - Java Code Geeks

Intro

Mesos in Vagrant

Running Scheduler

Running HDFS Cluster

Using CLI

Using REST

Thank you!

Joe Stein

Related Articles

Simple REST client in Java

Spring Boot Error – Error creating a bean with name ‘dataSource’ defined in class path resource DataSourceAutoConfiguration

How to fix Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory in Java

Mockito: Cannot instantiate @InjectMocks field: the type is an interface

100 Java Spring Interview Questions & Answers – The ULTIMATE List (PDF Download)

Spring Boot Remove Embedded Tomcat Server, Enable Jetty Server

What is SecurityContext and SecurityContextHolder in Spring Security?

How to install Apache Web Server on EC2 Instance using User data script