Getting hadoop to run on the Raspberry Pi

Hadoop was implemented on Java, so getting it to run on the Pi is just as easy as doing so on x86 servers. First of all, we need JVM for pi. You can either get OpenJDK or Oracle’s JDK 8 for ARM Early Access. I would personally recommended JDK8 as it is **just a little slightly* faster than OpenJDK, which is easier to install.

1. Install Java

Installing OpenJDK is easy, just do and wait

pi@raspberrypi ~ $ sudo apt-get install openjdk-7-jdk
pi@raspberrypi ~ $ java -version
java version "1.7.0_07"
OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2a-1+rpi1)
OpenJDK Zero VM (build 22.0-b10, mixed mode)

Alternatively, you can install Oracle’s JDK 8 for ARM Early Access (some said it was optimized for Pi).
First get it from here: https://jdk8.java.net/fxarmpreview/index.html

pi@raspberrypi ~ $ sudo tar zxvf jdk-8-ea-b36e-linux-arm-hflt-*.tar.gz -C /opt
pi@raspberrypi ~ $ sudo update-alternatives --install "/usr/bin/java" 
"java" "/opt/jdk1.8.0/bin/java" 1 
pi@raspberrypi ~ $ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)

If you have both versions installed, you can use switch between them with

sudo update-alternatives --config java

2. Create a hadoop system user

pi@raspberrypi ~ $ sudo addgroup hadoop
pi@raspberrypi ~ $ sudo adduser --ingroup hadoop hduser
pi@raspberrypi ~ $ sudo adduser hduser sudo

3. Setup SSH

pi@raspberrypi ~ $ su - hduser
hduser@raspberrypi ~ $ ssh-keygen -t rsa -P ""

This will create an RSA key pair with an empty password. It is done so to stop Hadoop prompting for the passphrase when in talks to its nodes

hduser@raspberrypi ~ $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now SSH access to your local machine is enabled with this newly created key

hduser@raspberrypi ~ $ ssh localhost

You should be good to login without password

4. Download (install?) Hadoop
Download hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core

hduser@raspberrypi ~ $ wget http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
hduser@raspberrypi ~ $ sudo tar vxzf hadoop-1.1.2.tar.gz -C /usr/local
hduser@raspberrypi ~ $ cd /usr/local
hduser@raspberrypi /usr/local $ sudo mv hadoop-1.1.2 hadoop
hduser@raspberrypi /usr/local $ sudo chown -R hduser:hadoop hadoop

Now hadoop has been installed and ready to roll (not yet). Edit .bashrc under your home, and append the following lines

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-armhf
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

modify JAVA_HOME accordingly if you use oracle’s version.

Reboot Pi and verify the installation:

hduser@raspberrypi ~ $ hadoop version
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/
branch-1.1 -r 1440782
Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
From source with checksum c720ddcf4b926991de7467d253a79b8b

5. Configure Hadoop
NOTE: this how-to is just a minimal configuration for single-node mode hadoop

configuration files are at "/usr/local/hadoop/conf/", and will need to 
edit core-site.xml, hdfs-site.xml, mapred-site.xml

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/fs/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

OK, we’re almost done, one last step.

hduser@raspberrypi ~ $ sudo mkdir -p /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chown hduser:hadoop /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chmod 750 /fs/hadoop/tmp
hduser@raspberrypi ~ $ hadoop namenode -format

ATTENTION:

If you use JDK 8 for hadoop, you need to force DataNode to run in JVM client mode as JDK 8 does not support server yet. Go to /usr/local/hadoop/bin and edit hadoop file (please create a backup first). Assuming you’re using nano, the procedure is as follows.  nano hadoop, ctrl-w to search for “-server” argument. What you need is to delete “-server” and then save & exit.

Now hadoop single-node system is ready. Below are some useful commands.

1. jps           // will report the local VM identifier
2. start-all.sh  // will start all hadoop processes
3. stop-all.sh   // will stop all hadoop processes

 

References:

[1] http://raspberrypi.stackexchange.com/questions/4683/how-to-install-java-jdk-on-raspberry-pi
[2] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/