Hadoop was implemented on Java, so getting it to run on the Pi is just as easy as doing so on x86 servers. First of all, we need JVM for pi. You can either get OpenJDK or Oracle’s JDK 8 for ARM Early Access. I would personally recommended JDK8 as it is **just a little slightly* faster than OpenJDK, which is easier to install.
1. Install Java
Installing OpenJDK is easy, just do and wait
pi@raspberrypi ~ $ sudo apt-get install openjdk-7-jdk
pi@raspberrypi ~ $ java -version
java version "1.7.0_07"
OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2a-1+rpi1)
OpenJDK Zero VM (build 22.0-b10, mixed mode)
Alternatively, you can install Oracle’s JDK 8 for ARM Early Access (some said it was optimized for Pi).
First get it from here: https://jdk8.java.net/fxarmpreview/index.html
pi@raspberrypi ~ $
sudo tar zxvf jdk-8-ea-b36e-linux-arm-hflt-*.tar.gz -C /opt
pi@raspberrypi ~ $
sudo update-alternatives --install "/usr/bin/java"
"java" "/opt/jdk1.8.0/bin/java" 1
pi@raspberrypi ~ $ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
If you have both versions installed, you can use switch between them with
sudo update-alternatives --config java
2. Create a hadoop system user
pi@raspberrypi ~ $ sudo addgroup hadoop
pi@raspberrypi ~ $ sudo adduser --ingroup hadoop hduser
pi@raspberrypi ~ $ sudo adduser hduser sudo
3. Setup SSH
pi@raspberrypi ~ $ su - hduser
hduser
@raspberrypi ~ $ ssh-keygen -t rsa -P ""
This will create an RSA key pair with an empty password. It is done so to stop Hadoop prompting for the passphrase when in talks to its nodes
hduser
@raspberrypi ~
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Now SSH access to your local machine is enabled with this newly created key
hduser
@raspberrypi ~
$ ssh localhost
You should be good to login without password
4. Download (install?) Hadoop
Download hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core
hduser
@raspberrypi ~
$ wget http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
hduser
@raspberrypi ~
$
sudo tar vxzf hadoop-1.1.2.tar.gz -C /usr/local
hduser
@raspberrypi ~
$
cd /usr/local
hduser
@raspberrypi /usr/local
$ sudo mv hadoop-1.1.2 hadoop
hduser
@raspberrypi /usr/local
$ sudo chown -R hduser:hadoop hadoop
Now hadoop has been installed and ready to roll (not yet). Edit .bashrc under your home, and append the following lines
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-armhf
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
modify JAVA_HOME accordingly if you use oracle’s version.
Reboot Pi and verify the installation:
hduser@raspberrypi ~ $ hadoop version
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/
branch-1.1 -r 1440782
Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
From source with checksum c720ddcf4b926991de7467d253a79b8b
5. Configure Hadoop
NOTE: this how-to is just a minimal configuration for single-node mode hadoop
configuration files are at "/usr/local/hadoop/conf/"
, and will need to
edit core-site.xml, hdfs-site.xml, mapred-site.xml
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/fs/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
OK, we’re almost done, one last step.
hduser@raspberrypi ~ $ sudo mkdir -p /fs/hadoop/tmp
hduser@raspberrypi ~
$ sudo chown hduser:hadoop /fs/hadoop/tmp
hduser@raspberrypi ~
$ sudo chmod 750 /fs/hadoop/tmp
hduser@raspberrypi ~
$
hadoop namenode -format
ATTENTION:
If you use JDK 8 for hadoop, you need to force DataNode to run in JVM client mode as JDK 8 does not support server yet. Go to /usr/local/hadoop/bin and edit hadoop file (please create a backup first). Assuming you’re using nano, the procedure is as follows. nano hadoop, ctrl-w to search for “-server” argument. What you need is to delete “-server” and then save & exit.
Now hadoop single-node system is ready. Below are some useful commands.
1. jps // will report the local VM identifier
2. start-all.sh // will start all hadoop processes
3. stop-all.sh // will stop all hadoop processes
References:
[1] http://raspberrypi.stackexchange.com/questions/4683/how-to-install-java-jdk-on-raspberry-pi
[2] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/