Getting hadoop to run on the Raspberry Pi

Hadoop was implemented on Java, so getting it to run on the Pi is just as easy as doing so on x86 servers. First of all, we need JVM for pi. You can either get OpenJDK or Oracle’s JDK 8 for ARM Early Access. I would personally recommended JDK8 as it is **just a little slightly* faster than OpenJDK, which is easier to install.

1. Install Java

Installing OpenJDK is easy, just do and wait

pi@raspberrypi ~ $ sudo apt-get install openjdk-7-jdk
pi@raspberrypi ~ $ java -version
java version "1.7.0_07"
OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2a-1+rpi1)
OpenJDK Zero VM (build 22.0-b10, mixed mode)

Alternatively, you can install Oracle’s JDK 8 for ARM Early Access (some said it was optimized for Pi).
First get it from here: https://jdk8.java.net/fxarmpreview/index.html

pi@raspberrypi ~ $ sudo tar zxvf jdk-8-ea-b36e-linux-arm-hflt-*.tar.gz -C /opt
pi@raspberrypi ~ $ sudo update-alternatives --install "/usr/bin/java" 
"java" "/opt/jdk1.8.0/bin/java" 1 
pi@raspberrypi ~ $ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)

If you have both versions installed, you can use switch between them with

sudo update-alternatives --config java

2. Create a hadoop system user

pi@raspberrypi ~ $ sudo addgroup hadoop
pi@raspberrypi ~ $ sudo adduser --ingroup hadoop hduser
pi@raspberrypi ~ $ sudo adduser hduser sudo

3. Setup SSH

pi@raspberrypi ~ $ su - hduser
hduser@raspberrypi ~ $ ssh-keygen -t rsa -P ""

This will create an RSA key pair with an empty password. It is done so to stop Hadoop prompting for the passphrase when in talks to its nodes

hduser@raspberrypi ~ $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now SSH access to your local machine is enabled with this newly created key

hduser@raspberrypi ~ $ ssh localhost

You should be good to login without password

4. Download (install?) Hadoop
Download hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core

hduser@raspberrypi ~ $ wget http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
hduser@raspberrypi ~ $ sudo tar vxzf hadoop-1.1.2.tar.gz -C /usr/local
hduser@raspberrypi ~ $ cd /usr/local
hduser@raspberrypi /usr/local $ sudo mv hadoop-1.1.2 hadoop
hduser@raspberrypi /usr/local $ sudo chown -R hduser:hadoop hadoop

Now hadoop has been installed and ready to roll (not yet). Edit .bashrc under your home, and append the following lines

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-armhf
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

modify JAVA_HOME accordingly if you use oracle’s version.

Reboot Pi and verify the installation:

hduser@raspberrypi ~ $ hadoop version
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/
branch-1.1 -r 1440782
Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
From source with checksum c720ddcf4b926991de7467d253a79b8b

5. Configure Hadoop
NOTE: this how-to is just a minimal configuration for single-node mode hadoop

configuration files are at "/usr/local/hadoop/conf/", and will need to 
edit core-site.xml, hdfs-site.xml, mapred-site.xml

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/fs/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

OK, we’re almost done, one last step.

hduser@raspberrypi ~ $ sudo mkdir -p /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chown hduser:hadoop /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chmod 750 /fs/hadoop/tmp
hduser@raspberrypi ~ $ hadoop namenode -format

ATTENTION:

If you use JDK 8 for hadoop, you need to force DataNode to run in JVM client mode as JDK 8 does not support server yet. Go to /usr/local/hadoop/bin and edit hadoop file (please create a backup first). Assuming you’re using nano, the procedure is as follows.  nano hadoop, ctrl-w to search for “-server” argument. What you need is to delete “-server” and then save & exit.

Now hadoop single-node system is ready. Below are some useful commands.

1. jps           // will report the local VM identifier
2. start-all.sh  // will start all hadoop processes
3. stop-all.sh   // will stop all hadoop processes

 

References:

[1] http://raspberrypi.stackexchange.com/questions/4683/how-to-install-java-jdk-on-raspberry-pi
[2] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

8 thoughts on “Getting hadoop to run on the Raspberry Pi

  1. Hi raspberrypicloud Team,

    I have 5 Pi’s in my Hadoop cluster and I’m having a lot of difficulty fine tuning it. I thought I’d give JDK 8 for ARM Early Access a try however I’ve run into a few problems.

    Previously I’ve used java-6-openjdk-armhf and it’s worked fine but switching over to to JDK 8 without changing anything else I get this error in my task tracker log:

    2013-05-05 11:40:32,559 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.IllegalArgumentException
    at java.util.concurrent.LinkedBlockingQueue.(LinkedBlockingQueue.java:259)
    at org.apache.hadoop.ipc.Server.(Server.java:1478)
    at org.apache.hadoop.ipc.RPC$Server.(RPC.java:560)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:521)
    at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:881)
    at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1565)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3906)

    and this in my datanode log:

    2013-05-05 11:01:32,912 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to rp-master/192.168.0.10:54310 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
    at org.apache.hadoop.ipc.Client.call(Client.java:1112)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:972)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1527)
    at java.lang.Thread.run(Thread.java:679)
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

    Could you share some of your config files you are using in your set up?

    Also are you using compression? Hadoop doesn’t have ARM native libs for any of the compression codecs, that I could find.

    Much appreciated,

    Lewis

    • Arh, apparently an important step is missing.

      jdk8 doesn’t support server mode. Do you see DataNode running after “jps”? Probably not. Now, go to $HADOOP_INSTALL/bin and edit “hadoop”; find and remove “-server” argument. Restart everything and try again.

      • Thank you. Works a treat. Looking forward to the full ARM release of JDK 8. Latest update here: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=81&t=27805&sid=1db4e7272b372a51f9055cffdb137c6f

        Just to clarify for anyone else wanting to get this to work.

        You’ll need to do the following:
        cp /usr/local/hadoop/bin/hadoop /usr/local/hadoop/bin/hadoop.bkp
        nano /usr/local/hadoop/bin/hadoop

        once in nano: ctrl+w “-server”
        this will take you to line:
        ” HADOOP_OPTS=”$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS””
        you’ll need to change this to
        ” HADOOP_OPTS=”$HADOOP_OPTS $HADOOP_DATANODE_OPTS””

        unfortunately it is not enough to change hadoop-env.sh to ‘export HADOOP_OPTS=”-client”‘

        You’ll want to make a backup of bin/hadoop as I believe that the next version of JDK 8 will support server mode. Making a backup will allow you to switch back easily. Not 100% sure why it doesn’t currently support server mode. Or if you want to change back to Open JDK, server mode is more efficient.

        @raspberrypicloud Team:
        I’d love to see your *-site.xml config files as I’m sure you’ve fine tuned your setup to get the most out of the RPI.

        For example what have you set you for:
        “io.sort.mb”
        “mapred.child.java.opts”
        “mapred.output.compress”
        “mapred.task.timeout”
        “export HADOOP_HEAPSIZE”
        export HADOOP_OPTS=””
        “dfs.image.compress”

        Have you got compression to work with the RPI? About 30% of any reduce job I’ve tried to run has been moving files. Compression will improve performance massively.

        Also, have you tried overclocking?

        I’ve found these setting to work well:
        /boot/config.txt:
        gpu_mem=16
        core_freq=450
        arm_freq=900
        sdram_freq=450
        over_voltage=0

        I’ve heard reports that overclocking beyond this will corrupt the SD card over prolonged use.

        Thanks for your help.

        Regards,

        Lewis

  2. Hi!

    Thank you for this! This is a very good and helpful tutorial . I making a rpi hadoop cluster, and I found this very helpful.

    One thing is not mention here is that you need to set the JAVA_HOME at the hadoop-env.sh file at /usr/local/hadoop/conf. if not, you will this error “localhost: Error: JAVA_HOME is not set.” like i do.

    All in all, I manage to start the hadoop single node. I hope my multinode will work. Currently I have 2 rpi’s at hand.

    Also ,initially, i planned to use archlinux installed on Pi but seems like its rather difficult to install.
    Let me know if you were able to try it with Arch.

    Thanks and cheers,
    nifrali

  3. Pingback: ทำ Cloud Computing ที่สกอตแลนด์ | Raspberry Pi Thailand

  4. the link to get the hadoop package “http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz” is not working.
    its because hadoop 1.1.2 is not available there. So instead of this link i used “http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz”

  5. Pingback: ทำ Cloud Computing ที่สกอตแลนด์ | Unofficial of Raspberry Pi Fan in Thailand

Leave a comment