Attrition

This week was a hard one. The cutting edge seems like a much more elegant concept when you’re far away from it, knowing that somewhere smart people are chipping away at the unknown. These wise philosophers use measured accuracy and precision, planning every move and quickly reaping the fruits of their intellectual labours. Now that I’ve joined them on this frontier, I can tell you that this is not how things are at all.

This not to say that all researchers are like myself of course, I’m sure all of them are much more capable, intellectual people than I, with fantastic methodology. Of course, this doesn’t mean that they never run up against problems either, and this week, I’ve had my fair share of them.

It started with Shipyard. Shipyard can be thought of as something of a poster child for docker. Simply download a container and then you instantly have a full docker cluster management suite, without ever having to deal with any of its nasty complexities. However, this is assuming that you are running docker on a 64-bit host. If you are on a raspberry pi, then you have to compile everything yourself from a minimal armv7 Ubuntu image, not just for the application container, but for the separate Rethinkdb container too, Shipyard’s data store. I managed to get shipyard working, but it’s Rethinkdb component is running into compilation errors despite it’s supposed arm support. I struggled against this for a while, but rather than get bogged down and keep hitting my head against a brick wall, I decided to move onto Kubernetes.

Kuberenetes is a hotly anticipated management tool, developed by Google and used as part of their own internal Borg infrastructure. It handles the scheduling of a compute cluster using labels and pods, and is currently in pre-production beta. Unsurprisingly, the developers at Google haven’t yet made single board 32-bit micro computers one of their key targets yet, so compiling Kubernetes is bringing its own host of problems.

Two pieces of advice that I would give to anyone doing something like this is make sure you have at least a GB swapfile, and that you should be more than willing to try to fool the installer. Kubernetes required a golang:1.4 container to install, but as the default one is 64 bit, I instead downloaded and renamed a Raspberry Pi 2 targeted one.

This week has taught me that I have a daunting path ahead of me, one that has been trodden by few others (I found one guy on twitter but he didn’t document ANYTHING). I am not going to give up. Far from it, now I better appreciate the scale of this challenge ahead of me. All these in-alpha orchestration programs have reminded me just how important the education and research that this platform will provide will be. I cannot wait to dive back into the project in a weeks time.

End on a bright note, look at my new mug. See you next time.

It's official now!

It’s official now!

PROGRESS

We have chosen a distribution!

Raspbian was the first to fall, it’s lack of an armv7 targeted derivative leading to concerns about its performance. Next was Debian armhf, for which I’d had big hopes. Unfortunately configuring Debian for Docker was taking too long, and we had to move on. Next down was Ubuntu Mate for the Pi 2, which is honestly an excellent distribution, but just wasn’t right for us. Ubuntu Mate installed docker perfectly well, but came with too many apps we didn’t need and was worryingly specific compared to Arch Linux Arm. Arch suited our requirements perfectly. Docker installed without a hitch, and Arch has a large enough community to ensure that the distribution will continue to be supported for a long time.

With our operating system chosen, it was time to start building up our stack. We needed a tool to administer our small dev group of Pi’s for, which could be easily extended to five times the number of machines. The tool would have to be lightweight, scalable, easy to learn and fully featured. For this reason we went with Saltstack. Saltstack has everything we need. It’s FOSS, and powerful. I can send commands to every machine in just one line, it has a server/agent (salt-master/minion) relationship built in, and it supports docker. It even comes with an API, and after a little bit of fiddling I was able to get SaltPad, a web based GUI for Salt, up and running on pi0, which also hosts the Saltmaster.

Screenshot from 2015-06-12 13:13:13

SaltPad is a promising project, but it’s still in early alpha and as buggy as you might expect. It’s also missing some critical features for us. It’s very worthy regardless, and I might try toying with this alongside shipyard, the free docker management tool, seeing how effective the combination is, and if our Pi master can handle it. If nothing out there ends up working for us, then we’ll have to build our own solution. Just in case that does end up happening, let us know in the comments the features that you think all cloud management software should have.

We’re back!

Yes, we’re doing it. We’re building (what seems to be) the first distributed Raspberry Pi 2 cloud!

A look back and then a look ahead.

Hi, I’m Jim. I’m going to be running this blog for the next 10 weeks, and it’s going to be an exciting time. A lot has changed since we last posted here, in 2013. A wealth of related projects have been started and finished, papers released and cited. Here’s a selection of those which we thought noteworthy.

In the intervening time, the Pi 2 has also been released. The Pi 2 has a much faster armv7 processor and 1GB RAM. While it’s still far from competing with a desktop PC, it’s a massive improvement from the Pi 1. Because of this, we’ve decided to build and benchmark the second iteration of our favourite cloud. Development has just started, and I will be posting weekly updates. We haven’t settled on a stack yet, but we’re definitely going to be using Docker. We’ll be testing the following distributions:

  • Arch Linux | Arm for the Pi 2
  • Ubuntu Mate for the Pi 2
  • Debian armhf
  • Raspbian

If you have any suggestions for other distributions, please let us know in the comments. Any chosen distro will need to take full advantage of the Pi 2’s architecture, and supporting Docker out of the box even better (but I don’t mind compiling a kernel or two). If you check back next week I will hopefully have set up and benchmarked all of these, each on a single Pi. This is a great project and once completed it will be a massively useful tool for research and education.

We’re also resurrecting our twitter account, so follow us on that if you haven’t already: @glasgowpicloud.

 

See you next week.

Raspberry Pi Lego Rack Designs

A few people requested that we describe the design of our racks.

The truth is, each rack is slightly different and the final build is not the one we’d planned. A couple of reasons for this: we didn’t quite receive the Lego pieces we were expecting, and we had to tweak the designs to make things fit better. So this is a somewhat retrospective design document…

We have four racks containing 14 Raspberry Pi’s each, actually composed of two adjacent towers. In between the towers are two USB hubs. The design is such that the front provides access to the SD card slot and micro-USB power supply, so we can easily change SD cards and reset the Pi’s. We can also slide the Pi’s out, which is incredibly useful as we tend to be cannibalising Pi’s quite often, or else swapping them around for testing. The back of the racks has space to reach the ethernet port, and each rack has a dedicated Netgear GS116E switch.

Each rack sits on a green 32 by 32 stud (25cm x 25cm) Lego baseplate.

Ten studs worth of space is left in front of the rack, and eight studs at the back. There’s a gap of two studs either side of the rack.

For the most part, the towers follow four simple layers of Lego, corresponding to a snug fit for a single Raspberry Pi. The only exceptions to these layers are (a) connecting structs between the two towers to hold them together and (b) the top of the racks, which are frankly all of differing design depending on the academic or student that built them! Particularly as the available Lego ran out, the designs became more improvised. So here I’ll just show the three layers that make up the towers and a few examples of the improvisations people chose.

For the first layer, we use 2×4 Lego bricks to create “feet” that protrude into the space for the Pi. The Pi actually sits on these feet, to give it space below for the SD card and to allow airflow underneath the Pi’s (the extra piece in the centre is to keep the USB hub in place):

IMG_2258

Next, we simply add two layers of Lego that do not overlap the feet, just building the perimeter wall with enough room for the Pi:

IMG_2262

Here’s a few horizontal shots to clarify:

IMG_2264 IMG_2263

The fourth layer is more or less the same as the first layer, but we add a long strut at the back instead of the two 2×4 lego pieces we had, to strengthen the structure:

IMG_2265

shots from the side and the back:

IMG_2266

IMG_2270

Now we simply repeat the second, third and fourth layers on top of this, until we eventually have enough room for seven Pi’s in each tower. Just a couple of exceptions:

Struts are added an intervals in the rack to strengthen the towers. For example, see the one halfway up the red tower in the image below:

The tops of the racks are a little improvised. Here’s one example:

IMG_2272 IMG_2271

Raspberry Pi Cloud status update

Our Glasgow Raspberry Pi cloud system is an academic project, which means it will be a never-ending work-in-progress. In the past few days, we had lots of publicity (thanks (merci) guys!) so we want to give a quick status update so people know what we’ve done so far:

Hardware

We have 56 Pi boards in 4 Lego mini-racks. Sadly these are 256MB model B boards, not the newer 512MB version. We have 56 because each rack has a Top Of Rack Switch, which has 16 ethernet connections. We use 14 connections for the Pi boards, and the others for inter-switch connections.

Software Stack

We run Raspbian Linux on each Pi board. We have three LXC containers on each Pi, each running a Linux instance. There is no resource isolation or accounting yet, so we don’t make any guarantees about utilization for individual containers.

We have experimented with more adventurous technology, including libvirt (hacking this, but not yet got full RasPi support working) and docker (had discussions with the developers, watch this space).

Hosted Software

Within each container, we run simple workloads such as lighttpd. We also use artificial workloads like lookbusy for our experiments. We are currently working with Hadoop, although at present this is on the native Linux instance, rather than an LXC instance.

Management Layer

Our project student (Richard) built a nice AWS-like web management console for the Glasgow Raspberry Pi Cloud. Here are some screenshots.

Web Management - main consoleWeb Management screenshot Screen Shot 2013-06-13 at 15.20.00 Screen Shot 2013-06-13 at 15.19.27 Screen Shot 2013-06-13 at 15.19.49

If/when we get libvirt working, then we hope to be able to use standard tools like ovirt.

Edit (22 June 2013): The Glasgow Raspberry Pi Cloud is entirely distinct from PiCloud, as the PiCloud folks requested us to say…

Running Hadoop Java and C++ Word Count example on Raspberry Pi

I hear that hadoop is incredibly slow on pi from various blog posts, and yes pls lower your hope as the speed is really appalling. But it is very interesting to see how slow it can be on the pi.

This post assumes you already have hadoop installed and configured on your pi. Before we start, we need to increase swap file size if your pi is 256MB ver. otherwise your pi will run out of memory.

1. Increase the swap file size (I stole this from David’s post)

hduser@raspberrypi ~ $ pico /etc/dphys-swapfile
change the value to 500 (MB)
hduser@raspberrypi ~ $ sudo dphys-swapfile setup
hduser@raspberrypi ~ $ sudo reboot

2. Download the example file

go http://www.gutenberg.org/ebooks/20417 and download the plain text e-book. Assuming you have downloaded the file to your home directory, we then copy this file to HDFS.

hduser@raspberrypi ~ $ start-all.sh
hduser@raspberrypi ~ $ hadoop dfs -copyFromLocal pg20417.txt /user/hduser/wordcount/pg20417.txt 

You can the check the file existence similar to ls command

hduser@raspberrypi ~ $ hadoop dfs -ls /user/hduser/wordcount

3.  Run example Java wordcount example

hduser@raspberrypi ~ $ hadoop jar /usr/local/hadoop/hadoop-examples-1.1.2.jar wordcount /user/hduser/wordcount /user/hduser/wordcount-output

Now, be patient! it will take approx. 8 minutes to complete….

4. Check execution result

hduser@raspberrypi ~ $ hadoop dfs -cat /user/hduser/wordcount-output/part-r-00000

5. C++ wordcount example

Getting hadoop pipes to run on pi needs a little more effort (hacking?) as we will need to build some pi compatible libraries. Particularly we’ll want libhdfs libhadooppipes as well as libhadooputils.

Let’s get the build environment ready first.

hduser@raspberrypi ~ $ apt-get install libssl-dev

go to /usr/local/hadoop/src/c++/libhdfs/ and edit the configure file, so it will run without errors.

in configure file, find the comment out the following two lines.

as_fn_error $? "Unsupported CPU architecture \"$host_cpu\"" "$LINENO" 5;;

and

define size_t unsigned int

Those are all hackings we need to do. Next,

hduser@raspberrypi ~ $ ./configure --prefix=/usr/local/hadoop/c++/Linux-i386-32
hduser@raspberrypi ~ $ make
hduser@raspberrypi ~ $ make install

We’re almost done, just do the same for pipes and utils. Once finished, you’ll have pi compatible libraries and just build the wordcount.cpp with Makefile given below.

wordcount.cpp

#include <algorithm>
#include <limits>
#include <string>

#include  "stdint.h"  // <--- to prevent uint64_t errors! 

#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"

using namespace std;

class WordCountMapper : public HadoopPipes::Mapper {
public:
  // constructor: does nothing
  WordCountMapper( HadoopPipes::TaskContext& context ) {
  }

  // map function: receives a line, outputs (word,"1")
  // to reducer.
  void map( HadoopPipes::MapContext& context ) {
    //--- get line of text ---
    string line = context.getInputValue();

    //--- split it into words ---
    vector< string > words =
      HadoopUtils::splitString( line, " " );

    //--- emit each word tuple (word, "1" ) ---
    for ( unsigned int i=0; i < words.size(); i++ ) {
      context.emit( words[i], HadoopUtils::toString( 1 ) );
    }
  }
};

class WordCountReducer : public HadoopPipes::Reducer {
public:
  // constructor: does nothing
  WordCountReducer(HadoopPipes::TaskContext& context) {
  }

  // reduce function
  void reduce( HadoopPipes::ReduceContext& context ) {
    int count = 0;

    //--- get all tuples with the same key, and count their numbers ---
    while ( context.nextValue() ) {
      count += HadoopUtils::toInt( context.getInputValue() );
    }

    //--- emit (word, count) ---
    context.emit(context.getInputKey(), HadoopUtils::toString( count ));
  }
};

int main(int argc, char *argv[]) {
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory< 
			      WordCountMapper, 
                              WordCountReducer >() );
}

Makefile

CC = g++
HADOOP_INSTALL = /usr/local/hadoop
PLATFORM = Linux-i386-32
CPPFLAGS =  -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include

wordcount: wordcount.cpp
     $(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
     -lhadooputils -lpthread -lcrypto -lssl -g -O2 -o $

Remark: on my 256BM ver.B pi, C++ wordcount take about 10 minutes to finish.

References:

[1] http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_–_Running_C%2B%2B_Programs_on_Hadoop

[2] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#Copy_local_example_data_to_HDFS

Getting hadoop to run on the Raspberry Pi

Hadoop was implemented on Java, so getting it to run on the Pi is just as easy as doing so on x86 servers. First of all, we need JVM for pi. You can either get OpenJDK or Oracle’s JDK 8 for ARM Early Access. I would personally recommended JDK8 as it is **just a little slightly* faster than OpenJDK, which is easier to install.

1. Install Java

Installing OpenJDK is easy, just do and wait

pi@raspberrypi ~ $ sudo apt-get install openjdk-7-jdk
pi@raspberrypi ~ $ java -version
java version "1.7.0_07"
OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2a-1+rpi1)
OpenJDK Zero VM (build 22.0-b10, mixed mode)

Alternatively, you can install Oracle’s JDK 8 for ARM Early Access (some said it was optimized for Pi).
First get it from here: https://jdk8.java.net/fxarmpreview/index.html

pi@raspberrypi ~ $ sudo tar zxvf jdk-8-ea-b36e-linux-arm-hflt-*.tar.gz -C /opt
pi@raspberrypi ~ $ sudo update-alternatives --install "/usr/bin/java" 
"java" "/opt/jdk1.8.0/bin/java" 1 
pi@raspberrypi ~ $ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)

If you have both versions installed, you can use switch between them with

sudo update-alternatives --config java

2. Create a hadoop system user

pi@raspberrypi ~ $ sudo addgroup hadoop
pi@raspberrypi ~ $ sudo adduser --ingroup hadoop hduser
pi@raspberrypi ~ $ sudo adduser hduser sudo

3. Setup SSH

pi@raspberrypi ~ $ su - hduser
hduser@raspberrypi ~ $ ssh-keygen -t rsa -P ""

This will create an RSA key pair with an empty password. It is done so to stop Hadoop prompting for the passphrase when in talks to its nodes

hduser@raspberrypi ~ $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now SSH access to your local machine is enabled with this newly created key

hduser@raspberrypi ~ $ ssh localhost

You should be good to login without password

4. Download (install?) Hadoop
Download hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core

hduser@raspberrypi ~ $ wget http://mirror.catn.com/pub/apache/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz
hduser@raspberrypi ~ $ sudo tar vxzf hadoop-1.1.2.tar.gz -C /usr/local
hduser@raspberrypi ~ $ cd /usr/local
hduser@raspberrypi /usr/local $ sudo mv hadoop-1.1.2 hadoop
hduser@raspberrypi /usr/local $ sudo chown -R hduser:hadoop hadoop

Now hadoop has been installed and ready to roll (not yet). Edit .bashrc under your home, and append the following lines

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-armhf
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

modify JAVA_HOME accordingly if you use oracle’s version.

Reboot Pi and verify the installation:

hduser@raspberrypi ~ $ hadoop version
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/
branch-1.1 -r 1440782
Compiled by hortonfo on Thu Jan 31 02:03:24 UTC 2013
From source with checksum c720ddcf4b926991de7467d253a79b8b

5. Configure Hadoop
NOTE: this how-to is just a minimal configuration for single-node mode hadoop

configuration files are at "/usr/local/hadoop/conf/", and will need to 
edit core-site.xml, hdfs-site.xml, mapred-site.xml

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/fs/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

OK, we’re almost done, one last step.

hduser@raspberrypi ~ $ sudo mkdir -p /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chown hduser:hadoop /fs/hadoop/tmp
hduser@raspberrypi ~ $ sudo chmod 750 /fs/hadoop/tmp
hduser@raspberrypi ~ $ hadoop namenode -format

ATTENTION:

If you use JDK 8 for hadoop, you need to force DataNode to run in JVM client mode as JDK 8 does not support server yet. Go to /usr/local/hadoop/bin and edit hadoop file (please create a backup first). Assuming you’re using nano, the procedure is as follows.  nano hadoop, ctrl-w to search for “-server” argument. What you need is to delete “-server” and then save & exit.

Now hadoop single-node system is ready. Below are some useful commands.

1. jps           // will report the local VM identifier
2. start-all.sh  // will start all hadoop processes
3. stop-all.sh   // will stop all hadoop processes

 

References:

[1] http://raspberrypi.stackexchange.com/questions/4683/how-to-install-java-jdk-on-raspberry-pi
[2] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/