How to build your first Raspberry Pi cluster?

raspberry pi cluster for beginners

You have two or more Raspberry Pi at home?
You want to try putting them together to make a cluster?
You’re at the good place
When I bought my second Raspberry Pi, I immediately wanted to build one

How to build a Raspberry Pi cluster?
You need two or more Raspberry Pi (not necessarily the same models)
You have to install softwares on all nodes like MPICH and MPI4PY
And a good step-by-step tutorial to make this work: you have one 🙂

As it can be a complex topic for beginners, I’ll start by a little introduction on clusters in general
Then I’ll explain you what I have done and how you can do the same on your side

Cluster presentation

What’s a cluster?

Basically, a cluster is a group of computers in a single entity
The goal is to make them work together, to improve the global performance
All the computers works on the same task, reducing the time needed to finish it

Don’t confuse computer clusters with load balancing
In a load balancing architecture, each computer is working on a different task to decrease the master node’s load
In a cluster, we take advantage of the total power of the cluster to run a task in parallel

Clusters examples

Computer clusters find their origins in the 60s and are still used today (At the same time as the first works on networking)

arcnetThe first commercial computer cluster in the history, is the Arcnet (see the image on the left)
Its goal was to connect groups of Datapoint 2200 computers
It’s damn old in the computer history 🙂

At the time I wrote these lines, the IBM Summit from the ORNL laboratory is the biggest supercomputer in the world
With over 2 million CPU cores and 3000To of RAM, and always increasing, it will be tough to compete with
Here is an illustration below if you want to know what it looks like

summit computer cluster ibm
Summit from IBM (source: ornl.gov)

Raspberry Pi application

Let’s go back to our more realistic dimensions with the adaptation of that definition on our Raspberry Pi

As you know, the Raspberry Pi is not so powerful, but it’s cheap
So it’s the perfect device to build a cluster
We can make it run tasks faster on 4 devices instead of only one, for a reasonable price

In this tutorial, I’ll show you how to build your first Raspberry Pi cluster
You can do it with two nodes to start and add others later if needed

Cluster implementation

My scenario

I’m doing this exercise for you, with two Raspberry Pi:

  • A Raspberry 3B+: the master node that will run all the stuff
  • A Raspberry Pi Zero: the second node, to increase global performance (yes it’s maybe not the best model choice, but it’s just for the test)

As the preparation phase is long (compilation time mainly), I’ll do it on the 3B+ only to do this faster
Then I’ll copy the SD card to another to get a Raspberry Pi Zero almost ready without doing the same thing again
Finally, there are extra steps to do on both to connect them together and run the first script

If you have more than one node to add, repeat the same process for each node

Prerequisites

To follow this tutorial, you’ll need:

  • 2 or more Raspberry Pi (any model, but if you buy it I recommend the Raspberry Pi 3B+)
  • 2 or more SD Cards (check my recommended product page if you need some)
  • A cheap 5-ports gigabit switch to plug all Pi’s together
  • Power cables, or a power bank with 2 or more ports
  • A Network cable for each Pi (wireless is possible, but not recommended)
  • Optional: for serious guys, this cluster case to stack Raspberry Pi

And for the software, I’ll explain you this in the following parts

Note: if you have SD cards from different sizes, it’s not a problem
But you need to install the master on the smallest.
Otherwise, you’ll have an issue when flashing 64G image to a 16G SD card 🙂

Master preparation

The first step in my scenario is to make the installation on one Raspberry Pi and then duplicate to the others
Start with the most powerful you have

Basic installation

We start as for any project, with the Raspbian installation
Download the Raspbian Lite from the Raspberry Pi Foundation website: link here
Raspbian Desktop is ok, but we don’t need a GUI for this project

Install it and boot for the first time (if you don’t know how to install Raspbian on a Raspberry Pi, follow my guide here and come back later)

Then you need to follow these extra steps:

  • Change some configuration options with Raspi Config
    • Run Raspi Config
      sudo raspi-config
    • Change the pi user password in Change user password
      (We’ll enable SSH, and it’s not a good idea to let the default password with SSH running)
    • Enable SSH in Interfacing options > SSH > Yes
    • Change the host name in Network options > Host name
      Choose something clear, like “Master”
  • Update your system
    • Update the repository sources
      sudo apt update
    • Upgrade all packages
      sudo apt upgrade
  • Reboot to apply all changes
    sudo reboot

You finished the base installation, we can move to specific software for this project

MPICH installation

What’s MPICH?

MPICH is the main tool we need to run a cluster
MPICH is a free implementation from the MPI standard
MPI stands for Message Passing Interface and its goal is to manage parallel computing architectures

In short, this is what will allow us to run a script on several Raspberry Pi at the same time

MPICH preparation

MPICH needs different folders:

  • The first one to download and extract the source code
  • A different one to build the code
  • And a last one that will be the installation path

So you need to create three different folders, for example:

sudo mkdir /opt/mpi
sudo mkdir /opt/mpi-dl
sudo mkdir /opt/mpi-build

The first one will be the definitive path to use it later and the others to download and build the code

You also need to install Fortran before installing MPICH

sudo apt install gfortran

MPICH installation

We are now ready to start the MPICH installation process:

  • Move to the download folder
    cd /opt/mpi-dl
  • Download the file from the MPICH website, for example:
    sudo wget http://www.mpich.org/static/downloads/3.3/mpich-3.3.tar.gz

    Change the download link if there is a newer version number available

  • Then extract all files from this archive
    tar zxvf mpich-3.3.tar.gz
  • Move to the build folder for the next step
    cd /opt/mpi-build
  • Run the configuration process
    sudo /opt/mpi-dl/mpich-3.3/configure --prefix=/opt/mpi

    This can take a few minutes, be patient

  • Then use make to build
    sudo make

    Take a drink, or even a nap.
    On my Raspberry Pi 3B+ this took 90 minutes …

  • Finally, you can install it with:
    sudo make install

Once done, make a test to assure that everything is working well
To do this, run this command for example:

/opt/mpi/bin/mpiexec -n 1 date

If you get the current date from the master, the MPI installation is done
Move to the next part

MPI4PY installation

What’s MPI4PY

For the moment, MPI is available for Fortran and C scripts on your Raspberry Pi
But as the Raspberry Pi runs with Python, we’ll add Python capability to our cluster

For this, we need to install a Python library: MPI4PY

MPI4PY preparation

MPI4PY installation process is easy as it’s available with pip
But you need some dependencies before:

sudo apt install python-pip python-dev libopenmpi-dev

That’s it, move to the installation process

MPI4PY installation

  • Install MPI4PY with pip:
    sudo pip install mpi4py
  • Ok, now we’ll create a basic Python script to test it with MPI
    Move to the home folder and create a basic script:
    cd /home/pi
    nano test.py

    Paste this line inside (or whatever you want):

    print("Hello")
  • Make sure that your script works with Python
    python test.py

    If you kept my script, this should display “Hello”

  • Then test it with MPI
    /opt/mpi/bin/mpiexec -n 4 python test.py

    This should now display “Hello” four times
    This is not useful, but it’s just a test to validate this step

If this is working correctly, it’s good, your master installation is ready
MPI can now run Python scripts and we can start the nodes preparation

Duplicate the master

The next step is to duplicate the master’s SD card into other cards, one for each node
To do this, we’ll create an image from the SD card and flash it on the other cards

Create the image

On Windows, you need a software like Win32DiskImager
Click on the link, download and install it on your computer

  • Start the program
    win32diskimager
  • In the “Image file” field, choose a temporary directory and a filename “cluster_master.img” for example
  • Then choose the Device letter corresponding to the SD card
  • Finally, press the “Read” button to start the image creation
    This process took about 15 minutes on my computer
  • Once done, eject the master SD card and keep it safe

On Linux, it should be something like:

sudo dd if=/dev/sdb > cluster_master.img

I didn’t test it so I can’t guarantee this is working, but dd should do the job

Create SD Card for nodes

Once the image is ready, you need to create the SD card for each nodes of your cluster

  • Insert the new SD card into your computer
  • In Win32 Disk Imager, select the image filename and the device letter
  • Click on “Write” to create the same SD card

If you prefer you can use Etcher to do this
I always use Etcher, but here we are already in Win32 Disk Imager, so it’s the same

flashing the cluster image

Again, on Linux, the dd command should do that too

At the end of this step, you have one SD card for each node you want to use
All the SD card contain the same image from the master we created before

Nodes configuration

Start all Raspberry Pi

  • Insert an SD card in each Raspberry Pi you want to use
  • Start them all

If you want to use WiFi for one or more nodes, there is an extra step
For example, in my case I have a Raspberry Pi Zero, and it was easier for me to put it in Wifi

  • Plug a screen and keyboard on the Raspberry Pi you want to use in WiFi
  • Use Raspi config to configure the WiFi
    • Use the following command
      sudo raspi-config
    • Go in Network options > Wi-fi
    • Follow the Wizard to select your network (country, SSID and pass phrase)

Find all IP addresses

Once all the Raspberry Pi are started and plugged in the network, we need to get all IP addresses to use it later

  • Go back to the master node (directly or with SSH)
  • Install NMAP
    sudo apt install nmap

    nmap is a free tool for network discovery (check the website here)
    We’ll use it to find all IP addresses

  • Use this command to find all devices on your network with a host name containing “master”
    (all the Raspberry Pi have the same host name for the moment)
    nmap -sP 192.168.1.* | grep master

    Change the network subnet if you are using another

  • You should get this kind of output
    nmap scanning cluster ip
  • I now know my second node IP: 192.168.1.18

You should now have all the nodes IP
If you don’t know the master one, you need to use this command:

sudo ifconfig

You’ll get something like this:

ifconfig

The IP address is on the second line after the “inet” keyword (192.168.1.200 in this screenshot)

The last step you need to do, is to note these IP addresses in a text file on your Master node

  • Create a new file in your home folder
    cd /home/pi
    nano nodesips
  • In this file, add an IP on each line (and only the IP)
  • For example:
    192.168.1.15
    192.168.1.16
    192.168.1.17
    192.168.1.18
  • That’s all for this part

Change the nodes host names

We’ll now change the host name on the new nodes to have a different one for each

  • From the master node, connect to the first one with SSH
    ssh pi@192.168.1.18

    Answer “yes” to the question and enter the pi password

  • Go into Raspi-config
    • Use this command to access the tool
    • Go in Network options > Host name
    • Set a new host name for this node, for example “Node1”
  • Exit raspi-config and exit this node with:
    exit

Repeat these steps for each node you want to add in the cluster

Do the SSH keys exchange

The last step is to allow the master to connect to each node via SSH without password
To do this, you need to create an SSH key on the master, transfer them to all nodes to allow it

  • On the master, create the SSH key with:
    ssh-keygen -t rsa

    Accept the default values (default path and no password)

  • This tool generate two keys in the /home/pi/.ssh folder:
    • id_rsa: your private key, keep it here
    • id_rsa.pub: the public key, you need to send it to peers you want to access without password
  • Transfer the public key to all nodes
    scp /home/pi/.ssh/id_rsa.pub pi@192.168.1.18:/home/pi/master.pub

    Do this for each node you want to use

  • Then, go to each node and add the key to the authorized_keys file
    This file contains all hosts allowed to access the system via SSH without password
    ssh pi@192.168.1.18
    cat master.pub >> .ssh/authorized_keys
    exit

    Do this for each node

  • Now, you should be able to connect each node without password
    You can make a try with:
    ssh pi@192.168.1.18

That’s it, you cluster is ready. We’ll now test it

Cluster usage

The cluster is now available and we’ll use MPI to run commands simultaneously on each node
As we already saw, MPI allows you to run basic commands and scripts through the cluster

Basic command

The first thing we can try is to run the same command on each node
Preferably something that don’t return the same thing 🙂

For example:

/opt/mpi/bin/mpiexec -f nodesips -n 2 hostname

nodesips is the file we created before with all IP addresses inside
And “hostname” is the command we want to run on each node
2 is for the number of thread to start, in this case change it for the number of nodes

As a result, you’ll get one line for each node in the cluster, with all nodes host names

Python script

Test script

If you followed this tutorial entirely, you should already have a test.py script on the home folder

You can test to run it on each node with the same command:

/opt/mpi/bin/mpiexec -f nodesips -n 2 python test.py

This will display “Hello” two times, once for each node

Congrats! Your Raspberry Pi cluster is operational 🙂

A new script

But if you didn’t do this script, or if you want to create new ones, there is an extra step
MPI simulates the execution of the script on each node
So you need to have the script on each node

To do this, follow this short procedure:

  • Create the script on the master node
  • Make sure it’s working as expected
  • Then transfer this script on all nodes with scp
    scp /home/pi/myscript.py pi@192.168.1.18:/home/pi/

    It’s important to have the same script on each node, and with the same path

  • Then you can run your script with MPI as explained before

Go further with Python

As you’d expect, we didn’t add MPI4Py just to run basic python scripts 4 times instead of one
MPI4PY is a Python library you can include in your scripts to use specific functions in your cluster

Here is a small example:

#!/usr/bin/env python

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0:
    data = {'a':1,'b':2,'c':3}
else:
    data = None

data = comm.bcast(data, root=0)
print 'rank',rank,data

The goal of this script is to send data from one node to all the others
In this script data is defined only for the master (rank 0)
And then we send this data value to all others nodes with the broadcast function (comm.bcast)

When you run this script, all nodes and ranks display the same message

cluster python script

It just an example to show you that you can add more functions in your Python script to take advantage of your cluster
I’m not an expert about this
You can find more information here

Related questions

Can I add more nodes in my cluster now? You can add more nodes at any time to your existing cluster (that’s what they do with supercomputers). You just need to create a new SD card, follow the node configuration steps again for the new node and add the new IP address in the nodesips file

The IP addresses are changing each day, what can I do? Yes, it’s a problem. For the test I didn’t do this step, but if you want to keep your cluster you need to do this. Depending on your network, you can either set a reservation in your DHCP server (so each Pi will always get the same IP on boot). Or you can set manually a static IP address in your network configuration (I explain how to do this at the end of this article)

For what kind of usage do I really need a cluster? In this tutorial, it was mainly the technology and the installation process that interested me. Not much the possibilities that are now opened to you with this cluster. This is another topic and I can’t fit all in only one article. If you want to go further, you can find more projects about clusters on Hackaday

Conclusion

That’s it, you know how to build your own Raspberry Pi cluster from two nodes to an infinity 🙂

I really liked to write this tutorial for you
It’s interesting to have an overview on how supercomputers are working
And the technology seems to be stable as I had no issues while creating my cluster (And it’s rare in computing ^^)

I hope you’ll like that too
If you have any questions or experiences to share, leave a comment below
I would like to know what you do after this first step in the supercomputer world 🙂

7 comments

  1. Fernando Ramos Reply

    Thanks for the tutorial! I went ahead and setup an NFS share on the master and had the nodes mount it so I could drop scripts in there. It’s a fun exercise.

  2. Hal Pattenden Reply

    Thank you for the concise information and instructions. My Raspberry Pi 3 master and 3 Pi Zeros cluster is working like a charm.

    Is there a limit to the number of nodes that you can add to a master controller Pi?
    If there is, how do big clusters work? Can you make a cluster of clusters?

    Thanx again.

  3. suvan Reply

    thanks man it really helped me since this is my first computer i own and you helped me built it

Leave a Reply

Your email address will not be published. Required fields are marked *