..

# Programmazione concorrente, parallela e su cloud - Class 2019/20

## Master Degree Course of prof. Vittorio Scarano and Carmine Spagnuolo, Ph.D., Università degli Studi di Salerno

Reception hours for students

We talk (asynchronously) on the community channel #pcpc or you can schedule an online meeting (synchronously) on Tuesday and Friday 14:00-16:00.

# Books and references

1. Kai Hwang, Jack Dongarra, and Geoffrey C. Fox. 2011. Distributed and Cloud Computing: From Parallel Processing to the Internet of Things (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
2. Czech, Z. (2017). Introduction to Parallel Computing. Cambridge: Cambridge University Press.
3. 📖 Have fun with MPI in C

# Materials for online course

## Lessons

↑ Back to Index ↑

## Have fun with MPI in C

#include <stdio.h>
#include <mpi.h>
#include <string.h>
static const char NERD[5] =  {0xF0, 0x9F, 0xA4, 0x93, '\0'};
static const char WORLD[5] =  {0xF0, 0x9F, 0x8C, 0x8D, '\0'};
static const char SLEEP[5] =  {	0xF0, 0x9F, 0x98, 0xB4, '\0'};
#define 🤓 {MPI_Init(NULL, NULL);  int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); printf("I am the %s with rank %d ",NERD, world_rank);}
#define 🌍 ({int s; MPI_Comm_size(MPI_COMM_WORLD, &s); printf("of MPI %s of size %d ", WORLD, s);});
#define 😴 {printf("Goodbye %s\n",SLEEP);MPI_Finalize();return 0;}
#define P(x) printf(x)
#define I P("\a");
#define am P("\a");
#define the P("\a");
#define with P("\a");
#define rank P("\a");
#define size P("\a");
#define 🤔 {P("\a");};
#define of P("\a");
#define MPI P("\a");
#define Goodbye P("\a");
int main()
{
I am the 🤓 with rank 🤔 of MPI 🌍 of size 🤔 Goodbye 😴;
}

## Class question forum

• ISISLab Community - Channel #pcpc
• Moreover, we use issues on GitHub. Please use it for sharing ideas and issues on both theoretical and practical course part.

# Course homeworks

## Which is my homework?

Each student must develop a solution for the assigned homework using parallel programming approach, and by exploiting distributed memory. The solution must be written in the C programming language, exploiting the OpenMPI library. Furthermore, the developed solution must be experimented on a homogeneous cluster running on top of the Amazon Web Services cloud computing platform.

The cluster is composed of several supported AWS Educate Instances. The obtained results must be presented in terms of strong and weak scalability, varying the number of computing processors from 1 to number-of-vcp x number-of-instances. For example, if we run a cluster machine of 4 t2.large (2 VCPU) nodes, we have to perform the scalability of our solution for P={1,2,4,5,6,7,8}.

👍 The total number of processors is equal to the number of Virtual CPU on the running instances. The benchmark must exploit 8 instances for the more big experiment. The student must describe the solution and the benchmark in a report file, written in Markdown, and included in the submission in both the format Markdown and PDF. The report file must also describe the compilation phase and how to reproduce the results obtained, considering the Docker MPI environment of the course.

## Homeworks list

ID Name Description
1️⃣ gameoflife ↓ Conway’s Game of Life Game ↓
2️⃣ nbody ↓ N-body problem ↓
3️⃣ wordcount ↓ Word Count↓
• 👌 A correct solution must generate the same results independently by the number of processors used.
• 🚗 Moreover, the program should provide simple functionalities for testing the results, allowing us to quickly compare the results of the execution using one processor and P processors.
• 🤺 Finally, it is vital for keeping your solution clear, concise, and easily understandable by providing good code quality and fully describing it in the attached document.

### Game of Life

Open description ↓

Problem Description The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970. The "game" is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further input. One interacts with the Game of Life by creating an initial configuration and observing how it evolves, or, for advanced "players," by creating patterns with particular properties.
The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive or dead, or "populated" or "unpopulated." We can suppose to use a matrix of char, which marks if the cell is alive or dead.
Every cell interacts with its eight neighbors, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

• Any live cell with fewer than 2 live neighbors dies as if caused by underpopulation.
• Any live cell with 2 or 3 live neighbors lives on to the next generation.
• Any live cell with more than 3 live neighbors dies, as if by overpopulation.
• Any dead cell with exactly 3 live neighbors becomes a live cell, as if by reproduction.

The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed—births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick (in other words, each generation is a pure function of the preceding one). The rules continue to be applied repeatedly to create further generations.
Notes
The program must be able to simulate the process for a particular number of steps I. The MASTER process initializes a char matrix at random and split among P-1 processors. Notice that the MASTER can contribute to the computation or not; it is your choice. Each slave simulates the game and sends the corresponding ghost cells to its neighbor's processors, need in for the next simulation step. The hard part of the problem concerns equally divide the matrix among processors, which can be done in different ways. Your program must work on any matrix size (N!=M) and a number of processes.

CORNER - TOP - CORNER
LEFT - i - RIGHT
CORNER - BOTTOM - CORNER

### N-Body

Open description ↓

Problem Description In the N-body problem, we need to find the positions and velocities of a collection of interacting particles over some time. For example, an astrophysicist might want to know the positions and velocities of a group of stars. In contrast, a chemist might want to know the positions and velocities of a collection of molecules or atoms.
An n-body solver is a program that finds the solution to an n-body problem by simulating the behavior of the particles. The input to the problem is the mass, position, and velocity of each particle at the start of the simulation, and the output is typically the position and velocity of each particle at a sequence of user-specified times, or merely the position and velocity of each particle at the end of a user-specified time.
Problem details are also available here.
Notes Consider only the solution that is quadratic in the number of particles; however, also the Barnes-Hut algorithm can be considered but should be harder to develop. For the mathematical part (bodies force computation) of the problem, follow the guidelines of this solution: example solution sequential.
The program must be able to simulate the process for a particular number of steps I. The MASTER process initializes an array of bodies at random and sends it to P-1 processors. Notice that the MASTER can contribute to the computation or not; it is your choice. Each slave simulates the bodies force, only for its bodies, and sends results of its bodies, needed for the next simulation step—the hard part of the problem concerning how to reduce the communication overhead. For the initialization phase, you can consider creating the bodies in a file, and all processors start by reading this file or use a deterministic algorithm to randomize the bodies initialization. The important part is that the execution using one or P processors must generate the same output.

### Word Count

Open description ↓

Problem Description The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within specific numbers of words. This may particularly be the case in academia, legal proceedings, journalism, and advertising. Word count is commonly used by translators to determine the price of a translation job. Word counts may also be used to calculate measures of readability and to measure typing and reading speeds (usually in words per minute). When converting character counts to words, a measure of 5 or 6 characters to a word is generally used for English.
We will be doing a version of map-reduce using MPI to perform word counting over a large number of files.
There are three steps for this process:

• the MASTER node reads the file list (or a directory), which will contain the names of all the files that are to be counted. Note that only 1 of your processes should read the files list. Then each of the processes should receive their portion of the file from the MASTER process. Once a process has received its list of files to process, it should then read in each of the files and perform a word counting, keeping track of the frequency each word found in the files occurs. We will call the histogram produced the local histogram. This is similar to the map stage or map-reduce.
• the second phase is combining the frequencies of words across processes. For example, the word 'cat' might be counted in multiple processes, and we need to add up all these occurrences. This is similar to the reduced stage of map-reduce.
• the last phase is to have each of the processes send their local histograms to the master process. The MASTER process just needs to gather up all this information. Note that there will be duplicate words between processes. The master should create a CSV formatted file with the words and frequencies ordered.

Notes
The hard part of the problem concerns to split equally the computation among the processors. For instance, if we split the files between processors, we cannot have a good partitioning scheme because some files must be bigger and bigger than other files. A good partitioning scheme must consider the number of words in each file, and split according to this value.

## Give me a project now!

Compute the MD5 a string representing your name, surname as follow:

String x = "\$NAME\$SURNAME"
xmd5 = md5(x)

Your homework matching is computed according the following rules:

• xmd5[0] the first character in the MD5 is the homework;
• xmd5[strlen(xmd5)-1] the last character in the MD5 is the AWS EC2 instance type for your cluster environment.

Characters table conversion (use case insentive):

Project Character - xmd5[0] Value
a-g-m-s-y-4-b-h-n-t-z-5 1
c-i-o-u-0-6-d-j-p-v-1-7 2
e-k-q-w-2-8-f-l-r-x-3-9 3

↑ List of homeworks ↑

Instance Type Character - xmd5[strlen(xmd5)-1] Value
a-g-m-s-y-4 1
b-h-n-t-z-5 2
c-i-o-u-0-6 3
d-j-p-v-1-7 4
e-k-q-w-2-8 5
f-l-r-x-3-9 6

↓ List of AWS EC instances ↓

Example

My name is Alice Liddell, I am the heroine of the well-known book Alice’s Adventures in Wonderland 🐇.

Homework matching is computed as follow:

String x ="aliceliddell"
md5x = md5(x) //a5419f04a964fab9ca91288d008c3b64
prinln(xmd5[0]) //1
println(xmd5[strlen(xmd5)-1]) //1

Alice’s homework matching is solving the game-of-life problem on t2.small instances.

↑ Back to Index ↑

Each solution should be a files repository (or directory) 🗄, that contains the C sources, the report file (in both Markdown and PDF), and all images (if needed).

The report file must include a:

• 1️⃣ presentation of the proposed solution;
• 2️⃣ detailed description of your project structure, and a summary of your code, which highlights the crucial aspect of your solution;
• 3️⃣ deeply analysis of the performance of your program, described in terms of weak and strong scalability;
• 4️⃣ conclusion section, where is analyzed the reason for the obtained performance.

⚠️ Solutions without the report file and/or cannot easily compiled and executed using only mpicc and mpirun - will be not considered!

Notes about the creation of a Tape-archive

For archiving all files in the current directory:

tar -cvf solution.tar.gz *

For extracting files from a tape-archive:

tar -xvf solution.tar.gz

### Submission instructions

• Each solution should be compliant with the problem project exam.
• You can submit a solution via mail
• Subject: Your Name & Surname, Project Title, Official exam date (ESSE3)
• Body: what you want
• 📓 Attachment:
• a compressed directory using tape archive,
• a private GitHub repository (share to me ⭐️ preferred),
• or any kind of easily accessible method.
• The submission deadline is the day of the official exam on the ESSE3 platform.

↑ Back to Index ↑

# Homework evaluation criteria

Homeworks are evaluated on a range of 30 total points. The final score is expressed in the following four levels:

Level Range
A 🥇 [30-28]
B 🥈 [27-25]
C 🥉 [24-22]
D [21-18]

## How to have a successful homework evaluation?

• 👍 Correctness.
• Measures the student’s commitment to developing a solution that is compliant with the problem requirement (obviously!). A solution that partially solves part of the problem will also be considered; if all the crucial aspects of the problems have been faced. The project will be evaluated using the docker-mpi container make sure that your program works on this docker docker run -it --mount src="\$(pwd)",target=/home,type=bind spagnuolocarmine/docker-mpi:latest.
• 1️0 points.
• 😎 Style.
• Measures the student’s commitment to developing a solution styling it and exploiting all features of MPI and C language, paying attention to use arguments of the parallel and concurrent computing fundamental part.
• 1️0 points.
• 🎯 Problem evaluation and Benchmarks.
• Measures the student’s commitment to understanding the problem and gives the right solution; moreover, it measures the student’s commitment to presents a deep analysis of the program performance.
• 1️0 points.

## Benchmarks TIPS

Present your results in terms of strong and week scalability:

• Strong Scalability: How fast the problem size must increase to maintain a fixed efficiency when the number of processes is increased. In practice, the total problem size stays the same as the number of processors increases.
• Goal: Minimize time to solution for a given problem.
• Weak Scalability: How fast the efficiency decreases when the number of processes increases, but the problem size is fixed. In practice, the problem size increases at the same rate as the number of processors, keeping the amount of work per processor equal.
• Goal: solve the larger problems.

## How benchmark your MPI program on a AWS EC2 cluster?

All information about configuring a cluster of Amazon EC2 instances are available on this repository.

Amazon educate program supports only several AWS services, at this link are described all limitations for educate account.

Supported Instances

ID EC2 Instance name
1️⃣ t2.small
2️⃣ t2.large
3️⃣ t2.xlarge
4️⃣ t2.2xlarge
5️⃣ m4.large
6️⃣ m4.xlarge

↑ Back to Index ↑

# The last “dance” of your course

💃 Homework Discussion

Finally, when the project is submitted by mail before the official exam date.

You have to fix an appointment to discuss the project via email: