Programmazione Concorrente, Parallela e su Cloud

Università degli Studi di Salerno

GitHub commit activity (branch)

Course description

Part A: Parallel programming with MPI and running on Google Cloud Platform virtual cluster

Part B: Fundamentals on concurrency and parallel systems

The growing demand for more computing power is hard to fulfill. Modern computer architectures permit even more significant problems in various applications by adopting many computing systems, from personal multicore processors to high-performance systems or even co-processors such as GPU. This course provides a comprehensive introduction to parallel computing and concurrency, discussing the model of parallel algorithms and practical issues, such as designing and implementing distributed-memory programs by adopting the MPI standard. Moreover, the course will investigate cloud computing architectures as high-performance computing platforms. In particular, the Google Cloud Platform provider will be described, which will be exploited to create and manage clusters of machines for executing parallel programs written in C using OpenMPI.

Course Organization

What prerequisites are required for this course? (😄)

Prerequisite(s): Porgramming I - Programming and Data Structures - Algorithms - Distributed Programming.

Co-Requisite(s): Good knowledge of the C language and cloud computing fundamentals.

👤 Teacher contacts and appointments

Carmine Spagnuolo, cspagnuolo [@] unisa.it.
The office hours are scheduled after each lesson, or you can request an appointment via email.
Classes: Thursday 12-2 PM, Friday 12-2 PM (always in room P6).
The lessons always start promptly at 12 and finish before 2 PM (generally).

📚 Textbooks

Part 🅰️
- 📗 MPI: A Message-Passing Interface Standard 2.2
- ⚙️ HAVE FUN WITH MPI
- 📘 Introduction to Parallel Computing, Czech, Z., 2017
Part 🅱️
- 📕 The Art of Multiprocessor Programming, Revised Reprint, Maurice Herlihy
  - Errata for The Art of Multiprocessor Programming

👁️ Insights

🔥 GOOGLE GCP ACCOUNT REQUEST ⚠️

You'll receive account information during class sessions. If needed, feel free to reach out to the teacher for assistance (if you cannot attend).

📝 Exam

Group presentation on a research article (grants access to oral examination), more details in Course research Forum. Students from other years or those who have already completed the practical project in MPI will still use the project to gain access to the oral examination.
Oral exam covering all topics discussed in class (to be scheduled via email with the instructor on a date between two exam sessions).

🗓️ Exam dates

Presentation: May 2025
First oral session: June 16, 2025
Second oral session: July 11, 2025
Third oral session: July 28, 2025 (by the end of July)
Fourth oral session: September 10, 2025 (before the start of the first semester courses)

Programmazione Concorrente, Parallela e su Cloud

Course Calendar

📜 LEGEND

Part 🅰️
- 📗 MPI: A Message-Passing Interface Standard 2.2
- ⚙️ HAVE FUN WITH MPI
- 📘 Introduction to Parallel Computing, Czech, Z., 2017
Part 🅱️
- 📕 The Art of Multiprocessor Programming, Revised Reprint, Maurice Herlihy
  - Errata for The Art of Multiprocessor Programming
Symbols
- 📒 Lesson
- ⚗️ Lab

#️⃣	🗓️	Lesson	Type	Part	Materials	References	Assignments
1️⃣	27/02/2025	Course introduction and parallel computing fundamentals	📒	🅰️🅱️	vim graphical cheat sheet	📹 Mythbusters Demo GPU versus CPU About 🐧 OS 📒 An Introduction to Linux 📺 Linux Tutorial for Beginners: Introduction to Linux Operating System 📑 Introduction to Linux, Boston University	🧑🏻‍🚀 activate Google GCP. You will receive a pdf with instruction on your studenti.unisa.it email. BEFORE beginning of June, 1 2025.
2️⃣	28/02/2025	Introduction to HPC Cloud and GCP	📒	🅰️	Intro2GCP
3️⃣	06/03/2025	Model and Architecture for Parallel Computing	📒⚗️	🅰️	shell-tools ssh-essentials	📘 Cap. 2 (2.1, 2.2 with no demonstration)	lab.2 - GCP
4️⃣	07/03/2025	Metrics for parallel computing	📒	🅰️	IntroDocker	📘 Cap. 3 (3.1, 3.2 with no demonstration) 🐳 Docker resource: - Docker beginner lab - Get Docker - Documentation (in-depth material) - gcloud Docker
5️⃣	13/03/2025	Design of parallel algorithms	📒	🅰️		📘 Cap. 4 (4.1, 4.2, 4.3, 4.4 with no demonstration) OpenMPI 🐳 Docker Ubuntu OpenMPI 🆚 code Docker MPI integration 🆘 Usage Docker environment Install OpneMPI	hello_world_mpi.c
6️⃣	14/03/2025	Introduction to Message Passing Interface (MPI)	📒	🅰️		MPI: A Message-Passing Interface Standard Version 3.1 📖 Have fun with MPI in C: - 📌 Chapter 1 - Introduction
7️⃣	20/03/2025	MPI - Synchronous Communication	📒⚗️	🅰️	Find the bug 1	📖 Have fun with MPI in C: - 📌 Chapter 2.1 - MPI Memory Model - 📌 Chapter 2.2 - Blocking Communication - 📌 Chapter 2.3 - Communication Modes
8️⃣	21/03/2025	Lab Message Passing Interface (MPI)	⚗️	🅰️		🔗 MPI on Cloud VM cluster	lab.3 - MPI.1 lab.4 - MPI.2
9️⃣	27/03/2025	MPI - Asynchronous Communication	📒⚗️	🅰️	Find the bug 2	📖 Have fun with MPI in C - 📌 Chapter 2.4 - Non-Blocking Communication	lab.5 - MPI.3
1️⃣0️⃣	28/03/2025	MPI - Noncontiguous Data, Derived Datatypes - Collective Communication	📒	🅰️		📖 Have fun with MPI in C - 📌 Chapter 3.1 - Communicate Noncontiguous Data - 📌 Chapter 3.2 - Derived Datatypes -📌 Chapter 3 - Collective Communication 📓 mpi-derived-datatypes.pdf
1️⃣1️⃣	03/04/2024	MPI Labs - Noncontiguous Data, Derived Datatypes - Collective Communication	⚗️	🅰️	mpi_latency.c		lab.6 - MPI.4 lab.7 - MPI.5
1️⃣2️⃣	04/04/2024	Introduction to concurrency	📒	🅱️		📕 Cap. 1, Appendice B
1️⃣3️⃣	10/04/2025	Mutual exclusion 1	📒	🅱️		📕 Cap. 2 (2.1->2.6)
1️⃣4️⃣	11/04/2025	Scalability Research Forum	⚗️	🅰️			lab.8 - MPI.6
1️⃣5️⃣	24/04/2025	Mutual exclusion 2	📒	🅱️		📕 Cap. 2 (2.7->2.8)
1️⃣6️⃣	08/05/2025	Concurrent Objects	📒	🅱️	Fix Double-Checking Locking	📕 Cap. 3
1️⃣7️⃣	09/05/2025	Monitor	📒	🅱️		Cap. 8
1️⃣8️⃣	15/05/2025	Spin Locks	📒	🅱️		📕 Cap. 7
1️⃣9️⃣	16/05/2025	Lists	📒	🅱️		📕 Cap. 9
2️⃣0️⃣	22/05/2025	Queues	📒	🅱️		📕 Cap. 10
2️⃣1️⃣	23/05/2025	Lab MPI + Q&A Presentazioni Scalability Research Forum	⚗️	🅰️🅱️
2️⃣2️⃣	29/05/2025 (4 ore)	Scalability Research Forum 1️⃣
2️⃣3️⃣	30/05/2025 (4 ore)	Scalability Research Forum 2️⃣

⚗️ Labs

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 1

1️⃣ Registrazione a GCP

Accedere al link all'interno della mail ricevuta dal professore. Inserire all'interno della pagina il proprio nome e l'email istituzionale studenti.unisa.it e premere il tasto Submit;
accedere al link di verifica all'interno della mail ricevuta da Google sul proprio account istituzionale ed effettuare la verifica. Verrà inviata un'altra mail sul proprio account contenente un codice coupon e un link per riscattarlo, tuttavia, non è possibile riscattare il coupon sull'account Google istituzionale per problemi tecnici. Salvare il codice coupon;
utilizzare il link https://cloud.google.com/ per accedere con un account Google personale usando il bottone in alto a destra;
utilizzare il link https://console.cloud.google.com/education?authuser=0 per accedere alla pagina dove riscattare il proprio codice. NOTA BENE: l'ultimo numero indicato nel link deve corrispondere all'account Google PERSONALE e non a quello istituzionale. È possibile controllare il numero associato ad ogni account Google controllando il link dopo aver effettuato l'accesso con l'account personale (passo 3); in caso in cui non si riesca ad accedere al link indicato, disconnettere tutti gli account Google, riavviare il browser, effettuare l'accesso SOLO CON L'ACCOUNT PERSONALE e riprovare;
inserire il codice ricevuto tramite mail (passo 2) nel campo Codice Coupon, accettare e proseguire. La pagina re-indirizzerà automaticamente alla dashboard Fatturazione dove saranno visualizzati i 50$ a disposizione.

2️⃣ Creazione di una Macchina Virtuale tramite Console Web

Accedere alla console GCP utilizzando l'account Google personale. Accedere al sito https://cloud.google.com/, se necessario, accedere con l'account personale e cliccare sul bottone Console in alto a destra;
utilizzare la barra di ricerca in alto per accedere al servizio VM Instances. In caso venga richiesto di selezionare un account di fatturazione, selezionare l'Account Studente;
SOLO AL PRIMO ACCESSO AL SERVIZIO cliccare il tasto Abilita e attendere l'attivazione. La pagina re-indirizzerà automaticamente alla dashboard di Compute Engine;
SOLO PER LA PRIMA CREAZIONE le istanze GCP richiedono di fornire una propria chiave ssh al momento della creazione per l'accesso che potrà essere riutilizzata ad ogni creazione di istanza successiva. Le istruzioni per creare una nuova chiave sulla propria macchina sono disponibili nella documentazione ufficiale di GCP https://cloud.google.com/compute/docs/connect/create-ssh-keys#create_an_ssh_key_pair/ ;
nella dashboard di Compute Engine utilizzare il bottone Crea istanza per creare una nuova macchina virtuale;
nel campo Regione selezionare una delle seguenti opzioni in modo da usufruire del piano gratuito:
- Oregon: us-west1
- Iowa: us-central1
- South Carolina: us-east1
nel campo Tipo di macchina selezionare e2-micro;
nel campo Disco di avvio, utilizzare il tasto Cambia per selezionare come Sistema operativo Ubuntu con Versione Ubuntu 18.04 LTS;
nel campo Firewall spuntare entrambe le opzioni;
aprire la sezione NETWORKING, DISCHI, SICUREZZA, GESTIONE, SINGLE-TENANCY";
aprire la sezione Sicurezza - Gestisci accesso e usare il bottone Aggiungi elemento per inserire la chiave ssh generata (passo 4). Inserire nel campo Chiave SSH 1 il contenuto del file .pub. N.B. L'ultima parte del file contiene l'username da utilizzare per accedere all'istanza tramite ssh;
aprire la sezione Gestione e selezionare Spot nel campo Modello di provisioning delle VM;
Creare l'istanza usando il tasto Crea e attendere per l'avvio. Una volta che l'istanza è in esecuzione utilizzare il protocollo ssh per accedere utilizzando l'IP esterno. NB Per accedere all'istanza è necessaria la chiave privata (passo 4) ovvero il file senza l'estensione .pub. Il comando per l'accesso deve avere la seguente sintassi: ssh -i ssh-key-private-file-name sshkey-username@99.999.9.99; Ulteriori informazioni riguardo la connessione ad una macchina virtuale sono disponibili nella documentazione ufficiale di GCP https://cloud.google.com/compute/docs/instances/connecting-advanced#thirdpartytools/. ATTENZIONE: Una volta terminato di lavorare sulle macchine virtuali è importante ricordarsi di eliminarle per evitare di esaurire il credito a disposizione. Per farlo tramite la dashboard di Compute Engine selezionare l'istanza da eliminare, cliccare sui tre puntini in alto e selezionare Elimina. Installazione MPI L'installazione di MPI su un'istanza creata in questo modo è disponibile sul repository Github https://github.com/spagnuolocarmine/ubuntu-openmpi-openmp/.

3️⃣ Installazione e configurazione Google gcloud

4️⃣ Creazione di una Home Page personale

Avviare una instanza e2-micro seguendo la procedura 2 Creazione di una macchina virtuale da Console Web;
collegarsi tramite ssh all'istanza ed effettuare gli aggiornamenti $ sudo apt-get update && sudo apt-get upgrade -y;
installare Apache Web Server: $ sudo apt-get install apache2 -y;
modificare il file index.html in /var/www/html con una messaggio di esempio;
connettersi da browser utilizzando l'IP pubblico dell'istanza e verificare che il contenuto della pagina corrisponda;
NOTA BENE: se non è possibile connettersi all'istanza in questo modo verificare le impostazioni nel campo Firewall usate durante la creazione della macchina;
eliminare l'instanza.

Ripetere l'esercizio utilizzando la Web console.

5️⃣ Creazione di un cluster di instanze e2-micro

sshkey

Creare 4 istanze e2.micro;
avviare una nuova sessione remota tramite ssh per ogni macchina creata ed effettuare gli aggiornamenti $ sudo apt-get update && sudo apt-get upgrade -y;
su ogni istanza creare un nuovo utente con username pcpc: $ sudo useradd -s /bin/bash -d /home/pcpc/ -m -G sudo pcpc ed impostare la password: $ sudo passwd pcpc;
su ogni istanza accedere all'account pcpc: $ sudo login pcpc
su ogni istanza modificare il file di configarezione di ssh: $ sudo vim /etc/ssh/sshd_config e cambiare l'opzione PasswordAuthentication come segue: PasswordAuthentication yes
su ogni istanza riavviare il servizio ssh: $ sudo service ssh restart
scegliere un'istanza da usare come MASTER e generare una coppia di chiavi ssh: $ ssh-keygen -t rsa;
sull'istanza MASTER aggiungere la chiave pubblica appena generata alle authorized_keys: $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys;
sull'istanza MASTER aggiungere la coppia di chiavi ssh ad ogni istanza alle authorized_keys: $ cat ~/.ssh/id_rsa.pub | ssh pcpc@IP_INSTANCE "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys";
su ogni istanza verificare di riuscire ad avviare una nuova sessione remota tramite ssh da ogni istanza verso le altre senza utilizzare nome utente e password: $ ssh IP;

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 2

🔗 MPI on Cloud VM cluster
Set up a cluster comprising 2 (or more) MPI nodes and run the MPI hello world programs:

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 3 - MPI Point-to-Point Communication

Develop an MPI program in C for the following problems, using only the MPI_Send and MPI_Recv operations:

Exchange of an integer value between two MPI processes.
Sending a string (read from stdin) from the process with rank 0 to the process with rank 1.
Given P MPI processes and an array of integer values with length N, perform the following operations:
- Broadcasting, the process with rank 0 sends to all processes 1..P-1;
- Gathering, the process with rank 0 receives an integer value from all processes 1...P-1;
- Scatter, the process with rank 0 sends a portion of the array to each process in 1...P-1.
Starting from the previous point, generalize the programs to obtain a library mycollective. There are no constraints on the function signatures and/or computational requirements.
Evaluate the performance of the implemented library in sending the data types MPI_Int and MPI_Char, calculating the execution times on a single machine varying the size of N and the number of MPI processes.

How to calculate the execution time?

/*................*/
double start, end;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Barrier(MPI_COMM_WORLD); /* tutti i processi sono inizializzati */
start = MPI_Wtime();
/*----------------*/
/*................*/
/*codice programma*/
/*................*/
/*----------------*/
MPI_Barrier(MPI_COMM_WORLD); /* tutti i processi hanno terminato */
end = MPI_Wtime();
MPI_Finalize();
if (rank == 0) { /* Master node scrive su stdout il tempo o su file */
    printf("Time in ms = %f\n", end-start);
}
/*................*/

Note that the rank values refer to the processes' indices obtained from the MPI_COMM_WORLD communicator.

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 3 - MPI Point-to-Point Communication

Develop an MPI program in C for the following problems:

Ring: Given P processes, the process with rank i sends an integer value to the process with rank i+1. Note that the communication pattern is circular and toroidal, so the process with rank P-1 sends to the process with rank 0. The program execution involves 10 iterations. In each iteration, processes increment the value received from the left neighbor by a pseudo-random integer between 0-100. A particular iteration ends when the value received by a process exceeds a certain threshold S provided as input to the program. At the end of the 10 iterations, the program writes to standard output (rank 0) the average number of communication rounds needed to converge using P and S. Note that random number generators should not be reinitialized between iterations. It is recommended to initialize the generators using the rank value.
Calculate the maximum and minimum of arrays of integer values using P processes and blocking and non blocking communication operations.
Develop the Reduce functionality in the mycollective library, capable of supporting the maximum and minimum operators of an array of integers.
Modify the mycollective library to implement gather, scatter, broadcast, and reduce operations in non-blocking mode. Note that these operations involve multiple processes; therefore, the library should also provide a function to wait for an array of MPI_Request, utilizing the operations offered by MPI.

Furthermore, note that the rank values refer to the processes' indices obtained from the MPI_COMM_WORLD communicator.

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 4 - MPI Non-blocking communication

Develop an MPI program in C for the following problems:

Ring 2: Given P processors, develop an MPI program where each processor sends its rank to its right neighbor in the order defined by MPI_COMM_WORLD and receives from the left neighbor. The communication of the rank should occur in a non-blocking manner. Additionally, each process forwards the messages it receives from the left neighbor and sums all the values, including its own, into a variable. The process terminates when each process has received the rank of all others and writes the calculated sum to the standard output.
Develop the same program but utilizing different communication modes.
Evaluate the performance of the programs on an MPI cluster varying the number of virtual instance from 2 to 16 nodes.
Execute and evaluate the performance of all programs developed in previous labs on the same cluster.

Furthermore, note that the rank values refer to the processes' indices obtained from the MPI_COMM_WORLD communicator.

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 5 - MPI Collective communication

Develop an MPI program in C for the following problems:

Broadcasting: Develop an MPI program utilizing P processors, where the process with rank 0 in MPI_COMM_WORLD broadcasts an array of doubles to all other processes in MPI_COMM_WORLD.
Scattering: Develop an MPI program utilizing P processors, where the process with rank 0 in MPI_COMM_WORLD scatters an array of doubles among all processes in MPI_COMM_WORLD.
Gathering: Develop an MPI program utilizing P processors, where the process with rank 0 in MPI_COMM_WORLD gathers a set of double values distributed among all processes in MPI_COMM_WORLD.
Evaluate the performance of the previous programs in handling large-sized arrays of doubles relative to the number of available processors.
Implement points 1, 2, and 3 using the mycollective library and assess the performance compared to MPI collective operations.
Evaluate the performance of all previously developed programs on an MPI cluster composed of 8 nodes.
Develop an MPI program that, given an array A of integers of length N, evenly utilizes P processors to update the values in A. Each element A[i] is calculated using the following operation:
- A[i] = A[i-1] + A[i] + A[i+1], for each i, 1...N-2
- A[0] = A[N-1] + A[0] + A[1], for i=0
- A[N-1] = A[N-2] + A[N-1] + A[0], for i=N-1 The array A is initialized in the master process, and the slaves perform operations only on their portion of the array. Each slave process sends its portion of the array back to the master; upon receiving all portions, the master process writes the execution time to standard output.
Develop an MPI program that, given a matrix of size NxM and P processes, calculates the maximum for each row of the matrix using P processes fairly. Upon completion, the master process writes the maximum of each row to standard output.
Repeat the previous point by calculating the minimum for each column.
Combine points 8 and 9 into a single program.

Furthermore, note that the rank values refer to the processes' indices obtained from the MPI_COMM_WORLD communicator.

Programmazione Concorrente, Parallela e su Cloud

Lab assignment 6 - MPI putting all together

Develop an MPI program in C for the following problems:

Game of Life

The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970.

The "game" is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further input. One interacts with the Game of Life by creating an initial configuration and observing how it evolves, or, for advanced "players," by creating patterns with particular properties.

The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive or dead, or "populated" or "unpopulated." We can suppose to use a matrix of char, which marks if the cell is alive or dead.

Every cell interacts with its eight neighbors, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

Any live cell with fewer than 2 live neighbors dies as if caused by underpopulation.
Any live cell with 2 or 3 live neighbors lives on to the next generation.
Any live cell with more than 3 live neighbors dies, as if by overpopulation.
Any dead cell with exactly 3 live neighbors becomes a live cell, as if by reproduction.

The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed—births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick (in other words, each generation is a pure function of the preceding one). The rules continue to be applied repeatedly to create further generations.

🗒️ The program must be able to simulate the process for a particular number of steps I. The MASTER process initializes a char matrix at random and split among P-1 processors. Notice that the MASTER can contribute to the computation or not; it is your choice. Each slave simulates the game and sends the corresponding ghost cells to its neighbor's processors, need in for the next simulation step. The hard part of the problem concerns equally divide the matrix among processors, which can be done in different ways. Your program must work on any matrix size (N!=M) and a number of processes.

N-body

In the N-body problem, we need to find the positions and velocities of a collection of interacting particles over some time. For example, an astrophysicist might want to know the positions and velocities of a group of stars. In contrast, a chemist might want to know the positions and velocities of a collection of molecules or atoms.

An n-body solver is a program that finds the solution to an n-body problem by simulating the behavior of the particles. The input to the problem is the mass, position, and velocity of each particle at the start of the simulation, and the output is typically the position and velocity of each particle at a sequence of user-specified times, or merely the position and velocity of each particle at the end of a user-specified time.

Problem details are also available here.

👉 NOTES. Consider only the solution that is quadratic in the number of particles; however, also the Barnes-Hut algorithm can be considered but should be harder to develop. For the mathematical part (bodies force computation) of the problem, follow the guidelines of this solution:

Show code

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "timer.h"

#define SOFTENING 1e-9f

typedef struct { float x, y, z, vx, vy, vz; } Body;

void randomizeBodies(float *data, int n) {
  for (int i = 0; i < n; i++) {
    data[i] = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;
  }
}

void bodyForce(Body *p, float dt, int n) {
  for (int i = 0; i < n; i++) { 
    float Fx = 0.0f; float Fy = 0.0f; float Fz = 0.0f;

    for (int j = 0; j < n; j++) {
      float dx = p[j].x - p[i].x;
      float dy = p[j].y - p[i].y;
      float dz = p[j].z - p[i].z;
      float distSqr = dx*dx + dy*dy + dz*dz + SOFTENING;
      float invDist = 1.0f / sqrtf(distSqr);
      float invDist3 = invDist * invDist * invDist;

      Fx += dx * invDist3; Fy += dy * invDist3; Fz += dz * invDist3;
    }

    p[i].vx += dt*Fx; p[i].vy += dt*Fy; p[i].vz += dt*Fz;
  }
}

int main(const int argc, const char** argv) {
  
  int nBodies = 30000;
  if (argc > 1) nBodies = atoi(argv[1]);

  const float dt = 0.01f; // time step
  const int nIters = 10;  // simulation iterations

  int bytes = nBodies*sizeof(Body);
  float *buf = (float*)malloc(bytes);
  Body *p = (Body*)buf;

  randomizeBodies(buf, 6*nBodies); // Init pos / vel data

  double totalTime = 0.0;

  for (int iter = 1; iter <= nIters; iter++) {
    StartTimer();

    bodyForce(p, dt, nBodies); // compute interbody forces

    for (int i = 0 ; i < nBodies; i++) { // integrate position
      p[i].x += p[i].vx*dt;
      p[i].y += p[i].vy*dt;
      p[i].z += p[i].vz*dt;
    }

    const double tElapsed = GetTimer() / 1000.0;
    if (iter > 1) { // First iter is warm up
      totalTime += tElapsed; 
    }
#ifndef SHMOO
    printf("Iteration %d: %.3f seconds\n", iter, tElapsed);
#endif
  }
  double avgTime = totalTime / (double)(nIters-1); 

#ifdef SHMOO
  printf("%d, %0.3f\n", nBodies, 1e-9 * nBodies * nBodies / avgTime);
#else
  printf("Average rate for iterations 2 through %d: %.3f +- %.3f steps per second.\n",
         nIters, rate);
  printf("%d Bodies: average %0.3f Billion Interactions / second\n", nBodies, 1e-9 * nBodies * nBodies / avgTime);
#endif
  free(buf);
}

Word Count

The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within specific numbers of words. This may particularly be the case in academia, legal proceedings, journalism, and advertising.

Word count is commonly used by translators to determine the price of a translation job. Word counts may also be used to calculate measures of readability and to measure typing and reading speeds (usually in words per minute). When converting character counts to words, a measure of 5 or 6 characters to a word is generally used for English

We will be doing a version of map-reduce using MPI to perform word counting over a large number of files

💡 There are three steps for this process:

the MASTER node reads the file list (or a directory), which will contain the names of all the files that are to be counted. Note that only 1 of your processes should read the files list. Then each of the processes should receive their portion of the files from the MASTER process. Once a process has received its list of files to process, it should then read in each of the files and perform a word counting, keeping track of the frequency each word found in the files occurs. We will call the histogram produced the local histogram. This is similar to the map stage or map-reduce.
the second phase is combining the frequencies of words across processes. For example, the word 'cat' might be counted in multiple processes, and we need to add up all these occurrences. This is similar to the reduced stage of map-reduce.
the last phase is to have each of the processes send their local histograms to the master process. The MASTER process just needs to gather up all this information. Note that there will be duplicate words between processes. The master should create a CSV formatted file with the words and frequencies ordered.

👉🏻 NOTES. The hard part of the problem concerns splitting the computation among the processors. For instance, if we split the files between processors, we cannot have a good partitioning scheme because some files must be bigger and bigger than other files. A good partitioning scheme must consider the number of words in each file, and split according to this value.

Programmazione Concorrente, Parallela e su Cloud & Serverless classes

📢 Scalability Research Forum (SRF)

Welcome to our Scalability Research Forum for the PCPC and Serverless courses!

In this engaging event, students are divided into groups and have the opportunity to present research articles on a specific topic related to parallel computing, distributed computing, or cloud computing. Each group carefully selects and analyzes a relevant article, leading to in-depth and stimulating discussions. The Research Forum provides a dynamic space to explore the latest scientific discoveries and promote knowledge sharing among participants and is a required task to access the oral examination.

Editions

2024
2025

SRF - Scalability Research Forum

1st Edition 2️⃣0️⃣2️⃣4️⃣

📚 Surveys articles

Groups participation (reserved to students which are attending classes in 2024)

Within April 29th, communicate using this form:

the name of the group;
the participants (1, 2 or 3);
the list of 3️⃣ papers in the references of the survey proposed, in order of preference;
preferred timeslot in the day (only if you are attending other classes on those days);
classes PCPC, Serverless, or both.

Presentation guidelines

The group must be homogeneous: all attendees must attend Serverless OR Concurrent Programming OR both.
Presentation for:
- One class students (PCPC or Serverless): 2️⃣0️⃣ minutes presentation ➕ 1️⃣0️⃣ minutes of questions.
- Two classes students (PCPC and Serverless): 3️⃣0️⃣ minutes presentation ➕ 1️⃣0️⃣ minutes of questions.

🗓️ Important dates and 🏆 Grading

The presentations will take place at the Scalability Research Forum (SRF), which will be held for the first time on May 30th, 31st, and June 7th. Presentations must be in English for the Serverless students and may be in English (preferably) for the Concurrent Programming course (if you wish, questions/answers may be conducted in Italian).

The participants will receive a gadget at the end of the SRF (on the 7th), where the 3 best presentations will be awarded 🎁. Anyway, the presentations, for the purpose of the exam, are graded pass/fail.

May 30th 2024, P6, 9:00 14:00 (Presentations)
May 31st 2024, P6, 9:00 14:00 (Presentations)
June 7th 2024, P6, 9:00 14:00 (Final day with presentations and awards)

The calendar is now available.

Exam for non attending students

The SRF is reserved for students currently actively attending the classes. The other students will give the presentations on the day of every scheduled exam, respectively, for Serverless and Concurrent programming (obviously with no gadgets! 😄).

🗓️ Schedule

May 30th

Califano-DeMaio, PipeDream: Generalized Pipeline Parallelism for DNN Training - https://doi.org/10.1145/3341301.3359646
Gianluigi Memoli, Dynamic Aggregation and Scheduling in CoAP/Observe-Based Wireless Sensor Networks - http://dx.doi.org/10.1109/JIOT.2016.2517120
Penna, FlexPS: Flexible Parallelism Control in Parameter Server Architecture - https://doi.org/10.1145/3187009.3177734
LIBPAKIOT, Internet of things: Architectures, protocols, and applications - https://doi.org/10.1155/2017/9324035
Gruppo Federated Learning, Federated Learning: Strategies for Improving Communication Efficiency - https://arxiv.org/abs/1610.05492
Mutual Inclusion, Neugraph: parallel deep neural network computation on large graphs - https://dl.acm.org/doi/10.5555/3358807.3358845
Pizza Team, A Survey of Communication Protocols for Internet of Things and Related Challenges of Fog and Cloud Computing Integration - https://doi.org/10.1145/3292674
Data Dream Team, A Performance Evaluation of Federated Learning Algorithms- https://doi.org/10.1145/3292674

May 31th

Melkia, Middlewarefor IoT-Cloud Integration Across Application Domains - https://doi.org/10.1109/MDAT.2014.2314602
UniSec, Lucky thirteen: Breaking the TLS and DTLS record protocols - https://doi.org/10.1109/SP.2013.42
DiPasqualeMonzillo, Complex Network Analysis using Parallel Approximate Motif Counting - https://doi.org/10.1109/IPDPS.2014.50
Me, Myself and I, SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing - https://doi.org/10.1145/3464298.3476133
Vitale-Cerciello, Multi-column deep neural network for traffic sign classification - https://doi.org/10.1016/j.neunet.2012.02.023
GarofaloAdinolfiArdovino, Parallel hypergraph partitioning for scientific computing - https://doi.org/10.1109/IPDPS.2006.1639359
The Solo Journey, Web Performance Evaluation for Internet of Things Applications - https://doi.org/10.1109/ACCESS.2016.2615181
Gruppo Leone, Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU - https://doi.org/10.1145/1815961.1816021

June 7th

Group 1.2.3 (Final), Authentication for the web of things: Secure end-to-end authentication between CoAP and HTTP - https://doi.org/10.1109/PIMRC.2017.8292352
GNU/Kefir, ChainerMN: Scalable Distributed Deep Learning Framework - https://doi.org/10.48550/arXiv.1710.11351
Bilovus, Performance evaluation of Websocket protocol for implementation of full-duplex web streams - https://doi.org/10.1109/MIPRO.2014.6859715
YM, Fog computing and its role in the internet of things - http://dx.doi.org/10.1145/2342509.2342513
Gioacchino Tortorelli, Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations - https://doi.org/10.1145/2751205.2751219
Santangelo, Horovod: fast and easy distributed deep learning in TensorFlow - https://arxiv.org/abs/1802.05799
iRagazzi, Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines - https://doi.org/10.1145/3458817.3476145
Nuvola, A Disruption-Tolerant RESTful Support for the Web of Things - https://doi.org/10.1109/FiCloud.2016.11

Lorenzo&Lorenzo, Middleware solutions in WSN: The IoT oriented approach in the ICSI project - https://doi.org/10.1109/SoftCOM.2013.6671886
The New Revolution Cloud Ranger, The importance of a standard security architecture for SOA-based iot middleware - https://doi.org/10.1109/MCOM.2015.7355580
Solo(Serverless), Performance analysis of communication protocols for internet of things platforms - http://dx.doi.org/10.1109/ColComCon.2017.8088198
Taranum, Choice of effective messaging protocols for IoT systems: MQTT, CoAP, AMQP and HTTP - http://dx.doi.org/10.1109/SysEng.2017.8088251
UNISArverless, Communication-avoiding parallel minimum cuts and connected components - https://doi.org/10.1145/3178487.3178504
MegaBeat, Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons - https://doi.ieeecomputersociety.org/10.1109/IPDPS47924.2020.00118

SRF - Scalability Research Forum

2st Edition 2️⃣0️⃣2️⃣5️⃣

Groups participation (reserved to students which are attending classes in 2025)

By April 16th, please fill out this form with the following information:

The name of your group;
The participants (1, 2, or 3 members);
Add your group name next to the selected paper in the article list;
Your preferred timeslot during the day (only if you have other classes at that time).

Presentation Guidelines

Groups must be homogeneous: all members must be attending Serverless, Concurrent Programming, or both.
Presentation duration:
- For students attending one course only (either Concurrent Programming or Serverless): 20 minutes presentation ➕ questions;
- For students attending both courses: 30 minutes presentation ➕ questions.

🗓️ Important Dates and 🏆 Grading

The presentations will take place during the Scalability Research Forum (SRF), which will be held for the first time on May 29th and 30th, 2025.
Presentations must be in English for Serverless students, and preferably in English for Concurrent Programming students (questions and answers may be conducted in Italian, if preferred).
Presentation grading (for exam purposes) is pass/fail.

Time slots:

May 29th, 2025 — P6, 09:00–14:00 (Presentations)
May 29th, 2025 — ISISLab, 14:00–18:00 (Presentations)
May 30th, 2025 — P6, 09:00–14:00 (Presentations)
May 30th, 2025 — Lab Sammet, 14:00–18:00(Presentations)

The calendar is online (20/05/2025).

Please upload your presentation slides to the SRF folder in PDF using your group name as name for the file (GROUP-NAME.pdf).

Exam for non attending students

Contribute to 3Dify research

Complete the survey here to contribute to the 3Dify research project.

Keyboard shortcuts