Tesi etd-10232024-213804

Tipo di tesi

Corso Ordinario Secondo Livello

Autore

MANGIACAPRE, MARCO LUCIO

URN

etd-10232024-213804

Titolo

Software-based fault-tolerance with OpenMP

Struttura

Classe Scienze Sperimentali

Corso di studi

INGEGNERIA - INGEGNERIA

Commissione

Tutor Prof. CASTOLDI, PIERO
Relatore ROYUELA ALCAZAR, SARA
Relatore Prof. CUCINOTTA, TOMMASO
Presidente Prof.ssa BOGONI, ANTONELLA
Membro Dott.ssa CREA, SIMONA
Membro Prof. ABENI, LUCA
Membro Prof. ANDREUSSI, TOMMASO
Membro Prof. AVIZZANO, CARLO ALBERTO
Membro Prof. MICERA, SILVESTRO
Membro Prof. ODDO, CALOGERO MARIA
Membro Prof. RICOTTI, LEONARDO

Parole chiave

checkpointing
error recovery
OMP
OMP tasking
OpenMP
radiation resilience
task
Xilinx
Xilinx zcu102

Data inizio appello

09/12/2024;

Disponibilità

parziale

Riassunto analitico

Parallel programming has been continuously increasing its importance in the last decades as the number of cores available even in consumer's computer CPUs has grown significantly. In recent years, this increment in compute power has affected also devices previously always characterized by extremely limited capabilities: multi-core embedded devices with multiple accelerators are today available. The union between multi-core devices and embedded systems introduced new reliability concerns in software development, being parallel programming famous for the variety and complexity of possible failures and bugs. In a such error-prone environment, programming models like OpenMP appear as excellent solutions to avoid having to manually write the thread management code, then saving from the risk of many common bugs. However, just simplifying development is not sufficient in extreme environment were radiation-caused bit errors can occur probabilistically at any point compromising the correctness of mathematically correct algorithms. These kind of issues have been normally handled by using special hardware, capable of detecting anomalies and redoing operations automatically, but this kind of hardware is extremely expensive and that prevent its use in many context. In this research work a new approach trying to tackle these problems - at least partially - in software as been developed, aiming to allow the use of parallel customer hardware even in open space by relying on a verify-or-repeat approach.

File

Nome file	Dimensione
Ci sono 1 file riservati su richiesta dell'autore. Contatta l'autore

DTA

Archivio Digitale delle Tesi e degli elaborati finali elettronici

Tesi etd-10232024-213804