Tesi etd-11202020-131631
Link copiato negli appunti
Tipo di tesi
Dottorato
Autore
SEYOUM, BIRUK
Indirizzo email
forbiruk@gmail.com
URN
etd-11202020-131631
Titolo
Design methods for partially reconfigurable FPGA-based SoCs and their applications for accelerating deep neural networks
Settore scientifico disciplinare
INF/01
Corso di studi
Istituto di Tecnologie della Comunicazione, dell'Informazione e della Percezione - PH.D. PROGRAMME IN EMERGING DIGITAL TECHNOLOGIES (EDT)
Commissione
relatore Prof. BUTTAZZO, GIORGIO CARLO
Membro Prof. CILARDO, ALESSANDRO
Presidente Prof. CUCINOTTA, TOMMASO
Membro Prof. Luca CARLONI
Relatore Prof.ssa MITRA, TULIKA
Membro Prof. CILARDO, ALESSANDRO
Presidente Prof. CUCINOTTA, TOMMASO
Membro Prof. Luca CARLONI
Relatore Prof.ssa MITRA, TULIKA
Parole chiave
- Dynamic Partial Reconfiguration
- FPGAs
- Reconfigurable computing
Data inizio appello
19/07/2021;
Disponibilità
completa
Riassunto analitico
The ever-growing demand for energy and power-efficient computation without compromising performance has caused a paradigm shift in system design towards complex heterogeneous systems, which integrate general-purpose multi-core and many-core processors with specialized and power-efficient hardware accelerators on a single chip. One significant outcome of this drift towards heterogeneous architectures is the integration of general-purpose processor cores with integrated programmable logic inside FPGA-based SoCs. These platforms combine the software programmability of processors with the hardware programmability of FPGAs, enabling the design of systems with a high-level of flexibility, performance, and scalability.
The paradigm shift in system design has also opened the gate for specialized hardware accelerators to be increasingly employed to achieve efficient performance across a wide range of applications. Accordingly, the exploration of the efficient way to implement hardware accelerators on FPGA-based SoCs has also become a major research theme. In the meanwhile, the programmable logic on the FPGAs has also evolved from simply being re-programmable to dynamically reconfigurable to dynamically partially reconfigurable. Dynamic partial reconfiguration (DPR) of FPGAs refers to the capability of modifying some portion of the logic blocks on the FPGA with partial bitstreams, while the remaining portions of the logic remain to be operational. Under DPR, hardware modules can be reconfigured on the same logic area of the FPGA in a time-multiplexed manner, thereby reducing the size, cost, and dynamic power consumption of the target FPGA. Employing the DPR feature into the design is also a great way to increase the modularity and flexibility of the system as it allows to swap logic functions on the fly, without the need to reconfigure the entire FPGA. Furthermore, by the virtue of its flexibility, the DPR feature makes FPGA-based SoCs an ideal platform to build truly self-adaptive systems.
Despite its benefits, DPR-based hardware acceleration is not highly explored for reasons related to the tediousness and complexity of the existing DPR design flow. Moreover, a considerable manual effort and an expertise with the technology and tools is also required to perform the full design flow, as existing commercial DPR design tools partially/fully automate some of the individual design steps. While DPR can improve the performance of many applications across a wide domain, in practice, its use has mostly been limited to applications requiring fault tolerance, adaptability and reliability. The lack of DPR-based designs in other areas, such as AI applications, real-time applications, security applications, etc., has limited its popularity. To enable a wider adoption of DPR, the standard DPR design methodology also needs to be updated in such a way that it can easily integrate the high-level application requirements into the design. Therefore, in order to enable system designers to take full advantage of the DPR feature of FPGA-based SoCs, and also increase its applicability, the standard DPR design flow needs to be modified and simplified. The final goal of this work is to reach \textit{a full automation of the DPR design flow that also integrates high-level application requirements into the design. This will lead to reaping the full benefits offered by DPR and push for its wider adoption, beyond the realm of a few expert designers.}
Towards this objective, this thesis addresses two major research themes: (1) the automation of the DPR design flow in Xilinx FPGAs and (2) the efficient use of DPR in the design and implementation of embedded systems. The first theme is covered in the first part of the thesis, which presents a suite of design tools that automate, the otherwise manually performed, DPR design steps in Xilinx FPGAs, and ultimately integrates these individual tools to build a holistic DPR design automation tool. The tool especially targets system designs involving a hardware-software co-design approach with real-time requirements. The second part of the thesis focuses on demonstrating the feasibility and increased benefits of utilizing DPR in the design and implementation of applications, focusing on the efficient implementation and optimization of Deep Neural Network (DNN) accelerators for FPGAs under DPR. In particular, the thesis presents a novel tool that optimizes and improves the performance of FPGA-based quantized neural network (QNN) accelerators by combining an analytical latency model of the network, an empirical resource consumption model of the network, and DPR. Furthermore, this part demonstrates on how DPR can become a new optimization trade-off, besides the area-latency trade-off, in the performance optimization of hardware accelerators for FPGAs. Finally, this part also aims at closing the gap between the state of the art DNN architectures and their corresponding FPGA implementation.
The paradigm shift in system design has also opened the gate for specialized hardware accelerators to be increasingly employed to achieve efficient performance across a wide range of applications. Accordingly, the exploration of the efficient way to implement hardware accelerators on FPGA-based SoCs has also become a major research theme. In the meanwhile, the programmable logic on the FPGAs has also evolved from simply being re-programmable to dynamically reconfigurable to dynamically partially reconfigurable. Dynamic partial reconfiguration (DPR) of FPGAs refers to the capability of modifying some portion of the logic blocks on the FPGA with partial bitstreams, while the remaining portions of the logic remain to be operational. Under DPR, hardware modules can be reconfigured on the same logic area of the FPGA in a time-multiplexed manner, thereby reducing the size, cost, and dynamic power consumption of the target FPGA. Employing the DPR feature into the design is also a great way to increase the modularity and flexibility of the system as it allows to swap logic functions on the fly, without the need to reconfigure the entire FPGA. Furthermore, by the virtue of its flexibility, the DPR feature makes FPGA-based SoCs an ideal platform to build truly self-adaptive systems.
Despite its benefits, DPR-based hardware acceleration is not highly explored for reasons related to the tediousness and complexity of the existing DPR design flow. Moreover, a considerable manual effort and an expertise with the technology and tools is also required to perform the full design flow, as existing commercial DPR design tools partially/fully automate some of the individual design steps. While DPR can improve the performance of many applications across a wide domain, in practice, its use has mostly been limited to applications requiring fault tolerance, adaptability and reliability. The lack of DPR-based designs in other areas, such as AI applications, real-time applications, security applications, etc., has limited its popularity. To enable a wider adoption of DPR, the standard DPR design methodology also needs to be updated in such a way that it can easily integrate the high-level application requirements into the design. Therefore, in order to enable system designers to take full advantage of the DPR feature of FPGA-based SoCs, and also increase its applicability, the standard DPR design flow needs to be modified and simplified. The final goal of this work is to reach \textit{a full automation of the DPR design flow that also integrates high-level application requirements into the design. This will lead to reaping the full benefits offered by DPR and push for its wider adoption, beyond the realm of a few expert designers.}
Towards this objective, this thesis addresses two major research themes: (1) the automation of the DPR design flow in Xilinx FPGAs and (2) the efficient use of DPR in the design and implementation of embedded systems. The first theme is covered in the first part of the thesis, which presents a suite of design tools that automate, the otherwise manually performed, DPR design steps in Xilinx FPGAs, and ultimately integrates these individual tools to build a holistic DPR design automation tool. The tool especially targets system designs involving a hardware-software co-design approach with real-time requirements. The second part of the thesis focuses on demonstrating the feasibility and increased benefits of utilizing DPR in the design and implementation of applications, focusing on the efficient implementation and optimization of Deep Neural Network (DNN) accelerators for FPGAs under DPR. In particular, the thesis presents a novel tool that optimizes and improves the performance of FPGA-based quantized neural network (QNN) accelerators by combining an analytical latency model of the network, an empirical resource consumption model of the network, and DPR. Furthermore, this part demonstrates on how DPR can become a new optimization trade-off, besides the area-latency trade-off, in the performance optimization of hardware accelerators for FPGAs. Finally, this part also aims at closing the gap between the state of the art DNN architectures and their corresponding FPGA implementation.
File
Nome file | Dimensione |
---|---|
seyoum_t...plete.pdf | 4.79 Mb |
Contatta l'autore |