Tesi etd-09052024-111807

Tipo di tesi

Dottorato

Autore

KHAN, LAREB ZAR

URN

etd-09052024-111807

Titolo

Optimized Machine Learning Techniques for Reliability of Optical Networks

Settore scientifico disciplinare

ING-INF/03

Corso di studi

Istituto di Tecnologie della Comunicazione, dell'Informazione e della Percezione - PHD IN EMERGING DIGITAL TECHNOLOGIES

Commissione

relatore SAMBO, NICOLA
Membro Prof.ssa BOGONI, ANTONELLA
Presidente Prof. ESPOSITO, FLAVIO
Membro Dott. PEDRO, JOAO

Parole chiave

Failure Management
Optical Networks
Machine Learning
Neural Networks
Computational Complexity
Data Imbalance

Data inizio appello

12/11/2024;

Disponibilità

completa

Riassunto analitico

The increasing reliance on data-intensive services is pushing the boundaries of capacity that can be achieved from optical networks. As optical fibers can provide high throughput over longer distances, optical networks employing optical fibers form the backbone of modern telecommunications infrastructure. Network operators strive to provide their customers with high-quality online experiences while adhering to service level agreements (SLAs). Their utmost desire is to efficiently make their optical networks more autonomous, which can be accomplished by minimizing human intervention in network operation. In this regard, machine learning (ML) is expected to play an important role. Over the past several years, ML has been studied with the goal of achieving autonomous operation and management of optical networks. However, despite numerous studies demonstrating the exceptional potential of ML for a variety of applications, we have yet to see a mature commercial ML solution that is widely used in optical networks. This thesis intends to investigate some of the overlooked aspects of ML in the context of optical network failure management (ONFM), with the goal of bridging the gap between research and practical application in industry.

The thesis starts with the focus on the quality of training data, which is critical for achieving optimal performance of ML models. Within quality of training data, the primary focus of this thesis is on reducing the imbalance in training data for classification problems (e.g., type of failure classification) to prevent the ML model from being biased towards the well-represented scenario(s) in the data. A variational-autoencoder-based technique has been proposed for data augmentation, which involves the generation of synthetic samples to reduce imbalance in the training data. The proposed approach has been shown to be effective in reducing the training time of ML models, specifically neural networks (NNs), for failure detection and identification. Other demonstrated benefits include reduced computational complexity and improved classification accuracy.

Next, considering the significance of data imbalance and its relevance to ONFM, a comprehensive comparative analysis of various model-centric and data-centric techniques for addressing this issue has been conducted. The dataset used for this comparison has been obtained from an experimental testbed containing commercial devices. The effectiveness of model-centric and data-centric approaches has been compared in terms of their impact on classification performance and additional computational cost associated with each of these techniques. Failure identification has been considered as a use-case for this analysis.

Following this, another very important yet overlooked performance aspect of ML models: computational complexity, has been studied. It has been investigated how improved performance achieved with ML models adds additional complexity to the overall network operation, and how this complexity of ML models can be reduced while maintaining the same improved performance. For this analysis, NNs, often a go-to choice when it comes to ML model selection, have been considered, and failure identification has been the use-case. Two novel NN compression techniques, \emph{Iterative Neuron Removal} and \emph{Guided Knowledge Distillation}, have been proposed, and their effectiveness has been demonstrated in the form of a significant reduction in computational complexity quantified in terms of floating point operations (FLOPs) per inference and memory footprint, while having no significant impact on NN classification performance.

Alarms from the Network Management System (NMS), which contain important information about network operation, have also been used for ONFM in this thesis. A few of the challenges associated with raw alarms have been addressed. These challenges include the high cardinality of alarm data for ML, the unavailability of corresponding alarm labels, and the vendor-specificity of NMS and alarms. To deal with high cardinality, an alarm data pre-processing approach has been proposed that effectively reduces the dimensions of high-dimensional data for ML. The proposed approach has been demonstrated to significantly reduce training and inference times for multiple ML models while having no significant impact on classification performance. Regarding unavailability of labelled alarms data, an unsupervised ML technique for alarm classification has been proposed that does not require any labels. The obtained results suggest that up to 100\% failure alarms can be identified, albeit at the expense of low precision, depending on the threshold selection. Last but not the least, failure root-cause identification has been investigated using alarms in a vendor-agnostic manner. The proposed approach makes use of alarms' information such as the impacted network element and probable cause of the fault, which is provided by NMS of all the vendors. This approach has been tested on alarms extracted from a production network, and the results indicate that failure root-cause identification can be achieved in this manner. However, the effectiveness of the proposed approach has yet to be validated by operational staff.

To summarize, a few issues regarding ML-assisted fault management have been addressed, and several solutions to make ML techniques more effective in dealing with optical network failures have been successfully proposed in the scope of this thesis.

File

Nome file	Dimensione
PhD_Thes...Lareb.pdf	21.45 Mb
Contatta l'autore

DTA

Archivio Digitale delle Tesi e degli elaborati finali elettronici

Tesi etd-09052024-111807