Optimizing communication networks with AI

Machine Learning


• Machine learning algorithms can detect various anomalies before they degrade service quality.
• Although still in the experimental stage, generative AI and agent-based AI represent a new era of network automation.
• Network automation simplifies management, reduces associated costs, and optimizes performance and energy consumption.

The long-promised convergence of IT and communications is now a reality. Network functions virtualization (NFV) has made it possible to decouple network services from the hardware infrastructure by deploying them as software. Purpose: To remove dependencies on specific hardware layers. Recent trends include the “cloudification” of communications infrastructure, adding layers of abstraction, where network functions run on a shared platform, and paving the way for a more open horizontal model. However, new services and innovations have turned communication networks into distributed systems that are extremely complex to manage. Management and maintenance teams must analyze an ever-increasing number of events, from alarm data reported by equipment to performance metrics and application logs.

While traditional monitoring tools are reaching their limits, artificial intelligence provides the ability to correlate disparate data that is often redundant and lacks context. Purpose: To quickly identify the root cause of an incident (RCA, Root Cause Analysis). In the context of predictive maintenance, it involves continuously monitoring the health of a network to identify potential failures before they impact quality of service. “Based on operational history, the algorithm can detect abnormal deviations of key performance indicators (KPIs) from expected behavior,” explains Ilhem Fajjari, a researcher at the Orange Innovation Center in Châtillon. “Subsequent corrective actions may include self-healing mechanisms such as restarting or reconfiguring certain network functions. The goal is to resolve issues as quickly as possible before they impact the user experience. Even the slightest degradation will directly impact the quality of voice calls and video streaming.”

The concept of self-healing allows smart networks to automatically diagnose and correct certain failures, anomalies, or degraded quality of service with little or no human intervention. The performance metrics tracked are related to the infrastructure itself (CPU, RAM, disk space), network capabilities, or user experience such as session setup time and user throughput.

How far should we go with network autonomy? Orange researcher Ilhem Fajali said: “Full autonomy is premature at this stage,” pointing out the risks of misunderstandings and lack of transparency.

Reduce energy consumption

Key use cases outside of maintenance include network configuration (applying optimal settings) and optimization. “By predicting the load on network cells and servers, we can optimize the operation of mobile networks,” explains Zwi Altman, a research engineer at the same Orange site. “For example, turning off certain cells during periods of low traffic reduces network energy consumption and reduces environmental impact.”

Designing specialized machine learning algorithms relies on MLOps (machine learning operations). This set of practices, based on DevOps principles, aims to industrialize the model lifecycle across various stages of training, validation, deployment, and monitoring. MLOps is also part of a continuous improvement framework. “Retraining is an important way to tune models and maintain performance over time,” explains Ilhem Fajjari.

Give instructions in natural language

The telecom industry could not ignore the wave of generative AI. Intent management uses large-scale language models (LLMs) to control networks by expressing expected outcomes in natural language and having the system translate those intents into concrete actions. “Proof of concept (POC) testing is currently underway,” says Zwi Altman.

Agent-based AI, the next evolution of generative AI, takes network automation a step further by deploying an army of specialized intelligent agents that coordinate to trigger a sequence of actions based on objectives and events.

How far should network automation go? With a scale that includes five maturity levels established by the TM Forum Alliance, Orange is aiming for Level 4 (“Man in the Loop”), where AI is pervasive while maintaining human oversight. “At this stage, full autonomy is still premature,” says Ilhem Fajali, pointing to the risk of hallucinations and clouding. “To avoid the ‘black box’ effect, ongoing research is focused on model explainability.”

Finally, there is the issue of sovereignty. The most powerful LLMs are mostly developed by US companies. Three hyperscalers: AWS, Microsoft Azure, and Google Cloud are also trying to capture this AIOps market by offering carriers all-in-one cloud solutions that combine network function hosting with AI building blocks.

This text was translated by artificial intelligence.



Source link