Information technology continues to play an increasingly important role in modern enterprises, with significant and ongoing investment in observability platforms and practices that help manage underlying scale and complexity.
A significant improvement over traditional monitoring tools, these platforms are primarily targeted at software engineers and provide insight into the performance and reliability of stacks from applications to compute.
Unfortunately, a recurring trend is that the creators of these platforms have largely ignored the network, with operators using dozens of software to identify subtle sources of network performance and reliability issues. We ended up relying on disparate traditional monitoring tools. But a new category of observability platform is poised to revolutionize operations with comprehensive visibility into networks, infrastructure, and applications.
What is observability?
Observability has a long history of origins within control theory.
A system is said to be observable if the current state of the system can be determined using the system output. Historically, this meant collecting telemetry through sensors, but the general principles apply to nearly any system.
In software systems, these system outputs include telemetry such as logs, metrics, and events. For network domains, his two data sources are often added: configuration and inventory. These will help you understand the context of the relevant telemetry and identify the cause of the problem.
The latest generation of these platforms has gone even further, automating key aspects of the troubleshooting process and pointing engineers to potential outages and problems. These platforms often come to market under the AIOps banner and are able to answer plain-language questions from operators such as “What’s wrong with the network?” “What’s wrong with my network?” And finally, “How do I fix it?”
With this capability, engineers can more quickly and effectively identify and address problems, reduce downtime, and minimize the impact of network issues on customers.
Overcome the ever-changing landscape
It’s never too early for these platforms to arrive. The streaming wars were already raging when COVID-19 hit, forcing so many businesses to facilitate large-scale remote and distributed work almost overnight. Many, if not most, of these trends will permanently change the way we live and work.
These advances have put networking and networks back in the spotlight. Over the last few decades, networks have increasingly become the cornerstone of how we live and work. However, there was a time when availability got so good that we started forgetting about the network.
What has changed is the focus on performance and quality of service, rather than service availability. Network monitoring using a diverse collection of point solutions and tools may have been enough to ‘keep the lights on’, but today, to ensure the quality of experience customers and users demand, You need deeper, more comprehensive network monitoring.
Yet service availability remains an issue even for the largest CSPs. For example, consider the Rogers Communications Disability in 2022, when more than 25% of Canadians (about 12 million) were without internet or wireless service all day long. Lack of observability appears to have been the main cause of the prolonged outage. As a result, no one could answer why the failure occurred, let alone how to fix it. The extended suspension period is estimated to have cost Rogers $28 million to $70 million in customer rebates alone, with damage to the Canadian economy exceeding $142 million. increase. And, of course, we all know that this incident directly led to the dismissal of the company’s CTO, Jorge Fernandez.
increase the bet
Most of us don’t have the misfortune to deal with a disability of this magnitude or severity. But ensuring uptime is in many ways an easier problem to solve than delivering consistently high performance across the board.
Performance is now a key metric by which CSPs are judged, even if subscribers themselves don’t know how to quantify it. For example, if a favorite show doesn’t stream in 4K for her, or if someone’s video call is delayed or choppy, everyone notices.
Corporate customers care even more about this than individuals. In a world where more businesses interact with their customers through applications built for always-on, high-performance network connections, packet loss, latency, and jitter are all more important than ever.
Problems with traditional monitoring
Over the years, many point solutions and tools have emerged to meet this need. You may currently have several of these installed on your network. These are traditional network monitoring tools (often based on polling SNMP data from network devices), log collection and analysis tools (mainly based on syslog messages from various devices and applications), Packet capture or other flow-based tools (collect and analyze). network traffic), and even synthetic monitoring tools (generating and analyzing simulated user/application traffic).
Unfortunately, each of these tools are independent and incomplete. While it may alleviate the need to manually move from router to router or switch to switch to understand the current state of the network, the burden of moving between different tools is moved to Correlations and problem identification are still largely up to the user.
Benefits of observability for CSP
A modern observability platform helps users efficiently identify the root cause of network problems by creating a correlated narrative across network infrastructure and applications.
These platforms integrate data from all available sources, formatting, normalizing and auto-labeling the information received. This enables data and metadata correlation across all infrastructure, from the network to the application stack, allowing the platform to rank potential alerts and display only the most important ones to operators. will be
After years of false promises, vertically integrated platforms like this finally offer the single glass we’ve all been waiting for, paving the way to revolutionize observability.
