Kyndryl adds AI to proactively prevent IT outages

joseph gabriel ragoncin

news editor

Kyndryl has introduced new agent AI capabilities to Kyndryl Bridge designed to detect and resolve IT risks before they become an outage. This feature is already being used across Kyndryl Bridge’s customer base.

The feature, which Kyndryl describes as patented, is within the company’s open integration platform and is intended to identify patterns that tend to occur before system failures occur. Analyze signals from applications and infrastructure and use AI agents to trigger actions aimed at preventing disruption rather than responding after an incident occurs.

This feature has been deployed to over 1,400 Kyndryl Bridge customers. Across its installed base, the platform generates more than 16 million AI insights each month, according to Kyndryl.

Kyndryl said customers using the system have recorded up to 50% fewer IT incidents, with annual savings totaling US$3 billion in avoided outages and reduced maintenance costs. Some early deployments have reduced mission-critical outages by as much as 90%.

structure

The system performs AI-assisted root cause analysis across over 200,000 customer devices. This process is designed to identify conditions that often lead to outages, such as application slowdowns, infrastructure conflicts, configuration changes, and operational events that seem minor on their own, but become more severe when combined.

By correlating these signals across different layers of the IT estate, Kyndryl aims to address the challenges faced by many large organizations as their systems become distributed across hybrid environments and multiple suppliers. In such settings, determining the cause of a failure can take days or weeks and often relies on manual investigation by technical teams working with separate tools.

This new feature aims to speed up that process by uncovering possible causes and recommended interventions early on. Kyndryl says its experts review and validate the insights generated to ensure they fit each customer’s environment before making any operational decisions.

This consideration step is important because many companies remain wary of handing over control of their production systems to fully automated agents. The use of AI in IT operations is increasing as companies seek to reduce downtime and lower support costs, but concerns remain about false positives, bad recommendations, and limited visibility into how systems arrive at certain decisions.

pressure from customers

The announcement comes as large enterprises face increasing pressure to keep digital systems available while managing increasingly complex assets. Many enterprises today run applications across on-premises infrastructure, public cloud services, and outsourced platforms, creating a web of dependencies that can make failures difficult to predict.

Traditional monitoring tools can generate a large number of alerts without a clear indication of which warning signs are most important. Kyndryl’s approach is to identify the likely combination of events that precede a disruption and act before those conditions develop into business-impacting outages.

According to Kyndryl, this capability can support early detection at scale of more than 10 million incidents per year. The company also said the tool speeds root cause analysis and enables organizations to complete critical incident reporting in hours instead of weeks.

For Kyndryl, this addition expands the role of Kyndryl Bridge beyond observability and support to more direct intervention into customer operations. This reflects a broader shift in the IT services market, with providers moving from advising customers after problems occur to preventing failures before they impact users or revenue.

Kyndryl specializes in infrastructure services and systems management, serving thousands of customers in more than 60 countries. Position Kyndryl Bridge as the central layer that links operational data across your customer environment and turns it into recommendations for your technical team.

The latest capabilities add a powerful element of automation to that model by using AI agents to take actions after risks are identified. Kyndryl did not provide details on which remediation steps will be automated by default and which will require human approval, but said the system is intended to support early intervention and reduce operational disruption.

Xerxes Cooper, Global Leader of Kyndryl Delivery, explained the rationale for the company’s launch. “By incorporating AI agents into Kyndryl Bridge for proactive risk detection, we are transforming IT operations from reactive outage recovery to proactive, evidence-based prevention,” he said. “By correlating millions of observability signals across applications and deep infrastructure, we enable our customers to recognize and resolve problems before they even feel them.”

Source link