Many Google Cloud services, including Google Vertex AI Online Prediction, Dialogflow CX, Agent Assist, and Contact Center AI, have declined due to issues that affected users in the US, Southeast Asia and Europe for 3 hours and 53 minutes.
“We sincerely apologize to our customers affected amid this disruption. This is not the level of quality and reliability we strive to provide to you, but we are taking immediate steps to improve the performance and availability of our platform.” Google said in a statement.
The tech giant said the root cause of the outage was that on June 8, users were sending new types of requests, causing occasional crashes (called “segmentation failures”) on several servers handling AI responses.
If the server crashes, it will automatically restart and incoming requests will be redirected to another working server.
Initially, the request wasn't an issue as there was a healthy server that was healthy enough to handle the load, Google said. However, if the number of these crash trigger requests increased, so many servers were crashing, there weren't enough available servers left to keep up with demand. Therefore, users began to see a halt.
Google engineers were warned of the issue on June 10th, at which point there was no visible customer impact. The engineers began to identify the cause and deploy corrections.
However, today, many services using vertex prediction have begun to experience issues for users linked to the same root cause. Google engineers have identified connections to previous issues, accelerated the deployment of fixes, and resolved the issues.
To prevent the issue from repeating, Google said it will be releasing its production monitoring to detect early signs of a server crash and release it to production before it can step up its functionality changes and update validation process.
Bigger trends
Vertex AI is used Hospitals, digital health startups, research institutes, pharmaceutical companies for diagnostic support, personalized treatment recommendations, risk scoring, and operational support using patient data.
Dialogflow CX and Agent Assist are also increasingly used within healthcare as clinical support tools and to aid in administrative workflows.
Contact Center AI is actively used in healthcare for patient scheduling, triage, billing support and virtual front door services.
