Platform Unavailable

Major incident PRODUCTION - East US Platform Access Analytics Cognitive Workbench Cortex Data Workbench Discovery Skills & Integrations Other Features
2026-02-18 21:27 UTC · 3 hours, 26 minutes

Updates

Post-mortem

Summary:

On Feb 19 2026, automated monitoring detected platform unavailability across multiple East US Production environments. The Production Services Engineering (PSE) and Site Reliability Engineering (SRE) teams immediately initiated an investigation.
The issue was identified as certain pods entering a degraded state due to underlying storage volumes becoming disconnected. A fix was developed, validated in lower environments, and then applied in Production. Once the corrective action was implemented, services were restored and all affected tenants returned to a healthy state. Post-recovery validation checks confirmed that the environments were functioning as expected.

Customer Impact:

During the impact window, users experienced unavailability across multiple East US Production environments. Services resumed normal operation once the affected storage volumes were successfully reattached.

Root Cause:

The disruption was caused by an unintended configuration change which resulted in the underlying storage for certain volumes becoming disconnected. This led to pods entering a degraded state and caused service unavailability. As this was the first occurrence of this condition, additional time was required to diagnose, reproduce, and validate the corrective approach prior to applying it in Production.

Remediations:

The engineering teams identified the storage disconnection condition and developed a corrective solution. The fix was first tested and validated in lower environments before being safely implemented in Production, where the affected volumes were successfully reattached. Once the resolution approach was confirmed, recovery time was approximately 20 minutes. Sanity checks were completed to ensure full service restoration.

Future Mitigating Actions:

A formal runbook and repair script have been created to enable faster recovery should a similar issue occur in the future. In addition, a permanent update has been scheduled to disable the tooling configuration that led to the storage disconnection, preventing recurrence.

February 23, 2026 · 12:41 UTC
Resolved

We have confirmed internally and with our customers that the Aera platform is now fully restored.

We appreciate your patience during this incident and apologise for any inconvenience that this issue may have caused. Our teams are now working on documenting a comprehensive root cause analysis which we will share with you shortly.

If you have any questions or experience any further problems please don’t hesitate to reach out to our Support team at Aera Support Portal

February 19, 2026 · 00:53 UTC
Investigating

Our engineers are continuing to investigate the root cause of the Platform unavailability issues. We understand the business impact this issue may have and are working to restore service as quickly as possible. Again, we thank you for your continued patience and understanding.

February 18, 2026 · 23:45 UTC
Investigating

We are continuing to work towards restoring service for the platform unavailability. Our engineers are diligently working to narrow down the root cause. We will continue to keep you informed as the investigation progresses. We appreciate your continued patience whilst we work towards resolution.

February 18, 2026 · 23:14 UTC
Investigating

We are continuing to investigate the Platform unavailability issues. Our engineers are actively working to restore service as quickly as possible. Thank you for bearing with us whilst we work through these issues.

February 18, 2026 · 22:17 UTC
Issue

This notice is to advise you that we are receiving reports that subset of our customers experiencing difficulties with the platform. We are actively investigating and will provide regular updates until the issues are resolved.

Our apologies for the inconvenience this may be causing and we appreciate your patience as we investigate further.

February 18, 2026 · 21:27 UTC

← Back