UATIRL is unavailable
Updates
Summary:
On 7th February 2025, at 12:00 AM UTC, the Aera platform experienced a service disruption, affecting access for a subset of customers. This issue was caused by unresponsiveness and degraded performance within the storage layer of the IRL cluster. Despite multiple recovery attempts, a full restoration of the affected storage mounts was ultimately required to resolve the issue. The platform is now fully operational.
Customer Impact:
A subset of customers experienced difficulties accessing the Aera platform.
Root Cause:
The incident was caused by heavy latency in the cluster’s storage layer leading to extremely slow performance. This storage degradation led to widespread platform disruptions, preventing customers from accessing the system.
Remediations:
The SRE team attempted multiple recovery procedures to restore the unresponsive storage mounts. When these attempts were unsuccessful, a full restoration from backups was initiated. The affected mounts were successfully restored, bringing the platform back online. The application was then verified to be fully operational.
Future Mitigating Actions:
To prevent similar incidents in the future, the following actions will be taken:
Enhanced Monitoring: Implement more granular monitoring of the storage layer, including specific metrics for mount responsiveness and performance, enabling earlier detection of potential issues.
Performance Analysis: Conduct a detailed performance analysis of the storage layer to identify bottlenecks and potential failure points.
Backup and Restore Optimisation: Review and optimise backup and restore procedures to minimise recovery time in future incidents, with regular testing to ensure effectiveness.
We have confirmed internally and with our customers that the Aera platform is now fully restored.
We appreciate your patience during this incident and apologise for any inconvenience that this issue may have caused. Our teams are now working on documenting a comprehensive root cause analysis which we will share with you shortly.
If you have any questions or experience any further problems please don’t hesitate to reach out to our Support team at Aera Support Portal
Aera Platform is up and running and integrations are running as expected. We are continuing to work towards restoring all the services for the platform. We will continue to keep you informed as the investigation progresses. . We appreciate your continued patience whilst we work towards resolution.
We are continuing to investigate the UATIRL unavailability issues. Our engineers are actively working to restore service as quickly as possible. Thank you for bearing with us whilst we work through these issues.
This notice is to advise you that we are receiving reports that a subset of our customers are experiencing difficulties with the platform. We are actively investigating and will provide regular updates until the issues are resolved.
Our apologies for the inconvenience this may be causing and we appreciate your patience as we investigate further.
← Back