UAT US is unavailable

Major incident UAT US Platform Access Analytics Cognitive WorkBench Data WorkBench Discovery Integrations Other Features Skills
2023-07-24 07:56 UTC · 15 hours, 7 minutes

Updates

Post-mortem

Summary:

On 24th July 2023, internal monitoring and alerting indicated there was an issue whilst accessing the Aera application in UATUS01. Operations teams quickly identified that the issue arose due to the unavailability of the database, leading to a disruption in access. Usual remediation of this issue did not succeed and we engaged our 3rd party vendor to assist in the service restoration. Our teams took necessary actions under guidance from our vendor to restore functionality of the database service and restarted all related services to return the platform to being fully operational.

Customer Impact:

Unable to access the UATUS01 Aera environment.

Root Cause:

A swift unexpected increase in transactions on the database services, that could not be remediated through standard means due to a vendor bug, caused the database to become unavailable. The vendor bug within the database application prevented successful restoration of service consequently leading to a prolonged recovery time.

Remediations:

The teams took guidance from our vendor to restore functionality of the database service. Subsequently, all related services were restarted, resolving the issue.

Future Mitigating Actions:

A comprehensive review of all database related alerting and monitoring has been conducted and additional alerts have been implemented to enable teams to be able to detect and address the initial load issue more effectively and proactively avoid the need for intervention.
Mitigation and workaround steps have been provided by our vendor to ensure restoration time of any future occurrences is reduced by at least 85%
A bug has been acknowledged by our vendor and will be fixed in a future vendor release

August 11, 2023 · 07:09 UTC
Resolved

We have confirmed internally and with our customers that the Aera platform is now fully restored.

We appreciate your patience during this incident and apologise for any inconvenience that this issue may have caused. Our teams are now working on documenting a comprehensive root cause analysis which we will share with you shortly.

If you have any questions or experience any further problems please don’t hesitate to reach out to our Support team at support@aeratechnology.com

July 24, 2023 · 23:03 UTC
Investigating

We are continuing to work towards restoring service for the Database issues. We will continue to keep you informed as the investigation progresses. We appreciate your continued patience whilst we work towards resolution.

July 24, 2023 · 16:22 UTC
Investigating

We are continuing to work towards restoring service for the Database Issues. Our engineers are actively working to restore service and the ETA is ~3 hours. We will continue to keep you informed as the investigation progresses. We appreciate your continued patience whilst we work towards resolution.

July 24, 2023 · 13:22 UTC
Investigating

We are continuing to work towards restoring service for the Database issues. Our engineers are actively working to restore service and the ETA is ~4 hours. We appreciate your continued patience whilst we work towards resolution.

July 24, 2023 · 10:53 UTC
Update

We are continuing to investigate the Database issues . Our engineers are actively working to restore service and the ETA is ~6 hours. Next update will be shared once there is a change in the current status. Thank you for bearing with us whilst we work through these issues.

July 24, 2023 · 08:47 UTC
Issue

This notice is to advise you that we are receiving reports of our customers experiencing difficulties with the platform. We are actively investigating and will provide regular updates until the issues are resolved.

Our apologies for the inconvenience this may be causing and we appreciate your patience as we investigate further.

July 24, 2023 · 07:56 UTC

← Back