PRODUS - Integrations are Unavailable

Major incident Production US Integrations
2024-03-11 06:54 UTC · 7 hours, 31 minutes

Updates

Post-mortem

Summary:

On March 11th, 2024, our Production Service Engineering team observed that integration jobs within the PRODUS environment were taking longer than usual, resulting in reports of delays. Investigations revealed that multiple crawlers across the environment became stuck due to performance degradation in the system’s handling of file transfers, causing a backlog and delays in data processing. The Production Service and Product Engineering teams adjusted file transfer configuration settings and manually managed file transfer load until full service resumed. Subsequently, a product fix was identified to permanently alleviate the issue and was deployed to all environments. Following the fix, all integrations were validated and confirmed to be functioning smoothly without further issues.

Customer Impact:

Integrations were impacted by performance degradation and in some cases, crawlers became stuck.

Root Cause:

The root cause lies in the system’s handling of file transfers, resulting in a backlog and delay in data processing and subsequent stalling of crawlers.

Remediations:

  • To address the issue, adjustments were made to the file transfer configuration settings and the transfers manually co-ordinated to allow the crawlers and integrations to start working as expected again. In parallel, Engineering worked on a permanent mitigation for the issue that was released to all environments within the next 2 days.

Future Mitigating Actions:

  • The engineering team has implemented the fix across all environments.
  • Additional test cases have been added to the full suite of regression tests.
April 2, 2024 · 15:48 UTC
Resolved

We have confirmed internally and with our customers that the Aera platform is now restored however some users may experience a slight delay with some integrations.

We appreciate your patience during this incident and apologise for any inconvenience that this issue may have caused. Our teams are now working on documenting a comprehensive root cause analysis which we will share with you shortly.

If you have any questions or experience any further problems please don’t hesitate to reach out to our Support team at support@aeratechnology.com

March 11, 2024 · 14:25 UTC
Monitoring

We have identified the cause of the reported issues. All integrations are now moving with out any issues. We will continue to monitor to ensure no additional issues arise and will send a further update to confirm the resolution.

You should now be able to resume normal activities however if you continue to experience any problems please contact our support team support@aeratechnology.com

Thank you for your patience and understanding whilst our engineers restored service.

March 11, 2024 · 09:43 UTC
Investigating

We are continuing to work towards restoring service for the integrations. Our engineers are diligently working to narrow down the root cause. We will continue to keep you informed as the investigation progresses. We appreciate your continued patience while we work towards resolution.

March 11, 2024 · 08:31 UTC
Investigating

We are continuing to investigate the integration issues. Our engineers are actively working to restore service as quickly as possible. Thank you for bearing with us whilst we work through these issues.

March 11, 2024 · 07:31 UTC
Issue

This notice is to advise you that we are receiving reports of our customers experiencing difficulties with the platform. We are actively investigating and will provide regular updates until the issues are resolved.

Our apologies for the inconvenience this may be causing and we appreciate your patience as we investigate further.

March 11, 2024 · 06:54 UTC

← Back