Minor Issue - Queue Latency - US SL1
Incident Report for Cornerstone
Postmortem

On February 26th, 2024, Cornerstone's engineering team swiftly responded to alerts generated by our internal monitoring tools, signalling issues with queuing functionality affecting clients in the US SL1 (AWS) Prod swimlane.

After initiating an immediate investigation, we identified the root cause as an issue with one of the instances of the cluster responsible for managing the queuing functionality. Subsequent analysis pinpointed a specific queue component within the cluster that had become stuck, resulting in the observed delays. To resolve this, the problematic queue component was promptly removed, and the backlog in the queue was processed, restoring normal operations.

To prevent similar incidents in the future and minimize the risk of recurrence, Cornerstone collaborated internally with the application team to implement a permanent fix. This proactive approach aims to safeguard against similar disruptions and ensure continued reliability of our systems.

Posted Feb 28, 2024 - 04:45 PST

Resolved
The CSOD Technology Team observed delays in processing background or queued tasks on this swim lane. The problem began on 2/26/24 at 8:36pm Pacific Time and was resolved on 2/27/224 at 11:05am Pacific Time. During this time, background tasks may have taken longer than normal to complete.
Posted Feb 27, 2024 - 13:18 PST
Monitoring
A fix has been implemented and we are monitoring the results. Queues are now processing. There is a backlog of queued items and we are taking measures to clear the backlog. Thank you for your continued partnership.
Posted Feb 27, 2024 - 11:15 PST
Update
We are continuing to investigate this issue.
Posted Feb 27, 2024 - 09:00 PST
Investigating
This swimlane is experiencing some delays in processing background or queued tasks . During this time, background tasks may take longer than normal to complete. This is our top priority and we are working to resolve the problem as soon as possible. Please check back periodically for additional updates, which will be posted as they become available.
Posted Feb 27, 2024 - 08:05 PST
This incident affected: US SL1 (AWS) (Response Time).