Major Issue - Service Disruption - US Swimlane

Incident Report for Cornerstone

Postmortem

Issue Summary:

On Monday, starting at 5:40 AM PST, customers using basic (username/password) authentication were unable to initiate new sessions.

SSO-based login workflow and existing active user sessions didn’t observe errors in the authentication workflow, however the elevated errors and latencies were observed across multiple functionalities including Edge, Reporting, Recruiting and Performance Reviews etc.

By 8:00 AM PST, the reported issue was resolved confirming all the impacted functionalities to be working as expected.

Root Cause:
The issue occurred due to the backend authentication service was unable to complete the login process for new sessions. The authentication service and the subsequent dependent services failed to initialize due to a missing component due its dependency on a third-party plugin . The service was attempting to download the plugin dynamically at runtime, and the source was either unavailable or blocked, leading to a cascade failure in the authentication path for new logins via username/password. Poor logging and missing observability further delayed diagnosis and resolution.

Corrective Action:

  • The plugin was downloaded and hosted locally to eliminate runtime dependency on external sources.
  • Service was restarted with the locally available plugin, restoring normal login functionality.

Preventive Actions:

  1. Audit and catalog all third-party runtime dependencies used across the environment.
  2. Introduce local caching or bundling mechanisms for essential third-party libraries/plugins.
  3. Improve logging and observability for critical services to ensure faster root cause identification.
  4. Update operational runbooks and team training to include this class of failures.
  5. Implement automated checks to validate the availability of external dependencies during build/deploy phases.
Posted Jun 05, 2025 - 14:26 PDT

Resolved

After careful monitoring, the issue is Resolved. 

The CSOD Technology Team observed a service disruption on this swimlane.  The problem began at 05:10AM Pacific Time and service was restored at 08:29AM Pacific Time.  During this time, clients with portals on this swimlane were not able to access the application or may have experienced errors across all modules.

If you are still experiencing issues, please clear your cache. If you need extra support, please update your case and GCS will be happy to help. Thank you for your patience. A full RCA with preventative measures will be shared in 7-10 business days on Status page.
Posted Jun 02, 2025 - 10:07 PDT

Monitoring

The service disruption situation is now under recovery. The errors should no longer be observed for the impacted functionalities. The browser cache needs to be cleared if any users still observe the error.

Some of the functional impacts due to the current Service Disruption were:
- Login with username and password
- Reporting 2.0
- Edge Import
- Performance Review
- Password Reset feature

We are actively monitoring the situation to make sure no further functional issues are observed any longer.
Posted Jun 02, 2025 - 08:39 PDT

Identified

CSOD technical teams are actively working towards the recovery of the impacted service disruption. 50% of the service request are expected to be successful, however still causing a partial disruption with some service requests being failed. We are share further updates regularly as we make progress towards the recovery.
Posted Jun 02, 2025 - 07:40 PDT

Update

We are continuing to investigate this issue.
Posted Jun 02, 2025 - 06:49 PDT

Update

We are continuing to investigate this issue.
Posted Jun 02, 2025 - 06:45 PDT

Update

We are continuing to investigate this issue.
Posted Jun 02, 2025 - 06:34 PDT

Update

We are continuing to investigate this issue.
Posted Jun 02, 2025 - 06:32 PDT

Investigating

The US swimlanes are experiencing a service disruption. The problem began at 5,40 AM Pacific Time. This is our top priority and we are working to resolve the problem as soon as possible. Please check back periodically for additional updates, which will be posted as they become available.
Posted Jun 02, 2025 - 06:31 PDT
This incident affected: US SL3 (AWS) (Uptime), US SL5 (AWS) (Uptime), US SL1 (AWS) (Uptime), and US SL2 (AWS) (Uptime).