Back to overview
Degraded

One of our Document Verification API clusters is experiencing overload

Aug 27 at 03:56pm EDT
Affected services
Document Verification API

Resolved
Aug 28 at 01:50am EDT

Our Document Verification API experienced severe performance degradation causing some ID scan uploads to take 5-10 minutes instead of the expected near-instant processing. Multiple users reported these delays, with an estimated 37.5% of all verification operations potentially affected during the incident window.

User Impact

  • 3-4 confirmed users reported 5-10 minute delays for ID uploads
  • 10-15 total users attempted to use the service during the incident
  • Successfully completed verifications showed success badges but no data was accessible
  • Records were not appearing in the admin dashboard for review

Root Cause

A critical service component entered a failure state, consuming excessive CPU resources while repeatedly failing to process requests. This component handled approximately 50% of all verification traffic, causing widespread impact when it became unresponsive.

Contributing Factors

  • The affected service component entered an unrecoverable state requiring manual intervention
  • No automatic recovery mechanism was in place for high CPU usage scenarios

Timeline

  • 2025-08-26: Normal operations, no errors reported
  • 2025-08-27: Error spike began, "No scan response" errors began to appear
  • 2025-08-28: Issue identified and resolved through manual intervention

Resolution

Immediate Actions Taken:

  • Manual restart of the failed service component
  • Traffic redistribution to reduce dependency on any single server from 50% to 25%

Planned Improvements:

  • Enhanced monitoring and alerting for application-level health
  • Implementation of automatic recovery mechanisms for performance degradation
  • Improved health checks that can detect and isolate failing components

Created
Aug 27 at 03:56pm EDT

We have received reports of elevated "No scan response" errors. We are investigating