Service Disruption 4/23/2026 — WorkBright Web Application Redis Servers Down
Resolved
Apr 27, 2026 at 5:09pm UTC
WorkBright — Service Incident Report
April 23, 2026
Overview
On the afternoon of April 23, 2026, WorkBright experienced a service disruption that affected most users of our web application for approximately 30 minutes. We are sharing this report to provide transparency on what happened, what was impacted, and the steps we have taken to prevent a recurrence.
Severity
P0 (Critical). The majority of users were unable to use the WorkBright web application during the incident.
Impact
The disruption began at approximately 1:17 PM PM MST and was fully resolved by 1:47 PM MST. During this period:
- The WorkBright web application was largely unavailable, including login sessions, reports, dashboards, and any functionality that relies on background processing.
- Related features such as notifications, employee group changes, and bulk document assignment updates were affected.
- Webhook delivery was paused for the duration of the incident.
- Database transaction volume dipped to approximately 25% of typical load.
The WorkBright REST API was largely unaffected. More than 90% of our 55+ REST endpoints do not depend on background processing and remained accessible throughout the incident.
Timeline
All times MST.
| Time | Event |
|---|---|
| 12:07 PM | A backfill job was initiated as part of an internal data consistency effort. |
| 12:37 PM | Automated monitoring alerted our engineering team to elevated CPU utilization. |
| 1:17 PM | External monitoring confirmed the web application was unavailable; the engineering response team was paged. |
| 1:30 PM | Engineering convened to investigate, identified the backfill job as the source, and began clearing the affected job queue. Recovery required two restarts of background processing infrastructure. |
| 1:47 PM | All services restored. |
Root Cause
A backfill job intended to update a single shared data table was unintentionally executed across all customer accounts. As a result, our background job queue received orders of magnitude more work than expected, exhausting available capacity and degrading dependent services.
The underlying defect was a tenant-selection error in the script. The change passed our standard pull request review process, including approvals from multiple engineers and automated review tooling, but the specific failure mode was not caught.
Prevention and Action Items
We have already deployed the following:
- Hard limit on backfill scope (deployed April 28, 2026). Backfill jobs are now prevented from updating more than 1 million records in a single execution.
We are also pursuing the following improvements:
- Increased background processing capacity to provide additional headroom during high-volume operations.
- Operational guardrails restricting large batch jobs from running during peak business hours.
- Strengthened review for batch job changes, including additional automated checks targeting tenant-scope errors.
Our Commitment
We take service availability seriously, and we apologize for the disruption this incident caused. If you have any questions about this incident or its impact on your account, please reach out to your WorkBright representative or contact support@workbright.com.
Affected services