Back to overview

Service Disruption 4/23/2026 — WorkBright Web Application Redis Servers Down

Apr 27, 2026 at 5:09pm UTC
Affected services
WorkBright Application & API

Resolved
Apr 27, 2026 at 5:09pm UTC

WorkBright — Service Incident Report

April 23, 2026

Overview

On the afternoon of April 23, 2026, WorkBright experienced a service disruption that affected most users of our web application for approximately 30 minutes. We are sharing this report to provide transparency on what happened, what was impacted, and the steps we have taken to prevent a recurrence.

Severity

P0 (Critical). The majority of users were unable to use the WorkBright web application during the incident.

Impact

The disruption began at approximately 1:17 PM PM MST and was fully resolved by 1:47 PM MST. During this period:

  • The WorkBright web application was largely unavailable, including login sessions, reports, dashboards, and any functionality that relies on background processing.
  • Related features such as notifications, employee group changes, and bulk document assignment updates were affected.
  • Webhook delivery was paused for the duration of the incident.
  • Database transaction volume dipped to approximately 25% of typical load.

The WorkBright REST API was largely unaffected. More than 90% of our 55+ REST endpoints do not depend on background processing and remained accessible throughout the incident.

Timeline

All times MST.

Time Event
12:07 PM A backfill job was initiated as part of an internal data consistency effort.
12:37 PM Automated monitoring alerted our engineering team to elevated CPU utilization.
1:17 PM External monitoring confirmed the web application was unavailable; the engineering response team was paged.
1:30 PM Engineering convened to investigate, identified the backfill job as the source, and began clearing the affected job queue. Recovery required two restarts of background processing infrastructure.
1:47 PM All services restored.

Root Cause

A backfill job intended to update a single shared data table was unintentionally executed across all customer accounts. As a result, our background job queue received orders of magnitude more work than expected, exhausting available capacity and degrading dependent services.

The underlying defect was a tenant-selection error in the script. The change passed our standard pull request review process, including approvals from multiple engineers and automated review tooling, but the specific failure mode was not caught.

Prevention and Action Items

We have already deployed the following:

  • Hard limit on backfill scope (deployed April 28, 2026). Backfill jobs are now prevented from updating more than 1 million records in a single execution.

We are also pursuing the following improvements:

  • Increased background processing capacity to provide additional headroom during high-volume operations.
  • Operational guardrails restricting large batch jobs from running during peak business hours.
  • Strengthened review for batch job changes, including additional automated checks targeting tenant-scope errors.

Our Commitment

We take service availability seriously, and we apologize for the disruption this incident caused. If you have any questions about this incident or its impact on your account, please reach out to your WorkBright representative or contact support@workbright.com.