Platform Write Operations Not Functioning
Incident Report for Benevity
Postmortem

Summary
On November 28th 2022, an issue was encountered with our backend API during preventative maintenance, preventing end users from performing certain actions including donations.

Impact
Spark users were unable to donate from November 28th 2022 21:31 to 22:13 MT.

Root Cause
During unscheduled, preventative maintenance, a restart of one of Benevity's backend persistent data stores caused backend API connections to be routed to a read-only endpoint, preventing the creation of new records and interrupting key workflows such as donating. A restart of Benevity's backend API was required to re-establish proper connection routing and restore functionality.

Future Mitigation

  • Documentation will be added to ensure that it is a known deficiency that our backend API requires downtime if we are to do maintenance on it's data store.
  • Future maintenance will take this into consideration and be performed during planned outage windows.

Timeline of Events

  • 21:33 MT - Planned maintenance carried out
  • 21:46 MT - Internal user identified an issue with donations
  • 21:54 MT - Issue identified through application logs
  • 21:57 MT - Rolling restarts of backend APIs
  • 21:58 MT - Successful donation processed
  • 22:10 MT - All backend APIs restarted and full functionality restored
Posted Dec 02, 2022 - 15:42 MST

Resolved
Summary
On November 28th 2022, an issue was encountered with our backend API during preventative maintenance, preventing end users from performing certain actions including donations.

Impact
Spark users were unable to donate from November 28th 2022 21:31 to 22:13 MT.

Root Cause
During unscheduled, preventative maintenance, a restart of one of Benevity's backend persistent data stores caused backend API connections to be routed to a read-only endpoint, preventing the creation of new records and interrupting key workflows such as donating. A restart of Benevity's backend API was required to re-establish proper connection routing and restore functionality.

Future Mitigation
Documentation will be added to ensure that it is a known deficiency that our backend API requires downtime if we are to do maintenance on it's data store.
Future maintenance will take this into consideration and be performed during planned outage windows.

Timeline of Events
21:33 MT - Planned maintenance carried out
21:46 MT - Internal user identified an issue with donations
21:54 MT - Issue identified through application logs
21:57 MT - Rolling restarts of backend APIs
21:58 MT - Successful donation processed
22:10 MT - All backend APIs restarted and full functionality restored
Posted Nov 28, 2022 - 21:30 MST