Spark User Donation Problems
Incident Report for Benevity
Postmortem

Summary

On April 16th, 2024, at 9:20 am MT, a version of our financial infrastructure engine was deployed to the production environment.  During this release, our team noticed an connection issue between our financial processing engine and our User Management (UMS) System, used by a small subset of clients.  Our response teams quickly assembled and were able to immediately deploy a fixed version of the financial platform to production, resolving the problem with UMS connection by 10:30am MT.

Impact

The User Management (UMS) System is leveraged by a small subset of clients to handle user-specific information that is required within our donation, volunteering, and user profile workflows.  During the 70 minute span of the incident, this subset of users would have experienced problems associated with accessing and updating their personal information, as well as entering Acts of Goodness such as donations.  Following incident remediation, all functionality was restored with no further impact to users.

Root Cause

This problem arose due to complex interactions between the coding dependencies used within our financial engine, where an external library was changed to a version unsupported by the UMS service.  The symptom of this incompatibility was the inability for UMS to connect with the financial engine. 

Future Mitigation

As part of our continuing commitment to quality and the robustness of our systems, Benevity leverages automated testing throughout our build and deployment processes to verify the correctness of behaviour of the various components.  Our Engineering teams have already implemented additional automated tests to confirm the ability of the UMS system to connect with the financial engine, in addition to addressing the external library version inconsistencies within these components.

Timeline of Events

  • 09:20 MT - Benevity Team noticed problem
  • 09:20 MT - Major contributing cause identified
  • 10:05 MT - Fix deployed
  • 10:30 MT - Incident resolved; systems fully operational
Posted Apr 19, 2024 - 16:13 MDT

Resolved
This incident has been resolved.
Posted Apr 16, 2024 - 10:33 MDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 16, 2024 - 10:27 MDT
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 16, 2024 - 10:06 MDT
Investigating
We are investigating an issue with some Spark users unable to donate
Posted Apr 16, 2024 - 10:03 MDT
This incident affected: Benevity Spark.