A new cause search functionality for Spark was introduced in December 2021. This change introduced a non-optimized query whose performance was immaterial under normal system load. As Giving Tuesday (Nov 29th) volume rapidly increased around 8:26am MST, this query pushed search performance over the limit causing extensive degradation across multiple pages for 133 minutes.
All Spark clients started experiencing extremely slow response times on the Cause Search and Cause Profile pages. While these pages were not completely unresponsive, response times significantly degraded Spark's ability to fulfill our clients' usability for over two hours.
Spark's cause search engine capacity was exhausted due to non-optimized queries, and load test coverage failed to identify the issue. Initial attempts to add additional capacity to the search service while under load were unsuccessful which contributed to a longer time to remediate the issue.