Back to overview
Degraded

[RETROACTIVE] Service Degradation

Oct 29 at 11:50am CET
Affected services
REST API
Dashboard UI
Webhook Delivery via [clever.par.1]
Webhook Delivery via [scw.ams.1]

Resolved
Oct 29 at 05:15pm CET

Incident is resolved.

Created
Oct 29 at 11:50am CET

Summary

We are seeing 100% database CPU usage. Analysis shows two causes: lots of malicious subscriptions have been created by a customer's user and a misconfiguration on our side has amplified the impact on our database.

Impact

  • Significant slowdowns and timeouts in our API
  • Delay in processing webhooks

Resolution

A few improvements are made to reduce the impact of housekeeping operations to database's CPU. After disabling incriminated subscriptions, webhooks queue starts to reduce and system recovers.

Next Steps

  • patch worker so that it ignores webhooks from disabled subscriptions
  • design and implement automatic deactivation of long-failing subscriptions