On Wednesday, May 6, 2020, a traffic spike period caused a brief disruption of service to the Auth0 Production EU environment.
On Wednesday, May 6, 2020, between 00:58 and 01:15 UTC, an unexpected traffic spike to the Auth0 Production EU environment caused a disruption of service.
During these spikes, some customers in the Auth0 EU region may have observed connectivity issues and/or high latency for API requests using custom domains. Not all customers located in this Auth0 region were impacted. Tenants in the US and AU Auth0 regions were not impacted. In practical terms, some tenants in the EU region may have noticed slow login behavior or failed logins, for approximately 15 minutes, combined.
Resolution
Service started to recover by itself as soon as the spike ended, however given the load from the spike's requests still being processed, high response time was still observed in the backend services.
As part of the mitigation, both Auth0 and the custom domain services substantially scaled to handle the increased load after the recovery.
Auth0 takes its customer commitments and user experience seriously. To prevent another occurrence, here are the actions we are taking.
2020-05-06 00:59 UTC: Tenant begins sending a large number of requests to Auth0 service.
2020-05-06 01:01 UTC: The spike in traffic triggers automated Auth0 alerts. Auth0 engineering starts to investigate.
2020-05-06 01:05 UTC: Spike in customer traffic ends.
2020-05-06 01:05 UTC: Auth0 service start the recovery process, customer impact is mitigated
2020-05-06 01:15 UTC: Auth0 engineering starts to scale up internal infrastructure in prevention.
2020-05-06 01:15 UTC: All traffic levels return to normal and alarms are cleared.