Increased errors in Auth0
Incident Report for Auth0
Postmortem

We have posted the final update to our detailed root-cause analysis for our service disruption on April 20, 2021.

This update includes:

– Completion of the remaining six action items

https://cdn.auth0.com/blog/Detailed_Root_Cause_Analysis_(RCA)_4-2021.pdf

Posted Apr 22, 2021 - 23:55 UTC

Resolved
We have now restored the affected regions of @auth0 for US-1, Support Center, and the Auth0 Dashboard. The root cause has been identified and is related to specific queries that created resource contention and impacted database performance. We will also provide our customers with a detailed Root Cause Analysis (RCA) within the next 12 days or sooner. We sincerely apologize to all affected customers and will post additional follow-ups here.
Posted Apr 20, 2021 - 23:53 UTC
Update
We are continuing to closely watch performance in US-1.
Posted Apr 20, 2021 - 23:05 UTC
Monitoring
Based on the sustained performance for Auth0 US-1, Support Center, and the Auth0 Dashboard, we have now updated our status to monitoring. We are continuing to closely watch performance in US-1. We will now be moving to hourly updates.
Posted Apr 20, 2021 - 21:39 UTC
Update
We have now restored the affected regions of @auth0 for US-1, Support Center, and the management dashboard. Our User Search v3 service has been enabled and the data is catching up. We will update here once all data has been brought up to current.
Posted Apr 20, 2021 - 21:23 UTC
Update
We continue to see performance improvements. We are working to fully restore all services to customers in our US-1 region.
Posted Apr 20, 2021 - 20:52 UTC
Update
We continue to see additional customers that have moved to degraded performance. Our User Search v3 service is currently disabled, which can generate stale data when using `/api/v2/users` endpoints. Once the service is enabled again, all data will be brought up to current.
Posted Apr 20, 2021 - 20:38 UTC
Update
We continue to see customers that have moved to degraded performance. Systems are recovering and access to the Auth0 Dashboard has been restored. We are continuing to dedicate our full team's efforts on restoring services for all customers impacted by this incident.
Posted Apr 20, 2021 - 20:18 UTC
Update
We are seeing customers in US-1 Production moving to degraded performance. We are continuing with all efforts to fully restore services for all customers.
Posted Apr 20, 2021 - 20:05 UTC
Update
We are continuing to work on restoring services for our outage. We can communicate that users who are logged in are not impacted by this incident. We are executing all remediation steps for our incident protocol. Our entire technical and engineering teams are taking this as an all hands on deck situation to find resolution.
Posted Apr 20, 2021 - 19:44 UTC
Update
We are continuing to work on restoring services as quickly as possible. As soon as we have an ETA for the restoration of services, we will update our status.
Posted Apr 20, 2021 - 19:25 UTC
Update
We are continuing to work on restoring services as quickly as possible. We understand that this is causing a significant impact on your business.
Posted Apr 20, 2021 - 19:08 UTC
Update
We are continuing to work on restoring services as quickly as possible. We understand that this is causing a significant impact on your business.
Posted Apr 20, 2021 - 18:54 UTC
Update
We are continuing to investigate the root cause and resolution of the issue at hand.
Posted Apr 20, 2021 - 18:22 UTC
Update
We continue to look into the root cause of this particular issue. Our Engineers are taking this as an all hands on deck situation.
Posted Apr 20, 2021 - 18:06 UTC
Update
We have identified that this issue is coming from our database. However, we are still tracking down the root cause of the issue. We understand that this is an issue impacting the entirety of the environment, and the full Auth0 team is engaged in resolving this.
Posted Apr 20, 2021 - 17:50 UTC
Update
We are continuing to work on this incident and we are also looking into ways to reduce the impact this is generating. At this time, we are still unclear as to the root cause of the issue. We understand that this is causing a significant impact on your business, and we thank you for your patience. We are doing everything we can to restore service.
Posted Apr 20, 2021 - 17:31 UTC
Update
Our Engineers have identified some patterns in our databases that we believe could be related to this issue. We are attempting to leverage them to identify the issue
Posted Apr 20, 2021 - 17:14 UTC
Update
A set of our customer base has reported being unable to access our Support Center as a result of this issue. If you need to file a ticket, we have enabled a temporary support route, which can be accessed by emailing support-backup@auth0.com
Posted Apr 20, 2021 - 16:41 UTC
Update
We have received some reports of status.auth0.com being inaccessible by a small subset of customers. As a reminder, you can also follow the status updates on twitter.com/auth0status.
Posted Apr 20, 2021 - 16:20 UTC
Update
Our Engineers and on-call teams continue to investigate this set of issues.
Posted Apr 20, 2021 - 15:59 UTC
Investigating
We are currently experiencing an increased error rate in Auth0 services. Our Engineering team is investigating, and we will provide you with further updates as we have them.
Posted Apr 20, 2021 - 15:43 UTC
This incident affected: Auth0 US (PREVIEW) (User Authentication, Machine to Machine Authentication, Multi-factor Authentication, Management API), Auth0 US (PROD) (User Authentication, Machine to Machine Authentication, Multi-factor Authentication, Management API), and Management Dashboard (manage.auth0.com).