Elevated response times and error rates for the Authentication API
Incident Report for Auth0
Postmortem

We have now completed our full RCA. Please see here for the full details.

Posted Dec 01, 2018 - 03:43 UTC

Resolved
Auth0 service has stabilized for all customers. We have started working on our public post-mortem.
Posted Nov 29, 2018 - 00:11 UTC
Update
We are continuing to monitor the incident and have started the RCA. We will keep this status in monitoring and continue to provide updates.
Posted Nov 28, 2018 - 23:13 UTC
Update
At 14:44 UTC our incident response team started investigating an alarm showing an increase in response times for the Authentication API affecting all customers in the US (PROD) environment. The team started investigating the issue, and at 16:14 UTC we discovered issues in our MongoDB cluster that soon began affecting all customers in the US region. The issue started getting worse, not just impacting response times but also generating errors for customers in different APIs. From 16:30 UTC until 18:35 UTC (the worst period of the incident), 18.88% of all requests failed in the US environment.

With all hands on deck, we found issues related to a couple of Database queries that significantly impacted the cluster. We immediately began applying changes and redirected traffic to a different application cluster, which began to normalize the issue.

We are still working diligently on improving the situation and will provide an in-depth Root Cause Analysis as soon as possible.

We are deeply sorry for the inconvenience this issue has caused you, your users and your customers.
Posted Nov 28, 2018 - 20:52 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 28, 2018 - 19:11 UTC
Identified
We're seeing decrease in the amount of errors to our services. We continue to monitor the situation and update as we make progress.
Posted Nov 28, 2018 - 19:09 UTC
Investigating
We continue to investigate this issue. We're working to stabilize our services in prod-us.
Posted Nov 28, 2018 - 18:13 UTC
Update
We're continuing to work on a fix. Response times are slowly normalizing and errors are going down; we'll post an update as soon as we're back to normal.
Posted Nov 28, 2018 - 16:56 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 28, 2018 - 16:35 UTC
Investigating
We are currently investigating this issue.
Posted Nov 28, 2018 - 15:13 UTC
This incident affected: Auth0 US (PROD) (User Authentication).