Issues with our Active Directory Connector
Incident Report for Auth0
Postmortem

05/22/2020 - AD/LDAP Partial Outage in the US Region

Summary

On May 22nd 2020, at 12:00 UTC we experienced AD/LDAP connection issues in the US region. This impacted logins for customers using AD/LDAP connections without caching enabled. The incident lasted until 17:00 UTC.

What Happened

The root cause of this incident was DNS-related, and caused by Auth0 reaching certain limitations within our DNS provider. In order to resolve the issue, we made appropriate changes to our infrastructure, and worked with our provider to increase these limitations.

Mitigation Actions

To prevent incidents like this from happening in the future, we are doing the following:

  1. Making improvements to our best-practices with regards to infrastructure resources.
  2. Adding additional monitoring and alerting to improve our response time for issues like this.

Annex 1: Events Timeline

12:00 UTC - Metrics first indicated these issues were starting to occur. Although we do have alerting in place for this capability, it did not detect this particular scenario.

15:25 UTC - The relevant teams were put up to speed with regards to the issue and began triaging.

16:40 UTC - We made DNS changes to resolve the issue.

17:00 UTC - Connection errors were resolved.

Posted Jun 04, 2020 - 20:58 UTC

Resolved
This incident has been resolved.
Posted May 22, 2020 - 17:30 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 22, 2020 - 16:52 UTC
Identified
The issue has been identified and our Engineers are working on providing a fix.
Posted May 22, 2020 - 16:17 UTC
Investigating
We are investigating issues with our Active Directory Connector in our US Preview region. This is affecting authentication for these connections, unless you had caching enabled in it. Bear in mind that this is not affecting our Production region. Our Engineers are working on resolving this as soon as possible, and we will keep you updated.
Posted May 22, 2020 - 16:02 UTC
This incident affected: Auth0 US (PREVIEW) (Other).