On May 22nd 2020, at 12:00 UTC we experienced AD/LDAP connection issues in the US region. This impacted logins for customers using AD/LDAP connections without caching enabled. The incident lasted until 17:00 UTC.
The root cause of this incident was DNS-related, and caused by Auth0 reaching certain limitations within our DNS provider. In order to resolve the issue, we made appropriate changes to our infrastructure, and worked with our provider to increase these limitations.
To prevent incidents like this from happening in the future, we are doing the following:
12:00 UTC - Metrics first indicated these issues were starting to occur. Although we do have alerting in place for this capability, it did not detect this particular scenario.
15:25 UTC - The relevant teams were put up to speed with regards to the issue and began triaging.
16:40 UTC - We made DNS changes to resolve the issue.
17:00 UTC - Connection errors were resolved.