Azure… or no Azure

Like all public/private clouds, Azure does have its ups and downs and can maybe effect all of us at some point. However due to this, Microsoft publish a public article Designing resilient applications for Azure to help to protect you against any outages. However, when there are any outages, you would want to know about them and be fully kept in the loop. To be totally covered as much as possible for both your stuff and general stuff with a wider scope with the least noise – two steps below:

  1. More targeted notifications – Make use of Azure Service Health, you can Create activity log alerts on service notifications and be notified with Service issues, Planned maintenance, Health advisories. Only caveat is, Service health notifications does not send an alert regarding resource health events, for this, you would need Azure Activity Logs – a log that provides insight into any subscription-level events that have occurred in Azure. Azure Service Health tracks three types of health events that may impact you (Service issues, Planned maintenance, Health advisories):

    I have defined each category below:

    1. Service issues – Problems in Azure services that affect you right now
    2. Planned maintenance – Upcoming maintenance that can affect the availability of your services in the future. Typically communicated at least seven days prior to the event
    3. Health advisories – Health-related issues that may require you to act to avoid service disruption. Examples include service retirements, misconfiguration, exceeding a usage quota, and more. Usually communicated at least 90 days prior, with notable exceptions including service retirements, which are announced at least 12 months in advance, and misconfigurations, which are immediately surfaced
    4. Resource Health – keeps you informed about the current and historical health status of your Azure resources. Azure Resource Health alerts can notify you in near real-time when these resources have a change in their health status
  2. For everything else, you can use the Azure status page, here you can subscribe to the RSS feed. You can hook this up with Logic Apps and be notified proactively when anything is logged, I wrote a blog on Logic Apps conditions a couple of years ago which you can leverage when writing your Logic Apps.

We will only publish Azure outage details to the Azure status page if:

  • We cannot determine the subscriptions that are affected by an issue i.e. if the issue is multi region etc
  • The issue would prevent customers being able to get to their Azure Portal or direct Communications are not getting to customers
  • It is a large issue impacting multiple service and/or regions

Some examples of notifications on the Azure Status page:

  • “if we have an issue with a single rack or server etc., somewhere, impacting a handful of customers, then it doesn’t warrant sending this to all customers, it would be just noise.”
    • These are single points of failures and are known as fault domains… We publish fault domains here, Unexpected Downtime is when the hardware or the physical infrastructure for the virtual machine fails unexpectedly. This can include local network failures, local disk failures, or other rack level failures

”if it’s multi-service or multi-region, then it will appear on the Azure status page feed.

Leave a comment