This is part one of two that will dive into how to develop an alerting strategy for the cloud.
How many of us could make it through a day without having to rely on preprogrammed alerts? Alarm clocks tell us when it is time to get up, low fuel lights warn us when our gas tanks need filling and Facebook alerts remind us that three of our cousins have birthdays today.
Alerts are critically important in the cloud, as well. There are hundreds, if not thousands, of alerts you can set to track progress, stop or start operations, manage utilization and security compliance, flag abnormal performance levels or add new resources. An IT staff that can skillfully manage all the alerts at its disposal will be positioned to provide many benefits to the organization’s infrastructure, applications and networks.
The problem is, alerts are more complicated than they seem. The fact that there are so many (with so many data sources and so many potential outcomes) makes it difficult for IT leaders to manage what they have, potentially neglecting what they ought to have. What these leaders must understand is that they need a method to create an alerting strategy where alerts do what organizations want and deliver the outcomes organizations need.
Alerts, of course, come in a variety of forms. Some are simply informational – letting a department know when an IT task is finished, or how many servers are running. There are also “velocity alerts” based on forecasted data that tell you when you are, say, two weeks away from running out of server space. These help with planning. Other alerts are more mission critical. The server running the e-commerce site is down! An unauthorized user has accessed a departmental database!
To create an effective alerting strategy, an organization has to take a hard look at all its alerts, and decide whether they are truly serving a need. Are the alerts helping the organization meet predefined service level agreements (SLAs) with outside parties (customers, vendors, partners, other stakeholders), or with the organization itself? If there are facets to the SLAs that are not being addressed, are there new alerts – or new practices – that can plug the gaps?
Developing an Alerting Strategy
The overall goal of alerting is to make sure the right people are getting the right alerts at the right time with the right level of urgency. This enables organizations to correct situations immediately or to act proactively to improve operations over time. Even more importantly, the IT leaders charged with overseeing the alerting strategy can create automated responses to do things such as reflexively spin up more servers when space is tight.
What should organizations consider as they develop an alerting strategy? The priority should be placed on ensuring that the most critical issues are covered, before developing the more informational alerts, or the “nice to have” notifications.
At the top of the list are alerts that monitor the resources that would impact the business most directly if they went down. For example, you need a storage space alert if the organization always needs a certain amount – because when that storage fills up, it will not be able to generate critical log files anymore. An alert should also be tied to a database used for ordering products, because if that system goes down, customers cannot order anything.
Network connectivity is a big issue. If your company’s websites are connected to a network that goes down, customers will not be able to access information or purchase products – in other words, your brand’s health will be at stake. Your alerting strategy should ensure that if a network loses a connection, it triggers an immediate response – to restore service automatically, if possible, or at least to reach the right person who can solve the problem.
Your alerting strategy’s primary concern is security. If a hacker gets into your environment, you want to have an alert in place to block further access, quarantine the system and notify designated people so they can protect the data and restore service.
Managing Alerts Across Environments
Other considerations? One involves managing alerts across hybrid or multi-cloud environments. AWS, Microsoft Azure and Google all have cloud native tools that can ingest alerts from on premises or on their own cloud platforms. But in multiple, cross-cloud environments, you might need to have a third-party vendor ingest the logs, manage the alerts and present them in a dashboard that is easy to read.
Another key part of the alerting strategy is to determine who will receive specific alerts. It is fine to have information flowing through your systems, but is it going to the people who can act on it correctly? Organizations need to determine who will be in charge of certain alerting structures, and how alerts are parsed out. Does this alert go to a separate entity or to a dedicated resource? Do the networking alerts go to the networking team, and the server alerts go to the server team? Does one team govern the distribution of alerts? Do you involve multiple teams on issues that require multiple steps to solve?
Alerting strategies need to find a balance between flexibility and rigidity. Staying flexible gives departments a chance to evaluate situations and respond accordingly – e.g., based on data use patterns. For example, you do not want an alert to automatically acquire more storage when you hit 70 percent utilization during your slowest season. Conversely, you do not want to set up a knee-jerk response where you have to make decisions on every alert. You want the system to work for you, not generate more work.
Here are a few key steps an organization should take to develop an alerting strategy:
- Take inventory. Find out how many alerts you have, where they are going and what they are doing.
- Identify the key players. How many stakeholders does each alert affect? Who is managing the outcomes of each alert?
- Create an action plan. Each set of alerts should have an outcome and serve a particular purpose.
- Understand correlations. Do certain alerts shed light on potential problems in multiple areas? Who is responsible if the information that is created crosses multiple domains?
- Map alerts to SLAs. This is the most critical point. Alerts that do not serve specific business purposes outlined in SLAs can add unnecessary cost and confusion to the process.
Make no mistake: alerts serve important functions within organizations. They provide information and give organizations a chance to improve – either at a moment’s notice or over time. But, as with any information source, alerts can either be useful or distracting depending on how they are managed. Companies that are growing in the cloud can make alerts work to their advantage if they develop a comprehensive alerting strategy that aligns with their SLAs. Now that we have aligned your alerting strategy to your business SLAs, what will that strategy look like in the cloud?
Stay tuned for part two.