Migrating a large number of applications to the public cloud is a complicated process. There are many strategic questions organizations need to answer, steps to consider and areas where migrators commonly struggle. In an earlier Doppler article, we discussed some of these pitfalls and outlined six key enablers that teams should put in place to ensure a migration process succeeds.
The first two enablers – collecting and analyzing your data – alone can seem like overwhelming tasks. Data is stored in a variety of formats. It comes from different sources. Some of it is up to date, some is not. Data describes different sets of assets, and the relationships among various data elements and asset types can be difficult to sort out. How do you make sense of all this information so you can use it to plan the migration?
One way to approach the data collection and assessment process is the way chefs prepare a meal. Chefs start with “inputs”: raw or pre-cooked ingredients. They use “tools”: pots, pans, appliances and instruments. They move on to conduct “processes”: a series of steps that transform raw ingredients into consumable products, which we will call “outputs.” Before the food is served, a chef may need to put it into temporary “storage” to preserve freshness until its eventual consumption.
In a migration project, a migration team uses the resources it has at hand to prepare vast amounts of data that can be collected, transformed, sampled and repackaged for use and reuse. Let’s step into our “kitchen” and see how teams can bring data to the table. Let’s look at how they aggregate and correlate it, and understand relationships between assets and recognize patterns using data analytics tools and techniques, to generate outputs and prepare for migrations.
Gather Up Your Ingredients
Just like a quality beef stew recipe includes meat, vegetables and a creative array of spices, a cloud migration project starts with its own distinct set of ingredients. The list is usually long. It includes morsels of data gathered up from application and database inventories, server and device inventories, network scanners, configuration management database (CMDB) exports, reports from asset discovery tools, server process data, server communications data, spreadsheets with departmental asset information and possibly other sources.
The ingredients vary from company to company. While smaller companies may pull most of their data out of CMDBs, larger enterprises can have a multitude of data sources. Even if data is coming from a small data set – a server list, for example – there can be a large number of applications and databases hosted on those servers, resulting in a rich and complex information set.
To gather the right ingredients, scanning tools should run for an extended period of time (around two months), so they can capture infrequent business processes. Along the way, they will capture significant amounts of time series data on application and OS processes, network connections and a myriad of other events happening in an enterprise information system. We leverage HPE Right Mix Adviser (RMA) which provides such a set of asset discovery capabilities and delivers a rich set of data about applications and assets, as well as their dependencies for additional analysis. Furthermore, RMA will help you process this data to determine cloud suitability and migration disposition for your application portfolio, answering questions including “Which applications should I refactor, replatform, or rehost?”
Plan Your Processes
Before we get to tooling selection, it makes sense to look at how to prepare the data for consumption. There are several steps. Just like a chef does not just throw raw ingredients into a pot and turn on a burner, a team needs to proceed carefully and strategically to create a quality output. Here are a few steps to follow.
- Data collection and normalization. Collect the input data and make it consumable. Put it in a format that is acceptable to analytics tools.
- Data manipulation and aggregation. This is where you massage the data, add statistical functions, process time series data and then summarize it into totals, counters, averages and other statistical aggregates. You can later run queries to make sense out of the aggregated data. This is the intermediate processing of the data.
- Apply analysis. This is where you create queries to correlate between different data sets and apply analytical techniques to gain insights. This helps us plan the migration.
- Refresh and repeat. Environments do not stay static. Data center environments are dynamic. Servers will be provisioned or decommissioned, applications will be deployed and undeployed, and networks will change – all while migration is being planned or even being conducted. The updated data sets will need to be reingested and reprocessed to update the results.
Put Your Tools to Use
Now let’s get to the cooking! What tools shall we use?
Since you are migrating into the cloud, why not take advantage of the rich set of data analytics services and tools from the leading cloud providers? Why reinvent the wheel?
For data prep and data processing, cloud services such as AWS Glue and EMR, GCP Cloud Dataprep and Cloud Dataproc or Azure Databricks do the job. You may not need a significant amount of data prep and processing. It will depend on how normalized you get the data in the first place, and how relatable the data sets are. One area that is particularly important to prepare is time series data, which may be difficult to query. You may have to preprocess this data and create some aggregations out of the raw input.
Quite a few tools perform analysis and visualization. Some data processing tools have data analysis capabilities, others are more specifically designed for data analysis. Amazon Athena can be used to query data directly from storage while applying a just-in-time schema to it. If you have minimal requirements for preprocessing, you can skip that step altogether and use Athena to query your unstructured data from S3.
You can visualize your assets and their relationships by using analytics services such as Amazon QuickSight, GCP Cloud Datalab or Azure Power BI. These tools can generate graphs and charts, draw relationships, illuminate patterns and generate insights into your data.
Produce Quality Outputs
Now we are getting to the good stuff – the finished product. The meal a migration team prepares produces a desired array of query results, reports and diagrams dissecting application and server data.
Here are some of the more interesting outputs that will help with migration planning:
- Asset groupings. What are the clusters of assets with a large number of interdependencies?
- Communication patterns. What are the protocols that represent these dependencies?
- Shared infrastructure. Which applications share servers?
Put Everything in Storage
Unlike a meal, data lives on as long as it is useful. It needs to be reliably stored, so it can be reanalyzed, repackaged and reused.
All of the major public cloud service providers have rich sets of capabilities for storing structured and unstructured data. Amazon S3 has high durability and solid SLAs. Microsoft Azure and Google Cloud have their own storage services with similar capabilities.
Another, more sophisticated alternative for storage is a graph database, such as Neo4j or Amazon Neptune. A graph database is an optimal way to represent complex nonlinear relationships (such as shared infrastructure and transitive dependencies between components), but it is not absolutely required. A relational database management system (RDBMS) can store structured sets of data that can be queried for use in other applications. Although relational technology is not optimal for traversing multi-hop references, it can still be used with a little more effort.
Cloud migration planning is not easy. It is an involved process. But with the right data analytics tools and processes, you can prepare a proper migration plan that takes into account the intricacies of your environment.
If you do not go through this process, you run the risk of missing important details about your assets and wind up putting together a migration plan that is suboptimal. You then might have to rework it again and again, which will waste time, money and resources. If you migrate applications without considering all the dependencies, it just will not work. Not having clear insight into your asset data, will prevent you from doing a good job with planning your migration.
Follow this as a framework and adjust it to your environment, choosing the processes and data analytics tooling that will get you the desired outcome: a clear understanding of the workloads, the assets planned for cloud migration and their relationships. Should you require assistance, a trusted and experienced migration partner can help you navigate the deep waters of cloud migration. Our Application Migration Plan for Cloud and Application Migration for Cloud services will help guide you along the optimal path of migration planning and execution.