One of the main goals of DevOps is to improve the overall workflow in the software development life cycle (SDLC). The flow of work is often described as WIP or work in progress. Improving WIP can be accomplished by a variety of means. In order to effectively remove bottlenecks that decrease the flow of WIP, one must first analyze the people, process, and technology aspects of the entire SDLC. After performing this analysis at a number of Fortune 500 companies and by collaborating with my peers in this space, I have compiled a list of 11 bottlenecks that have the biggest impact on the flow of work.
1. Inconsistent Environments
In almost every company I have worked for or consulted with, a huge amount of waste exists because the various environments (dev, test, stage, prod) are configured differently. I call this “environment hell”. How many times have you heard a developer say “it worked on my laptop”? As code moves from one environment to the next, software breaks because of the different configurations within each environment. I have seen teams waste days and even weeks fixing bugs that are due to environmental issues and are not due to errors within the code. Inconsistent environments are the number one killer of agility.
Create standard infrastructure blueprints and implement continuous delivery to ensure all environments are identical.
2. Manual Intervention
Manual intervention leads to human error and non-repeatable processes. Two areas where manual intervention can disrupt agility the most are in testing and deployments. If testing is performed manually, it is impossible to implement continuous integration and continuous delivery in an agile manner (if at all). Also, manual testing increases the chance of producing defects, creating unplanned work. When deployments are performed fully or partially manual, the risk of deployment failure increases significantly which lowers quality and reliability and increases unplanned work.
Automate the build and deployment processes and implement a test automation methodology like test driven development (TDD).
3. SDLC Maturity
The maturity of a team’s software development lifecycle (SDLC) has a direct impact on their ability to deliver software. There is nothing new here; SDLC maturity has plagued IT for decades. In the age of DevOps, where we strive to deliver software in shorter increments with a high degree of reliability and quality, it is even more critical for a team to have a mature process.
Some companies I visit are still practicing waterfall methodologies. These companies struggle with DevOps because they don’t have any experience with agile. But not all companies that practice agile do it well. Some are early in their agile journey, while others have implemented what I call “Wagile”: waterfall tendencies with agile terminology sprinkled in. I have seen teams who have implemented Kanban but struggle with the prioritization and control of WIP. I have seen scrum teams struggle to complete the story points that they promised. It takes time to get really good at agile.
Invest in training and hold blameless post mortems to continously solicit feedback and improve.
4. Legacy Change Management Processes
Many companies have had their change management processes in place for years and are comfortable with it. The problem is that these processes were created back when companies were deploying and updating back office solutions or infrastructure changes that happened infrequently. Fast forward to today’s environments where applications are made of many small components or micro services that can be changed and deployed quickly, now all of a sudden the process gets in the way.
Many large companies with well-established ITIL processes struggle with DevOps. In these environments I have seen development teams implement highly automated CI/CD processes only to stop and wait for weekly manual review gates. Sometimes these teams have to go through multiple reviews (security, operations, code, and change control). What is worse is that there is often a long line to wait in for reviews, causing a review process to slip another week. Many of these reviews are just rubber stamp approvals that could be entirely avoided with some minor modifications to the existing processes.
Companies with legacy processes need to look at how they can modernize processes to be more agile instead of being the reason why their company can’t move fast enough.
5. Lack of Operational Maturity
Moving to a DevOps model often requires a different approach to operations. Some companies accustomed to supporting back office applications that change infrequently. It requires a different mindset to support software delivered as a service that is always on, and deployed frequently. With DevOps, operations is no longer just something Ops does. Developers now must have tools so they can support applications. Often I encounter companies that only monitor infrastructure. In the DevOps model developers need access to logging solutions, application performance monitoring (APM) tools, web and mobile analytics, advanced alerting and notification solutions. Processes like change management, problem management, request management, incident management, access management, and many others often need to be modernized to allow for more agility and transparency. With DevOps, operations is a team sport.
Assess your operational processes, tools, and organization and modernize to increase agility and transparency.
6. Outdated testing practices
Too often I see clients who have a separate QA department that is not fully integrated with the development team. The code is thrown over the wall and then testing begins. Bugs are detected and sent back to developers who then have to quickly fix, build, and redeploy. This process is repeated until there is no time remaining and teams are left to agree on what defects they can tolerate and promote to production. This is a death spiral in action. With every release, they introduce more technical debt into the system lowering its quality and reliability, and increasing unplanned work. There is a better way.
The better way is to block bugs from moving forward in the development process. This is accomplished by building automated test harnesses and by automatically failing the build if any of the tests fail. This is what continuous integration is designed for. Testing must be part of the development process, not a handoff that is performed after development. Developers need to play a bigger part in testing and testers need to play a bigger part in development. This strikes fear in some testers and not all testers can make the transition.
Quality is everyone’s responsibility, not just the QA team.
7. Automating waste
A very common pattern I run into is the automation of waste. This occurs when a team declares itself a DevOps team or a person declares themselves a DevOps engineer and immediately starts writing hundreds or thousands of lines of Chef or Puppet scripts to automate their existing processes. The problem is that many of the existing processes are bottlenecks and need to be changed. Automating waste is like pouring concrete around unbalanced support beams. It makes bad design permanent.
Automate processes after the bottlenecks are removed.
8. Competing or Misaligned Incentives and Lack of Shared Ownership
This bottleneck has plagued IT for years but is more profound when attempting to be agile. In fact, this issue is at the heart of why DevOps came to be in the first place. Developers are incented for speed to market and operations is incented to ensure security, reliability, availability, and governance. The incentives are conflicting. Instead, everyone should be incented for customer satisfaction, with a high degree of agility, reliability, and quality (which is what DevOps is all about). If every team is not marching towards the same goals, then there will be a never-ending battle of priorities and resources. If all teams’ goals are in support of the goals I mentioned above, and everyone is measured in a way that enforces those incentives, then everyone wins — especially the customer.
Work with HR to help change the reward and incentives to foster the desired behaviors.
9. Dependence on Heroic Efforts
When heroic efforts are necessary to succeed, then a team is in a dark place. This often means working insane hours, being reactive instead of proactive, and being highly reliant on luck and chance. The biggest causes of this are a lack of automation, too much tribal knowledge, immature operational processes, and even poor management. The culture of heroism often leads to burnout, high turnover, and poor customer satisfaction.
If your organization relies on heroes, find out what the root causes are that creates these dependencies and fix them fast.
10. Governance as an Afterthought
When DevOps starts as a grassroots initiative there is typically little attention paid to the question “how does this scale?” It is much easier to show some success in a small isolated team and for an initial project. But once the DevOps initiative starts scaling to larger projects running on way more infrastructures or once it starts spreading to other teams, it can come crashing down without proper governance in place. This is very similar to building software in the cloud. How many times have you seen a small team whip out their credit card and build an amazing solution on AWS? Easy to do, right? Then a year later the costs are spiraling out of control as they lose sight of how many servers are in use and what is running on them. They all have different versions of third party products and libraries on them. Suddenly, it is not so easy anymore.
With DevOps, the same thing can happen without the appropriate controls in place. Many companies start their DevOps journey with a team of innovators and are able to score some major wins. But when they take that model to other teams it all falls down. There are numerous reasons that this happens. Is the organization ready to manage infrastructure and operations across multiple teams? Are there common shared services available like central logging and monitoring solutions or is each team rolling their own? Is there a common security architecture that everyone can adhere to? Can the teams provision their own infrastructure from a self-service portal or are they all dependent on a single queue ticketing system? I could go on but you get the point. It is easier to cut some corners when there is one team to manage but to scale we must look at the entire service catalog. DevOps will not scale without the appropriate level of governance in place.
Assign an owner and start building a plan for scaling DevOps across the organization.
11. Limited to No Executive Sponsorship
The most successful companies have top level support for their DevOps initiative. One of my clients is making a heavy investment in DevOps training and it will run a large number of employees through the program. Companies with top level support make DevOps a priority. They break down barriers, drive organizational change, improve incentive plans, communicate “Why” they are doing Devops, and fund the initiative. When there is no top level support, DevOps becomes much more challenging and often becomes a new silo. Don’t let this stop you from starting a grass roots initiative. Many sponsored initiatives started as grassroots initiatives. These grassroots teams measured their success and pitched their executives. Sometimes when executives see the results and the ROI they become the champions for furthering the cause. My point is, it is hard to get dev and ops to work together with common goals when it is not supported at the highest levels. It is difficult to transform a company to DevOps if it is not supported at the highest levels.
If running a grassroots effort, gather before and after metrics and be prepared to sell and evangelize DevOps upward.
Embracing DevOps is a long continuous journey. The secret to success is to identify what the bottlenecks are and to put steps in place to remove those bottlenecks. Removing these bottlenecks requires great patience. Fixing the bottlenecks that I listed above is not hard from a technology or process standpoint, but can be very challenging from a cultural standpoint. Too much change at one time can be too disruptive. I always say let’s fix this “one bottleneck at a time” and then move on to the next bottleneck. I also like to prioritize bottlenecks based on impact and effort. Start with a few quick wins to show progress but also implement a solution to a high impact bottleneck early to make a major impact on the flow.