We discuss why Vanguard went to the public cloud, the value of DevOps and best practices for IT leaders who are just getting started on their cloud initiative.
Mike Kavis: Welcome to the show, why don’t you tell us a little about what you do at Vanguard?
Jeff Dowds: I play the CTO role at Vanguard. I’ve spent about 40 years in IT; 20 in the telecom industry and the last 20 in the mutual fund industry with Vanguard in Malvern, PA. In the CTO role, we try to drive the more strategic technology change initiatives that are taking place at Vanguard. The types of technology change initiatives we are working on are probably familiar to most senior IT people. We have three or four initiatives that are fairly significant. One is our migration to cloud computing. Another strategic initiative is what we call next gen apps; a new way of building software. And then a third initiative is lean enterprise. We’ve branded lean enterprise, next gen apps and cloud under a super change initiative we call DVASS (Deliver Value at Startup Speed).
Kavis: Tell me a little about your vision of the future state. Why are you going that route and what are the business benefits if you can make that vision come true?
Dowds I’ll start with cloud. Today we’re very traditional in our approach to providing compute. We have our own data centers so we accrue all the benefits of managing your own data centers, and we accrue all the disadvantages of managing your own data centers. We’ve looked at that situation as an opportunity with the emergence of cloud computing. We think the source of value, especially associated with public cloud computing is just too attractive to pass up. We had originally started with a cloud strategy that was hybrid cloud, like most firms. We wanted to build a private cloud and then emerge into the public cloud. We had what I’ll call a delusion of moving workloads back and forth. At least that’s the way we used to think about it in the early days of thinking about cloud.
As we went down the path of building private cloud, it became apparent to us after 10-12 months that the effort to build a private cloud on-prem and the associated benefits we would eventually get just didn’t compete with what you could get in the public cloud. For us, what it meant to build private cloud capabilities and provide similar automation; self-service capabilities to our customers at Vanguard (IT shops) such as the cost to architect, design, and build all of the automated services that are typically available out of the box in public cloud was just enormous and it would take all of our time. Even if we were willing to make that investment and take that much time to build those services, we are still in what I refer to as a provision model, and provisioning compute capacity for our peak workloads. These peak workloads at Vanguard are generated by website activity. We provision our capacity significantly above our previous peak volumes. From a cost perspective we lay out an awful lot of capital that basically sits on the sidelines the majority of the time. Our analysis suggested that public cloud offered three primary sources of value. First, we would end up in a consumption model so we’re obviously paying for what we use. There was a clear cost advantage that we feel is in the 30-40% change over running workloads in our own on-prem facilities. Beyond that, there is the speed and agility you get from being in the cloud. Our ability to provision compute environments in seconds, rather than days, weeks or months on prem was a huge advantage. We got a huge agility play, we got a cost play, and then clearly the access to innovation. All public cloud providers are spending billions in developing new cloud-based services and for Vanguard to try to compete with that using our own on-prem engineers – it was just not something we were able to compete with.
Three primary sources of value:
- Access to new technology because of the innovation and research being done by the cloud providers
- The clear cost reduction
- The agility play; an improvement in the speed in which to provision compute environments
Kavis: Let’s talk about the agility play for a bit. The big buzzword that helps in the world of agility is DevOps and DevOps is a really confusing word. It means a lot of different things to a lot of different people. What does DevOps mean to you and how does that contribute to the agility bit?
Dowds: DevOps is a confusing label and I personally spend a lot of time reading and researching thought leadership about DevOps. This thought leadership comes from what you see in research firms like Gartner and Forrester, but also through some of the thought leadership books that have been published on DevOps. I read all the research, read all the books and at the end of all that knowledge gathering it was very clear to me that DevOps means different things to different people. But we have settled on a way to think of DevOps at Vanguard. It’s a very broad perspective on DevOps. It’s not, as the name would imply, moving Ops people into Dev shops and allowing Dev shops to own responsibilities that you traditionally associate with the Ops guys. We look at a framework from Gartner that focuses on DevOps very broadly. It has the organizational / people changes that I referenced, such as moving Ops functions into delivery shops, changing your organizational structure in delivery shops to be full stack so that you not only have the delivery function but also the ops functions and any other center of excellence type functions that are normally outside of delivery – trying to move all of that end-to-end capability into a single organization.
DevOps is also process changes, contemporary architecture, design patterns that you use to build software these days. It’s very much a culture change. It’s the trust, communications and better collaboration. The empowerment of providing delivery teams with decision rights regarding everything they are doing in the build cycle. It’s a multidimensional perspective on DevOps. We very much embrace the Gartner framework (Figure 1) in this space and I encourage others to do likewise. It’s a journey of multiple years. At Vanguard we have produced what we call our DevOps decoder ring. Because even if you read all the books and all the articles and even find an industry framework you like and you understand, (that) doesn’t mean the other 4000 IT professionals at Vanguard will understand it, nor business people. We took all that knowledge and put it into a DevOps decoder ring, which is no more than a simple Excel spreadsheet and it looks a lot like a maturity model where all the different DevOps capabilities or characteristics are in the rows and the different maturity levels are in the columns so you can self-assess where you are in your journey. The idea is to have a long-term perspective where you can pursue embracing more of the DevOps concepts after you pass the foundational steps. DevOps is very much a significant IT change event occurring at Vanguard as we speak.
Kavis: We have something similar at CTP. The columns are people, processes and technology. Often we will come in and give a list of recommendations and most of what gets worked on is the technology. The people and process stuff is hard and cross-siloed and only a small dent gets put into those. Agility can only go so far. We always say technology is easy, but people and processes are hard. How can we break down those barriers, silos, different mix of incentives to really get to the agility play?
Dowds: I think you’re right. New architecture, in our case moving from a monolithic approach to building software to microservices – cloud computing moving toward a consumption model rather than a provision model. CI/CD pipeline, putting those things in place to speed the build, package, deploy process of the software lifecycle. IT people can get their head around technical changes and make progress on them and be successful but full stack teams (all of the organizational entities and leaders) have to let go of responsibility and allow it to shift to the delivery organizations. That’s difficult. All IT shops have senior IT people who own their areas of responsibility and are not always ready to let go of responsibilities. That kind of organizational change is not always easy.
The cultural changes. Even if you put a full stack team in place and give them the decision rights after you tell them what business outcomes you want, that’s another major cultural changes. The developers themselves, rather than being experts in a few things, they obviously need to be a full stack developers, which isn’t easy. Putting the collaborative pieces in place. We’re collaborating on hangouts today, but we need collaboration platforms in place that allow the full stack team to communicate, not only amongst themselves, but with their business constituents, which is often challenging. The “mushy stuff,” what I tend to refer to as people, organizational and cultural changes, is harder than the technical stuff. My only advice is to try to focus on a couple top line metrics that tend to drive the behavioral change you’re looking for. Personally, I’ve focused on a single top line metric and its deployment frequency. If you’re operating in a DevOps way you should see your deployment frequency increase. There’s no way your deployment frequency will become greater than it used to be unless you’re building in thin slices, embracing concepts like minimum viable product, (and) have self-provisioned infrastructure. Again, it’s about embracing microservices architecture, putting in place robust CI/CD pipelines, have shifted your testing left, empowered your team to make decisions, shifted security functions left, co-located teams that are collaborated and making business decisions that produce business outcomes you’re looking for. I do believe that if the team is focused on their deployment frequency and embracing every DevOps concept you can think of that would have a positive impact on that frequency, you would eventually start behaving the way you need to behave from a culture perspective.
Kavis: One of the things that you guys did that I like to use as an example is that you talk about Full Stack team a lot. I like that because I think of a full stack engineer as a unicorn. There isn’t one person who knows everything about everything. But you’re able to bring people together from all different disciplines under different leadership and put them all in the same location in open cubes. That was your cloud construction team and I thought that was a pretty good approach. What did it take to have the organization give up those resources? How did you pull that off?
Dowds: You reference our cloud construction team, often referred to in the industry as the cloud engineering team by many cloud providers. What did it take to pull off our cloud engineering team to actually behave in a DevOps way? It took 8,10 or 12 months of working the other way and being somewhat ineffective. To make the move to public cloud we had to pull critical resources from at least three organizations: our security organization, our traditional data center organization, as well as our CTO organization. To get those three groups pulling the rope the same way, was difficult. Those three groups reported up to their own chain of command, so we recognize the challenges of three separate organizations. We made a decision after 8 or 10 months to put a full stack team in place. We gave the leader of the organization the decision rights on all the things we needed to do to get to public cloud. It was a game changer for Vanguard. All of a sudden we had everyone pulling the rope the same way. We had Cloud Technology Partners to pursue concepts like the Minimum Viable Cloud approach to getting our initial workloads into the cloud and the progress that we were able to make in building out our landing zones on the public cloud was just tremendous. It was an important decision we made early and the impetus for it was failing by staying in a siloed organization.
Kavis: I work with a lot of clients who don’t do that and what happens is people still have this other part time job, and stuff just takes forever because no one is really focused.
Dowds: I’ll mention one other thing. We can refer to our cloud engineering team as a DevOps approach to building out our cloud landing zones. At Vanguard we are trying to bring that same approach to all of our delivery teams. Our cloud engineering team is maybe 50,60,70 people but we have 2,000 people in our delivery shops who we are trying to get working in a similar matter – going full stack, focusing on business outcomes and embracing these other concepts and behaviors associated with DevOps. What we try to do with the cloud engineering team is to be the gold model for how to work that way, because we’re promoting all of our delivery shops to be working in a similar fashion. Our DevOps journey started with our cloud engineering team, but we are in the process of trying to transform another 2,000 software engineers to work in a similar manner.
Kavis: Impressive stuff! So the last question is what would your advice be to someone in a leadership role who is starting their public cloud journey? Where should they start and what should they focus on?
Dowds: I think the keyword you said there is public cloud. When we started our journey we didn’t have public cloud knowledge talent. So how do we solve that problem? You go pick the rockstars out of your organization that you know can learn new things rapidly and you start to build this talent base of internal people that you want to join you on the journey. Then we did some very selective and very vital outside hiring to complement our internal people. But even that is not enough. You still need to find some external assistance; a partner who’s made that journey with other firms who will guide you along. Obviously in our case it was Cloud Technology Partners. An outside partner who knows what they’re doing and has an approach to get to the public cloud, combined with some really strong external hiring of public cloud knowledgeable people, and then assimilating them with your own internal rockstars – that’s what ended up as the nucleus of our cloud engineering team. The advice I have is to form that dream team of very talented people from the inside and the outside and partner with someone who knows what they are doing. Then embrace that minimum viable cloud approach, which means work quickly and get your first workload up as soon as possible.
Kavis: Jeff and I, along with Robert Christiansen will be talking about this Vanguard journey at AWS re:Invent at the end of November. If you’re going to be there, come check it out.