McKinsey recently published an interesting article on the importance of building the right Cloud foundations at the outset, authored by Aaron Bawcom. If this is not done properly, long-term success and ROI are likely to just become pie in the sky.
Reading the article, we were happy to see that we have gotten quite a few of McKinsey's recommendations right at SEB. For example, in the second of the 10 commandments that the article outlines for faster and more profitable cloud migrations, it states that one should:
Design the Cloud architecture so it can scale. If companies do it right, they can build a Cloud architecture based on five people that can scale up to support 500 or more without significant changes.
At SEB, we started the Google Cloud journey with a Cloud Core Team of between five and ten people and now have more than 1,000 active users in our Google Cloud environment, so that recommendation definitely holds true for us. Building on the article, we would say that there are three key factors that has made this journey possible for us at SEB:
- Using the Cloud that already exists.
- Working with Infrastructure as Code (IaC)
- Working with Security as Code (SaC)
Below we discuss these key factors in more detail.
Use the Cloud that already exists
The fourth commandment in the article explains:
Many companies operate in fear of being locked into a specific Cloud service provider (CSP), so they look for ways to mitigate that risk. A common pattern is an overreliance on containers, which can be expensive and time consuming and keep businesses from realizing the genuine benefits available from CSPs.
In the SEB Cloud Core Team, we always search for a Cloud native solution whenever we're lacking a feature. It doesn't always necessarily correspond to a single CSP service or even multiple ones (sometimes we simply have to build it ourselves), but we always try to minimize the effort we need to put into a given feature, and rely heavily on the expertise of the major Cloud providers. Reinventing the wheel makes it a lot harder to make progress.
Work with Infrastructure as Code (IaC)
How many of your (Cloud) infrastructure deployments are four-eyes reviewed, tested and automatically deployed to their respective environments? Do you have an audit trail of your infrastructure changes, and how easily can you rollback in case of misconfiguration? While many companies go all in on various kinds of KPIs, metrics and OKRs, we haven't really seen enough companies put sufficient focus on having most of their infrastructure changes done via Infrastructure as Code (IaC).
William (one of the authors of this post) shares a personal reflection:
Back in 2018, I worked at another company on route to their Cloud transformation. I recall how new and even frightening IaC felt to many in the infrastructure engineering teams.
Gradually we moved the needle over and got a decent IaC coverage for our Cloud and on premise infrastructure, but it was far from given when we first started discussing it.
Coming to SEB, I was expecting a similar situation, but it turned out things were quite the opposite.
Nearly all of our acceptance and production infrastructure in Cloud is in version source control and highly automated, with all the benefits of that approach, such as visibility, repeatability, traceability, resilience, control and governance. By having IaC as a first-class citizen for all working in Cloud, and maintaining the discipline throughout, many other benefits realize themselves as well. These include decreased overall risk, stable and consistent environments for faster product and infrastructure iterations, easier cost optimization, self documenting infrastructure, and simpler onboarding.
We know from experience that achieving a solid IaC practice is no small feat. It has its own learning curve and the steepness of the curve varies across different tools and of course the maturity of your developers. However, it's the only feasible solution if you're going for scale in public Cloud.
Not having IaC is not a problem if you are running a single web application on a compute instance behind a load balancer. But it's an entirely different matter when you are running a web application, a database, a Kubernetes cluster, and some serverless functions to support all the services of your application. That problem exacerbates when you need to run many environments (hundreds and thousands). Consider your development, acceptance, and production environments. You have to provision and maintain all that infrastructure. Doing that manually is a massive burden and time commitment (and of course highly prone to human error).
This investment into a more sound infrastructure process using IaC will pay dividends for the entire life of the product or company. Our colleague Tonni Hult recently published an article on this blog titled Do I have to use Terraform, where he discusses the benefits we've found in the Core Team working with IaC in more detail.
Security as Code (SaC)
The McKinsey article again:
Security as code (SaC) has been the most effective approach to securing Cloud workloads with speed and agility. The SaC approach defines cybersecurity policies and standards programmatically so they can be referenced automatically in the configuration scripts used to provision Cloud systems. Systems running in the Cloud can be evaluated against security policies to prevent changes that move the system out of compliance.
The majority of the benefits with code configuration has already been discussed above. But it's worth mentioning security and compliance configurations in isolation because of the extended benefits they provide.
Having your Security configuration versioned, reviewed and traceable allows you to quickly scan the overall security posture of your Cloud configuration, as well as quickly find answers in relation to specific compliance requirements.
- Are we making sure that no sensitive data is publicly exposed?
- How fast can we propagate a new regulation requirement to our entire Cloud configuration?
- Which, if any, of our environments across our entire organization is exempt from our default security posture, and why?
As an example, the last question has a natural traceability through code commits and code reviews.
Since the banking industry is highly regulated, SaC is a given in our tech stack, and without it there is no way we could scale out our Cloud onboarding to thousands of users.