Starting to work in a cloud context can be very different because suddenly you have access to several new services basically at the click of a button. Doing courses and lab exercises are definitely good and important but it only takes you so far because you still need to test and evaluate different services and design, judge cost and maintainability among others thing. What's more, traffic to and from services are not governed by the same security mechanisms as on-prem so the challenge is to learn how the services work, how to architect them together, and how to secure and govern them etc. and this is true for both the individual user but also for SEB as a company.
Regarding hands-on testing in Google Cloud Platform (GCP), we in the cloud core team didn't want to rely on users doing pre-created labs since they are often done in a customized GCP project that will have differences from what users will face later in "real" GCP projects. So early on, we realized the need of a place where people could explore and test out the capabilities in GCP, a place where people could dip their toes and test out different use cases. We also needed to learn how to secure and govern a setup where a lot of users and teams worked in Google Cloud following the principles that were set (see footnote 1).
Thinking about a sandbox concept
Cloud resources are both easy to create and remove and it got us thinking on how we as children played and created things in the sandbox. So, could that be something for us too, a place where people "play around", explore and test things like kids do? Translated to our reality it would be a contained place, created for a user where they could test a use-case and invite others to collaborate.
We thought about what we wanted to achieve and came up with:
- A separated and autonomous place where people could test and explore while not exposing the user, and SEB, to unwanted risks
- A place where things could be created, broken, removed, changed without it being a big problem for users
- A way for people to easily create their own GCP projects but with our customizations, our landing zone, added in them (see footnote 2)
- A way to add improvements to the landing zone that we implemented in the GCP project
- A way to govern the projects smoothly so that new and updated features can be pushed out while also checking the existing landing zone
- All done via things like terraform, merge requests, CI/CD pipelines and pipelines-as-code etc.
This idea resonated well, and we knew that if done right it could also help in other ways. For example, if a sandbox was quick and easy to get it could give a good impression of working in GCP. If we made it as self-service, it would help to free up time for us in the cloud core team. It could give people exposure to the boundaries, guardrails and remediations that we have in place for GCP projects and we would get several contained, and non-productional, projects against which we could add and test guardrails like organization policies, auto-remediation bots, landing zone enhancements etc. So, we knew that there were several gains to make here.
At the time we were using terraform and CI/CD pipelines for creating GCP projects and we were used to working with security, guardrails and governance for projects and the overall GCP setup. Our problem was that the way we created GCP projects was time consuming so creating x amounts of sandboxes would not be possible. We also needed better ways to govern the landing zone and roll- out changes. Fortunately, we found several of our answers in making use of modules in Terraform and CI/CD capabilities in GitLab.
Let's have a look at the terraform code
We needed the setup to be simple so that we could add a new sandbox with as little code change as possible and Terraform modules are great for this purpose. So we created a new project in our version control of choice, GitLab, which has a very powerful CI/CD engine built in. With the combination of Terraform we started our journey to create a new way of giving sandboxes to our users through a project factory.
The actual Terraform resources were simple enough and below are examples of how they look. The important thing was to have them generic and avoid potential conflicts in naming of projects, buckets etc. that are globally unique in GCP.
Terraform modules are then a great way to instantiate the resources "as a package" and abstract away the complexity of the actual;Terraform code. Then you only need to put in some known values to different input fields for example like.
These are the different inputs we send into our module to create the sandboxes. It may look complicated, but this is actually much easier than having to make changes in the Terraform code for every sandbox.
Running it through the CI/CD pipeline
Another important component here is our CI/CD, which gives us the flexibility to create jobs that have different input variables in them. And since variables prefixed with TF_VAR_ will be automatically picked up by Terraform this means that we can have jobs that are doing Terraform plan and apply separately for each sandbox! Here is an example of a plan and deploy job and put together, this makes the code and overall setup very dynamic because each sandbox is handled as its own deployment:
Then regarding customizations, we found that a vanilla sandbox is fit for about 90% of the use cases. If someone for example wants to test a GCP service not included in the vanilla setup we can here easily enable that API with TF_VAR_ADDITIONAL_ENABLED_APIS: '["apiname.googleapis.com"]' for the specific sandbox. Our switches and fields enable us to turn on and off or customize different features individually. Why don't you just enable all services in a sandbox you may ask? This is due to us wanting to first review the services that Google releases to make sure they are compliant with our regulatory requirements and because of that, we in the cloud core team keep control over what services are activated for the sandboxes.
Handling x amounts of sandboxes with (almost) no effort
As everything is done with code, we use CI/CD to avoid configuration drift and to roll out new features. But with x number of sandboxes (right now about 170 of them), doing plans and deploys manually is not feasible. So here GitLab CI/CD has another powerful feature we use called schedules. So, with the addition of a rule like below (only showing the plan part here) the sandbox automatically gets added into a scheduled pipeline and then updated once a week. Sweet!
This is very clever as it can make us confident that we don't have sandboxes that are drifting too far away from our desired state.
After we had made all the setup of new sandboxes simpler and more repetitive, we were able to free up more time to do even more innovative things. But as always, we looked into making this even more hands off for the cloud team. For that we started to look into building a sandbox git bot and had the idea to create a self-service on-boarding portal for new teams, a.k.a. the Cloud Portal. But that will have to be written for the next blog post, so we are leaving you here with a little cliffhanger.
Footnote 1: https://sebgroup.com/career/who-are-we/career-at-seb/tech/blog/2021/seb-cloud-platform-the-saga-begins
Footnote 2: Our landing zone is the collection of resources and settings that we add and configure in a gcp project upon creation. This landing zone is created and managed by a central team.