Many startups start their cloud journey by manually setting up their infrastructure. Yet the manual, drag-and-drop approach to cloud management often falls short, proving to be time-consuming and cumbersome as the team grows—seeking a more agile and automated approach? It’s time to embrace the transformative power of Infrastructure as Code (IaC)!
This article will explain what your team should know about IaC and how to implement best practices for maximizing its benefits. We’ll also delve into some tools that can be used in advanced use cases, such as ephemeral dev environments or packaging together microservices code and cloud resources.
Automation through IaC unlocks a whole range of benefits. From reducing unnecessary risk and costs to increasing consistency and reliability, replacing redundant manual processes with an efficient IaC strategy will deliver the following benefits:
Maintaining IaC leads to faster execution because it takes just a single command and a few minutes to provision a new environment for testing or development purposes. It is helpful for startups constantly short on time and fighting uphill battles with their funding plans and go-to-market strategies.
IaC’s ability to spin up pre-configured environments every time ensures consistency across the board. Every developer (regardless of role, specialty, or start date) uses the same pre-approved baseline infrastructure without sacrificing quality.
IaC makes it impossible for a single engineer to be the sole owner of crucial company knowledge. Democratic by nature, IaC is documented in a source code repository (instead of in one person’s brain), reducing risk when talent churns.
IaC security scanning allows for the earlier detection of vulnerabilities. This is achieved by implementing the “shift-left” approach to security practices. Drift detection can alert on unauthorized changes in sensitive infrastructure, like misconfigured firewalls or open ports.
IaC reduces unnecessary costs at every level of the development pipeline. The automation of otherwise manual processes allows engineers and IT leaders to return their time and attention to high-value tasks and critical business issues.
When choosing the right IaC tool for you, your team should consider factors such as supported platforms, community support, scalability, and cost. Every cloud comes with its own IaC offerings: There’s AWS CloudFormation, AWS CDK, Google Cloud DM (Deployment Manager), and Azure Resource Manager—as well as the popular cross-cloud IaC tools like Hashicorp Terraform and Pulumi. There are also new offerings in the ecosystem, such as Crossplane.
We recommend that young startups without prior knowledge of a specific IaC tool go with Terraform, as it’s the most popular tool with the biggest open-source community.
While we do recommend Terraform, it's important to note it just underwent a recent license change in August 2023. Its creator, HashiCorp, switched from an open-source MPL v2.0 to a "business source" BSL v1. This change follows similar moves by other companies—such as Elasticsearch and MongoDB—who restricted commercial use of their open-source software. The change targets companies with products that compete with HashiCorp's offerings and does not impact most end-users. But we’d recommend speaking with your lawyers as a fail-safe to ensure you can use tools with this new license.
In response to Terraform’s license change, several organizations in the IaC ecosystem have announced the creation of OpenTofu, a fully open-source, backward-compatible fork of Terraform which is backed by the Linux Foundation. However, as OpenTofu is still under development, their first stable version was only released recently, so for most users and organizations, Terraform remains a solid choice. This being said, we do recommend keeping up to date on the development of OpenTofu, as it may become a more attractive open-source alternative to Terraform in the future.
Once you manage your infrastructure as code, it’s important to establish a CI/CD process for it. This will allow you to merge your infrastructure code and apply its changes in the relevant environment. But what happens to un-applied infrastructure code?
Un-applied infrastructure code can lead to more mess than order, causing you to wonder where to find the most updated infrastructure version. Unless your dev team is very small (<10), we strongly recommend to implement Terraform code CI/CD pipeline—utilizing your existing/favorite CI engine. When you have your infrastructure as code written, the same software CI practices apply to it. And just as you test and deploy your application code using GitHub Actions, Jenkins, etc., you can test and deploy your infrastructure code. Even if it's a very basic, static test.
When working with Terraform, there are a few ways to keep your code DRY. To avoid duplication of code, we recommend using Terraform modules and/or the powerful Terragrunt plugin, which is a CLI (Command-line Interface) tool that wraps your Terraform code and instantly helps to invoke big and complex setups. For example, provisioning a copy of an environment with dozens of interconnected Terraform projects can be done without changing the Terraform code and just by running one Terragrunt command.
As you begin to manage several environments as code end-to-end, putting guardrails in place is good practice, especially when managing production environments. We recommend restricting manual changes that can be done through the cloud console. You can do this by allowing write permissions to several designated senior engineers while the rest of the team has read-only access. This way, all the infrastructure changes will go through an approved and tracked process in the infrastructure code repository.
Even if you properly manage your infrastructure code, our experience shows that drifts occasionally happen. Once a drift happens, it should be remediated by reverting unwanted changes in your cloud environments or importing drifted resources to the IaC codebase. For this purpose, Terraform has built-in drift management capabilities, including the currently experimental feature of configuration generation of the drifted resources. There are also several commercial and open source tools for infra drift management, Firefly.ai is just one example of such a tool.
Apart from Terraform and other “traditional” IaC tools that run from a CLI, a new tool called Crossplane has recently become available. Unlike Terraform, Crossplane is not built as a CLI tool, but as a Kubernetes-based control plane. If you’re unfamiliar with the concept of a control plane, Kubernetes can serve as a good example.
The Kubernetes’ control plane makes global decisions about the cluster and detects and responds to cluster events (for example, deciding when to start a new pod).
Similarly, in the context of IaC, Crossplane’s control plane is an engine that manages cloud resources, responds to changes in the managed environments, and enforces the desired state defined in the code—essentially preventing drift in the managed resources. This means it can effectively replace the need to implement the CD process for IaC, as previously discussed.
There is also an AWS-originated tool similar to Crossplane, called AWS Controllers for Kubernetes (ACK). It’s built on the same concepts but isn’t feature-rich and doesn’t support multiple clouds (like Crossplane). If you want to experiment with the control plane approach to IaC but don’t want to go with a more complicated option, ACK is worth looking into. Just bear in mind that ACK is for AWS only.
These tools allow you to package your Kubernetes application, along with its infrastructure, using Kubernetes CRDs (Custom Resource Definition). Therefore, if your dev team already works with Kubernetes, learning a new ecosystem is unnecessary. This can be useful in the common use case of ephemeral dev environments that spawn up each Pull Request, or when you want to package infra and code together in one Kubernetes package like Helm chart, so each microservice is bundled with its infrastructure—like queues, storage buckets, and dedicated databases.
It may be theoretically possible to manage infrastructure manually without the help of IaC, but it’s difficult to ignore the drawbacks. When engineers follow IaC best practices—such as choosing the right tool, establishing a CI/CD process, and managing infrastructure drift—IaC makes it possible to operate with a much greater degree of efficiency and reliability.
Learn how Opsfleet's DevOps-as-a-Service offer helped Konnecto scale and improve their infrastructure, moving from manual infra to Terraform.