Infrastructure as Code – assessing the options

Introduction

Infrastructure as Code (IaC) is a key element to successful cloud adoption. Using IaC, organisations create infrastructure that is both re-usable and reproducible. This consistency helps ensure that applications built in development environments will function the same way in near-identical production environments. Changes can (should!) be planned and controlled through coding best practices such as version control and code review. If disaster strikes, additional infrastructure can be created as required and broken or misconfigured resources remediated or replaced.

Cloud vendors generally offer IaC as part of their platform. A number of third-party languages have also been developed including Terraform. A natural question to ask is which to use. Unfortunately, the answer is not simple and will be influenced by existing skills and experience, features required and even personal or team preference. This blog post discusses Terraform and native IaC languages in the hope that it can be used to help you decide which to use.

Is multi-cloud IaC better?

Terraform, a multi-cloud IaC, can be used to manage resources in different cloud platforms using a single language, command line interface, tools and workflow. This is especially useful if an organisation already uses Terraform and can translate that experience to projects on another platform – VMWare to Azure for example. Terraform is predominantly a command line-oriented workflow.

Terraform uses “provider plugins” to authenticate into and create resources in each cloud. Each provider is coupled to its underlying platform. Provider-specific statements are used to create resources for those providers. For example, the AWS provider requires the template to declare an S3 bucket. The Azure provider requires the template to declare a Storage Account (and optionally an access tier). There is no abstraction that would allow a developer to request “blob storage” and let the provider work out what is needed.

Separate template files can be used per cloud provider, or if the template is organised around an application, multiple providers used in a single template file. Either way, organisations must have good knowledge of underlying platforms and their architectural patterns to make good use of Terraform.

In contrast to multi-cloud, native IaC languages target a single cloud platform. As with Terraform, the same workflows may be used for native IaC from different vendors – for example AWS provide a toolkit for Azure DevOps that deploys CloudFormation into AWS alongside any existing ARM deployments. However, the syntax and interfaces (both console and command line) are different with each vendor. Different tools are used for workflow stages like checking state of deployed resources. This may be an advantage if teams are focussed on single platforms and are familiar with native UIs. Jisc Cloud Solutions architect teams typically specialise in a single public cloud and because of this those teams use native IaC. We believe that new starters into those teams would be more familiar with native IaC too.

In the past Jisc Cloud Solutions also chose native IaC to ensure we could provision the latest features for our members and non-member customers. New features may take time to be supported in third-party languages like Terraform and priority of updates may be different. As cloud matures, feature lag is less of an issue for organisations wanting to use stable products only, though even the most mature services change frequently. Depending on regions used, features missing from IaC may not be available to organisations on release regardless. Due to proximity to the platform and that vendors only have to manage one set of features, native IaC feature lag may be shorter, but it still exists.

Native IaC benefits from close integration with native platform services. CloudFormation is integrated with IAM, AWS CloudTrail and AWS Config and other AWS services for example. Platform console UIs can be used to monitor state and see events, resources and template outputs. Vendors also publish reference architectures in their own IaC and both AWS and Azure Marketplace products provide CloudFormation or ARM templates for one-click deployments. Additional effort is required to translate these templates to Terraform.

Which markup?

Terraform uses the HashiCorp Configuration Language (HCL). HCL was designed to be easy to read and use with command line tools. Azure’s nascent IaC Bicep has a similar syntax. GCP is written in YAML, Azure ARM is JSON and CloudFormation can be either YAML or JSON. The choice here is entirely a matter of preference. HashiCorp claim that YAML is hard for beginners but each syntax has its own foibles that a user must learn to love. If mark-up languages do not suit, then IaC like Pulumi and AWS Cloud Development Kit, allow IaC to be constructed with programming languages like Node.js and C#. Discussion of these is outside the scope of this post.

Additional language features?

An advantage of Terraform is module support. The idea of reusable modules is a good one. Be careful with their use however as they can create an administrative overhead and unwanted dependencies. For example, if an organisation uses a specific network module in all projects, updates to the module for a development project could result in unexpected changes in a production bug fix. If a template is broken, then all teams using that module cannot deploy until it is fixed. That said, Terraform modules are a good way to organise IaC code within a project.

Both Terraform and native IaC support conditional logic – each in different ways. Again, be careful here. Mixing procedural flow and declarative statements make it easier to write unreadable templates and introduce cyclomatic complexity into IaC.

Can I use Terraform and native IaC at the same time?

Yes. There are use cases where this may be desirable; for example, an infrastructure team may use native IaC to create and maintain networking infrastructure and then provide the relevant IDs to development teams deploying virtual machines with Terraform.

Where organisations want to make use of standalone products like AWS Marketplace services or extra tools like AWS Instance Scheduler it may be easier to deploy these with the provided (native) templates. There will be additional effort translating these to Terraform. This could be seen as an advantage as it forces organisations to read and understand what they are deploying – a good practice regardless of IaC used.

Another reason to use different IaC languages at the same time is that some architectural patterns are better served. The Serverless Framework is a “multi-cloud” IaC aimed at rapid development of serverless applications. These applications may be deployed into network infrastructure created by the same or a different team within Terraform or native IaC for example. In Jisc, development teams create network infrastructure in Terraform and serverless functions with the Serverless Framework for example.

When considering using one or several IaC languages then ask: does it simplify cloud resource management and team mobility if all teams use the same IaC?

If so, unless the organisation also commits to one cloud-vendor, then a multi-cloud language like Terraform is required. If having a single-IaC policy across an organisation would be difficult or teams work separately on different projects, then flexibility is essential.

Can I move to or from Terraform?

It is very difficult to migrate resources from one IaC to another. There are tools that can automatically generate IaC templates from existing infrastructure and there is some support for importing. However, these are best used for resources that were created in the console rather than migration to another IaC. As such, migration is a long manual task and prone to error.

It is better to start small and be flexible. Allow teams to use the most appropriate IaC and where migration is required, start moving new deployments into the new IaC and retire the legacy deployments as soon as possible. Try to avoid situations where teams are burdened with different IaCs for a single business function or application. This creates technical debt and slows everything down.

Commit to IaC

Regardless of choice of language, it is important to commit to using IaC. Brikman writes of Terraform (and it applies to all other IaC) that “if you want to build out your entire architecture, including all your apps, data stores, load balancers, monitoring, alerting, security and so on [for production workloads]” then this can take six months to several years to complete (Brikman, 2019). IaC is not a quick fix and it will take time to discover what works for your teams. Be flexible. Take “small steps ferociously”.

Once resources are managed with IaC it is wise to treat the console as a read-only view for those resources. Changes made in the console will either be overwritten by IaC deployed later or prevent that IaC from being deployed at all. In worst cases, IaC may recreate resources, causing down time and even data loss. Bringing console changes back in line with IaC templates can be a difficult and risky task.

Summary – there is no one right answer

It is probably clear by now that there is no one right answer to the question which IaC language to use. Do not worry. Any IaC will deliver the key advantages of consistent and reproducible architecture. Organisations must decide what they want to achieve and how their teams work and pick a tool (or set of tools) that can deliver those goals. A final thought if you are still struggling – start by experimenting with native IaC. Closer integration and the console GUI will make that first step easier. Use this experience to get a better idea of cloud as a whole before committing to an IaC policy and be prepared to throw away those first experiments.

One of the great advantages of cloud computing is that you can try all the IaCs and see what works best for you and the teams you work with.