IaC - why you should(n't) use it
June 17, 2021•2,132 words
Yes, I hate IaC (Infrastructure as Code)... and I love it... sometimes.
Of course there are a lot of advantages of IaC. It makes infrastructure reproducible (partially), auditable (partially) and by that... easier to control (partially). But you should ever take a closer look if it's really useful for your company. In fact it's not useful for more or less static infrastructures. If you don't use a server network with more than 100 servers or if you don't use a constantly changing server network, IaC is for sure not for you. If you run only some dedicated servers in a data center for your website and email and perhaps and OwnCloud or similar, IaC is definitely not for you! Why?
There is no tool, that fits your needs
First of all you'll never find a tool that fits all your needs. In the end you'll use a bunch of tools, for example Terraform to provision your virtual machines, Ansible to deploy and update your software, you build VM images with Packer and if you work with cloud environments like Kubernetes you'll also use tools like KOPS and of course Dockerfiles. In the end you have to manage more software than you had to manage before.
More Version Conflicts
"But software management with IaC is much easier!" No, it's not! Because there is another problem: version conflicts. Never expect that the new version of a tool is downward compatible to the version you use. It's more realistic to expect broken state files if you update your tool and any of your team members is still using the old version of this tool. And it's also more realistic to expect, that you have to rewrite some if your IaC code after updates, because of incompatible parameters and similar. So you have to manage more software than you had before. Additional to your tech stack you now have to manage your IaC tools. Congratulations! Perhaps you had a tech stack, which consisted of Webservers, DB servers, Caching layer, Load Balancers and perhaps some security-related tools and a CI/CD suite (in best case everything in containers), and now you have Terraform, Ansible, KOPS and Packer, which also will cause version conflicts. What an advantage! ;)
One of the biggest pro's of IaC should be a better collaboration. This may be correct, if you implement very strict guidelines about how to use the IaC suite in your infrastructure team. If you don't, you'll end up with a bunch of cruft code, non-reusable "modules" and isles of knowledge, where some of your team members understand only parts of your IaC infrastructure.
If you decide to use IaC, never forget, that other departments or external partners of your company may collaborate with your infrastructure team. And the other departments are presumably not involved in update management or your external partners may never have worked with IaC tools before. Congratulations! You now have some additional problems in your company. Your infrastructure team must integrate the environments, which were build by external partners into the IaC infrastructure and your developers will have only a partially understanding of your infrastructure, because your sysadmins think, that IaC is enough documentation.
Never forget, that IaC consumes a lot of time. It consumes time not only while setting up the infrastructure. It also consumes time when you change anything to your infrastructure. You need a little change to your infrastructure, like a new VM instance? Ok, what would be the "classical" way?
You login to your cloud/hosting provider and start a new server instance.
You install your software on this new instance.
Perhaps you add it to your Load Balancers.
What is Terraform doing?
You write your new infrastructure definition.
It checks if the syntax of your IaC code is correct. -> You fix your code.
It checks your state file. -> Hopefully, there are no broken state files, else your sysadmins spend the next hours to fix it.
It checks your infrastructure if everything is compliant to the expected state.
It tells your sysadmin what it will do in this run.
Your sysadmin has to confirm it. But in most cases he will check why some of the changes are needed and he must coordinate the changes in your cross-team environment.
It will do all the changes. You can not simply skip some of them without changing your IaC code.
In the end you'll need an hour for a process that would typically consume 3 minutes, because your sysadmins have to change IaC code or coordinate unexpected or unwanted updates.
It becomes more time consuming, if you decide to integrate IaC in your current infrastructure. The typical way would be to setup a completely new environment, that is completely managed with IaC. I did this process 3 times now. It ever bound all resources of the infrastructure teams for several weeks. Please don't expect, that your current infrastructure is still managed by your sysadmins. They have enough to do with your new IaC-based infrastructure and no time for "the old stuff". So you have to expect, that your old infrastructure is not updated until you move to your new "modern" IaC-based infrastructure, except you add some additional sysadmins to your team.
The first time I did an IaC integration was in a time, when tools like Ansible or Terraform were not available. So we wrote our tools be ourselves... with Perl. We invested most of our free time in coding but in the end we had a tool, that we called "ASP Tool" (Application Service Providing Tool). It was perfect, because it was designed for our very specific infrastructure, consisting of classical webserver environments (LAMP stacks), some in-house developed search engines and some very project-specific software. And it was perfectly integrated in our CI/CD environment. Furthermore it did only the changes, which were defined in the state files. Only an additional parameter triggered a check if our infrastructure was compliant to the state files. So we could do changes nearly as fast as we would have done them manually. This is not possible with new tools like Terraform, Ansible, Puppet or Chef, because they will ever check the state of your complete network. Sure, you can split your infrastructure into multiple repositories, but in that way you'll end up with a lot of repositories, where (hopefully) only your security department will have an overview.
If you use "modern" tools like Ansible, Terraform or Chef, they are never designed for your network. They must meet the requirements of a lot of different environments. This is a nearly impossible balancing act and the reason why you'll end up with a bunch of tools.
Another advantage of IaC should be a better auditability of your setup. This is absolutely correct... from the view of your infrastructure team. But have you ever asked your developers if they understand the infrastructure if they only have the IaC code available? Have you ever asked your security department if IaC helps them to see if all of your security requirements are met?
I can say from my perspective, that at least your security department will clearly answer with a "No!". Most of the IaC tools in the market don't track all manual changes. You load a kernel module that is not defined in your IaC? IaC will ignore it, because IaC don't track it. You change a configuration that is not defined in your IaC, because your sysadmins used the default settings? Your IaC tools will not see it. Why? Because this tools only track what you tell them to track. If you don't import all of your configurations and expected system states into your IaC environment, IaC can not help you with auditability. In the end your security department will use a semi-intelligent intrusion detection system to audit the systems. Congratulations! Not IaC helped you to audit your systems, but your IDS does. Oh wait... you could install this IDS also without IaC and the setup would only consume half of the time.
When IaC really helps
Of course there are reasons to use IaC. But you should inspect your environments and your requirements before you decide to use it. If you answer most of the following questions with "No", you shouldn't use IaC:
Do you often change your tech stack?
Is your environment highly scalable?
Do you regularly start completely new environments?
Do you use auto-scaling environments or plan to use them? (If yes, also ask, if you really need auto-scaling and how much can it save.)
In fact I've seen only a few environments within 20 years in my job, which really needed IaC. One was in an agency, that had to set up new environments for new customers every few days. Another one was in a server network consisting of several hundred of servers.
If you had a more or less static environment in the past, where you only add some additional servers every view weeks, you don't need IaC. Your sysadmins will be much faster in setting up new servers, if they don't have to use IaC for it. If you work in an environment based on Docker/Kubernetes/Cloud, you can scale your environment with some simple changes to manifest files and you can add additional nodes to your cluster with basically a single command on the new node. If you use auto-scaling groups on AWS but your tech-stack is not constantly changing, you don't need IaC.
IaC also (partially) helps, if you must prepare your network to move to another hosting provider. If your hosting provider's datacenter is destroyed and you must move to another provider within some hours, IaC is a big advantage. It mostly abstracts the API layers of the providers and by that makes it possible to setup your infrastructure from scratch very fast... as long as your IaC code is prepared for it and you stored backups outside of your current provider. In fact most if IaC infrastructures are not prepared for such use-cases and by that they are mostly useless in such situations.
You need IaC if your tech-stack is very flexible, for example if your developers play around with new technologies every view days or weeks. You need IaC if you add additional servers every day or week. You need IaC if you have a very big server network, i.e. >100 servers. You need IaC if your infrastructure team consists of >5-10 employees (presumed, that you'll also implement guidelines on how to use IaC). In all other cases you don't need IaC. IaC is only another hype, but in the end it's only needed for very flexible or very big environments. And this doesn't apply to most of the small- and mid-size IT companies.
What you should consider
If you decide to use IaC you should consider some points:
- Set very strict guidelines on how to use IaC. Especially the reusability of modules is a big pain point in most of the companies. If a module is not reusable it's not a module! And it must be tested if a module is really reusable!
- Provide additional documentation. Even if your sysadmin thinks that the code of your IaC is enough documentation, ask for data flow diagrams, documentation of the repository content etc.. It will save a lot of time for your developers!
- Track and calculate the time, that is needed by your sysadmins for IaC. If they need longer than with a manual setup of the servers, stop it immediately!
- Ask other departments if they still understand your environment. IaC is an additional abstraction layer, that may be confusing for other employees, as long as they don't have additional documentation (see 2.).
- Ask your external partners, if they can work with the tools you use. Else your sysadmins will spend a lot of time to integrate the setups of your external partners into your IaC.
- 6. Do a cost calculation. If the time saved by IaC doesn't significantly exceed the time your sysadmins normally need to set up new servers, it's not worth to use it.
If you really decide to use IaC, you should do some simple calculations and think about some points.
How much time will it need to integrate my current infrastructure into IaC?
Which tools has my infrastructure team to maintain additionally to my tech stack?
What are the costs for the time your sysadmins need to integrate your infrastructure into IaC and how much time to they really save with this step?
Is my server environment really highly flexible and/or scalable to justify the high costs for the setup?
Who will manage the current environment until the switch to the new environment can be done?
And don't forget to evaluate the tools before you use them. The biggest chaos arises, when you use tools which doesn't fit your requirements. "This tools is cool" from the mouth of your sysadmins is not evaluation!