14 April 2021

AWS EC2 Resilience Engineering the easy way

Most of you are probably familiar with Resilience and Chaos engineering. No, those are not synonyms, but stubborn as I am, I treat them as the same thing. My apologies if that is like cursing in the church to you. The term ‘Chaos’ has a bit of a negative connotation, that’s mostly why I prefer to talk about resilience engineering instead.

In case you wonder:

The goal of resilience engineering is to design systems to adapt in the event of failure.

Chaos engineering helps test the resiliency of the system by proactively throwing common failures at the system.

The bottom line, chaos helps to test a system for resilience. But I digress 😄

Chaos Tools

Back in the early days, we only had Netflix’s Chaos Monkey to bring down EC2 instances every now and then. Nowadays, there’s quite a bunch of tools to pick from and recently AWS released its own AWS Fault Injection Simulator with the same purpose. However, there’s another AWS property that can offer chaos but is often overlooked… MaxInstanceLifetime.

MaxInstanceLifetime

In late 2019 AWS released the Maximum Instance Lifetime property for Auto Scaling Groups (ASG). The Maximum Instance Lifetime property helps you ensure that instances are recycled before reaching the specified lifetime. No, it’s not a replacement for Chaos Monkey or AWS Fault Injector but this property is effortless to set and it doesn’t cost a penny. Nevertheless, it offers nice protection against Configuration Drift and it’s very helpful to enforce Immutable Infrastructure.

Before you ask, the property’s design is well thought thru:

Note that instances are not guaranteed to be replaced only at the end of their maximum duration. In some situations, Amazon EC2 Auto Scaling might need to start replacing instances immediately after you configure the maximum instance lifetime parameter. The intention of this more aggressive behavior is to avoid replacing all instances at the same time.

From: Replacing Auto Scaling instances based on maximum instance lifetime

In CloudFormation you set it as follow:

  SomeAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      MaxInstanceLifetime: 1209600 # 14 day

It’s a pity to see this property is little known. I add it to all my Autoscaling Groups by default.

Enjoy and until next time!