24 June 2020

Sprint planning: how-to use RTO & RPO to prevent feature stuffing

What is feature stuffing?

Filling a sprint to the brim with (new) features to please business stakeholders is what I tend to call “feature stuffing”. This phenomenon results in less time for other important issues with a more indirect business value. In the end, this is waiting for a catastrophe to happen, as shown in the image below.

Feature Stuffing

RTO, RPO, SLA’s, SLO’s & SLI’s

I’ve been in the storm for years, fighting the fight against the emphasis on new features while coldly neglecting all other stuff. Although I was convinced this was wrong and time would prove me right, it would be too late by then. However, in the battle against feature stuffing I was always missing one thing: ammunition. 😉

But then one day, I had an epiphany: meet Service Level Objectives (SLO’s). Actually, it’s a triplet: SLA, SLO & SLI. I’m not going to rehash Google to make you more familiar with these terms. I couldn’t explain it better than Google did in this awesome blogpost: SRE vs. DevOps: competing standards or close friends? If you’re short of time, then you could also skip the intro and just watch this movie:

Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are business continuity metrics used to define expectations of recovery. They are also service objectives that are often part of a service level agreement (SLA).

SLO’s: your Defence Shield against feature stuffing

Now that you’re familiar with SLO’s, defining one for every service you make should become a habit. Why? Well, basically an SLO is your defence shield against feature stuffing.

Let me explain it with a simple example

Let’s say that you’ve built a service and you have an agreement (SLA) including an SLO that guarantees your service to recover within an hour. However, last Saturday there was a huge hick-up and your service was down for 4 hours. Auch, you just broke your SLA!

Now comes the funny part… Investigating the issue, you find out that it’s not an easy fix. Fixing the issue properly will consume two entire sprints! Exactly, this means a feature freeze for a whole month.

Before I was familiar with SLO’s this was the point where I had no ammunition and business just said: “Sorry we can’t wait that long, just do an easy fix for now and deliver all these super important features”.

SLO to the rescue

Having an SLO for your service, you can now reply as follows: “Sure, but in that case we need to review our SLO’s as well”. Your business stakeholders will still have a choice, but they can’t have both. With your SLO’s in mind, business has two options:

  1. Having new features and a weaker SLO. If the service goes down again the recovery time goes up to eight hours.
  2. There’s a feature freeze but the service will stay in line with the SLO. Basically, the next time the service goes down, it will be up within the hour.

So finally, here’s your ammunition. Now business can decide whatever they want but also at what cost. This way, what you already knew becomes visible to business as well.

Bottom line: assure you have a proper SLO for every service/application and use it to stay in line with the agreement (SLA) between you and your stakeholders. It’s your best defence against feature stuffing!

Enjoy and until next time!