Failure Tolerance, Iteration Length and Self-Organization

By | 2013-09-13T11:00:07+00:00 September 13th, 2013|Application Lifecycle Management (ALM), Process|2 Comments

Your iteration length should be equal to, or less than, the maximum amount of work your organization is willing to throw away.

“But Andrew,” you say, “We’re not willing to throw away any amount of work! That is waste!”

I’ve said it a thousand times, and I’ll probably say it a few thousand more. Software development is product development. It requires innovation and experimentation. Which in turn implies a chance of failure. Someday I’ll get around to writing a blog post on the relationship between failure tolerance and innovation rates… However, that isn’t the subject on offer today, so let’s just take it as axiomatic for now. Your organization values innovation rationally, and is willing to invest in it.

Of course, the next question obviously is, “How much?”  There are two distinctly different routes one can take to answer this question. The traditional command-and-control answer is a formal risk assessment that takes weeks to months to finish. This approach drastically reduces an organization’s failure tolerance as all of the expenditures associated with the assessment count against the piece of value in question. If it takes 3 weeks to assess a feature set that takes 6 weeks to implement, then we need to weigh the entire 9 week cycle against the probability of success. That is a one-third reduction (assuming linear cost of capacity) in your organization’s failure tolerance. This has a corresponding impact on your organization’s ability to innovate (not to mention drastically reducing organizational agility).

There is a better way

Iteration based development (agile style) gives us a mechanism for decentralizing this decision making process, and enables us to control the risk with minimal assessment. Instead of exhaustive risk assessments, you can use iteration lengths to set a maximum possible amount of time to spend, well, failing.

Here’s some examples:

We generally recommend a development task take 3-6 hours to complete when appropriately sized. Thus what we are in effect saying is that 6 hours is the maximum amount of time a developer is allowed to invest down a wrong path. If I’m an average developer making $100k/year, this equates to about $450 of investment. That is the failure tolerance at the individual developer level.

At the team level we allow a Product Owner to prioritize a backlog for a team, and the team to agree to deliver an amount of work on a particular cadence (sure, it sounds Scrum-like but almost every agile organization I’ve seen is using some sort of cadence at the team level). A team of 6 individuals averaging $100k/year on a two week cadence? About a $27,000 investment. What you’re essentially empowering the Product Owner (or whatever you call the economically responsible individual at the team level) to do is invest that money over a two week period.

This can be translated as high up the portfolio management ladder as you wish. Have a product you release on a quarterly basis? Does your organization report their finances to shareholders annually? You get the idea.

Here’s why it matters

Sure, it looks pretty obvious, but here’s the important part. If your cadence, at any level, is different than the assumed failure tolerance self-organization is impossible. Maybe you have 3 week sprints, but the failure tolerance at the team level is actually only about 1 week. Hello, weekly status reports. Perhaps developer tasks are generally sized to be a few days long, but the tolerance is only 4 hours? Hello, extremely boring and generally useless daily status meetings (which since we’re agile we’ll call stand-ups because changing the name makes us agile). Bleeding customers like crazy because you bombed a release and it will be six months before the next one? Hello, misunderstood customer failure tolerance.

Wrapping it up

Right up next to “Software development is product development” in my hierarchy of software one-liners lies “Make the economics transparent.” Giving everyone the same economic playbook with which to make decisions is incredibly valuable, and an important portion of that book should be dedicated to failure tolerance. If your organization has never had conversations in this direction, you’ll be amazed at just how wildly varied individual opinions will be.

About the Author:


  1. Steven Borg September 16, 2013 at 12:04 pm

    This is a great post. I love the tie in between culture and iteration length. It makes it clear why risk averse companies seem so backward when they have long iteration lengths. Or, more precisely, why companies that are claiming to want longer iterations because they are risk averse, are actually wanting longer iterations because they are change averse.

  2. Mickey Gousset September 24, 2013 at 6:42 am

    I would love to see a follow-up to this post, with more numbers. I’m interested in the math aspect of this, but I’m going to need to work through this post several times to grasp all the math ($$$ vs iterations vs fault tolerance). This has definitely gotten me thinking though.

Leave A Comment