In The Beginning
In the summer of 2006 I was working as a Development Manager for Corbis, a media licensing company owned by Bill Gates, with an uncanny ability to lose money. When Bill went from being the richest man in the world to being the second richest, I liked to say I played a part in it simply by working for Corbis. At the time our sustainment engineering effort was in a shambles, and I was on the team that was trying to pull it out of the tar pit and make it functional. At that point it was being run as a series of small projects, releasing every quarter, with a fixed scope, a little like scooping a load of socks from the dryer, and crab-walking them up a flight of stairs while trying to drop as few as possible. Needless to say, the process was consistently disappointing. The team had heard some crazy talk about a thing called Agile, and decided to include stand-ups in their process. These quickly degenerated into 30-45 minute morning marathons that succeeded in sucking the optimism out of all of those unlucky enough to drop by. It was clear things had to change, and a few of us were having discussions about how to improve the process.
At some point during the summer one of the members of our team, Rick Garber, heard a talk from a fascinating Scotsman, David Anderson, talking about how he had fixed some of the very same issues on one of his teams. His methods, based on the Theory of Constraints and other work by the likes of Goldratt and Deming, eliminated explicit estimation from the process, and relied on data to provide a probabilistic means of determining when software was likely to be done. Frankly, he had me at eliminating estimation, but the rest of the theories were also compelling. It blew our minds to think of software as inventory that could go stale, or that by idling some resources we could actually make the whole system more effective. Several of us read his book and engaged in a series of conversations like hardcore converts, eyes shining with revolutionary zeal, eager to remake the world. Or at the very least, our moribund sustainment process. We began building out or very own Kanban-based system.
It took us several months and extended well into the fall of 2006. In the middle of that process, David joined Corbis as the Senior Director of Software Engineering, and I began reporting to him. Also on the team at that time were Dominica Degrandis, Mark Grotte, Larry Cohen, Rick Garber, and Steven Weiss. David guided us through the rest of the process and finally, by November, we settled on a design that had his blessing. We trained our team on the process, populated our queues, and with great anticipation and fanfare, launched the first significant implementation of a Kanban-based system we were aware of.
It promptly went nowhere. For months.
Kanban Launches, and Flops
Keep in mind that our system was based on David’s first book, Agile Management for Software Engineering, not on his subsequent work. There were no real practical examples of how to implement his theories in the first book, and we designed our system much like his one previous (and smaller) implementation, which he assured us had been a tremendous success. And yet our process was quickly proving to be a non-starter. At this point in its history, Kanban didn’t look anything like the process we know today. What we had was about 25 work items stored in Team Foundation Server, organized in a series of approximately 14 queues, with a tangled web of transitions between the various states, and not much else. According to the theory, once the system was set up it would be self-managing. People understood their roles, would monitor their queues, do their work and pass it on. Every two weeks any work that had passed through the process and was sitting in “Ready for Production” would be released. Very few items were making it through to production though, and none of us, including David, understood why. People complained that they had no idea where things were in the process, devs and QA engineers had no visibility upstream to the work that was headed their way, and the customers were increasingly angry that they were getting nothing but the thinnest trickle of work out of our grand experiment. We kept insisting that it should work, that we’d designed the machinery in such a way that people could focus on their area alone, without concern for other areas of the system. “Just focus on your queue,” we would say, “and when work comes in simply pick it up, do the work and move it on.” But still they complained, and our customers were growing impatient. I would like to say we were hurtling toward disaster, but that implies too much velocity. We were grinding to a halt.
At the beginning of February the CIO (and David’s boss), Stephen Gillett called me into his office and said, “If you don’t fix this thing I’m going to have to fire somebody.” I didn’t think he was saying I personally needed to fix it, but I did think he was threatening to fire me if the team couldn’t get the process running (turns out he wasn’t referring to me at all, but that’s a story for another day). I got together with Mark, Dominica, Rick and Larry (David’s leadership team), and we talked about how to get things unstuck. It’s important to note, I think, that we didn’t set out to modify or “fix” the process. We simply, naively thought that it was only a problem of combustion, that if we could get the thing started it would take off. To that end, we decided to start running a stand-up each day. It was Dominica’s suggestion, and we all thought it was a reasonable thing to do. Our intent was to do the stand-up for a month, until we started to see some momentum, at which point we’d go back to running it the way we had at the start. We decided to start the stand-ups on the following Monday. I insisted on running them because I knew if I ran the stand-up it would never be longer than 15 minutes. If I was ever dogmatic about anything, it was that stand-ups should be short. I still had psychic scars from the stand-ups our previous sustainment process had run. Interestingly, this single change, intended as a temporary nudge, drove all the other modifications that became Kanban as we know it today.
Micro Improvements and Happy Accidents
I had never run a stand-up before so I didn’t really know what I was going to do that first Monday. Since people were complaining that they didn’t have visibility into where things were, I thought it made sense to put the work up on a whiteboard so we could talk about it. I didn’t have a clear idea of the format though, and decided to bounce it off some of our devs. That same week, as luck would have it, they were working with Daniel Vacanti doing Color Domain Modeling, a process of using color-coded post-it notes to puzzle out the design of software systems. When I told them I was thinking of putting the work up on the whiteboard one of the devs, Kurt Quamme, suggested using different colored post it notes to represent the work. It seemed like a good idea, so I went home that weekend and sketched out a fairly simple plan.
The most common color of post-it was yellow, so we’d use those to represent feature requests. To represent bugs we would use blue, because, well to be honest, because both “bugs” and “blue” start with the letter b. And that was pretty much it for the first board. I got to work early on Monday, drew up a very crude set of queues, wrote up some post-its and waited for people to arrive. We’d made it clear that if something was assigned to you, you were required to attend the stand-up, but that first morning most of the team attended. It may have been because people were curious to see what was going on, or because they wanted to feel like they were part of them team, but most likely it’s because the board was set up in a fairly public area directly behind my desk and during the stand-up it was difficult for people nearby to do anything but join in. Whatever the reason, it was a big group that first morning, and would continue to be a large, inclusive gathering for most of the rest of my time there.
The first, most pressing problem we were trying to solve, in fact at that point the only problem we were trying to solve, was why work wasn’t moving through the queues. In the stand-up that was our focus. It seems obvious now that focusing on blocking issues is the primary goal of a Kanban stand-up, but at the time it was a radical departure from the more orthodox, “Open Mic Night” style of stand-up common to Scrum, the kind of approach that goes around the circle and asks people what they did yesterday and what they plan on doing today. I wasn’t trying to be radical, I simply didn’t know any better. Over time, as a group, we would refine the process a bit, settling on a few standard questions:
- Is there anything blocking you that’s not on the board?
- With the issues on the board, is there somebody actively working to clear the issue?
- Do you need anything from management to get the issue cleared?
We always kept the meeting under 15 minutes, often under 10, but found that people would “stay after class” and discuss specific issues in groups of 2 or 3. In a sense it was Meeting 101: don’t waste people’s time, only discuss issues in front of the whole group that the whole group needs to hear. We also got the sense that the team was more focused, more energized than they had been when the old stand-ups dragged on so long.
I don’t recall who suggested using bright pink stickies to identify blocking issues, or when exactly, but it happened very early on. I wish I knew so I could buy them a drink. It was another simple but brilliant innovation that helped us stumble onto one of the key benefits of contemporary Kanban: the inherent power of visualizing the work in progress. We didn’t invent the concept, by any means, and if we had read up on the work of Edward Tufte we may have gotten to these ideas a lot sooner, but as the team iterated on the form and content of the board we became increasingly aware of its importance. By putting a pink sticky note on the feature or bug it was blocking, it quickly communicated to even the most casual observer that something was wrong. Color coding and other cues allowed us to get different levels of detail out of the same board. You could stand back and get a picture of the overall system health, where the bottlenecks were, what the batch size of the next release would be, or you could stand closer and see that the same developer was assigned to multiple items and needed to be refocused on a single task. The board also became the focal point of discussions and planning for a variety of people on the team. Our tester, Tom Utterback, and our build engineer, Doug Buros, would use the board to plan how builds would move into QA. We started collecting all of the items we released by sticking them to the walls on the far right side of the board. It started out as a kind of joke, but quickly became an advertisement and reminder of our success as a team. All these things are obvious now, and widely used, but at the time we really were just making them up, experimenting, discovering their value as we went along.
Welcome to Thunderdome, the Process of Prioritization
The process of selecting and prioritizing work was done in a weekly meeting of the VPs from various parts of the company. Marketing, Finance, Sales, Imaging, would all show up on Monday for an hour to select the next items pulled in for work. The meeting was run by Diana Kolomiyets, who was also instrumental in keeping things flowing through the queues and managing the releases. The meeting went through several iterations, like everything else, but eventually settled on a multi-vote system. Each representative had three votes. The meeting would start by going over new requests for the week, then asking for nominations. Nominating an item to be worked on was, essentially, a plea for support, and there was a fair amount of horse-trading as part of these sessions. Once a group of candidate items was determined, everybody would vote, with the top vote getters being added to our work queue (we called it Engineering Ready). The mechanism was pretty simple, and it was well established early on that if you didn’t make the meeting, there was little chance your pet item would make it into the work queue. While we were building out the initial process, before any of the modifications that actually made the system work, we were checking in regularly with our customers. One of them, Drew McLean, was the VP over the imaging department. He was a former Marine, and had worked much of his career in manufacturing, most recently at Boeing. He understood the theories well, but insisted that we include a Silver Bullet, an ability to expedite a request. We resisted, thinking that everything would become a Silver Bullet, but it’s hard to stand up to a former Marine who knows what he wants. We included a provision to expedite a request but added two rules:
- There can only be a single expedited request in the whole system at any one time. If there’s already an expedited request a new one couldn’t be added until it was released to production.
- The decision to mark an item as Expedited had to be a consensus of the various stakeholders from all the customer groups (Marketing, Sales, Finance, Imaging, etc.).
We were surprised to find, over time, that this mechanism was rarely invoked. When items did get marked for expediting they had a very noticeable negative effect on the rest of the system. Things would have to wait for the Silver Bullet, leading directly to longer lead times for the regular items. Because of this, it was not easy to get everybody to agree that a particular item was so much more important than the others that it deserved to be rushed.
Another interesting effect of better flow and throughput was with our SLAs. Initially we decided on a 21 day SLA, based purely on a guess. Since we had no data to begin with, we had to start somewhere and 21 days seemed like a defensible number. We agreed to revise this number over time as we got more data. We never hit that 21 day SLA, not even once. We bumped up the number to 28, but the best we ever did was right around 31 days. Oddly though, our customers didn’t seem to mind. Because we were moving items through the system with pretty good regularity, and they had total visibility to where items were, they seemed mostly content to let the system work without holding our feet to the fire too much about the SLA. It may also be that the system was so dysfunctional for so long we’d succeeded in setting very low expectations.
So What’s Your Point?
Why is any of this important? I think it’s important for a variety of reasons, some so small as to almost be petty, but others that I think have much broader implications. The small reasons first: I think it’s important that the people who actually did the work get some credit. I know how history works, and I know that more often than not attention is focused on an individual as being responsible for a movement, even one that wouldn’t have happened without the contributions of a much larger group. It’s easier to label somebody the “Father of Kanban” than to continually point out all the aunts and uncles. This is a small attempt to correct that, or at least add some names to the record. The people mentioned above all played a role in the evolution of Kanban, but there are certainly others that I’ve forgotten as well. It’s also important because it illustrates the gulf that often exists between theory and practice. I love theory, really I do, but only so far as it’s effective in practice. Clearly David brought the intellectual framework for Kanban into Corbis, and the theories as they apply to software development are really compelling. However, Kanban as we know it in the industry today would not exist if a group of people hadn’t been specifically tasked with making it work. The solutions we came up with worked for us, at that time, in that context. I don’t think anybody was trying to create a methodology, we were just trying to make it work so we wouldn’t get canned. With regard to the innovations that made our system work, nobody was guiding the team, approving changes to the process, or deciding which innovations to try next. Everything we did was driven by the team and evaluated solely on the basis of whether it was effective or not. Anybody who tells you differently is trying to sell you something.
The larger reason I think this is important is because I believe as an industry our focus is often in the wrong place. We seem to focus on learning a methodology like Scrum or Kanban, understanding the ceremonies and artifacts that define those approaches, and becoming as well-practiced at them as we can. We look outside our organizations for consultants or coaches who can come in and help us learn these techniques and apply them to our often unique and idiosyncratic challenges. Huge amounts of money are spent and made on certifications for various methodologies as a way of adding legitimacy to somebody’s opinion of what we should or shouldn’t be doing, like a growing sense of orthodoxy around what is “agile” and what is not. But it’s my opinion that a Scrum Team, or a Kanban Team, is of far less value to any organization than a team of agile thinkers, people who have a good understanding of the theories behind agile, but are empowered to question everything, to experiment, to fail, to learn and move on. I’ll confess I’m not a fan of Scrum, but I like Kanban and have used it to great effect in a number of organizations now. As much as I like it, I know it won’t last forever. Other ideas must and will come up that prove more effective in delivering value to our customers. It’s evolution, and it’s relentless. As great as an idea may be, it won’t be the theorists that demonstrate its value. It will fall to the practitioners of the craft of software engineering, people in the trenches every day figuring out how to make those ideas work in the real world.
To do that requires an agile mind.
About the Author:
Based in Seattle, Washington, Darren K. Davis is currently the Director of Software Engineering in the Strategy and Innovation group of Providence Health and Services, the third-largest not-for-profit health care company in the U.S. Prior to that, he led the web and mobile engineering teams for Starbucks, overseeing a variety of product releases, including the first rollout of their ground-breaking mobile payment app. Before joining Starbucks, Darren worked for Corbis, and was instrumental in the creation of the Kanban development methodology. Before starting his career in software engineering 18 years ago, he was a professional actor, and holds a Master of Fine Arts in Acting. If you can believe it.