Air Force software factory looks to unleash 'chaos' on civilian IT shops

A June 2021 briefing at Kessel Run's Boston headquarters.

A June 2021 briefing at Kessel Run's Boston headquarters. U.S. Air Force photo by Richard Blumenstein

The Kessel Run group is currently developing a playbook that would make it easier for organizations across the federal government to adopt engineering and security best practices.

The Air Force's Kessel Run software factory wants to share its recipes for success with the whole federal government when it comes to engineering and security best practices. 

"We're talking to other software factories and part of our initiative is to release all these templates and playbooks that not just [Defense Department] entities can use, right, from a software factory perspective or just a program office perspective, but any agency can just grab them off our site and say, hey, this is how Kessel Run does chaos engineering, this is how we do performance engineering," said Omar Marrero, Kessel Run's deputy test chief and the chaos and performance tech lead. 

Marrero told FCW that Kessel Run, which is part of the Air Force Life Cycle Management Center and focuses on software development and acquisitions, is routinely looking for partnerships, consulting with organizations that are looking to "start  their own chaos engineering journey" by sharing Kessel Run's templates, playbooks, or tech stacks. 

But what is chaos engineering and why does it matter?

The goal, he said, is to bring industry best practices across engineering, security, and performance to an organization "like a vaccine that you're injecting into a system" to bolster preparedness. 

That means working to prove or test assumptions and develop a process to address the worst case scenarios: "just think of it as pre-emptive practicing where you can practice fire drills: like we know what happens if all sudden we have a surge, does it happen, did we get an alert to put resources on it, that kind of stuff."

It might sound routine, but in practice, the testing concept has already proven beneficial through a partnership with the General Services Administration, which recently teamed up with Kessel Run to make sure the Cloud.gov service could handle a surge in users. 

Lindsay Young, the acting director of Cloud.gov, told FCW that the partnership was a "fantastic opportunity." 

"It was so much fun," Young said, "really spending the time to understand each other's setups and things like that, and then figure out bigger and bigger ways to make trouble and see if we could stand up to it."

Young said the aim was to ensure Cloud.gov users could get a seamless experience and that testing out extreme scenarios like scaling ability was "invaluable" because "you don't know you can do something until you can prove you can do something."

The next challenge, Young said, is finding and then partnering with other federal civilian agencies that have similar problems. Kessel Run will be doing the same as it is currently partnering with other software factories across the Defense Department, including the Navy's Black Pearl, but is also planning to release its playbooks to broaden its impact.

The Air Force's software factory darling has long been heralded as a Defense Department success story and mold for addressing emerging technology needs with plans to expand its influence in the Air Force and beyond. 

But the release of the guidebooks, which are being drafted, could speed that along. Marrero said there isn't a hard release date and many of the materials being drafted are also being worked on with other software factories, including the Army. The newer concept of security chaos will also be included.

"That's what we do. We just share what we learn. And if another opportunity comes like this one where we're going to collaborate again, we'll jump on that," he said. "And hopefully, we can spread this chaos engineering thing to the rest of the government and that initiative helps us deliver more resilient stuff."