Real-world data management
The pandemic has put new demands on data teams, but old obstacles are still hindering agency efforts.
For all the emphasis on artificial intelligence and advanced data science at government agencies, the fundamental aspects of data management are too often overlooked.
In July, FCW gathered a group of data specialists and IT leaders from across government to discuss how data strategies square with on-the-ground realities. The discussion was on the record but not for individual attribution (see Page 45 for the list of participants), and the quotes have been edited for length and clarity. Here's what the group had to say.
Different missions, same problems
The data initiatives participants manage ranged widely in scope and scale — from relatively small data lakes to massive COVID-19 response efforts. The obstacles, however, were remarkably consistent: resource constraints, poorly documented datasets, parochial data owners, legacy systems not built for sharing, and fundamental tensions between sharing and security.
"For most of the larger systems I've seen, they never thought about sharing data and they never thought about identity and access management," one participant said. "Their requirements are driven by what reports they want to run." As a result, essential data ends up in highly customized systems that obstruct new uses.
The group agreed that good governance is essential, and authorities granted to agency CIOs (via the Federal Information Security Management Act, Federal IT Acquisition Reform Act and Modernizing Government Technology Act) can be used to push for it. But some of the executives warned against becoming too rigid in that rule-making.
"I don't think governance is ever set," one argued. With data management, "you constantly have to be able to pivot. You have multiple layers of users. You have people who don't understand anything and just want to digest data. You have people who are scientists and want to model and be extremely creative. And then you have people who want to create their own tools or use a tool that they know and are comfortable with. And I didn't even talk about who you're sharing data with. So it's a constantly morphing thing."
That need for flexibility can't be an excuse for ad hoc approaches, however. Poor governance leads to more silos and interoperability problems, the speakers agreed. And as one warned: "If you don't have a strategy for identity and access management and authentication, you will fail."
COVID as a catalyst
For some agencies, this year's public health crisis prompted huge new data initiatives to support testing, vaccine research and financial relief. And across government, the shift to telework created new access challenges and threat-detection workloads.
Participants
Jose Arrieta
CIO (now former), Department of Health and Human Services
Brian Bataille
Chief Data Officer, Defense Intelligence Agency
Melvin Brown II
Director, Enterprise Business Management Office, Small Business Administration
Charles Campbell
IT Program Manager, Massachusetts Port Authority
Gerald Caron III
Director of Enterprise Network Management, Bureau of Information Resource Management, Department of State
Michael Conlin
Chief Business Analytics Officer, Department of Defense
Edward Dowgiallo
Senior Technical Advisor, Office of Information Technology, Federal Transit Administration, Department of Transportation
Skip Farmer
SE Director, U.S. Public Sector, Rubrik
Tom Kennedy
Vice President, U.S. Public Sector, Rubrik
Preston Werntz
Chief Data Officer, Cybersecurity and Infrastructure Security Agency
Justin Worrilow
Data and Artificial Intelligence Specialist, Microsoft
Note: FCW Editor-in-Chief Troy K. Schneider led the roundtable discussion. The July 30 gathering was underwritten by Rubrik, but the substance of the discussion and the recap on these pages are strictly editorial products. Neither Rubrik nor any of the roundtable participants had input beyond their July 30 comments.
"One of our biggest challenges that COVID highlighted was how disparate our systems work from a data perspective and just not being able to pull all the data in one place," one participant said.
The pandemic produced budget pressures as well. Multiple executives reported having to cancel or suspend contracts and bring certain data work back in-house. But in some cases, dollars could be redirected from unexpected sources. "I didn't have that many people traveling and probably didn't have some of the other expenditures across a year of execution," one participant said. "So that made some resources available. [But] we had to really think about operations being in different places."
Security and other struggles that come with sharing
Much of the discussion focused on risks that must be managed when breaking down the data silos. Identity and access management was a central concern, but the group also stressed the second-order challenges that can catch agencies by surprise.
One of those is the work imposed on data owners in the name of sharing. Many legacy systems lack good data catalogs and metadata, several participants noted, and creating such resources after the fact is a heavy lift.
"They want to share that data," one executive said of agency colleagues. "But they don't want to spend the time to explain what that data is and what the different elements are so that they can use it because that's taking time away from what their mission is."
And with public-facing data, the ever-present need to ensure accuracy is now extending to downstream use of agency data by others. One participant cited a recent case of competing visualizations of the spread of COVID-19 based on government datasets: "The first thing I thought was that if I was the one sharing the data, I'd be getting hammered with, 'Which data's right?'"
Political blowback isn't limited to the data sphere, another said, and "at some point you've got to terminate the transaction. But data seems a bit more pervasive. And so if you're the authoritative source, then you're going to get the questions that come with providing that."
Perhaps most significant, the group said, is the enormous workload that can come with opening an organization's data to outside users. "It's killing us," one executive said. "There's no way to understand that demand curve. The more insight and data we make available — which is good, right? — then there are more pivots and different needs. There are different datasets and different questions about the models themselves." That volume and variety have made it nearly impossible to give team members clearly defined responsibility areas, he added, so they just keep triaging and adapting in real time.
Training leaders to see data as a mission asset
The COVID crisis has put a new emphasis on certain data, but most participants said they still struggle to convince agency leaders to view data as a mission asset rather than a back-office resource. "We can't be solving the data management problem; you've got to be solving the mission problem," one official said. "So that's how we talk about it more."
Another said the challenge goes beyond messaging. Federal executives quickly learn to issue data calls, and "we condition [people] to not necessarily think and hunt data for themselves. So when they are in a place where they are the owner of it, they don't think about it as an asset." It's not a deliberate choice, he said. "It's just learned behavior over the course of a career."
The solution is simple, the group said, at least at the conceptual level: Identify the gaps, and show how better data management can bridge them. Then, as one participant said, your data "becomes something that is key to mission outcomes."
NEXT STORY: Pentagon raises AI ambitions