Streaming Data is One Thing, Interpreting It is Another
Digital strategy may lead to more corrections of mistaken analyses.
The first audience question during the inaugural General Services Administration webinar on implementing the federal digital strategy hit on one of the major thorns in the government’s new push for open data.
Why should I open up my data, the questioner asked, when someone can simply grab that data and mash it up in a way that’s misleading, out of context or just plain wrong?
The questioner was referring to a portion of the digital strategy that requires agencies to make application programming interfaces, which automatically stream data to external websites in machine-readable form, the “new default in government.”
The answer from presenter Fred Smith, technology lead for the Centers for Disease Control and Prevention’s news and electronic media division: “If it’s on your website, people are doing that already and you just don’t know about it.”
By streaming agency data out through APIs rather than forcing interested people to copy and paste the information out of PDF’s on the agency website, agencies can at least track who’s taking the data and check if it’s being used inaccurately, Smith said.
The question and answer hit, though, on a perhaps inevitable weakness in the federal open data strategy and open data generally. One idea behind the digital strategy is to save the time of federal data requesters and providers by making that information available upfront. A major selling point of the strategy, presenters during Thursday’s webinar said, is that it will cut down on the number of Freedom of Information Act requests agencies process.
But just making your data available is no guarantee people on the receiving end will be able to interpret it accurately. Take it from someone who looks at information other people compiled for a living -- something that seems completely clear to the writer can be inscrutable to the reader unless the writer is willing to sit down and explain it.
Some of this confusion can be allayed by good metadata, so the reader knows that he’s looking at the most recent report, for instance. But that won’t account for confusing elements inside the reports themselves.
Something as small as a poorly worded column heading, for instance, can lead to a major misunderstanding about just what a report says. And there’s no provision in the digital strategy to account for that.
The result, I suspect, is that there will be a lot of backend conversations correcting erroneous interpretations of federal data after they’ve been published. As Smith said, though, that’s certainly preferable to those misinterpretations festering without agencies even being aware of them.