6 methods for squashing info turf battles
Confused about how to achieve all that transparency and collaboration? Here are 6 options that can make your organization an open book.
It’s common these days for government agencies to share data in the name of transparency or perhaps connect the dots for anti-terrorism activities. Those are noble goals, to be sure.
But cash-strapped organizations aren’t ponying up millions of dollars each year just to do the right thing. A growing collection of regulatory requirements — ranging from the National Strategy for Information Sharing to a plethora of executive orders issued after the 2001 terrorist attacks — is putting legal teeth into the practice of playing nice with one another.
Even so, years of turf battles and ingrained reluctance to share information are hard to reverse. “Traditionally, our government is really poor at anything that tries to cross buckets of money,” said Michael Daconta, former metadata program manager at the Homeland Security Department and now chief technology officer at Accelerated Information Management.
But there are signs that attitude is changing. For one, technologies and data-formatting standards such as the Justice Department and DHS’ National Information Exchange Model (NIEM) are ushering in an era of simpler, easier-to-launch data-sharing methodologies that require less heavy-lifting to make individual computing platforms communicate with one another.
“One of the key trends is the real-time sharing of operational information, which is a far cry from the old days when agencies had to rely on data warehouses with outdated information,” said Melissa Chapman, former CIO at the Health and Human Services Department and now a vice president at Agilex Technologies.
Others have even grander visions for the future, such as data flowing throughout an information ecosystem that gathers the collective intelligence of the entire public sector. In that brave new world, smart-phone users could, for example, see the schedule of the next bus heading uptown, find crime rates for the area near the bus stop, look for reported flu cases in that area and learn what the Environmental Protection Agency has to say about the air quality there.
“We are seeing more apps appear in emergency management, law enforcement and even…the fire service that serve to raise information sharing to a new level,” said Paul Wormeli, executive director of the Integrated Justice Information Systems Institute.
But whether data-sharing goals are visionary or target a more pragmatic purpose, such as complying with a legal mandate, agencies must navigate a range of technology choices, each with trade-offs in complexity, cost and flexibility. Ultimately, however, many of the choices fall into the following six categories.
1. Data Dump
This publish-it-and-they-will-come approach makes large swaths of public information easily available in the hope that someone will find something useful to do with it.
- Data In/Data Out: Federal, state and local agencies amass large volumes of data while performing their day-to-day operations. After converting that raw data into a computer-readable format, such as Extensible Markup Language, they publish entire datasets on public websites. Private companies, software developers and the public can break out chunks of data for their own websites or custom applications. Examples include mobile programs that combine city transit schedules with real-time traffic feeds so commuters know when the next bus is due.
- Who Uses It: Federal examples include Data.gov, various datasets from the National Oceanic and Atmospheric Administration, and HHS’ Community Health Data Initiative. In addition, Massachusetts, San Francisco, the District of Columbia and a growing number of other state and local entities are fielding their own versions.
- Pros: This appraoch is a fast, low-cost way to distribute government data without burdening agencies with new applications and computing frameworks. And it might inspire outsiders to create new types of specialized applications.
- Cons: The one-way information flow doesn’t foster collaboration with other agencies or people.
- Special Considerations: Agencies need to pay close attention to identifying appropriate data to share without undermining security and privacy safeguards. For example, San Francisco doesn’t release data for DataSF, which went live in 2009, unless the mayor’s office, IT department and city attorney deem that the information doesn’t infringe on people’s privacy.
- Bottom Line: With proper controls in place, this approach could generate even wider data-sharing opportunities. “This may unlock our ability to work with the federal or state government to share data around areas like health, transportation or human services,” said Chris Vein, San Francisco’s CIO.
2. Standard Automatic
Computers do the talking in this approach, designed primarily to automate the collection and reporting of standardized information.
- Data In/Data Out: Members of a select community of data-sharing organizations establish automated information feeds to a central data warehouse maintained by an oversight agency. That agency summarizes the data into reports for internal consumption or display on a public Web portal.
- Who Uses It: The Education Department’s EDFacts and EPA’s Central Data Exchange follow this approach.
- Pros: Automated data feeds reduce the time, expense and potential inaccuracies associated with manual processes.
- Cons: Data sharing is confined to predetermined datasets that must be translated into computer-ready formats such as XML. Launching the systems requires significant upfront coordination among community members.
- Special Considerations: Agencies that share the reports with outsiders will need additional technology layers. For example, Education uses its internal EDFacts database to collect annual consolidated state performance reports and Institute of Education Sciences and National Center for Education Statistics data. But visitors don’t query the database directly when they want to know graduation rates for area high schools, for example. Instead, EDFacts exports the statistics in XML files to ED Data Express, which converts the results into formats compatible with its separate Microsoft SQL database and then displays the information to users of the public Web portal.
- Bottom Line: This highly efficient model for data exchange works best when a primary sponsor or stakeholder can dictate standards for the computer-to-computer transactions. After a system is established, adding new participants to the community is easy.
3. Free-Form Aggregation
This mostly centralized approach attempts to collect and disperse data feeds from individual organizations through a single, large data warehouse.
- Data In/Data Out: A closed community of organizations that share common missions voluntarily uploads full or partial records to a central data warehouse. Authorized community members can search the warehouse for information of interest. In addition, participants might allow other members to search portions of their locally maintained data stores.
- Who Uses It: The FBI hosts the Law Enforcement National Data Exchange (N-DEx), and participants include federal, state and local organizations. The approach is also used on a smaller scale for regional law enforcement activities across the country.
- Pros: Agencies can maintain and determine access privileges for their information while also providing a vehicle for sharing statewide, regional and national information. The approach can manage structured and unstructured data, including text-based incident reports.
- Cons: Large data warehouses are difficult to manage and update. The strategy also requires significant upfront coordination to standardize terminology and data formats. Individual agencies must translate their data into agreed-upon formats, which sometimes requires manual intervention.
- Special Considerations: About 3,000 law enforcement, corrections, and probation and parole agencies now use N-DEx. It requires organizations to convert their data to versions of NIEM and Justice’s Logical Entity Exchange Specification. In addition, because the system crosses such a range of jurisdictions, it must accommodate a variety of access controls to conform to local privacy and security rules. To do that, the original owners of the information designate that others may only see pointers to certain information while other records are displayed in full.
- Bottom Line: Either way, N-DEx can develop ties with far-flung departments. “Someone in Syracuse knows that someone in San Antonio is interested in a suspect by virtue of that query coming in,” said Jeff Lindsey, the FBI’s N-DEx program manager. “Now I’ve got collaboration.” Similar approaches work at the regional level. For example, when law enforcement agencies in Lehigh County, Pa., post updates to their locally managed records management system, some of the information goes to a central NIEM-compliant data warehouse maintained by the county. More than 22 agencies and the district attorney’s office can then search the storehouse for suspects, vehicles or other entities of interest. On the advice of the district attorney’s office, agencies don’t publish the full records for security reasons. “When you want more information, you contact that department to see what else they have on that subject,” said Douglas Kish, chief of the Catasauqua Police Department.
4. Federated Data Stores
By eschewing the management challenges of trying to field a large data warehouse, this method creates a search framework that spans a collection of dedicated databases.
- Data In/Data Out: Each group of users within a particular domain, such as law enforcement, stores select information on a dedicated server maintained locally. Other community members use proprietary search tools to query all the other dedicated servers within the community.
- Who Uses It: The Nationwide Suspicious Activity Reporting Initiative (NSI) includes participants from fusion centers and federal, state and local law enforcement agencies. A similar approach is used for the Nationwide Health Information Network (NHIN) that HHS manages.
- Pros: The approach is a cost-effective way to integrate federal, regional and local agencies. The system allows each organization to maintain control of its data.
- Cons: Special care must be given to the prevailing privacy and security statutes of individual participants.
- Special Considerations: Organizations may not join NSI until they have a privacy plan in place, which sometimes takes longer to develop than the technology implementation, Wormeli said. And before data can be stored on one of the shared servers, it must be converted from its native database format to a common data architecture based on the Suspicious Activity Reporting Information Exchange Package Document, a NIEM-based standard.
- Bottom Line: Rather than launching dedicated servers, NHIN uses a series of gateways built with Connect, which is open-source software for the health care market. Each agency manages its own data stores but relies on the gateways to authenticate and pass information among organizations. Like NSI, NHIN requires an underlying set of data standards — in this case, those developed by the Healthcare Information Technology Standards Panel, a group made up of public and private interests. The standards include common formats and dictionaries of terms. “When a physician makes a diagnosis, there’s a set of standardized diagnostic terms that he may choose from,” Chapman said. “That way, each doctor knows precisely what [another] physician meant by that diagnosis.”
5. Special-Purpose Wikis
This stalwart of Web 2.0 initiatives offers a low-maintenance way to build data-sharing, interactive communities.
- Data In/Data Out: Participants contribute free-form text, reports and pointers to related resources in a communal environment. After logging in, authorized users search, read, comment on and update the available information.
- Who Uses It: The intelligence community’s Intellipedia, the State Department’s Diplopedia and the General Services Administration’s Colab use this approach.
- Pros: It promotes real-time collaboration among users within or across agencies who can continuously update and correct published information when necessary. The modest Web-based wiki infrastructure keeps implementation and maintenance costs low.
- Cons: Information is not vetted after it is published, which means that incomplete or inaccurate data might exist.
- Special Considerations: Information overload can occur when large numbers of users post duplicate or outdated information on the same topic. Safeguards should include clear pointers to the sources of information and other factors that affect reliability.
- Bottom Line: A wiki can be set up in literally minutes, providing one of the fastest paths to information sharing and collaboration. As Intellipedia demonstrates with its tiered structure by clearance level, wikis can also support sensitive information securely, making it an option for a range of applications.
6. Hard-Wired Terminals
Direct connections among agencies establish secure information feeds with little infrastructure overhead.
- Data In/Data Out: One agency allows a trusted guest to tap into an internal database. The guest users access authorized sections of the database via a dedicated terminal.
- Pros: It is simple to set up and effective when crossing departmental boundaries. It simplifies some of the complexities surrounding security, data formatting and technology integration.
- Cons: A separate terminal is required for each connection, which can result in a confusing array of hardware and difficulties when trying to aggregate related information from various feeds.
- Who Uses It: The National Counterterrorism Center and others in intelligence, defense and law enforcement use this approach.
- Special Considerations: Because connections are hard-wired and discrete, this alternative achieves a level of sharing that meets the letter but not entirely the spirit of the term.
- Bottom Line: Because the computer infrastructure providing access to another party’s data systems remains entirely segregated, this form of sharing is not a good option for automated data handling or combining data from different sources.