Data behaving badly

Why data veracity is key to ensuring that business insights are reliable.

Data analytics
 

The private sector, especially consumer-facing organizations, are betting big on data-intensive technologies like artificial intelligence and the internet of things. The trend is accelerating worldwide, with private sector investments in AI projected to reach $12.5 billion in 2017 alone, and IoT investments expected to top $800 billion.  Although slower to embrace AI and IoT, government is now pursuing them aggressively.

Through AI, government has the potential to augment human capabilities and transform the employee and citizen digital experience. As exciting as the potential of these data-driven technologies are in helping agencies achieve their mission goals, they also raise new risks including inaccurate and manipulated data. Success with data requires more than expenditures; it also requires a focus on the quality and accuracy of the data used to produce insights.

Organizations must establish the veracity or accuracy of their data coming from multiple sources, such as IoT devices, to ensure that their business insights are reliable. Even the most advanced analytics and forecasting systems are only as good as the data that they are given to crunch.

In the recent Accenture Technology Vision 2018 companion survey, 86 percent of federal executive respondents said that their organizations are increasingly using data to drive critical and automated decision-making, but they also agreed that many organizations have not invested in the capabilities to verify the truth within their data.

Agencies don't need to accept the risks of poor data veracity. Instead, they can address vulnerabilities and build citizen confidence by extending their cyber operations and data assurance technologies to demonstrate data trustworthiness. As we discuss in the Accenture Technology Vision for Federal Government 2018 report -- titled Technology Advances. Federal Impacts -- this approach should focus on three key areas:

  • Provenance: verifying the history of data throughout its life cycle.
  • Context: considering the circumstances around its use.
  • Integrity: securing and maintaining data.

To begin this process, it's imperative that agencies form an internal team to establish, implement and maintain standards around these areas. The team should include data science and cybersecurity experts who can implement data integrity and security standards throughout the organization.

The job of this team is to "grade" the accuracy within an organization's data. This requires an understanding of the behavior around it, whether it's a person creating a data trail by searching for information about a new Medicare card, or a sensor network reporting a temperature reading for an industrial system. There's an associated behavior around all data origination, and organizations must build the capacity to track this behavior as data is recorded, used and maintained.

With this knowledge, agencies can provide cybersecurity and risk management systems with a baseline of normal behavior and therefore reduce the "noise" in the data, so that anomalies stand out. This knowledge also allows them to efficiently implement new technologies.

In the private sector, Google is already using machine learning to remove applications with overreaching permissions from its Play Store. For example, a flashlight app should only activate a smart phone's LED; if the app also requests access to a person's contacts, Google's system will flag the app for further review. Establishing these types of baselines for accepted permissions and knowledge will empower organizations to identify and address abnormal patterns of behavior. 

The same is true for government agencies that are experimenting with open source information. Currently, the State Department has a project underway that aims to understand if propaganda bots are spreading false information to influence the public. At the 2018 SXSW conference, Presidential Innovation Fellow Amy Wilson and Shawn Powers, executive director of the Advisory Commission on Public Diplomacy, discussed their plans to authenticate bot announcements. One such approach is to flag data that deviates from or conflicts with a known context.

The Air Force is already fusing text, video and virtually every potential source of data or information through a program called Data to Decision. The program would give commanders and warfighters a tool to make better decisions based on data. AI's job would be to establish "a complete cycle of understanding, from predicting what we expect to encounter, prescribe what can be done to help, understand the environment, then find, fix, track, target, engage, assess, anything, anytime, anywhere in any domain," explained Mark Tapper, special advisor to the deputy chief of staff for Air Force intelligence in a recent interview with Defense One.

In this example, developing actionable insights from multiple data sources requires reaching consensus, which can be achieved by averaging the data points together. However, contradictory indicators that don't fit the narrative should not be ignored. The presence of contradictory indicators simply expresses more uncertainty in the accuracy of the data and the need for further intervention from humans to work out discrepancies.    

Data is the lifeblood of federal government, and the decisions that flow from data will ultimately shape the future of agencies, increase efficiency in services and ensure citizen trust in the government's ability to use new technologies like AI. In fact, the organizations with the best people and data available to train an AI application how to do its job will create the most capable AI systems. Therefore, ensuring the integrity of data must be as much of a priority as implementing the technology itself.