White House probes the collision of AI and personal data

Thapana Onphalai/Getty Images

Policymakers are examining how the advent of artificial intelligence could create new privacy risks around the federal agency purchase and use of personal information from data brokers.

The Biden administration is mulling new policies for federal agencies that buy data on Americans from commercial data brokers, citing increased risks posed by the increased use of artificial intelligence in government IT systems.

In a request for information scheduled to be published Wednesday, the administration lays out potential privacy risks stemming from the use of commercially available information by federal agencies.

Agency purchase of sensitive data about people via third parties like data brokers “raises privacy concerns stemming from a lack of transparency,” Richard Revesz, the head of the Office of Information and Regulatory Affairs at the Office of Management and Budget, wrote in a blog about the RFI.

Of particular interest are those risks that are supercharged by artificial intelligence.

AI can fuel inferences about people based on the massive amounts of data brokers hold about them, including things like their political preference or sexuality, said Calli Schroeder, senior counsel and global privacy council at the Electronic Privacy Information Center.

Updating the guidance, Schroeder said, “not only would acknowledge that there is quite a bit of use of data broker supplied information that includes a huge amount of PII, but also...  that there do have to be some protections and precautions in how government is using this information.” 

The RFI also covers information sold or licensed about individuals’ devices and locations, an issue that made headlines when the Trump administration used cellphone location data for immigration and border enforcement.

Although there are legal and policy frameworks for the government’s use of PII already, Revesz writes that “the privacy concerns associated with CAI containing PII raise questions about whether agencies need to take additional steps to apply the framework of privacy law and policy to mitigate the risks exacerbated by new technology.”

The RFI tracks with President Joe Biden’s directive that OMB conduct an evaluation of what CAI agencies purchase and how they use that data as part of his AI-focused executive order to “inform potential guidance” on privacy risks related to the use of data brokers. 

Among the questions OMB wants answers to is what changes to current guidance might be necessary, as well as what policies or procedures agencies should have to follow. 

There are also questions about how agencies share information on the use of this data with the public and whether agencies should have certain provisions around data quality in their agreements with third-party providers of this type of data. 

But national security use cases of CAI are one thing out of scope for the RFI, OMB notes. 

Still, years before the advent of generative AI tools, collection of personal data has been a lightning rod for controversy in both the intelligence community and federal civilian agencies with cybersecurity or law enforcement mandates that often fall into the national security realm.

A government report released last year said that the IC frequently buys troves of Americans’ data with few checks and balances, and that use of such information without oversight presents a privacy threat. Some of those purchases have included social media data, it said at the time.

Personal information that’s hoovered up on digital marketplaces like social media platforms is packaged by data brokers, and U.S. spy agencies are among their customers. The dynamic has put the intelligence community on thin ice with both lawmakers and privacy advocates who call it an end-run around the Fourth Amendment, which bars unreasonable searches and seizures.

The National Security Agency, for instance, purchases certain types of Americans’ web browsing data from data brokers without a warrant, privacy hawk Sen. Ron Wyden, D-Ore. revealed in January. He alleged NSA is in violation of a Federal Trade Commission order that bars data brokers from selling individuals’ geolocation data without first obtaining consent from consumers.

Privacy-centric lawmakers on both sides of the political aisle pushed to inject sweeping reform measures into a powerful foreign spying power when it was reauthorized in April, including an amendment that would require a warrant to sift through collected communications data that includes discussions with U.S. persons, though the effort was ultimately unsuccessful.

The spying authority, backed by Section 702 of the Foreign Intelligence Surveillance Act, allows agencies like the FBI and NSA to warrantlessly target foreigners abroad by ordering U.S. internet and telecom providers to hand over reams of communications data, like emails or text messages, on those foreign targets for use in national security investigations. 

The law is largely controversial because two-sided conversation collection is permitted, even when a foreign target is speaking to an American on the other end of the communications.

In May, the Office of the Director of National Intelligence released a framework that aims to guide spy agencies on best practices for ethically using the commercial data that analysts frequently leverage in their day-to-day work, declaring they must have procedures in place to safeguard collected data that can easily identify Americans.

Some civilian agencies also rely on data from outside sources.

The use of credit bureaus, for example, is often an ingredient for agencies to verify that someone is who they say they are online — despite the fact that the government already houses a lot of authoritative information on Americans, such as social security numbers. 

Some anti-fraud and oversight experts have bemoaned the difficulty agencies face in sharing data among themselves under existing statute for government data sharing.

Still, a recent government report suggested that federal agencies could use their own information more to decrease reliance on “incomplete commercial data.”