Researchers: It Is Trivially Easy to Match Metadata to Real People
Telephony metadata NSA collects does not include customer names, but it's easy to figure them out.
In defending the NSA's telephony metadata collection efforts, government officials have repeatedly resorted to one seemingly significant detail: This is just metadata—numbers dialed, lengths of calls. "There are no names, there’s no content in that database," President Barack Obama told Charlie Rose in June.
No names; just metadata.
New research from Stanford demonstrates the silliness of that distinction. Armed with very sparse metadata, Jonathan Mayer and Patrick Mutchler found it easy—trivially so—to figure out the identity of a caller.
Mayer and Mutchler are running an experiment which works with volunteers who agree to use an Android app, MetaPhone, that allows the researchers access to their metadata. Now, using that data, Mayer and Mutchler say that it was hardly any trouble at all to figure out who the phone numbers belonged to, and they did it in just a few hours.
We randomly sampled 5,000 numbers from our crowdsourced MetaPhone data set and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.
What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.
How about if money were no object? We don’t have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched.1 Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.
Their results weren't perfect (and they note that the Intelius data was particularly spotty), but they didn't even try all that hard. "If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers," they conclude.
It's also difficult to believe they wouldn't try. As federal district judge Richard Leon wrote in his decision last week, "There is also nothing stopping the Government from skipping the [National Security Letter] step altogether and using public databases or any of its other vast resources to match phone numbers with subscribers."
(Image via Genialbaron/Shutterstock.com)