The Art (Not Science) of Deepfakes
Modern machine-learning technology brings deepfakes within reach of anyone.
Fake news is so 2018. The new, real news is more worrisome: video creation by almost anyone, that manipulates the truth in very convincing ways. Sen. Mark Warner, top Democrat on the Senate Intelligence Committee, recently acknowledged the intelligence community is “extremely concerned” about the rise of deepfake technology.
Deepfakes are videos or images that substitute someone else’s face in place of the person in the original media. Their use in influencing voters is top of mind as 2020 approaches. While politicians often do a great job of getting themselves into trouble without the help of deepfake trolls, it isn’t hard to imagine deepfake videos being used to inflict additional damage to a candidate’s chances at a pivotal moment in a campaign.
But the impact of deepfakes extends well beyond elections. They could be used as a form of personal revenge or assault against women (and men too). Celebrities like Jennifer Lawrence have already spoken out against deepfakes that placed her face on the bodies of porn stars. Nation-states could get tricked into making military, political and judicial decisions based on fake videos. For now, this risk is most acute for developing countries where intelligence services might lack the technical acumen to discern deepfakes, but as the art evolves, such videos could dupe highly capable intelligence services too.
If these examples are concerning, keep in mind that they’re only the beginning. Deepfakes are easy to create. Modern machine-learning technology brings deepfakes within reach of anyone with a reasonably powerful graphics card and off-the-Github software. The code repo at Github says that Faceswap, the code that makes deepfakes possible, “is a tool that utilizes deep learning to recognize and swap faces in pictures and videos.” All that’s required to produce a video that convincingly swaps one face for another is a reasonable number of images of those two faces.
About Machine Learning and Deepfakes
The branch of machine learning called deep learning employs artificial neural networks. We say “deep” because of the number of layers of artificial neurons through which data is processed, transformed at each layer into increasingly abstract form: pixels into edges, edges into shapes, shapes into complex objects, for example—you guessed it—faces.
Digging deeper, Faceswap uses a novel configuration of autoencoder neural networks. A single encoder takes in imagery. The output of that encoder is then used to train two decoder neural nets to reproduce accurate images of each actor, respectively. Once the decoders are trained, we play a trick: pass images of one person to the encoder, but connect the encoder’s output to another person’s decoder. None of this work requires expertise in machine learning or graphics editing.
What’s worrying is not the current state of deepfakes, but that neural network design is an art form, not a science, because no one understands exactly how neural networks do what they do. Fakes will continue to evolve as better neural network art emerges, which will in turn increasingly enable amateur artists (let’s call them what they are: machine learning enabled trolls) across the broad landscape of the internet to produce more and more convincing but false video and images.
Government and Industry Efforts Underway
A variety of technical approaches are under development to better detect deepfakes. One approach analyzes the blink rate of subjects in videos.
Defense Advanced Research Projects Agency is also working on the problem, partnering with several of the country’s leading research institutions through the DARPA Media Forensics program to detect tiny but recognizable discrepancies between audio and visual tracks in deepfakes.
Another promising approach relies on the provenance of imagery. Many cameras and all image and video editing software can insert tamper-proof metadata that describes where an image came from, when it was created, and how it was derived. A bit more work could use this capability trace provenance of imagery composed from original images, even for multiple generations of derivation. Browser plug-ins could then read this metadata and offer each user evidence with which to decide on believability.
To induce the question of whether real is fake or fake is real is to threaten fundamental trust in a large society, simply because we cannot connect face to face. But whose job is it to combat deepfakes and elevate fact? The executive branch and Congress will certainly need to play a role, but much like internet-of-things security, ultimate oversight is a muddled picture with no clear accountability. That picture is even more muddled when we can, with no training and little effort, swap faces in it at will.
Dr. David Archer is a principal researcher in privacy and cryptography at Galois.