The AI That Has Nothing to Learn From Humans
DeepMind’s new self-taught Go-playing program is making moves that other players describe as “alien” and “from an alternate dimension.”
It was a tense summer day in 1835 Japan. The country’s reigning Go player, Honinbo Jowa, took his seat across a board from a 25-year-old prodigy by the name of Akaboshi Intetsu. Both men had spent their lives mastering the two-player strategy game that’s long been popular in East Asia. Their face-off, that day, was high-stakes: Honinbo and Akaboshi represented two Go houses fighting for power, and the rivalry between the two camps had lately exploded into accusations of foul play.
Little did they know that the match—now remembered by Go historians as the “blood-vomiting game”—would last for several grueling days. Or that it would lead to a grisly end.
Early on, the young Akaboshi took a lead. But then, according to lore, “ghosts” appeared and showed Honinbo three crucial moves. His comeback was so overwhelming that, as the story goes, his junior opponent keeled over and began coughing up blood. Weeks later, Akaboshi was found dead. Historians have speculated that he might have had an undiagnosed respiratory disease.
It makes a certain kind of sense that the game’s connoisseurs might have wondered if they’d seen glimpses of the occult in those three so-called ghost moves. Unlike something like tic-tac-toe, which is straightforward enough that the optimal strategy is always clear-cut, Go is so complex that new, unfamiliar strategies can feel astonishing, revolutionary, or even uncanny.
Unfortunately for ghosts, now it’s computers that are revealing these goosebump-inducing moves.
As many will remember, AlphaGo—a program that used machine learning to master Go—decimated world champion Ke Jie earlier this year. Then, the program’s creators at Google’s DeepMind let the program continue to train by playing millions of games against itself. In a paper published in Nature earlier this week, DeepMind revealed that a new version of AlphaGo (which they christened AlphaGo Zero) picked up Go from scratch, without studying any human games at all. AlphaGo Zero took a mere three days to reach the point where it was pitted against an older version of itself and won 100 games to zero.
Now that AlphaGo’s arguably got nothing left to learn from humans—now that its continued progress takes the form of endless training games against itself—what do its tactics look like, in the eyes of experienced human players? We might have some early glimpses into an answer.
AlphaGo Zero’s latest games haven’t been disclosed yet. But several months ago, the company publicly released 55 games that an older version of AlphaGo played against itself. (Note that this is the incarnation of AlphaGo that had already made quick work of the world’s champions.) DeepMind called its offering a “special gift to fans of Go around the world.”
Since May, experts have been painstakingly analyzing the 55 machine-versus-machine games. And their descriptions of AlphaGo’s moves often seem to keep circling back to the same several words: Amazing. Strange. Alien.
“They’re how I imagine games from far in the future,” Shi Yue, a top Go player from China, has told the press. A Go enthusiast named Jonathan Hop who’s been reviewing the games on YouTube calls the AlphaGo-versus-AlphaGo face-offs “Go from an alternate dimension.” From all accounts, one gets the sense that an alien civilization has dropped a cryptic guidebook in our midst: a manual that’s brilliant—or at least, the parts of it we can understand.
Will Lockhart, a physics grad student and avid Go player who codirected The Surrounding Game (a documentary about the pastime’s history and devotees) tried to describe the difference between watching AlphaGo’s games against top human players, on the one hand, and its self-paired games, on the other. (I interviewed Will’s Go-playing brother Ben about Asia’s intensive Go schools in 2016.) According to Will, AlphaGo’s moves against Ke Jie made it seem to be “inevitably marching toward victory,” while Ke seemed to be “punching a brick wall.” Any time the Chinese player had perhaps found a way forward, said Lockhart, “10 moves later AlphaGo had resolved it in such a simple way, and it was like, ‘Poof, well that didn’t lead anywhere!’”
By contrast, AlphaGo’s self-paired games might have seemed more frenetic. More complex. Lockhart compares them to “people sword-fighting on a tightrope.”
Expert players are also noticing AlphaGo’s idiosyncrasies. Lockhart and others mention that it almost fights various battles simultaneously, adopting an approach that might seem a bit madcap to human players, who’d probably spend more energy focusing on smaller areas of the board at a time. According to Michael Redmond, the highest-ranked Go player from the Western world (he relocated to Japan at the age of 14 to study Go), humans have accumulated knowledge that might tend to be more useful on the sides and corners of the board. AlphaGo “has less of that bias,” he noted, “so it can make impressive moves in the center that are harder for us to grasp.”
Also, it’s been making unorthodox opening moves. Some of those gambits, just two years ago, might have seemed ill-conceived to experts. But now pro players are copying certain of these unfamiliar tactics in tournaments, even if no one fully understands how certain of these tactics lead to victory. For example, people have noticed that some versions of AlphaGo seem to like playing what’s called a three-three invasion on a star point, and they’re experimenting with that move in tournaments now too. No one’s seeing these experiments lead to clearly consistent victories yet, maybe because human players don’t understand how best to follow through.
Some moves AlphaGo likes to make against its clone are downright incomprehensible, even to the world’s best players. (These tend to happen early on in the games—probably because that phase is already mysterious, being farthest away from any final game outcome.) One opening move in Game One has many players stumped. Says Redmond, “I think a natural reaction (and the reaction I’m mostly seeing) is that they just sort of give up, and sort of throw their hands up in the opening. Because it’s so hard to try to attach a story about what AlphaGo is doing. You have to be ready to deny a lot of the things that we’ve believed and that have worked for us.”
Like others, Redmond notes that the games somehow feel “alien.” “There’s some inhuman element in the way AlphaGo plays,” he says, “which makes it very difficult for us to just even sort of get into the game.”
Still, Redmond thinks there are moments when AlphaGo (at least its older version) might not necessarily be enigmatically, transcendently good. Moments when it might possibly be making mistakes, even. There are patterns of play called joseki—series of locally confined attacks and responses, in which players essentially battle to a standstill until it only makes sense for them to move to another part of the board. Some of these joseki have been analyzed and memorized and honed over generations. Redmond suspects that people may still be better at responding in a few of these patterns, because people have analyzed them so intensely. (It’s hard to tell though, because in the AlphaGo-versus-AlphaGo games, both “copies” of the program seem to avoid getting into these joseki in the first place.)
It’s not far-fetched that AlphaGo may still be choosing suboptimal moves—making “mistakes,” if you will. You can see Go as a massive tree made of thousands of branches representing possible moves and countermoves. Over generations, Go players have identified certain clusters of branches that seem to work really well. And now that AlphaGo’s come along, it’s finding even better options. Still, huge swaths of the tree might yet be unexplored. As Lockhart put it, “It could be possible that a perfect God plays [AlphaGo] and crushes it. Or maybe not. Maybe it’s already there. We don't know.”
* * *
From his home base in Chiba, Japan, Redmond says, he has been studying AlphaGo’s self-paired games more or less nonstop for the past four months. He’s been videotaping his commentaries on each game and putting out one video per week on the American Go Association’s YouTube channel. One of his biggest challenges in these videos, he says, is to “attach stories” to AlphaGo’s moves.
“Generally the way humans learn Go is that we have a story,” he points out. “That’s the way we communicate. It’s a very human thing.”
After all, people can identify and discuss shapes and patterns. Or we can argue with each other about the reasons a killer move won the game. Take a basic example: When teaching beginners, a Go instructor might point out an odd-looking formation of stones resembling a lion’s mouth or a tortoiseshell (among other patterns) and discuss how best to play in these situations. In theory, AlphaGo could have something akin to that knowledge: A portion of its neural network might hypothetically be “sounding an alarm,” so to speak, whenever that lion’s-mouth pattern appears on the board. But even if that were the case, AlphaGo isn’t equipped to turn this sort of knowledge into any kind of a shareable story. So far, that task is one that still falls to people.