AlphaZero

From Wikipedia Quality
Jump to: navigation, search

AlphaZero is a computer program developed by the Alphabet-owned AI research company DeepMind, which uses an approach similar to AlphaGo Zero's to master not just Go, but also chess and shogi. On December 5, 2017 the DeepMind team released a preprint introducing AlphaZero, which, within 24 hours, achieved a superhuman level of play in these three games by defeating world-champion programs, Stockfish, elmo, and the 3-day version of AlphaGo Zero, in each case making use of custom tensor processing units (TPUs) that the Google programs were optimized to make use of. AlphaZero was trained solely via "self-play" using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After just four hours of training, DeepMind estimated AlphaZero was playing at a higher Elo rating than Stockfish 8; after 9 hours of training, the algorithm decisively defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws). The trained algorithm played on a single machine with four TPUs.