It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.
This updated neural network is then recombined with the search algorithm to create a new, stronger version of AlphaGo Zero, and the process begins again. In each iteration, the performance of the system improves by a small amount, and the quality of the self-play games increases, leading to more and more accurate neural networks and ever stronger versions of AlphaGo Zero.
After just three days of self-play training, AlphaGo Zero emphatically defeated the previously published version of AlphaGo – which had itself defeated 18-time world champion Lee Sedol – by 100 games to 0. After 40 days of self training, AlphaGo Zero became even stronger, outperforming the version of AlphaGo known as “Master”, which has defeated the world’s best players and world number one Ke Jie.
The new AlphaGo Zero is not impressive just because it uses no data set to become the world leader at what it does. It’s impressive also because it achieves the goal at a pace no human will ever be able to match.