We’ve evolved a population of 1050 agents of different anatomies (Ant, Bug, Spider), policies (MLP, LSTM), and adaptation strategies (PPO-tracking, RL^2, meta-updates) for 10 epochs. Initially, we had an equal number of agents of each type. Every epoch, we randomly matched 1000 pairs of agents and made them compete and adapt in multi-round games against each other. The agents that lost disappeared from the population, while the winners replicated themselves.
Summary: After a few epochs of evolution, Spiders, being the weakest, disappeared, the subpopulation of Bugs more than doubled, the Ants stayed the same. Importantly, the agents with meta-learned adaptation strategies end up dominating the population.
OpenAI has developed a “learning to learn” (or meta-learning) framework that allows an AI agent to continuously adapt to a dynamic environment, at least in certain conditions. The environment is dynamic for a number of reasons, including the fact that opponents are learning as well.
AI agents equipped with the meta-learning framework win more fights against their opponents and eventually dominate the environment. Be sure to watch the last video to see the effect.
The meta-learning framework gives the selected AI agents the capability to predict and anticipate the changes in the environment and adapt faster than the AI agents that only learn from direct experience.
We know that the neocortex is a prediction machine and that human intelligence amounts to the capability to anticipate and adapt. This research is a key step towards artificial general intelligence.