Researchers from DeepMind and the University of Toronto have announced DreamerV3, a reinforcement learning (RL) algorithm for training artificial intelligence models for many different domains. Using a single set of hyperparameters, DreamerV3 outperforms other methods across several benchmarks and can train AI to collect diamonds in Minecraft without human instruction.
Summary
General intelligence requires problem solving in many domains. Current reinforcement learning algorithms carry this potential, but are hampered by the resources and knowledge required to tune them for new tasks. DreamerV3, a general and scalable algorithm based on models of the world, outperforms previous approaches in a wide range of domains with fixed hyperparameters. These domains include continuous and discrete actions, visual and low-dimensional inputs, two-dimensional and three-dimensional worlds, various data budgets, frequency rewards, and reward scales.
Minecraft
DreamerV3 is the first algorithm that collects diamonds in Minecraft without human demonstrations or hand-crafted tutorials, presenting a great challenge to explore. The video shows the first diamond it collects, which happens at 30 million steps in the middle or 17 days of play.
Below is uncut video of the moves during which DreamerV3 collects diamonds. The algorithm succeeds under many initial conditions that require searching for trees in the world, swimming across lakes, and traversing mountains.
Comparative tests
DreamerV3 masters a wide range of domains with a fixed set of hyperparameters, outperforming specialized methods. Eliminating the need for tuning reduces the amount of expert knowledge and computational resources required to implement reinforcement learning.
Although the source code of DreamerV3 has not been published, its creator says it “will be soon.” The code for the previous version, DreamerV2, is available on GitHub. Hafner notes that V3 includes “better playback buffers” and is implemented based on JAX instead of TensorFlow.