What’s hot in AI: Deep reinforcement learning
- by 7wData
Deep reinforcement learning (DRL) is an exciting area of AI research, with potential applicability to a variety of problem areas. Some see DRL as a path to artificial general intelligence, or AGI, because of how it mirrors human learning by exploring and receiving feedback from environments. Recent successes of DRL agents besting human video game players, the well-publicized defeat of a Go grandmaster at the hands of DeepMind’s AlphaGo, and demonstrations of bipedal agents learning to walk in simulation have all contributed to the general sense of enthusiasm about the field.
Unlike supervised Machine Learning, which trains models based on known-correct answers, in reinforcement learning, researchers train the model by having an agent interact with an environment. When the agent’s actions produce desired results, it gets positive feedback. For example, the agent gets a reward for scoring a point or winning a game. Put simply, researchers reinforce the agent’s good behaviors.
One of the key challenges in applying DRL to non-trivial problems is in constructing a reward function that encourages desired behaviors without undesirable side effects. When you get this wrong, all kinds of bad things can happen, including cheating behaviors. (Think of rewarding a robot maid on some visual measure of room cleanliness, just to teach the bot to sweep dirt under the furniture.)
It might be worth noting here that while deep reinforcement learning — “deep” referring to the fact that the underlying model is a deep neural network — is still a relatively new field, reinforcement learning has been around since the 1970s or earlier, depending on how you count. As Andrej Karpathy points out in his 2016 blog post, pivotal DRL research such as the AlphaGo paper and the Atari Deep Q-Learning paper are based on algorithms that have been around for a while, but with deep learning swapped in instead of other ways to approximate functions. Their use of deep learning is of course enabled by the explosion in inexpensive compute power we’ve seen over the past 20+ years.
The promise of DRL, along with Google’s 2014 acquisition of DeepMind for $500 million, has led to a number of startups hoping to capitalize on this technology. I’ve interviewed Bonsai founder Mark Hammond for the This Week in Machine Learning & AI podcast (disclosure: Bonsai is a client of mine). That company offers a development platform for applying deep reinforcement learning to a variety of industrial use cases. I spoke with University of California at Berkeley’s Pieter Abbeel on the topic as well. He’s since founded Embodied Intelligence, a still-stealthy startup looking to apply VR and DRL to robotics.
Osaro, backed by Jerry Yang, Peter Thiel, Sean Parker, and other boldface-named investors, is also looking to apply DRL in the industrial space. Meanwhile, Pit.ai is seeking to best traditional hedge funds by applying it to algorithmic trading, and DeepVu is addressing the challenge of managing complex enterprise supply chains.
As a result of increased interest in DRL, we’ve also seen the creation of new open source toolkits and environments for training DRL agents. Most of these frameworks are essentially special-purpose simulation tools or interfaces thereto. Here are some of the ones I’m tracking.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More