From the course: Cognitive Technologies: The Real Opportunities for Business

Reinforcement learning

- In this lecture, we're going to talk about reinforcement learning, and Eric Nyberg is going to help. Reinforcement learning is like learning by doing or by trial and error. A bit more formally, reinforcement learning enables an agent to become proficient in an unknown environment, given only sensory input and occasional rewards or punishment; the feedback indicating whether the actions it took were desirable or not. Learning to crawl has a lot in common with reinforcement learning. A baby tries various motions; successful forward motion is its own reward. Toppling over or getting stuck are the punishments. This way, the baby learns through these reinforcements, through trial and error. Google recently acquired a company called DeepMind that developed an impressive example of reinforcement learning. They created an agent that used reinforcement learning to automatically learn how to play video games at a high level of performance. The sensory input was the image of the game on the screen, and the rewards were just the points scored. By playing the game, the system was able to eventually outperform humans. Now here's a high-level description of how reinforcement learning works. At any point, there's a set of actions that the agent can possibly take. It tries one and notes whether it gained or lost points as a result. If a sequence of actions leads to gaining or losing points, then each step in the sequence is associated with a part of the gain or loss. This can add a lot of complexity to reinforcement learning algorithms. Eric, can you explain why? - If there's a long sequence of actions that the agent has to select before it gets feedback on whether or not it's on the right track, it can take a long time to learn, whereas if it gets immediate feedback after every action it takes, it might be able to correct when it makes a wrong choice much more quickly. - Got it. So, over time, the agent learns which actions generated the best scores and begins to prefer those in order to earn the largest number of points. The key features of reinforcement learning problems are that they are closed-loop problems. There are no inputs other than what is caused by the actions that the agent takes. The agent is not told which actions to take as it is in supervised learning. Instead, it has to discover which actions yield the most reward by trying them out. The applications of reinforcement learning are interesting. A common application is controlling mechanical systems, where it has the potential to eliminate hand coding of control strategies in applications like robotics, elevators, or helicopters. We've seen experimental examples where a robot arm used reinforcement learning to learn how to flip pancakes and one where a robot was able to recover from damage by using reinforcement learning to discover new ways of walking. In some complex domains, it's the only feasible way to train a program to perform at high-level. Now Eric, where would reinforcement learning face challenges? - Well David, I think one thing to keep in mind is that reinforcement learning works great when the cost of the trial and error is fairly low, so it doesn't take very long to try something out and get feedback. And I'm not incurring a large cost in terms of dollars, but let's say, for example, I'm trying to learn how to trade on the stock market, and I'm doing that live with an agent that's learning. Every time I make a wrong decision, I might lose some money, and it might take a long time for the agent to learn how to trade, in which case I might have lost a lot of money to get there. So, I think reinforcement learning is a great choice when the time and the cost of trying something out and getting feedback is fairy small. - And, in the future, where do you see reinforcement learning going? - So, I think you pointed out the real applicability of reinforcement learning for physical control systems. Things like robots. And I think there's been a lot of success in using reinforcement learning to train a single agent to pick the right strategy in certain contexts, but I think in the future, we're going to see reinforcement learning apply to groups of agents that are maybe collaborating together in order to have a kind of joint strategy for solving a problem. Maybe we have robots and humans that are working together to manage an accident site, for example, and nuclear crisis management. In a situation like that, you'd have to have agents that learn how to work together rather than working by themselves. - Really interesting application. Thanks. So, just to sum up, reinforcement learning is just like learning by trial and error. The key applications are the control of physical systems.

Contents