What Is Reinforcement Learning?

By Motley Fool Staff – Updated Nov 21, 2024 at 10:16AM

Key Points

Reinforcement learning focuses on rewarding desired AI actions and punishing undesired ones.
Common RL algorithms include State-action-reward-state-action, Q-learning, and Deep-Q networks.
RL adapts to complex, unpredictable environments but is less useful for simple, clear-cut problems.
Investor Alert: Our 10 best stocks to buy right now ›

Key findings are powered by ChatGPT and based solely off the content from this article. Findings are reviewed by our editorial team. The author and editors take ultimate responsibility for the content.

Reinforcement learning is a branch of machine learning that's used with artificial intelligence. Agents are rewarded with positive values for achieving desired outcomes and prevented from undesired actions through negative values.

This article will discuss reinforcement learning, explain how it's different from other types of machine learning, and review common algorithms used in reinforcement learning. Finally, we'll explore the pros and cons of reinforcement learning and provide an example of how it's used in a vintage video arcade game.

Image source: Getty Images.

Types of machine learning

Artificial intelligence (AI) agents can learn, but it takes time and effort. An important part of teaching AI agents involves machine learning; reinforcement learning is an important branch of machine learning that assigns positive or negative values to outcomes. The AI agent is programmed to maximize the assigned value and calculate the most effective method for a desired outcome.

Reinforcement learning is only one branch of machine learning. It requires extra work for programmers to define clear goals, as well as values for positive and negative outcomes. Once the programming is done, however, the algorithm operates independently.

Related forms of machine learning that are often confused with reinforcement learning include:

Supervised learning: Algorithms use labeled data to achieve desired outcomes. An example is image recognition; the algorithm is only as good as the attributes of the data. Given enough information, the agent can distinguish common features of related, predefined forms.
Semi-supervised learning: Developers use a middle-ground approach, providing the agent with a relatively small set of labeled data and a larger set of unlabeled data. The algorithm is developed to extrapolate information from the labeled data and use it to draw conclusions about the larger collection of data.
Unsupervised learning: Algorithms are given free rein on unlabeled data to make observations about the features and draw their own conclusions.

Because reinforcement learning involves leaving the agent to its own devices once parameters have been established, it's often considered to be more like semi-supervised learning. The key difference, however, is that the level of explicit programming is higher for reinforcement learning.

Common reinforcement learning algorithms

Reinforcement learning isn't that much different from human learning in that it uses different approaches to achieve its goals. People generally are guided by teachers; reinforcement learning is steered by algorithms. There are a large number of algorithms that have been developed for reinforcement learning, but three of the most common include:

State-action-reward-state-action: The algorithm provides the agent with parameters that give it the odds of specific actions resulting in positive values.
Q-learning: Agents are free to explore their environment without policies and draw their own conclusions about its content.
Deep-Q networks: Algorithms use neural networks as well as reinforcement learning techniques, reaching outcomes based on a random sample of previous positive values accomplished by the neural network.

Deep Learning

Deep learning, an advanced branch of AI, finds patterns in data, aiding predictions.

Pros and cons of reinforcement learning

As with any branch of artificial intelligence, there are pros and cons. Some of the advantages of reinforcement learning are obvious. Like other AI features, it can be used to solve extremely complex problems, including issues that involve decision-making, control, and optimization.

Reinforcement learning also can deal with environments where outcomes aren't always easily predicted, an especially useful feature for real-world applications, such as healthcare. It can fix mistakes that happen during its shakedown process and can be combined with other branches of machine learning to improve performance.

Of course, there are downsides. Reinforcement learning isn't terribly useful for dealing with simple problems. It requires a lot of data and can be extremely difficult to debug if and when problems occur. Finally, it depends heavily on the quality of the positive value description. If the description is designed poorly, the agent may be a failure.

Pac-Man and reinforcement learning

It's hard to imagine that a branch of AI can be tied to a 1980 program that’s considered to be the basis for one of the greatest video games of all time. Then again, it's hard to deny the cultural appeal of the once-ubiquitous Pac-Man. Indeed, the structure of the game is used in many university computer science syllabi to provide students with an appreciation of the abilities of reinforcement learning.

The video game's algorithm can be described as a form of reinforcement learning. The grid where the Pac-Man eats pellets while avoiding ghosts is its environment. A positive value is assigned for certain outcomes, such as finishing a level; a negative value is given for others, such as being eaten by Blinky, Pinky, Inky, or Clyde.

Given enough repetition and some help from a well-designed deep-q network algorithm, reinforcement learning can sort through an almost infinite number of pixel combinations to achieve the desired outcome -- in this case, being consumed by a red, pink, cyan, or orange blob.

The Motley Fool has a disclosure policy.