Reinforcement learning is a branch of machine learning that's used with artificial intelligence. Agents are rewarded with positive values for achieving desired outcomes and prevented from undesired actions through negative values.
This article will discuss reinforcement learning, explain how it's different from other types of machine learning, and review common algorithms used in reinforcement learning. Finally, we'll explore the pros and cons of reinforcement learning and provide an example of how it's used in a vintage video arcade game.
Types of machine learning
Types of machine learning
Artificial intelligence (AI) agents can learn, but it takes time and effort. An important part of teaching AI agents involves machine learning; reinforcement learning is an important branch of machine learning that assigns positive or negative values to outcomes. The AI agent is programmed to maximize the assigned value and calculate the most effective method for a desired outcome.
Reinforcement learning is only one branch of machine learning. It requires extra work for programmers to define clear goals, as well as values for positive and negative outcomes. Once the programming is done, however, the algorithm operates independently.
Related forms of machine learning that are often confused with reinforcement learning include:
- Supervised learning: Algorithms use labeled data to achieve desired outcomes. An example is image recognition; the algorithm is only as good as the attributes of the data. Given enough information, the agent can distinguish common features of related, predefined forms.
- Semi-supervised learning: Developers use a middle-ground approach, providing the agent with a relatively small set of labeled data and a larger set of unlabeled data. The algorithm is developed to extrapolate information from the labeled data and use it to draw conclusions about the larger collection of data.
- Unsupervised learning: Algorithms are given free rein on unlabeled data to make observations about the features and draw their own conclusions.
Because reinforcement learning involves leaving the agent to its own devices once parameters have been established, it's often considered to be more like semi-supervised learning. The key difference, however, is that the level of explicit programming is higher for reinforcement learning.
Common reinforcement learning algorithms
Common reinforcement learning algorithms
Reinforcement learning isn't that much different from human learning in that it uses different approaches to achieve its goals. People generally are guided by teachers; reinforcement learning is steered by algorithms. There are a large number of algorithms that have been developed for reinforcement learning, but three of the most common include:
- State-action-reward-state-action: The algorithm provides the agent with parameters that give it the odds of specific actions resulting in positive values.
- Q-learning: Agents are free to explore their environment without policies and draw their own conclusions about its content.
- Deep-Q networks: Algorithms use neural networks as well as reinforcement learning techniques, reaching outcomes based on a random sample of previous positive values accomplished by the neural network.
Pros and cons of reinforcement learning
Pros and cons of reinforcement learning
As with any branch of artificial intelligence, there are pros and cons. Some of the advantages of reinforcement learning are obvious. Like other AI features, it can be used to solve extremely complex problems, including issues that involve decision-making, control, and optimization.
Reinforcement learning also can deal with environments where outcomes aren't always easily predicted, an especially useful feature for real-world applications, such as healthcare. It can fix mistakes that happen during its shakedown process and can be combined with other branches of machine learning to improve performance.
Of course, there are downsides. Reinforcement learning isn't terribly useful for dealing with simple problems. It requires a lot of data and can be extremely difficult to debug if and when problems occur. Finally, it depends heavily on the quality of the positive value description. If the description is designed poorly, the agent may be a failure.
Related investing topics
Pac-Man and reinforcement learning
Pac-Man and reinforcement learning
It's hard to imagine that a branch of AI can be tied to a 1980 program that’s considered to be the basis for one of the greatest video games of all time. Then again, it's hard to deny the cultural appeal of the once-ubiquitous Pac-Man. Indeed, the structure of the game is used in many university computer science syllabi to provide students with an appreciation of the abilities of reinforcement learning.
The video game's algorithm can be described as a form of reinforcement learning. The grid where the Pac-Man eats pellets while avoiding ghosts is its environment. A positive value is assigned for certain outcomes, such as finishing a level; a negative value is given for others, such as being eaten by Blinky, Pinky, Inky, or Clyde.
Given enough repetition and some help from a well-designed deep-q network algorithm, reinforcement learning can sort through an almost infinite number of pixel combinations to achieve the desired outcome -- in this case, being consumed by a red, pink, cyan, or orange blob.