Reinforcement learning algorithms and use cases

Important points

Reinforcement learning algorithms obtain feedback through trial and error and improve performance over time.

Types of reinforcement learning algorithms include Q-learning, SARSA, and proximity policy optimization.

Reinforcement learning algorithms can be used in real-world applications such as game testing, self-driving cars, and medical treatment planning.

Explore different types of reinforcement learning algorithms and how they work. If you want to learn more about reinforcement learning, the University of Alberta’s Reinforcement Learning Specialization will teach you how to apply AI tools to solve problems and give you hands-on experience building reinforcement learning systems.

What is reinforcement learning?

Reinforcement learning is a machine learning method in which a computer, robot, or other AI model finds the best way to accomplish a goal through trial and error, without a computer scientist or other person telling it what to do. Reinforcement learning allows AI to reflect on its own decisions, determine its own value toward achieving a goal, and ultimately find the solution with the highest or best value.

Different reinforcement learning algorithms are available that utilize different approaches to the learning process. They differ primarily in how they determine the value of actions within the decision-making process and learn through trial and error.

How do reinforcement learning algorithms work?

Reinforcement learning algorithms learn in a way that reminds us of how humans learn by trying things out and deciding whether their attempts were good or not. If you want to learn how to do something, such as how to play chess, one way to learn is to sit down at a chessboard and start playing. If you are a beginner, you will definitely make many mistakes. Every time you make a mistake, you learn what went wrong and become a stronger player in the next game. As you play the game of chess many times, you’ll begin to understand the best way to dominate your opponent, no matter what strategies they try.

Reinforcement learning works similarly. The algorithm attempts to achieve the goal and then evaluates its own performance. They adjust their decision-making processes based on the feedback they give themselves about their actions. Just as humans learn through trial and error, we use reward and punishment systems to learn the most effective ways to achieve our goals. Continuing to use chess as a real-world example of this process, Google DeepMind has developed AlphaZero, an artificial intelligence model that can play chess, shogi, and Go.

Reinforcement learning uses the Markov decision process, a sequential, mathematics-based decision-making process, to evaluate the immediate and cumulative rewards of a particular action. An AI model first explores its environment by trying different actions and considering whether the state moves toward its final goal.

By considering the immediate and long-term rewards of a particular decision, AI models can choose the most valuable solution.

Model-based and model-free

Reinforcement learning algorithms can be distinguished as model-based or model-free, which refers to whether the AI model builds an internal model of its environment. In a controlled, unchanging environment, an AI model builds a map or model of that environment to determine the best way to move through space. For example, a robot serving drinks at a restaurant might map the area to choose the best route to each table. This model allows the AI to predict the best action without having to physically move through the space first. This is a model-based reinforcement learning algorithm.

In more complex or dynamic environments, model-free agents similarly cannot build internal models and thus learn directly through trial and error. For example, self-driving cars cannot plan the space in which they will drive due to changes in other drivers, pedestrians, road conditions, and other factors. Instead, the AI should learn by trying different actions and seeing what works. In this case, learning occurs within a virtual environment, so agents are free to experiment without putting anyone at risk.

Types of reinforcement learning algorithms

Common reinforcement learning algorithms include Q-learning, SARSA, REINFORCE, PPO, TRPO, A2C, A3C, and DDPG. These algorithms differ in how they allow the main components of reinforcement learning (i.e., agent, environment, policy, and reward) to interact. Agents are AI models. The environment is everything that the AI model interacts with. Policies are the programming or instructions that an AI model has. And the reward is a score that represents the value of the action.

Each reinforcement learning algorithm has a different approach to implementing these four main components.

Q-learning or deep Q-network (DQN): Q-learning is a model-free algorithm that allows AI models to learn without prior knowledge or policy, or with the ability to deviate from policy. As a result, Q-learning algorithms can create their own set of rules to achieve a desired action by predicting the reward (Q-value) for a particular action. This allows Q-learning to be used in uncontrolled or unpredictable environments. Combining Q-learning and neural networks allows us to use the DQN algorithm.

SARSA (state, action, reward, state, action): SARSA is a model-free algorithm like Q-learning, but it learns based on the actions that are actually performed.

Enhancement: The REINFORCE algorithm is a type of policy gradient algorithm, meaning it adjusts the policy while learning to predict that certain actions will be returned. REINFORCE is considered an off-policy algorithm because it attempts to identify the best policy when operating the environment.

Actor-Critic and A2C: The actor-critic algorithm uses two neural networks. One is used as the actor who chooses the action and the other as the critic who evaluates the action. Actors follow the current policy, and critics evaluate and adjust the policy after each iteration. This architecture helps you take full advantage of both value-based and policy-based algorithms.

Trust Region Policy Optimization (TRPO): The TRPO algorithm helps solve common problems with policy gradient algorithms. In some cases, policy changes may be too large or too small to cause the program to no longer work as expected. TRPO prevents policy changes from becoming too rapid by adding constraints to policy updates at each iteration.

Deep Deterministic Policy Gradient (DDPG): DDPG is an algorithm that combines many of the qualities of the other algorithms mentioned above. This is an off-policy actor-critic model that uses a value-based critic to learn deterministic policies, or predictable policies based on input.

What are the use cases for reinforcement learning algorithms in machine learning?

Reinforcement learning can be used in a variety of applications in a variety of industries. Examples of the different ways reinforcement learning can be used include gaming, healthcare, and self-driving cars.

game: Reinforcement learning algorithms can learn how to play the game, allowing you to play against opponents who can adapt to your moves. You can also use reinforcement learning algorithms to test your games.

Self-driving car: Reinforcement learning can be used to control self-driving cars that can learn how to maneuver in complex and unpredictable environments. Reinforcement learning allows AI models to manage complex variables such as speed, multiple lanes, and other drivers.

read more: 10 machine learning algorithms you need to know

Explore free machine learning resources

Subscribe to our weekly LinkedIn newsletter, Career Chat, for the latest industry news and trends, popular certifications, and resume writing skills. Then check out our other free resources to learn more about artificial intelligence.

If you want to develop new skills, become familiar with in-demand technology, or improve your abilities, you can continue to grow with a Coursera Plus subscription. Access over 10,000 flexible courses.

Source link