Epsilon Greedy Policy

In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise. For example, if epsilon is 0.9, then the policy follows a random policy 90% of the time and a greedy policy 10% of the time.

Over successive episodes, the algorithm reduces epsilon's value in order to shift from following a random policy to following a greedy policy. By shifting the policy, the agent first randomly explores the environment and then greedily exploits the results of random exploration.

Real-world uses

Created for this library

1.
A logistics dispatcher uses an epsilon-greedy policy in simulation so the RL agent explores alternative routes before fully exploiting its current best estimate.
2.
An ad-bidding team uses an epsilon-greedy policy with a small epsilon in live experiments so the system keeps learning while still exploiting the best known action.
3.
A recommendation team uses an epsilon-greedy policy to inject a small share of exploration when serving new content to learn its true performance.

Back to glossary

Epsilon Greedy Policy

Real-world uses

Related terms

Loading…

Epsilon Greedy Policy

Real-world uses

Related terms