Glossary term
Glossary term
Agentic Systems
In reinforcement learning, a policy that either follows a random policy with epsilon probability or a greedy policy otherwise. For example, if epsilon is 0.9, then the policy follows a random policy 90% of the time and a greedy policy 10% of the time.
Over successive episodes, the algorithm reduces epsilon's value in order to shift from following a random policy to following a greedy policy. By shifting the policy, the agent first randomly explores the environment and then greedily exploits the results of random exploration.
Created for this library
A logistics dispatcher uses an epsilon-greedy policy in simulation so the RL agent explores alternative routes before fully exploiting its current best estimate.
An ad-bidding team uses an epsilon-greedy policy with a small epsilon in live experiments so the system keeps learning while still exploiting the best known action.
A recommendation team uses an epsilon-greedy policy to inject a small share of exploration when serving new content to learn its true performance.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License