Glossary term
Glossary term
Agentic Systems
In reinforcement learning, a policy that chooses an action at random.
Created for this library
An RL team uses a random policy as the baseline against which all learned policies must demonstrate improvement.
A logistics RL team uses a random policy in early experiments to estimate worst-case bounds before training a learned dispatcher.
A research team uses a random policy as an exploration baseline at the start of training before the agent learns useful action values.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License