Return

In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode. The agent accounts for the delayed nature of expected rewards by discounting rewards according to the state transitions required to obtain the reward.

Therefore, if the discount factor is , and denote the rewards until the end of the episode, then the return calculation is as follows:

Real-world uses

Created for this library

1.
An RL team in logistics monitors cumulative return per episode to track learning progress across training checkpoints.
2.
A trading research team uses cumulative return as the headline metric for its RL policy across episodes.
3.
An ad-bidding team uses return on bid as the RL agent's optimization target across logged episodes.

Back to glossary

Return

Real-world uses

Related terms

Loading…

Return

Real-world uses

Related terms