Bellman Equation

In reinforcement learning, the following identity satisfied by the optimal Q-function:

Reinforcement learning algorithms apply this identity to create Q-learning using the following update rule:

Beyond reinforcement learning, the Bellman equation has applications to dynamic programming. See the Wikipedia entry for Bellman equation.

Created for this library

1.
A logistics RL team uses the Bellman equation as the basis for value iteration when training a driver-routing policy across thousands of state-action pairs.
2.
A trading research group derives its option-exercise policy by solving the Bellman equation on a discretized state space for the underlying price.
3.
An ad-bidding platform's RL engineer uses Bellman updates inside a Q-learning loop to learn the expected long-run revenue of each bid amount in real-time auctions.

Loading…