Gradient Bounded Dynamic Programming with Submodular and Concave-Extensible Value Functions

D. Lebedev, P. J. Goulart and K. Margellos

in IFAC World Congress, Berlin, Germany, July 2020.
BibTeX  Preprint 

@inproceedings{LGM:2020,
  author = {D. Lebedev and P. J. Goulart and K. Margellos},
  title = {Gradient Bounded Dynamic Programming with Submodular and Concave-Extensible Value Functions},
  booktitle = {IFAC World Congress},
  year = {2020}
}

We consider dynamic programming problems with finite, discrete-time horizons and prohibitively high-dimensional, discrete state-spaces for direct computation of the value function from the Bellman equation. For the case that the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds of the value function similar to dual dynamic programming. We then show that the proposed algorithm terminates after a finite number of iterations. Finally, we demonstrate the efficacy of our approach on a high-dimensional numerical example from delivery slot pricing in attended home delivery.

Gradient-Bounded Dynamic Programming for Submodular and Concave Extensible Value Functions with Probabilistic Performance Guarantees

D. Lebedev, P. J. Goulart and K. Margellos

June 2020.
BibTeX  Preprint 

@article{LGM:2020b,
  author = {D. Lebedev and P. J. Goulart and K. Margellos},
  title = {Gradient-Bounded Dynamic Programming for Submodular and Concave Extensible Value Functions with Probabilistic Performance Guarantees},
  year = {2020}
}

We consider stochastic dynamic programming problems with high-dimensional, discrete state-spaces and finite, discrete-time horizons that prohibit direct computation of the value function from a given Bellman equation for all states and time steps due to the “curse of dimensionality”. For the case where the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds of the value function in the realm of dual dynamic programming. We show that the proposed algorithm terminates after a finite number of iterations. Furthermore, we derive probabilistic guarantees on the value accumulated under the associated policy for a single realisation of the dynamic program and for the expectation of this value. Finally, we demonstrate the efficacy of our approach on a high-dimensional numerical example from delivery slot pricing in attended home delivery.

Dynamic Programming for Optimal Delivery Time Slot Pricing

D. Lebedev, P. J. Goulart and K. Margellos

European Journal of Operational Research, 2020. To appear.
BibTeX  URL  Preprint 

@article{LGM:2020c,
  author = {D. Lebedev and P. J. Goulart and K. Margellos},
  title = {Dynamic Programming for Optimal Delivery Time Slot Pricing},
  journal = {European Journal of Operational Research},
  year = {2020},
  note = {To appear},
  url = {https://doi.org/10.1016/j.ejor.2020.11.010},
  doi = {10.1016/j.ejor.2020.11.010}
}

We study the dynamic programming approach to revenue management in the context of attended home delivery. We draw on results from dynamic programming theory for Markov decision problems to show that the underlying Bellman operator has a unique fixed point. We then provide a closed-form expression for the resulting fixed point and show that it admits a natural interpretation. Moreover, we also show that – under certain technical assumptions – the value function, which has a discrete domain and a continuous codomain, admits a continuous extension, which is a finite-valued, concave function of its state variables, at every time step. This result opens the road for achieving scalable implementations of the proposed formulation in future work, as it allows making informed choices of basis functions in an approximate dynamic programming context. We illustrate our findings on a simple numerical example and provide suggestions on how our results can be exploited to obtain closer approximations of the exact value function.