NashCoding Yet Another Artificial Intelligence Blog


Walkthrough of the UCB1 Optimality Proof

The Finite-time Analysis of the Multiarmed Bandit Problem paper is a seminal work in bandit research. It was the first paper to present and prove an algorithm for selecting a bandit with finite-time optimality guarantees. If you're not familiar with bandits, Wikipedia has a decent entry.

In this post, I'm going to assume you have made an attempt at reading the UCB paper and tried to understand the proof for UCB-1. My goal is to provide an in-depth explanation for how they arrive at their final finite-time optimality guarantee for the algorithm.