The Confidence Bound Target (CBT) algorithm is designed for infinite arms bandit problem. It is shown that CBT algorithm achieves the regret lower bound for general reward distributions. Reference: Hock Peng Chan and Shouri Hu (2018) <arXiv:1805.11793>.
Label | Latest Version |
---|---|
main | 1.0 |