학과 세미나가 다음과 같이 진행될 예정입니다.
1. 일시: 1월 10일(목) 오후 4시
2. 장소: 산업경영학동 2층 멀티미디어실 2501호
3. 주제: Thompson Sampling with Information Relaxation Penalties
4. 연사: 민승기((Columbia Business School, Decision Risk Operations Division)
(본 세미나는 영어로 진행될 예정입니다.)
많은 관심과 참석 부탁드립니다.
Dear ISysE Professors and Students,
ISysE dept. invites you to attend the following seminar.
1. Speaker: Seungki Min(Columbia Business School, Decision Risk Operations Division)
2. Seminar Title: Thompson Sampling with Information Relaxation Penalties
3. Date & time: 2019 Jan 10(Thu), 16:00
4. Venue: E2 Bldg., Multimedia Room (#2501)
5. Language: English
We look forward to your attendance and encourage you to forward this invitation to colleagues and friends who may be interested in the topic.
With best regards,
Title: Thompson Sampling with Information Relaxation Penalties
We consider a finite time horizon multi-armed bandit (MAB) problem in a Bayesian framework, for which we develop a general set of control policies that leverage ideas from information relaxations of stochastic dynamic optimization problems. In crude terms, an information relaxation allows the decision maker (DM) to have access to the future (unknown) rewards and incorporate them in her optimization problem to pick an action at time t, but penalizes the decision maker for using this information. In our setting, the future rewards allow the DM to better estimate the unknown mean reward parameters of the multiple arms, and optimize her sequence of actions. By picking different information penalties, the DM can construct a family of policies of increasing complexity that, for example, include Thompson Sampling (TS) and the true optimal (but intractable) policy as special cases.
We systematically develop this framework of information relaxation sampling, propose an intuitive family of control policies for our motivating finite time horizon Bayesian MAB problem, and prove associated structural results and performance bounds. Numerical experiments suggest that this new class of policies performs well, in particular in settings where the finite time horizon intorduces significant tension in the problem. Finally, inspired by the finite time horizon Gittins index, we propose an index policy that builds on our framework that particularly outperforms to the state-of-the-art algorithms in our numerical experiments.