Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization |
| |
Authors: | Feidiao YANG Wei CHEN Jialin ZHANG Xiaoming SUN |
| |
Affiliation: | 1. Institute of Computing Technology, Chinese Academy of Sciences, Bejing 100190, China2. University of Chinese Academy of Sciences, Beijing 100190, China3. Microsoft Research Asia, Beijing 100080, China |
| |
Abstract: | Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Followthe-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1+ ε)-scaled regret of order O() for any small ε>0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which Tis the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm. |
| |
Keywords: | online learning online combinatorial optimization semi-bandit follow-the-perturbed-leader |
|
| 点击此处可从《Frontiers of Computer Science》浏览原始摘要信息 |
|
点击此处可从《Frontiers of Computer Science》下载全文 |
|