在线核选择的对抗式多臂赌博机模型 Adversarial Multi-armed Bandit Model with Online Kernel Selection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

在线核选择的对抗式多臂赌博机模型

引用本文：	李峻樊,廖士中. 在线核选择的对抗式多臂赌博机模型[J]. 计算机科学, 2019, 46(1): 57-63

作者姓名：	李峻樊廖士中

作者单位：	天津大学智能与计算学部天津300350,天津大学智能与计算学部天津300350

基金项目：	本文受国家自然科学基金项目(61673293)资助

摘要：	在线核选择是在线核方法的重要工作,可分为过滤式、包裹式和嵌入式3种类型。已有在线核选择探索了包裹式方法和嵌入式方法,也经验地采用了过滤式方法,但迄今尚没有一个统一的框架来比较、分析并研究各种在线核选择问题。文中提出一种在线核选择的多臂赌博机模型,该模型可作为一个统一框架,同时给出在线核选择的包裹式方法和嵌入式方法。给定候选核集合,候选集中的一个核对应多臂赌博机模型中的一个臂,在线核选择的每回合依据一个概率分布重复地随机选择多个核,并应用指数加权的方法来更新该概率分布。这样,在线核选择问题本质上可归约为一个非遗忘对手环境下的对抗式多臂赌博机问题,并可应用对抗式多臂赌博机模型统一地给出在线核选择的包裹式方法和嵌入式方法。文中进一步提出一个新的在线核选择后悔的概念,理论证明包裹式方法具有关于回合数亚线性的弱期望后悔界,并且嵌入式方法具有关于回合数亚线性的期望后悔界。最后,在标准数据集上通过实验验证了所提统一框架的可行性。
关键词：	在线核选择对抗式多臂赌博机非遗忘对手统一框架
收稿时间：	2018-06-18
修稿时间：	2018-09-09
Adversarial Multi-armed Bandit Model with Online Kernel Selection

LI Jun-fan and LIAO Shi-zhong. Adversarial Multi-armed Bandit Model with Online Kernel Selection[J]. Computer Science, 2019, 46(1): 57-63

Authors:	LI Jun-fan and LIAO Shi-zhong

Affiliation:	College of Intelligence and Computing,Tianjin University,Tianjin 300350,China and College of Intelligence and Computing,Tianjin University,Tianjin 300350,China

Abstract:	Online kernel selection is an important component of online kernel methods,and it can be classified into three categories,that is,the filter,the wrapper and the embedder.Existing online kernel selection explores the wrapper and the embedder categories,and empirically adopts the filter approach.But there have been no unified frameworks yet for comparing,analyzing and investigating online kernel selection problems.This paper proposed a unified framework for online kernel selection researches via multi-armed bandits,which can model the wrapper and the embedder of online kernel selection simultaneously.Giving a set of candidate kernels,this paper corresponds each kernel to an arm in an adversarial bandit model.At each round of online kernel selection,this paper randomly chose multiple kernels according to a probability distribution,and updated the probability distribution via the exponentially weighted average method.In this way,an online kernel selection problem was reduced to an adversarial bandit problem in a non-oblivious adversary setting,and a unified framework was developed for online kernel selection researches,which can model the wrapper and the embedder uniformly.This paper further defined a new regret concept of online kernel selection,and proved that the wrapper within the framework enjoys a sub-linear weak expected regret bound and the embedder within the framework enjoys a sub-linear expected regret bound.Experimental results on benchmark datasets demonstrate the effectiveness of the proposed unified framework.

Keywords:	Online kernel selection Adversarial multi-armed bandit Non-oblivious adversary Unified framework

	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏