首页 | 本学科首页   官方微博 | 高级检索  
     

一种核的上下文多臂赌博机推荐算法
引用本文:王鼎1,门昌骞1,王文剑1,2. 一种核的上下文多臂赌博机推荐算法[J]. 智能系统学报, 2022, 17(3): 625-633. DOI: 10.11992/tis.202105039
作者姓名:王鼎1  门昌骞1  王文剑1  2
作者单位:1. 山西大学 计算机与信息技术学院,山西 太原 030006;2. 山西大学 计算智能与中文信息处理教育部重点实验室,山西 太原 030006
摘    要:个性化推荐服务在当今互联网时代越来越重要,但是传统推荐算法不适应一些高度变化场景。将线性上下文多臂赌博机算法(linear upper confidence bound, LinUCB)应用于个性化推荐可以有效改善传统推荐算法存在的问题,但遗憾的是准确率并不是很高。本文针对LinUCB算法推荐准确率不高这一问题,提出了一种改进算法K-UCB(kernel upper confidence bound)。该算法突破了LinUCB算法中不合理的线性假设前提,利用核方法拟合预测收益与上下文间的非线性关系,得到了一种新的在非线性数据下计算预测收益置信区间上界的方法,以解决推荐过程中的探索–利用困境。实验表明,本文提出的K-UCB算法相比其他基于多臂赌博机推荐算法有更高的点击率(click-through rate, CTR),能更好地适应变化场景下个性化推荐的需求。

关 键 词:个性化推荐  变化场景  多臂赌博机  线性上下文多臂赌博机  核方法  点击率  非线性  探索–利用困境

A kernel contextual bandit recommendation algorithm
WANG Ding,MEN Changqian,WANG Wenjian,. A kernel contextual bandit recommendation algorithm[J]. CAAL Transactions on Intelligent Systems, 2022, 17(3): 625-633. DOI: 10.11992/tis.202105039
Authors:WANG Ding  MEN Changqian  WANG Wenjian  
Affiliation:1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
Abstract:Personalized recommendations are becoming increasingly significant in the Internet era; however, conventional recommendation algorithms cannot adapt to the highly changing scenarios. Applying the linear contextual bandit algorithm (linear upper confidence bound, LinUCB) to personalized recommendations can effectively overcome the limitations of conventional recommendation algorithms; however, the accuracy is not sufficiently high. Herein, an improved kernel upper confidence bound (K-UCB) algorithm is proposed to handle the insufficient recommended accuracy of the LinUCB algorithm. The proposed algorithm breaks through the unreasonable linear hypothesis of the LinUCB algorithm and uses the kernel method to fit the nonlinear relation between the expected reward and context. A new method for calculating the upper confidence bound of estimate rewards under nonlinear data is established to the exploration–exploitation balance in the recommendation process. Experiments show that the proposed K-UCB algorithm exhibits higher recommended accuracy than other recommendation algorithms based on multiarmed bandits and can better adapt to the need for personalized recommendations in changing scenarios.
Keywords:personalized recommendation   changing scenarios   multi-armed bandits   linear contextual bandits   kernel method   click-through rate   nonlinear   exploration-exploitation dilemma
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号