首页 | 本学科首页   官方微博 | 高级检索  
     

融合序列模式评分的策略梯度推荐算法
引用本文:官蕊,丁家满.融合序列模式评分的策略梯度推荐算法[J].计算机应用与软件,2022,39(3):223-228.
作者姓名:官蕊  丁家满
作者单位:昆明理工大学信息工程与自动化学院 云南 昆明650500,云南省人工智能重点实验室 云南 昆明650500
基金项目:国家自然科学基金项目(61562054);
摘    要:推荐算法在一定程度上解决了信息过载问题,但传统推荐模型在挖掘数据特性方面有待改进。为此,结合强化学习方法提出一种融合序列模式评分的策略梯度推荐算法。将推荐过程建模为马尔可夫决策过程;分析推荐基础数据特性模式,设计以序列模式评分为奖励的反馈函数,在算法的每一次迭代过程中学习;通过对累积奖励设计标准化操作来降低策略梯度的方差。将该方法应用到电影推荐中进行验证,结果表明所提方法具有较好的推荐准确性。

关 键 词:强化学习  马尔可夫决策过程  策略梯度  序列模式

POLICY GRADIENT RECOMMENDATION ALGORITHM COMBINING SEQUENCE PATTERN RATING
Guan Rui,Ding Jiaman.POLICY GRADIENT RECOMMENDATION ALGORITHM COMBINING SEQUENCE PATTERN RATING[J].Computer Applications and Software,2022,39(3):223-228.
Authors:Guan Rui  Ding Jiaman
Affiliation:(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Artificial Intelligence Key Laboratory of Yunnan Province,Kunming 650500,Yunnan,China)
Abstract:The recommendation algorithm solves the problem of information overload to a certain extent,but the traditional recommendation model needs to be improved in mining the characteristics of the data.For these problems,we propose a policy gradient recommendation algorithm combining sequence pattern rating based on the reinforcement learning method.The recommendation process was modeled as Markov decision process;the characteristic pattern of recommended basic data was analyzed and a feedback function was designed with sequential pattern rating as reward to learn in each iteration of the algorithm;the variance of the policy gradient was reduced by designing a standardized operation on the cumulative reward;The method was applied to movie recommendation for verification,the experimental results showed that the proposed method has a good recommendation accuracy.
Keywords:Reinforcement learning  Markov decision process  Policy gradient  Sequence pattern
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号