首页 | 本学科首页   官方微博 | 高级检索  
     

基于近端策略优化的RFID室内定位算法
引用本文:李丽,郑嘉利,罗文聪,全艺璇.基于近端策略优化的RFID室内定位算法[J].计算机科学,2021,48(4):274-281.
作者姓名:李丽  郑嘉利  罗文聪  全艺璇
作者单位:广西大学计算机与电子信息学院 南宁 530004;广西多媒体通信与网络技术重点实验室 南宁 530004
基金项目:广西自然科学基金;国家自然科学基金
摘    要:针对在动态射频识别(Radio Frequency Identification,RFID)室内定位环境中,传统的室内定位模型会随着定位目标数量的增加而导致定位误差增大、计算复杂度上升的问题,文中提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)的RFID室内定位算法。该算法将室内定位过程看作马尔可夫决策过程,首先将动作评价与随机动作相结合,然后进一步最大化动作回报值,最后选择最优坐标值。其同时引入剪切概率比,首先将动作限制在一定范围内,交替使用采样后与采样前的新旧动作,然后使用随机梯度对多个时期的动作策略进行小批量更新,并使用评价网络对动作进行评估,最后通过训练得到PPO定位模型。该算法在有效减少定位误差、提高定位效率的同时,具备更快的收敛速度,特别是在处理大量定位目标时,可大大降低计算复杂度。实验结果表明,本文提出的算法与其他的RFID室内定位算法(如Twin Delayed Deep Deterministic Policy Gradient(TD3),Deep Deterministic Policy Gradient(DDPG),Actor Critic using Kronecker-Factored Trust Region(ACKTR))相比,定位平均误差分别下降了36.361%,30.696%,28.167%,定位稳定性分别提高了46.691%,34.926%,16.911%,计算复杂度分别降低了84.782%7,70.213%,63.158%。

关 键 词:RFID  室内定位  深度强化学习  剪切概率比

RFID Indoor Positioning Algorithm Based on Proximal Policy Optimization
LI Li,ZHENG Jia-li,LUO Wen-cong,QUAN Yi-xuan.RFID Indoor Positioning Algorithm Based on Proximal Policy Optimization[J].Computer Science,2021,48(4):274-281.
Authors:LI Li  ZHENG Jia-li  LUO Wen-cong  QUAN Yi-xuan
Affiliation:(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China;Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)
Abstract:In the Radio Frequency Identification(RFID)dynamic indoor positioning environment,the positioning error and the computing complexity of traditional indoor positioning model will increase with the increase of the number of positioning targets.This paper proposes an RFID positioning algorithm based on Proximal Policy Optimization(PPO),which regards the positioning as Markov decision-making process.Firstly,the action evalution is combined with random action and the return of the action is then maximized to select the best coordinate value.Meanwhile,under the premise of limiting the action to a certain range,the algorithm introduces clipped probability ratios,using post-sample and pre-sample action alternatesly,then,with stochastic gradient ascent updates multiple epochs policy of minibatch and with the critic network evaluate the action.Finally,the PPO positioning model is obtained by training.This method can effectively reduce the positioning error and improve the positioning efficiency.At the same time,it has a faster convergence speed,especially when dealing with a large number of positioning targets,it can greatly reduce the computational complexity.Experiment results show that,compared with other RFID indoor positioning algorithms,such as Twin Delayed Deep Deterministic policy gradient(TD3),Deep Deterministic Policy Gradient(DDPG)and actor-critic using Kronecker-Factored Trust Region(ACKTR),the average positioning error of the proposed method decreases respectively by 36.361%,30.696%and 28.167%,the positioning stability improves by 46.691%,34.926%and 16.911%,and the computing complexity decreases respectively by 84.782%,70.213%and 63.158%.
Keywords:RFID  Indoor positioning  Deep reinforcement learning  Clipped probability ratios
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号