基于策略梯度算法的工作量证明中挖矿困境研究 Research on proof of work mining dilemma based on policy gradient algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于策略梯度算法的工作量证明中挖矿困境研究

引用本文：	王甜甜,于双元,徐保民.基于策略梯度算法的工作量证明中挖矿困境研究[J].计算机应用,2019,39(5):1336-1342.

作者姓名：	王甜甜于双元徐保民

作者单位：	北京交通大学计算机与信息技术学院,北京,100044;北京交通大学计算机与信息技术学院,北京,100044;北京交通大学计算机与信息技术学院,北京,100044

基金项目：	国家自然科学基金资助项目（61572005）；河北省高等教育科技研究重点项目（ZD2017304）。

摘要：	针对区块链中工作量证明(PoW)共识机制下区块截留攻击导致的挖矿困境问题,将矿池间的博弈行为视作迭代的囚徒困境(IPD)模型,采用深度强化学习的策略梯度算法研究IPD的策略选择。利用该算法将每个矿池视为独立的智能体(Agent),将矿工的潜入率量化为强化学习中的行为分布,通过策略梯度算法中的策略网络对Agent的行为进行预测和优化,最大化矿工的人均收益,并通过模拟实验验证了策略梯度算法的有效性。实验发现,前期矿池处于相互攻击状态,平均收益小于1,出现了纳什均衡的问题;经过policy gradient算法的自我调整后,矿池由相互攻击转变为相互合作,每个矿池的潜入率趋于0,人均收益趋于1。实验结果表明,policy gradient算法可以解决挖矿困境的纳什均衡问题,最大化矿池人均收益。
关键词：	区块链工作量证明机制博弈论深度强化学习策略梯度算法
收稿时间：	2018-11-01
修稿时间：	2018-12-30
Research on proof of work mining dilemma based on policy gradient algorithm

WANG Tiantian,YU Shuangyuan,XU Baomin.Research on proof of work mining dilemma based on policy gradient algorithm[J].journal of Computer Applications,2019,39(5):1336-1342.

Authors:	WANG Tiantian YU Shuangyuan XU Baomin

Affiliation:	School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Abstract:	In view of the mining dilemma problem caused by block withholding attack under Proof of Work (PoW) consensus mechanism in the blockchain, the game behavior between mining pools was regarded as an Iterative Prisoner's Dilemma (IPD) model and the policy gradient algorithm of deep reinforcement learning was used to study IPD's strategy choices. Each mining pool was considered as an independent Agent and the miner's infiltration rate was quantified as a behavior distribution in reinforcement learning. The policy network in the policy gradient was used to predict and optimize the Agent's behavior in order to maximize miners' average revenues. And the effectiveness of the policy gradient algorithm was validated through simulation experiments. Experimental results show that the mining pools attack each other at the beginning with miners' average revenue less than 1, which causes Nash equilibrium problem. After self-adjustment by the policy gradient algorithm, the relationship between the mining pools transforms from mutual attack to mutual cooperation with infiltration rate of each mining pool tending to zero and miners' average revenue tending to 1. The results show that the policy gradient algorithm can solve the Nash equilibrium problem of mining dilemma and maximize the miners' average revenue.

Keywords:	blockchain Proof of Work (PoW) game deep reinforcement learning policy gradient algorithm
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏