共查询到19条相似文献,搜索用时 62 毫秒
1.
针对无人机(UAV)通信过程中存在的高移动性和节点异常问题,该文提出一种基于深度强化学习的无人机可信地理位置路由协议(DTGR)。引入可信第三方提供节点的信任度,使用理论与真实的时延偏差和丢包率作为信任度的评估因子,将路由选择建模为马尔可夫决策过程(MDP),基于节点信任度、地理位置和邻居拓扑信息构建状态空间,然后通过深度Q网络(DQN)输出路由决策。在奖励函数中结合信任度调整动作的价值,引导节点选择最优下一跳。仿真结果表明,在包含异常节点的无人机自组网(UANET)中,DTGR与现有方案相比具有更低的平均端到端时延和更高的包递交率。当异常节点数量或者比例变化时,DTGR能感知环境并高效智能地完成路由决策,保障网络性能。 相似文献
2.
3.
4.
针对传统深度强化学习算法难以快速解决长时序复杂任务的问题,提出了一种引入历史信息和人类知识的深度强化学习方法,对经典近端策略优化(Proximal Policy Optimization, PPO)强化学习算法进行改进,在状态空间引入历史状态以反映环境的时序变化特征,在策略模型中基于人类认知增加无效动作掩膜,禁止智能体进行无效探索,提高探索效率,从而提升模型的训练性能。仿真结果表明,所提方法能够有效解决长时序复杂任务的智能决策问题,相比传统的深度强化学习算法可显著提高模型收敛效果。 相似文献
5.
6.
7.
端到端的驾驶决策是无人驾驶领域的研究热点.本文基于DDPG(Deep Deterministic Policy Gradient)的深度强化学习算法对连续型动作输出的端到端驾驶决策展开研究.首先建立基于DDPG算法的端到端决策控制模型,模型根据连续获取的感知信息(如车辆转角,车辆速度,道路距离等)作为输入状态,输出车辆驾驶动作(加速,刹车,转向)的连续型控制量.然后在TORCS(The Open Racing Car Simulator)平台下不同的行驶环境中进行训练并验证,结果表明该模型可以实现端到端的无人驾驶决策.最后与离散型动作输出的DQN(Deep Q-learning Network)模型进行对比分析,实验结果表明DDPG决策模型具有更优越的决策控制效果. 相似文献
8.
投资组合策略问题是金融领域经久不衰的一个课题,将人工智能技术用于金融市场是信息技术时代一个重要的研究方向。目前的研究较多集中在股票的价格预测上,对于投资组合及自动化交易这类决策性问题的研究较少。文中基于深度强化学习算法,利用深度学习的BiLSTM来预测股价的涨跌,以强化学习的智能体进行观测,更好地判断当期情况,从而确定自己的交易动作;同时,利用传统的投资组合策略来建立交易的预权重,使智能体可以在自动化交易的过程中进行对比,从而不断优化自己的策略选择,生成当期时间点内最优的投资组合策略。文章选取美股的10支股票进行实验,在真实的市场模拟下表明,基于深度强化学习算法的模型累计收益率达到了86.5%,与其他基准策略相比,收益最高,风险最小,具有一定的实用价值。 相似文献
9.
10.
11.
To solve the problem that the QoS optimization schemes which based on heuristic algorithm degraded often due to the mismatch between parameters and network characteristics in software-defined networking scenarios,a software-defined networking QoS optimization algorithm based on deep reinforcement learning was proposed.Firstly,the network resources and state information were integrated into the network model,and then the flow perception capability was improved by the long short-term memory,and finally the dynamic flow scheduling strategy,which satisfied the specific QoS objectives,were generated in combination with deep reinforcement learning.The experimental results show that,compared with the existing algorithms,the proposed algorithm not only ensures the end-to-end delay and packet loss rate,but also improves the network load balancing by 22.7% and increases the throughput by 8.2%. 相似文献
12.
针对现在工程项目中测向阵列基线的布阵方法存在耗时耗力、严重依赖于技术人员测向经验且复杂测向阵列难以设计等问题,提出了一种利用深度强化学习方法实现测向阵列基线自动生成技术。基于相关干涉仪测向机制,采用深度强化学习方法构建测向布阵智能体,重点突破多场景多实体仿真建模、布阵智能体构建、测向效能评估等关键技术;利用强化学习反复试错机理,迭代优化得到符合指标的最优测向阵列,大大提高布阵效率和测向质量,并通过实验证明了该方法的有效性。采用该技术设计的阵列基线已在实际项目中进行测向试验验证,各项指标均满足实际工程应用要求。 相似文献
13.
《Mechatronics》2022
Path planning is one of the key technologies for mobile robot applications. However, the traditional robot path planner has a slow planning response, which leads to a long navigation completion time. In this paper, we propose a novel robot path planner (SOA+A2C) that produces global and local path planners with the seeker optimization algorithm (SOA) and the advantage actor-critic (A2C) algorithm, respectively. In addition, to solve the problems of poor convergence performance when training deep reinforcement learning (DRL) agents in complex path planning tasks and path redundancy when metaheuristic algorithms, such as SOA, are used for path planning, we propose the incremental map training method and path de-redundancy method. Simulation results show that first, the incremental map training method can improve the convergence performance of the DRL agent in complex path planning tasks. Second, the path de-redundancy method can effectively alleviate path redundancy without sacrificing the search capability of the metaheuristic algorithm. Third, the SOA+A2C path planner is superior to the Dijkstra & dynamic window approach (Dijkstra+DWA) and the Dijkstra & timed elastic band (Dijkstra+TEB) path planners provided by the robot operating system (ROS) in terms of path length, path planning response time, and navigation completion time. Therefore, the developed SOA+A2C path planner can serve as an effective tool for mobile robot path planning. 相似文献
14.
随着第五代通信技术(5G)的发展,各种应用场景不断涌现,而网络切片可以在通用的物理网络上构建多个逻辑独立的虚拟网络来满足移动通信网络多样化的业务需求。为了提高移动通信网络根据各切片业务量实现资源按需分配的能力,本文提出了一种基于深度强化学习的网络切片资源管理算法,该算法使用两个长短期记忆网络对无法实时到达的统计数据进行预测,并提取用户移动性导致的业务数据量动态特征,进而结合优势动作评论算法做出与切片业务需求相匹配的带宽分配决策。实验结果表明,相较于现有方法,该算法可以在保证用户时延和速率要求的同时,将频谱效率提高约7.7%。 相似文献
15.
A distributed interference coordination strategy based on multi-agent deep reinforcement learning was investigated to meet the requirements of file downloading traffic in interference networks.By the proposed strategy transmission scheme could be adjusted adaptively based on the interference environment and traffic requirements with limited amount of information exchanged among nodes.Simulation results show that the user satisfaction loss of the proposed strategy from the optimal strategy with perfect future information does not exceed 11% for arbitrary number of users and traffic requirements. 相似文献
16.
17.
In order to solve the problems that the feature data type are not rich enough in the data collection process about the vehicle-following task in marine scene which results in a long model convergence time and high training difficulty, a two-stage vehicle-following system was proposed. Firstly, semantic segmentation model predicts the number of pixels of the followed target, then the number of pixels of the followed target is mapped to the position feature. Secondly, deep reinforcement learning algorithm enables the control equipment to make decision action, to ensure that two moving objects remain within the safe distance. The experimental results show that the two-stage vehicle-following system has a 40% faster convergence rate than the model without position feature, and the following stability is significantly improved by adding the position feature. 相似文献
18.
为了提升反向散射网络中物联网设备的平均吞吐量,提出了一种资源分配机制,构建了用户配对和时隙分配联合优化资源分配模型。由于该模型直接利用深度强化学习(Deep Reinforcement Learning,DRL )算法求解导致动作空间维度较高且神经网络复杂,故将其分解为两层子问题以降低动作空间维度:首先,基于深度强化学习算法,利用历史信道信息推断当前的信道信息以进行最优的用户配对;然后,在用户固定配对的情况下,基于凸优化算法,以最大化物联网设备总吞吐量为目标进行最优的时隙分配。仿真结果表明,与其他资源分配方法相比,所提资源分配方法能有效提升系统吞吐量,且有较好的信道适应性和收敛性。 相似文献