首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Aiming at human-robot collaboration in manufacturing, the operator's safety is the primary issue during the manufacturing operations. This paper presents a deep reinforcement learning approach to realize the real-time collision-free motion planning of an industrial robot for human-robot collaboration. Firstly, the safe human-robot collaboration manufacturing problem is formulated into a Markov decision process, and the mathematical expression of the reward function design problem is given. The goal is that the robot can autonomously learn a policy to reduce the accumulated risk and assure the task completion time during human-robot collaboration. To transform our optimization object into a reward function to guide the robot to learn the expected behaviour, a reward function optimizing approach based on the deterministic policy gradient is proposed to learn a parameterized intrinsic reward function. The reward function for the agent to learn the policy is the sum of the intrinsic reward function and the extrinsic reward function. Then, a deep reinforcement learning algorithm intrinsic reward-deep deterministic policy gradient (IRDDPG), which is the combination of the DDPG algorithm and the reward function optimizing approach, is proposed to learn the expected collision avoidance policy. Finally, the proposed algorithm is tested in a simulation environment, and the results show that the industrial robot can learn the expected policy to achieve the safety assurance for industrial human-robot collaboration without missing the original target. Moreover, the reward function optimizing approach can help make up for the designed reward function and improve policy performance.  相似文献   

2.
In automotive paint shops, changes of colors between consecutive production orders cause costs for cleaning the painting robots. It is a significant task to re-sequence orders and group orders with identical color as a color batch to minimize the color changeover costs. In this paper, a Color-batching Resequencing Problem (CRP) with mix bank buffer systems is considered. We propose a Color-Histogram (CH) model to describe the CRP as a Markov decision process and a Deep Q-Network (DQN) algorithm to solve the CRP integrated with the virtual car resequencing technique. The CH model significantly reduces the number of possible actions of the DQN agent, so that the DQN algorithm can be applied to the CRP at a practical scale. A DQN agent is trained in a deep reinforcement learning environment to minimize the costs of color changeovers for the CRP. Two experiments with different assumptions on the order attribute distributions and cost metrics were conducted and evaluated. Experimental results show that the proposed approach outperformed conventional algorithms under both conditions. The proposed agent can run in real time on a regular personal computer with a GPU. Hence, the proposed approach can be readily applied in the production control of automotive paint shops to resolve order-resequencing problems.  相似文献   

3.
基于遗传算法和强化学习的贝叶斯网络结构学习算法   总被引:1,自引:0,他引:1  
遗传算法是基于自然界中生物遗传规律的适应性原则对问题解空间进行搜寻和最优化的方法。贝叶斯网络是对不确定性知识进行建模、推理的主要方法,Bayesian网中的学习问题(参数学习与结构学习)是个NP-hard问题。强化学习是利用新顺序数据来更新学习结果的在线学习方法。介绍了利用强化学习指导遗传算法,实现对贝叶斯网结构进行有效学习。  相似文献   

4.
It is argued that the backpropagation learning algorithm is unsuited to tackling real world problems such as sensory-motor coordination learning or the encoding of large amounts of background knowledge in neural networks. One difficulty in the real world - the unavailability of ‘teachers’ who already know the solution to problems, may be overcome by the use of reinforcement learning algorithms in place of backpropagation. It is suggested that the complexity of search space in real world neural network learning problems may be reduced if learning is divided into two components. One component is concerned with abstracting structure from the environment and hence with developing representations of stimuli. The other component involves associating and refining these representations on the basis of feedback from the environment. Time-dependent learning problems are also considered in this hybrid framework. Finally, an ‘open systems’ approach in which subsets of a network may adapt independently on the basis of spatio-temporal patterns is briefly discussed.  相似文献   

5.
无人机自组织网络(FANET)被广泛应用于军事、应急救灾和环境监测等情况下的网络通信服务,良好的路由协议能为其在通信条件恶劣场景下的可靠传输提供保障。利用强化学习将路由选择描述为一个马尔可夫决策过程进行路由决策成为研究热点。为了更进一步地介绍和挖掘基于强化学习的FANET路由协议研究现状,首先介绍近几年来FANET传统路由协议上的一些改进;其次,基于强化学习的FANET路由协议研究的最新调研结果进行详细的介绍;同时,对路由研究算法中的状态、动作和奖励等建模规律进行深度挖掘,从路由的优化标准和强化学习优化过程等方面进行了比较;最后,根据目前基于强化学习FANET路由协议的研究现状进行总结和展望。  相似文献   

6.
为了应对第五代无线通信网络中数据吞吐量急剧增加的问题,移动边缘缓存成为了一种有效的解决方案。它通过在边缘设备上存储网络内容,减轻回程链路和核心网络的负担,缩短服务时延。到目前为止,大多数边缘缓存研究主要在协作内容缓存的优化方面,忽略了内容传输的效率。研究超密集网络的内容协作边缘缓存与无线带宽资源的分配问题,通过余弦相似度和高斯相似度求解基站之间总的相似度,将网络中的小基站根据总相似度进行分组,把缓存和无线带宽分配问题建模成一个长期混合整数的非线性规划问题(LT-MINLP),进而将协作边缘缓存与带宽分配问题转变为一个带约束的马尔可夫决策过程,并利用深度确定性策略梯度DDPG模型,提出了一种基于深度强化学习的内容协作边缘缓存与带宽分配算法CBDDPG。提出的基站分组方案增加了基站之间文件共享的机会,提出的CBDDPG算法的缓存方案利用DDPG双网络机制能更好地捕捉用户的请求规律,优化缓存部署。将CBDDPG算法与三种基线算法(RBDDPG、LCCS和CB-TS)进行了对比实验,实验结果表明所提方案能够有效地提高内容缓存命中率,降低内容传递的时延,提升用户体验。  相似文献   

7.
为了提高移动传感器网络时延系统控制能力,提出基于强化学习的移动传感器网络时延系统控制模型,采用高阶近似微分方程构建移动传感器网络时延系统的控制目标函数,结合最大似然估计方法进行移动传感器网络的时延参数估计,采用强化学习方法进行移动传感器网络的收敛性控制和自适应调度,建立传感器网络时延系统控制的多维测度信息配准模型,在强化跟踪学习寻优模式下实现移动传感器网络时延系统的自适应控制。仿真结果表明,采用该方法进行移动传感器网络时延系统控制的自适应性较好,时延参数估计准确度较高,控制过程的鲁棒性较强。  相似文献   

8.
为了提高车联网中车辆定位的精度,提出了基于车载雷达测距信息与全球卫星导航系统(global navigation satellite system,GNSS)信息融合的车联网协同定位方法。该方法使用极大似然估计策略建立数学模型,其本质是一个非线性优化问题。将其化简为具有多个二次等式约束的二次规划问题,并给出一种半正定松弛方法,可以高效地近似求解原问题,最后通过特征值分解法进一步改进该近似解。仿真结果表明,该信息融合方法得到的协同定位比线性化加权最小二乘方法的定位精度有显著提高;且能达到基于较大数据集的BP(back propagation)神经网络定位方法的定位精度,但无须事先训练模型,可实现高精度实时定位。  相似文献   

9.
针对车联网联邦学习服务难以满足用户训练个性化模型的需求,提出一种创新性的车联网联邦学习模型定制化服务框架。该框架采用了一种融合设备贡献度和数据集相似性的联邦学习聚合算法,实现了个性化联邦学习。该算法通过不同权重分配方式和相似性计算,使得不同用户可以根据自己的需求和数据特征,选择合适的模型训练方案。该框架还提出了一种双重抽样验证方法,解决了模型性能和可信度问题;此外,利用智能合约支持数据协作,保障了数据的安全性。实验结果表明,提出算法在大多数实验场景中表现出较高的准确率,该框架可以显著提高车联网服务的个性化水平,同时保证模型的准确性和可靠性。  相似文献   

10.
李凯文  张涛  王锐  覃伟健  贺惠晖  黄鸿 《自动化学报》2021,47(11):2521-2537
组合优化问题广泛存在于国防、交通、工业、生活等各个领域, 几十年来, 传统运筹优化方法是解决组合优化问题的主要手段, 但随着实际应用中问题规模的不断扩大、求解实时性的要求越来越高, 传统运筹优化算法面临着很大的计算压力, 很难实现组合优化问题的在线求解. 近年来随着深度学习技术的迅猛发展, 深度强化学习在围棋、机器人等领域的瞩目成果显示了其强大的学习能力与序贯决策能力. 鉴于此, 近年来涌现出了多个利用深度强化学习方法解决组合优化问题的新方法, 具有求解速度快、模型泛化能力强的优势, 为组合优化问题的求解提供了一种全新的思路. 因此本文总结回顾近些年利用深度强化学习方法解决组合优化问题的相关理论方法与应用研究, 对其基本原理、相关方法、应用研究进行总结和综述, 并指出未来该方向亟待解决的若干问题.  相似文献   

11.
无线传感器网络易遭到各种内部攻击,入侵检测系统需要消耗大量能量进行攻击检测以保障网络安全。针对无线传感器网络入侵检测问题,建立恶意节点(malicious node,MN)与簇头节点(cluster head node,CHN)的攻防博弈模型,并提出一种基于强化学习的簇头入侵检测算法——带有近似策略预测的策略加权学习算法(weighted policy learner with approximate policy prediction,WPL-APP)。实验表明,簇头节点采用该算法对恶意节点进行动态检测防御,使得博弈双方快速达到演化均衡,避免了网络出现大量检测能量消耗和网络安全性能的波动。  相似文献   

12.
To benefit from the accurate simulation and high-throughput data contributed by advanced digital twin technologies in modern smart plants, the deep reinforcement learning (DRL) method is an appropriate choice to generate a self-optimizing scheduling policy. This study employs the deep Q-network (DQN), which is a successful DRL method, to solve the dynamic scheduling problem of flexible manufacturing systems (FMSs) involving shared resources, route flexibility, and stochastic arrivals of raw products. To model the system in consideration of both manufacturing efficiency and deadlock avoidance, we use a class of Petri nets combining timed-place Petri nets and a system of simple sequential processes with resources (S3PR), which is named as the timed S3PR. The dynamic scheduling problem of the timed S3PR is defined as a Markov decision process (MDP) that can be solved by the DQN. For constructing deep neural networks to approximate the DQN action-value function that maps the timed S3PR states to scheduling rewards, we innovatively employ a graph convolutional network (GCN) as the timed S3PR state approximator by proposing a novel graph convolution layer called a Petri-net convolution (PNC) layer. The PNC layer uses the input and output matrices of the timed S3PR to compute the propagation of features from places to transitions and from transitions to places, thereby reducing the number of parameters to be trained and ensuring robust convergence of the learning process. Experimental results verify that the proposed DQN with a PNC network can provide better solutions for dynamic scheduling problems in terms of manufacturing performance, computational efficiency, and adaptability compared with heuristic methods and a DQN with basic multilayer perceptrons.  相似文献   

13.
How to design System of Systems has been widely concerned in recent years, especially in military applications. This problem is also known as SoS architecting, which can be boiled down to two subproblems: selecting a number of systems from a set of candidates and specifying the tasks to be completed for each selected system. Essentially, such a problem can be reduced to a combinatorial optimization problem. Traditional exact solvers such as branch-bound algorithm are not efficient enough to deal with large scale cases. Heuristic algorithms are more scalable, but if input changes, these algorithms have to restart the searching process. Re-searching process may take a long time and interfere with the mission achievement of SoS in highly dynamic scenarios, e.g., in the Mosaic Warfare. In this paper, we combine artificial intelligence with SoS architecting and propose a deep reinforcement learning approach DRL-SoSDP for SoS design. Deep neural networks and actor–critic algorithms are used to find the optimal solution with constraints. Evaluation results show that the proposed approach is superior to heuristic algorithms in both solution quality and computation time, especially in large scale cases. DRL-SoSDP can find great solutions in a near real-time manner, showing great potential for cases that require an instant reply. DRL-SoSDP also shows good generalization ability and can find better results than heuristic algorithms even when the scale of SoS is much larger than that in training data.  相似文献   

14.
过去基于学习用户和物品的表征向量的推荐系统算法在大规模数据中取得了较好的结果。相比早期经典的基于矩阵分解(matrix factorization,MF)的推荐算法,近几年流行的基于深度学习的方法,在稀疏的数据集中具有更好的泛化能力。但许多方法只考虑了二维的评分矩阵信息,或者简单的对各种属性做嵌入表征,而忽略了各种属性之间的内部关系。异构信息网络(heterogeneous information network,HIN)相比同构网络能够存储更加丰富的语义特征。近几年结合异构信息网络与深度学习的推荐系统,通过元路径挖掘关键语义信息的方法成为研究热点。
为了更好地挖掘各种辅助信息与用户喜好的关联性,本文结合张量分解、异构信息网络与深度学习方法,提出了新的模型hin-dcf。首先,基于数据集构建特定场景的异构信息网络;对于某一元路径,根据异构图中的路径信息生成其关联性矩阵。其次,合并不同元路径的关联性矩阵后,得到包含用户、物品、元路径三个维度的张量。接着,通过经典的张量分解算法,将用户、物品、元路径映射到相同维度的隐语义向量空间中。并且将分解得到的隐语义向量作为深度神经网络的输入层的初始化。考虑到不同用户对不同元路径的关联性偏好不同,融入注意力机制,学习不同用户、物品,与不同元路径的偏好权重。在实验部分,该模型在精确度上有效提升,并且更好地应对了数据稀疏的问题。最后提出了未来可能的研究方向。  相似文献   

15.
Fog Computing (FC) based IoT applications are encountering a bottleneck in the data management and resource optimization due to the dynamic IoT topologies, resource-limited devices, resource diversity, mismatching service quality, and complicated service offering environments. Existing problems and emerging demands of FC based IoT applications are hard to be met by traditional IP-based Internet model. Therefore, in this paper, we focus on the Content-Centric Network (CCN) model to provide more efficient, flexible, and reliable data and resource management for fog-based IoT systems. We first propose a Deep Reinforcement Learning (DRL) algorithm that jointly considers the content type and status of fog servers for content-centric data and computation offloading. Then, we introduce a novel virtual layer called FogOrch that orchestrates the management and performance requirements of fog layer resources in an efficient manner via the proposed DRL agent. To show the feasibility of FogOrch, we develop a content-centric data offloading scheme (DRLOS) based on the DRL algorithm running on FogOrch. Through extensive simulations, we evaluate the performance of DRLOS in terms of total reward, computational workload, computation cost, and delay. The results show that the proposed DRLOS is superior to existing benchmark offloading schemes.  相似文献   

16.
本文针对动态流水车间调度问题(DFSP), 以最小化最大完工时间为优化目标, 提出一种自适应深度强化学习算法(ADRLA)进行求解. 首先, 将DFSP的新工件动态到达过程模拟为泊松过程, 进而采用马尔科夫决策过程(MDP)对DFSP的求解过程进行描述, 将DFSP转化为可由强化学习求解的序贯决策问题. 然后, 根据DFSP的排序模型特点, 设计具有较好状态特征区分度和泛化性的状态特征向量, 并依此提出5种特定动作(即调度规则)来选择当前需加工的工件, 同时构造基于问题特性的奖励函数以获取动作执行效果的评价值(即奖励值), 从而确定ADRLA的3类基本要素. 进而, 以深度双Q网络(DDQN) 作为ADRLA中的智能体, 用于进行调度决策. 该智能体采用由少量小规模DFSP确定的数据集(即3类基本要素在不同问题上的数据)训练后, 可较准确刻画不同规模DFSP的状态特征向量与Q值向量(由各动作的Q值组成)间的非线性关系, 从而能对各种规模DFSP进行自适应实时调度. 最后, 通过在不同测试问题上的仿真实验和与算法比较, 验证了所提ADRLA求解DFSP的有效性和实时性.  相似文献   

17.
针对车联网中边缘节点的可信性无法保证的问题,提出了一种基于声誉的车联网可信任务卸载模型,用记录在区块链上的边缘节点声誉来评估其可信度,从而帮助终端设备选取可靠的边缘节点进行任务卸载。同时,将卸载策略建模为声誉约束下的时延和能耗最小化问题,采用多智能体深度确定性策略梯度算法来求解该NP-hard问题的近似最优解,边缘服务器依据任务卸载的完成情况获得奖励,然后据此更新记录在区块链上的声誉。仿真实验表明,与基准测试方案相比,该算法在时延和能耗方面降低了25.58%~27.44%。  相似文献   

18.
In the Internet of Things (IoT), a huge amount of valuable data is generated by various IoT applications. As the IoT technologies become more complex, the attack methods are more diversified and can cause serious damages. Thus, establishing a secure IoT network based on user trust evaluation to defend against security threats and ensure the reliability of data source of collected data have become urgent issues, in this paper, a Data Fusion and transfer learning empowered granular Trust Evaluation mechanism (DFTE) is proposed to address the above challenges. Specifically, to meet the granularity demands of trust evaluation, time–space empowered fine/coarse grained trust evaluation models are built utilizing deep transfer learning algorithms based on data fusion. Moreover, to prevent privacy leakage and task sabotage, a dynamic reward and punishment mechanism is developed to encourage honest users by dynamically adjusting the scale of reward or punishment and accurately evaluating users’ trusts. The extensive experiments show that: (i) the proposed DFTE achieves high accuracy of trust evaluation under different granular demands through efficient data fusion; (ii) DFTE performs excellently in participation rate and data reliability.  相似文献   

19.
随着智能电网的不断发展,电力服务种类的多样化引出了不同的服务需求.5G中的网络切片技术,可以为智能电网提供虚拟化无线专用网络,以应对智能电网安全性、可靠性、时延性等方面的诸多挑战.考虑到智能电网的差异化服务特性,本文旨在使用深度强化学习(DRL)来解决智能电网的无线接入网(RAN)切片的资源分配问题.文章首先回顾了智能电网的背景以及网络切片技术的相关研究,随后分析了智能电网的RAN切片模型,并且提出了一种基于DRL的切片分配策略.仿真表明,本文所提出的算法能够在降低成本的同时,最大限度地满足智能电网在RAN侧的资源分配需求.  相似文献   

20.
针对城市车联网中出现的基站覆盖空洞及局部流量过载等问题,提出了一种基于车辆轨迹预测信息的动态预部署方案。首先,为了训练得到统一的seq2seq-GRU轨迹预测模型,多个携带边缘计算服务器的无人机在分布式联邦学习与区块链的架构下,去除中心聚合节点,采取改进的Raft算法,在每轮训练中根据贡献数据量的大小,选举得到节点来完成参数聚合及模型更新任务。其次,基于模型预测结果,提出了一种改进的虚拟力向导部署算法,通过各虚拟力来引导无人机进行动态地部署以提升车辆的接入率及通信质量。仿真结果表明,提出的训练架构能够加速模型的训练,部署算法在提升车辆接入率的同时提升了车辆与无人机之间的通信质量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号