共查询到20条相似文献,搜索用时 0 毫秒
1.
In the Internet of Things (IoT), a huge amount of valuable data is generated by various IoT applications. As the IoT technologies become more complex, the attack methods are more diversified and can cause serious damages. Thus, establishing a secure IoT network based on user trust evaluation to defend against security threats and ensure the reliability of data source of collected data have become urgent issues, in this paper, a Data Fusion and transfer learning empowered granular Trust Evaluation mechanism (DFTE) is proposed to address the above challenges. Specifically, to meet the granularity demands of trust evaluation, time–space empowered fine/coarse grained trust evaluation models are built utilizing deep transfer learning algorithms based on data fusion. Moreover, to prevent privacy leakage and task sabotage, a dynamic reward and punishment mechanism is developed to encourage honest users by dynamically adjusting the scale of reward or punishment and accurately evaluating users’ trusts. The extensive experiments show that: (i) the proposed DFTE achieves high accuracy of trust evaluation under different granular demands through efficient data fusion; (ii) DFTE performs excellently in participation rate and data reliability. 相似文献
2.
针对海上无人救援过程中遇险目标的漂移及如何快速靠近的问题,提出一种基于深度强化学习理论的目标追踪算法,使无人搜救船在与环境交互的过程中学习到自主驾驶追踪漂移遇险目标的最优驾驶决策.在SART的辅助下,通过自主学习能够使搜救船以最短的时间追踪到漂移遇险目标.在Gazebo物理仿真器中建立三维仿真环境,基于ROS系统分别设... 相似文献
3.
作为一种崭新的机器学习方法,深度强化学习将深度学习和强化学习技术结合起来,使智能体能够从高维空间感知信息,并根据得到的信息训练模型、做出决策。由于深度强化学习算法具有通用性和有效性,人们对其进行了广泛的研究,并将其运用到了日常生活的各个领域。首先,对深度强化学习研究进行概述,介绍了深度强化学习的基础理论;然后,分别介绍了基于值函数和基于策略的深度强化学习算法,讨论了其应用前景;最后,对相关研究工作做了总结和展望。 相似文献
4.
针对城市车联网中出现的基站覆盖空洞及局部流量过载等问题,提出了一种基于车辆轨迹预测信息的动态预部署方案。首先,为了训练得到统一的seq2seq-GRU轨迹预测模型,多个携带边缘计算服务器的无人机在分布式联邦学习与区块链的架构下,去除中心聚合节点,采取改进的Raft算法,在每轮训练中根据贡献数据量的大小,选举得到节点来完成参数聚合及模型更新任务。其次,基于模型预测结果,提出了一种改进的虚拟力向导部署算法,通过各虚拟力来引导无人机进行动态地部署以提升车辆的接入率及通信质量。仿真结果表明,提出的训练架构能够加速模型的训练,部署算法在提升车辆接入率的同时提升了车辆与无人机之间的通信质量。 相似文献
5.
组合优化问题广泛存在于国防、交通、工业、生活等各个领域, 几十年来, 传统运筹优化方法是解决组合优化问题的主要手段, 但随着实际应用中问题规模的不断扩大、求解实时性的要求越来越高, 传统运筹优化算法面临着很大的计算压力, 很难实现组合优化问题的在线求解. 近年来随着深度学习技术的迅猛发展, 深度强化学习在围棋、机器人等领域的瞩目成果显示了其强大的学习能力与序贯决策能力. 鉴于此, 近年来涌现出了多个利用深度强化学习方法解决组合优化问题的新方法, 具有求解速度快、模型泛化能力强的优势, 为组合优化问题的求解提供了一种全新的思路. 因此本文总结回顾近些年利用深度强化学习方法解决组合优化问题的相关理论方法与应用研究, 对其基本原理、相关方法、应用研究进行总结和综述, 并指出未来该方向亟待解决的若干问题. 相似文献
6.
7.
有容量车辆路径问题是组合优化问题中比较热门的问题, 它属于经典的NP-hard问题并且时间复杂度高.本文提出了一种基于策略梯度的超启发算法, 将强化学习中的确定性策略梯度算法引入到超启发算法的高层策略中的底层算法选择策略, 确定性策略梯度算法采用Actor-Critic框架, 另外为了能够在后续计算和神经网络参数更新中引用历史经验数据, 在确定性策略梯度算法中设计了经验池用于存储状态转移数据. 在超启发算法解的接受准则方面, 文中通过实验对比了3种接受准则的效果, 最终选择了自适应接受准则作为高层策略中解的接受准则. 通过对有容量车辆路径问题标准算例的计算, 并将求解结果与其他算法对比, 验证了所提算法在该问题求解上的有效性和稳定性. 相似文献
8.
9.
交通信号的智能控制是智能交通研究中的热点问题。为更加及时有效地自适应协调交通,文中提出了一种基于分布式深度强化学习的交通信号控制模型,采用深度神经网络框架,利用目标网络、双Q网络、价值分布提升模型表现。将交叉路口的高维实时交通信息离散化建模并与相应车道上的等待时间、队列长度、延迟时间、相位信息等整合作为状态输入,在对相位序列及动作、奖励做出恰当定义的基础上,在线学习交通信号的控制策略,实现交通信号Agent的自适应控制。为验证所提算法,在SUMO(Simulation of Urban Mobility)中相同设置下,将其与3种典型的深度强化学习算法进行对比。实验结果表明,基于分布式的深度强化学习算法在交通信号Agent的控制中具有更好的效率和鲁棒性,且在交叉路口车辆的平均延迟、行驶时间、队列长度、等待时间等方面具有更好的性能表现。 相似文献
10.
To benefit from the accurate simulation and high-throughput data contributed by advanced digital twin technologies in modern smart plants, the deep reinforcement learning (DRL) method is an appropriate choice to generate a self-optimizing scheduling policy. This study employs the deep Q-network (DQN), which is a successful DRL method, to solve the dynamic scheduling problem of flexible manufacturing systems (FMSs) involving shared resources, route flexibility, and stochastic arrivals of raw products. To model the system in consideration of both manufacturing efficiency and deadlock avoidance, we use a class of Petri nets combining timed-place Petri nets and a system of simple sequential processes with resources (S3PR), which is named as the timed S3PR. The dynamic scheduling problem of the timed S3PR is defined as a Markov decision process (MDP) that can be solved by the DQN. For constructing deep neural networks to approximate the DQN action-value function that maps the timed S3PR states to scheduling rewards, we innovatively employ a graph convolutional network (GCN) as the timed S3PR state approximator by proposing a novel graph convolution layer called a Petri-net convolution (PNC) layer. The PNC layer uses the input and output matrices of the timed S3PR to compute the propagation of features from places to transitions and from transitions to places, thereby reducing the number of parameters to be trained and ensuring robust convergence of the learning process. Experimental results verify that the proposed DQN with a PNC network can provide better solutions for dynamic scheduling problems in terms of manufacturing performance, computational efficiency, and adaptability compared with heuristic methods and a DQN with basic multilayer perceptrons. 相似文献
11.
12.
Fault diagnosis methods for rotating machinery have always been a hot research topic, and artificial intelligence-based approaches have attracted increasing attention from both researchers and engineers. Among those related studies and methods, artificial neural networks, especially deep learning-based methods, are widely used to extract fault features or classify fault features obtained by other signal processing techniques. Although such methods could solve the fault diagnosis problems of rotating machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-linear mapping between raw data and the corresponding fault modes, the performance of such fault diagnosis methods highly depends on the quality of the extracted features. (2) The optimization of neural network architecture and parameters, especially for deep neural networks, requires considerable manual modification and expert experience, which limits the applicability and generalization of such methods. As a remarkable breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep reinforcement learning, provides inspiration and direction for the aforementioned shortcomings. Combining the advantages of deep learning and reinforcement learning, deep reinforcement learning is able to build an end-to-end fault diagnosis architecture that can directly map raw fault data to the corresponding fault modes. Thus, based on deep reinforcement learning, a novel intelligent diagnosis method is proposed that is able to overcome the shortcomings of the aforementioned diagnosis methods. Validation tests of the proposed method are carried out using datasets of two types of rotating machinery, rolling bearings and hydraulic pumps, which contain a large number of measured raw vibration signals under different health states and working conditions. The diagnosis results show that the proposed method is able to obtain intelligent fault diagnosis agents that can mine the relationships between the raw vibration signals and fault modes autonomously and effectively. Considering that the learning process of the proposed method depends only on the replayed memories of the agent and the overall rewards, which represent much weaker feedback than that obtained by the supervised learning-based method, the proposed method is promising in establishing a general fault diagnosis architecture for rotating machinery. 相似文献
13.
Fog Computing (FC) based IoT applications are encountering a bottleneck in the data management and resource optimization due to the dynamic IoT topologies, resource-limited devices, resource diversity, mismatching service quality, and complicated service offering environments. Existing problems and emerging demands of FC based IoT applications are hard to be met by traditional IP-based Internet model. Therefore, in this paper, we focus on the Content-Centric Network (CCN) model to provide more efficient, flexible, and reliable data and resource management for fog-based IoT systems. We first propose a Deep Reinforcement Learning (DRL) algorithm that jointly considers the content type and status of fog servers for content-centric data and computation offloading. Then, we introduce a novel virtual layer called FogOrch that orchestrates the management and performance requirements of fog layer resources in an efficient manner via the proposed DRL agent. To show the feasibility of FogOrch, we develop a content-centric data offloading scheme (DRLOS) based on the DRL algorithm running on FogOrch. Through extensive simulations, we evaluate the performance of DRLOS in terms of total reward, computational workload, computation cost, and delay. The results show that the proposed DRLOS is superior to existing benchmark offloading schemes. 相似文献
14.
近年来, 进化策略由于其无梯度优化和高并行化效率等优点, 在深度强化学习领域得到了广泛的应用. 然而, 传统基于进化策略的深度强化学习方法存在着学习速度慢、容易收敛到局部最优和鲁棒性较弱等问题. 为此, 提出了一种基于自适应噪声的最大熵进化强化学习方法. 首先, 引入了一种进化策略的改进办法, 在“优胜”的基础上加强了“劣汰”, 从而提高进化强化学习的收敛速度; 其次, 在目标函数中引入了策略最大熵正则项, 来保证策略的随机性进而鼓励智能体对新策略的探索; 最后, 提出了自适应噪声控制的方式, 根据当前进化情形智能化调整进化策略的搜索范围, 进而减少对先验知识的依赖并提升算法的鲁棒性. 实验结果表明, 该方法较之传统方法在学习速度、最优性收敛和鲁棒性上有比较明显的提升. 相似文献
15.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts. 相似文献
16.
车辆路径问题是物流运输优化中的核心问题,目的是在满足顾客需求下得到一条最低成本的车辆路径规划。但随着物流运输规模的不断增大,车辆路径问题求解难度增加,并且对实时性要求也不断提高,已有的常规算法不再适应实际要求。近年来,基于强化学习算法开始成为求解车辆路径问题的重要方法,在简要回顾常规方法求解车辆路径问题的基础上,重点总结基于强化学习求解车辆路径问题的算法,并将算法按照基于动态规划、基于价值、基于策略的方式进行了分类;最后对该问题未来的研究进行了展望。 相似文献
17.
How to design System of Systems has been widely concerned in recent years, especially in military applications. This problem is also known as SoS architecting, which can be boiled down to two subproblems: selecting a number of systems from a set of candidates and specifying the tasks to be completed for each selected system. Essentially, such a problem can be reduced to a combinatorial optimization problem. Traditional exact solvers such as branch-bound algorithm are not efficient enough to deal with large scale cases. Heuristic algorithms are more scalable, but if input changes, these algorithms have to restart the searching process. Re-searching process may take a long time and interfere with the mission achievement of SoS in highly dynamic scenarios, e.g., in the Mosaic Warfare. In this paper, we combine artificial intelligence with SoS architecting and propose a deep reinforcement learning approach DRL-SoSDP for SoS design. Deep neural networks and actor–critic algorithms are used to find the optimal solution with constraints. Evaluation results show that the proposed approach is superior to heuristic algorithms in both solution quality and computation time, especially in large scale cases. DRL-SoSDP can find great solutions in a near real-time manner, showing great potential for cases that require an instant reply. DRL-SoSDP also shows good generalization ability and can find better results than heuristic algorithms even when the scale of SoS is much larger than that in training data. 相似文献
18.
A. M. Hafiz M. Hassaballah Abdullah Alqahtani Shtwai Alsubai Mohamed Abdel Hameed 《计算机系统科学与工程》2023,46(3):2651-2666
With the advent of Reinforcement Learning (RL) and its continuousprogress, state-of-the-art RL systems have come up for many challenging andreal-world tasks. Given the scope of this area, various techniques are found inthe literature. One such notable technique, Multiple Deep Q-Network (DQN) basedRL systems use multiple DQN-based-entities, which learn together and communicate with each other. The learning has to be distributed wisely among all entities insuch a scheme and the inter-entity communication protocol has to be carefullydesigned. As more complex DQNs come to the fore, the overall complexity of thesemulti-entity systems has increased many folds leading to issues like difficulty intraining, need for high resources, more training time, and difficulty in fine-tuningleading to performance issues. Taking a cue from the parallel processing foundin the nature and its efficacy, we propose a lightweight ensemble based approachfor solving the core RL tasks. It uses multiple binary action DQNs having sharedstate and reward. The benefits of the proposed approach are overall simplicity, fasterconvergence and better performance compared to conventional DQN basedapproaches. The approach can potentially be extended to any type of DQN by forming its ensemble. Conducting extensive experimentation, promising results areobtained using the proposed ensemble approach on OpenAI Gym tasks, and Atari2600 games as compared to recent techniques. The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task, 259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games. 相似文献
19.
为了提升具有高维动作空间的复杂连续控制任务的性能和样本效率, 提出一种基于Transformer的状态−动作−奖赏预测表征学习框架(Transformer-based state-action-reward prediction representation learning framework, TSAR). 具体来说, TSAR提出一种基于Transformer的融合状态−动作−奖赏信息的序列预测任务. 该预测任务采用随机掩码技术对序列数据进行预处理, 通过最大化掩码序列的预测状态特征与实际目标状态特征间的互信息, 同时学习状态与动作表征. 为进一步强化状态和动作表征与强化学习(Reinforcement learning, RL)策略的相关性, TSAR引入动作预测学习和奖赏预测学习作为附加的学习约束以指导状态和动作表征学习. TSAR同时将状态表征和动作表征显式地纳入到强化学习策略的优化中, 显著提高了表征对策略学习的促进作用. 实验结果表明, 在DMControl的9个具有挑战性的困难环境中, TSAR的性能和样本效率超越了现有最先进的方法. 相似文献