共查询到20条相似文献,搜索用时 0 毫秒
1.
作为一种崭新的机器学习方法,深度强化学习将深度学习和强化学习技术结合起来,使智能体能够从高维空间感知信息,并根据得到的信息训练模型、做出决策。由于深度强化学习算法具有通用性和有效性,人们对其进行了广泛的研究,并将其运用到了日常生活的各个领域。首先,对深度强化学习研究进行概述,介绍了深度强化学习的基础理论;然后,分别介绍了基于值函数和基于策略的深度强化学习算法,讨论了其应用前景;最后,对相关研究工作做了总结和展望。 相似文献
2.
In the actual working site, the equipment often works in different working conditions while the manufacturing system is rather complicated. However, traditional multi-label learning methods need to use the pre-defined label sequence or synchronously predict all labels of the input sample in the fault diagnosis domain. Deep reinforcement learning (DRL) combines the perception ability of deep learning and the decision-making ability of reinforcement learning. Moreover, the curriculum learning mechanism follows the learning approach of humans from easy to complex. Consequently, an improved proximal policy optimization (PPO) method, which is a typical algorithm in DRL, is proposed as a novel method on multi-label classification in this paper. The improved PPO method could build a relationship between several predicted labels of input sample because of designing an action history vector, which encodes all history actions selected by the agent at current time step. In two rolling bearing experiments, the diagnostic results demonstrate that the proposed method provides a higher accuracy than traditional multi-label methods on fault recognition under complicated working conditions. Besides, the proposed method could distinguish the multiple labels of input samples following the curriculum mechanism from easy to complex, compared with the same network using the pre-defined label sequence. 相似文献
3.
The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance. 相似文献
4.
In complex working site, bearings used as the important part of machine, could simultaneously have faults on several positions. Consequently, multi-label learning approach considering fully the correlation between different faulted positions of bearings becomes the popular learning pattern. Deep reinforcement learning (DRL) combining the perception ability of deep learning and the decision-making ability of reinforcement learning, could be adapted to the compound fault diagnosis while having a strong ability extracting the fault feature from the raw data. However, DRL is difficult to converge and easily falls into the unstable training problem. Therefore, this paper integrates the feature extraction ability of DRL and the knowledge transfer ability of transfer learning (TL), and proposes the multi-label transfer reinforcement learning (ML-TRL). In detail, the proposed method utilizes the improved trust region policy optimization (TRPO) as the basic DRL framework and pre-trains the fixed convolutional networks of ML-TRL using the multi-label convolutional neural network method. In compound fault experiment, the final results demonstrate powerfully that the proposed method could have the higher accuracy than other multi-label learning methods. Hence, the proposed method is a remarkable alternative when recognizing the compound fault of bearings. 相似文献
5.
Recognizing context for annotating a live life recording 总被引:1,自引:1,他引:1
In the near future, it will be possible to continuously record and store the entire audio–visual lifetime of a person together
with all digital information that the person perceives or creates. While the storage of this data will be possible soon, retrieval
and indexing into such large data sets are unsolved challenges. Since today’s retrieval cues seem insufficient we argue that
additional cues, obtained from body-worn sensors, make associative retrieval by humans possible. We present three approaches
to create such cues, each along with an experimental evaluation: the user’s physical activity from acceleration sensors, his
social environment from audio sensors, and his interruptibility from multiple sensors.
相似文献
Albrecht SchmidtEmail: |
6.
7.
8.
交通信号的智能控制是智能交通研究中的热点问题。为更加及时有效地自适应协调交通,文中提出了一种基于分布式深度强化学习的交通信号控制模型,采用深度神经网络框架,利用目标网络、双Q网络、价值分布提升模型表现。将交叉路口的高维实时交通信息离散化建模并与相应车道上的等待时间、队列长度、延迟时间、相位信息等整合作为状态输入,在对相位序列及动作、奖励做出恰当定义的基础上,在线学习交通信号的控制策略,实现交通信号Agent的自适应控制。为验证所提算法,在SUMO(Simulation of Urban Mobility)中相同设置下,将其与3种典型的深度强化学习算法进行对比。实验结果表明,基于分布式的深度强化学习算法在交通信号Agent的控制中具有更好的效率和鲁棒性,且在交叉路口车辆的平均延迟、行驶时间、队列长度、等待时间等方面具有更好的性能表现。 相似文献
9.
10.
Fault diagnosis methods for rotating machinery have always been a hot research topic, and artificial intelligence-based approaches have attracted increasing attention from both researchers and engineers. Among those related studies and methods, artificial neural networks, especially deep learning-based methods, are widely used to extract fault features or classify fault features obtained by other signal processing techniques. Although such methods could solve the fault diagnosis problems of rotating machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-linear mapping between raw data and the corresponding fault modes, the performance of such fault diagnosis methods highly depends on the quality of the extracted features. (2) The optimization of neural network architecture and parameters, especially for deep neural networks, requires considerable manual modification and expert experience, which limits the applicability and generalization of such methods. As a remarkable breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep reinforcement learning, provides inspiration and direction for the aforementioned shortcomings. Combining the advantages of deep learning and reinforcement learning, deep reinforcement learning is able to build an end-to-end fault diagnosis architecture that can directly map raw fault data to the corresponding fault modes. Thus, based on deep reinforcement learning, a novel intelligent diagnosis method is proposed that is able to overcome the shortcomings of the aforementioned diagnosis methods. Validation tests of the proposed method are carried out using datasets of two types of rotating machinery, rolling bearings and hydraulic pumps, which contain a large number of measured raw vibration signals under different health states and working conditions. The diagnosis results show that the proposed method is able to obtain intelligent fault diagnosis agents that can mine the relationships between the raw vibration signals and fault modes autonomously and effectively. Considering that the learning process of the proposed method depends only on the replayed memories of the agent and the overall rewards, which represent much weaker feedback than that obtained by the supervised learning-based method, the proposed method is promising in establishing a general fault diagnosis architecture for rotating machinery. 相似文献
11.
可重入生产系统的递阶增强型学习调度 总被引:2,自引:0,他引:2
对平均报酬型马氏决策过程,本文研究了一种递阶增强型学习算法;并将算法应用于一个两台机器组成的闭环可重入生产系统,计算机仿真结果表明,调度结果优于熟知的两种启发式调度策略. 相似文献
12.
采用一个组织良好的数据集和基于深度学习的模型,实现根据上下文获得论文的引文推荐.模型包括一个文档编码器和一个上下文编码器,使用图卷积网络层(GCN)和预训练模型BERT[1]的双向编码器表示.通过修改相关的PeerRead数据集,建立一个PeerReadPlus新数据集,它包含引用文献的上下文语句和论文元数据.结果表明... 相似文献
13.
近年来, 进化策略由于其无梯度优化和高并行化效率等优点, 在深度强化学习领域得到了广泛的应用. 然而, 传统基于进化策略的深度强化学习方法存在着学习速度慢、容易收敛到局部最优和鲁棒性较弱等问题. 为此, 提出了一种基于自适应噪声的最大熵进化强化学习方法. 首先, 引入了一种进化策略的改进办法, 在“优胜”的基础上加强了“劣汰”, 从而提高进化强化学习的收敛速度; 其次, 在目标函数中引入了策略最大熵正则项, 来保证策略的随机性进而鼓励智能体对新策略的探索; 最后, 提出了自适应噪声控制的方式, 根据当前进化情形智能化调整进化策略的搜索范围, 进而减少对先验知识的依赖并提升算法的鲁棒性. 实验结果表明, 该方法较之传统方法在学习速度、最优性收敛和鲁棒性上有比较明显的提升. 相似文献
14.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts. 相似文献
15.
针对车牌相似字符难以识别的问题,提出了基于深度学习的特征提取和识别方法。该方法首先对字符图像进行归一化处理;然后以归一化后的图像为输入,构建5层深度网络对相似字符由低层到高层的特征表达。在激励函数定义上采用对字符边缘特征敏感的卷积函数,从而能够对相似字符的局部差异进行分析。在实验部分与支持向量机(SVM)算法的分类效果进行比较,结果表明所提算法的识别率提高了5%。 相似文献
16.
深度学习研究与进展 总被引:2,自引:0,他引:2
深度学习是机器学习领域一个新兴的研究方向,它通过模仿人脑结构,实现对复杂输入数据的高效处理,智能地学习不同的知识,而且能够有效地解决多类复杂的智能问题。近年来,随着深度学习高效学习算法的出现,机器学习界掀起了研究深度学习理论及应用的热潮。实践表明,深度学习是一种高效的特征提取方法,它能够提取数据中更加抽象的特征,实现对数据更本质的刻画,同时深层模型具有更强的建模和推广能力。鉴于深度学习的优点及其广泛应用,对深度学习进行了较为系统的介绍,详细阐述了其产生背景、理论依据、典型的深度学习模型、具有代表性的快速学习算法、最新进展及实践应用,最后探讨了深度学习未来值得研究的方向。 相似文献
17.
A video playing on a television screen emits a characteristic flickering, which serves as an identification feature for the video. This paper presents a method for video recognition by sampling the ambient light sensor of a smartphone or wearable computer. The evaluation shows that given a set of known videos, a recognition rate of up to 100% is possible by sampling a sequence of 15 to 120 s length. Our method works even if the device has no direct line of sight to the television screen, since ambient light reflected from walls is sufficient. A major factor of influence on the recognition is the number of video cuts that change the light emitted by the screen. 相似文献
18.
A. M. Hafiz M. Hassaballah Abdullah Alqahtani Shtwai Alsubai Mohamed Abdel Hameed 《计算机系统科学与工程》2023,46(3):2651-2666
With the advent of Reinforcement Learning (RL) and its continuousprogress, state-of-the-art RL systems have come up for many challenging andreal-world tasks. Given the scope of this area, various techniques are found inthe literature. One such notable technique, Multiple Deep Q-Network (DQN) basedRL systems use multiple DQN-based-entities, which learn together and communicate with each other. The learning has to be distributed wisely among all entities insuch a scheme and the inter-entity communication protocol has to be carefullydesigned. As more complex DQNs come to the fore, the overall complexity of thesemulti-entity systems has increased many folds leading to issues like difficulty intraining, need for high resources, more training time, and difficulty in fine-tuningleading to performance issues. Taking a cue from the parallel processing foundin the nature and its efficacy, we propose a lightweight ensemble based approachfor solving the core RL tasks. It uses multiple binary action DQNs having sharedstate and reward. The benefits of the proposed approach are overall simplicity, fasterconvergence and better performance compared to conventional DQN basedapproaches. The approach can potentially be extended to any type of DQN by forming its ensemble. Conducting extensive experimentation, promising results areobtained using the proposed ensemble approach on OpenAI Gym tasks, and Atari2600 games as compared to recent techniques. The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task, 259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games. 相似文献
19.
为了提升具有高维动作空间的复杂连续控制任务的性能和样本效率, 提出一种基于Transformer的状态−动作−奖赏预测表征学习框架(Transformer-based state-action-reward prediction representation learning framework, TSAR). 具体来说, TSAR提出一种基于Transformer的融合状态−动作−奖赏信息的序列预测任务. 该预测任务采用随机掩码技术对序列数据进行预处理, 通过最大化掩码序列的预测状态特征与实际目标状态特征间的互信息, 同时学习状态与动作表征. 为进一步强化状态和动作表征与强化学习(Reinforcement learning, RL)策略的相关性, TSAR引入动作预测学习和奖赏预测学习作为附加的学习约束以指导状态和动作表征学习. TSAR同时将状态表征和动作表征显式地纳入到强化学习策略的优化中, 显著提高了表征对策略学习的促进作用. 实验结果表明, 在DMControl的9个具有挑战性的困难环境中, TSAR的性能和样本效率超越了现有最先进的方法. 相似文献