首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the actual working site, the equipment often works in different working conditions while the manufacturing system is rather complicated. However, traditional multi-label learning methods need to use the pre-defined label sequence or synchronously predict all labels of the input sample in the fault diagnosis domain. Deep reinforcement learning (DRL) combines the perception ability of deep learning and the decision-making ability of reinforcement learning. Moreover, the curriculum learning mechanism follows the learning approach of humans from easy to complex. Consequently, an improved proximal policy optimization (PPO) method, which is a typical algorithm in DRL, is proposed as a novel method on multi-label classification in this paper. The improved PPO method could build a relationship between several predicted labels of input sample because of designing an action history vector, which encodes all history actions selected by the agent at current time step. In two rolling bearing experiments, the diagnostic results demonstrate that the proposed method provides a higher accuracy than traditional multi-label methods on fault recognition under complicated working conditions. Besides, the proposed method could distinguish the multiple labels of input samples following the curriculum mechanism from easy to complex, compared with the same network using the pre-defined label sequence.  相似文献   

2.
This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. This review provides a comprehensive overview of existing exploration approaches, which are categorised based on the key contributions as: reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods. Then, unsolved challenges are discussed to provide valuable future research directions. Finally, the approaches of different categories are compared in terms of complexity, computational effort and overall performance.  相似文献   

3.
In complex working site, bearings used as the important part of machine, could simultaneously have faults on several positions. Consequently, multi-label learning approach considering fully the correlation between different faulted positions of bearings becomes the popular learning pattern. Deep reinforcement learning (DRL) combining the perception ability of deep learning and the decision-making ability of reinforcement learning, could be adapted to the compound fault diagnosis while having a strong ability extracting the fault feature from the raw data. However, DRL is difficult to converge and easily falls into the unstable training problem. Therefore, this paper integrates the feature extraction ability of DRL and the knowledge transfer ability of transfer learning (TL), and proposes the multi-label transfer reinforcement learning (ML-TRL). In detail, the proposed method utilizes the improved trust region policy optimization (TRPO) as the basic DRL framework and pre-trains the fixed convolutional networks of ML-TRL using the multi-label convolutional neural network method. In compound fault experiment, the final results demonstrate powerfully that the proposed method could have the higher accuracy than other multi-label learning methods. Hence, the proposed method is a remarkable alternative when recognizing the compound fault of bearings.  相似文献   

4.
Fault diagnosis methods for rotating machinery have always been a hot research topic, and artificial intelligence-based approaches have attracted increasing attention from both researchers and engineers. Among those related studies and methods, artificial neural networks, especially deep learning-based methods, are widely used to extract fault features or classify fault features obtained by other signal processing techniques. Although such methods could solve the fault diagnosis problems of rotating machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-linear mapping between raw data and the corresponding fault modes, the performance of such fault diagnosis methods highly depends on the quality of the extracted features. (2) The optimization of neural network architecture and parameters, especially for deep neural networks, requires considerable manual modification and expert experience, which limits the applicability and generalization of such methods. As a remarkable breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep reinforcement learning, provides inspiration and direction for the aforementioned shortcomings. Combining the advantages of deep learning and reinforcement learning, deep reinforcement learning is able to build an end-to-end fault diagnosis architecture that can directly map raw fault data to the corresponding fault modes. Thus, based on deep reinforcement learning, a novel intelligent diagnosis method is proposed that is able to overcome the shortcomings of the aforementioned diagnosis methods. Validation tests of the proposed method are carried out using datasets of two types of rotating machinery, rolling bearings and hydraulic pumps, which contain a large number of measured raw vibration signals under different health states and working conditions. The diagnosis results show that the proposed method is able to obtain intelligent fault diagnosis agents that can mine the relationships between the raw vibration signals and fault modes autonomously and effectively. Considering that the learning process of the proposed method depends only on the replayed memories of the agent and the overall rewards, which represent much weaker feedback than that obtained by the supervised learning-based method, the proposed method is promising in establishing a general fault diagnosis architecture for rotating machinery.  相似文献   

5.
The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance.  相似文献   

6.
深度逆向强化学习是机器学习领域的一个新的研究热点,它针对深度强化学习的回报函数难以获取问题,提出了通过专家示例轨迹重构回报函数的方法。首先介绍了3类深度强化学习方法的经典算法;接着阐述了经典的逆向强化学习算法,包括基于学徒学习、最大边际规划、结构化分类和概率模型形式化的方法;然后对深度逆向强化学习的一些前沿方向进行了综述,包括基于最大边际法的深度逆向强化学习、基于深度Q网络的深度逆向强化学习和基于最大熵模型的深度逆向强化学习和示例轨迹非专家情况下的逆向强化学习方法等。最后总结了深度逆向强化学习在算法、理论和应用方面存在的问题和发展方向。  相似文献   

7.
How to design System of Systems has been widely concerned in recent years, especially in military applications. This problem is also known as SoS architecting, which can be boiled down to two subproblems: selecting a number of systems from a set of candidates and specifying the tasks to be completed for each selected system. Essentially, such a problem can be reduced to a combinatorial optimization problem. Traditional exact solvers such as branch-bound algorithm are not efficient enough to deal with large scale cases. Heuristic algorithms are more scalable, but if input changes, these algorithms have to restart the searching process. Re-searching process may take a long time and interfere with the mission achievement of SoS in highly dynamic scenarios, e.g., in the Mosaic Warfare. In this paper, we combine artificial intelligence with SoS architecting and propose a deep reinforcement learning approach DRL-SoSDP for SoS design. Deep neural networks and actor–critic algorithms are used to find the optimal solution with constraints. Evaluation results show that the proposed approach is superior to heuristic algorithms in both solution quality and computation time, especially in large scale cases. DRL-SoSDP can find great solutions in a near real-time manner, showing great potential for cases that require an instant reply. DRL-SoSDP also shows good generalization ability and can find better results than heuristic algorithms even when the scale of SoS is much larger than that in training data.  相似文献   

8.
多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化(MAMPPO)方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。  相似文献   

9.
Distributed manufacturing plays an important role for large-scale companies to reduce production and transportation costs for globalized orders. However, how to real-timely and properly assign dynamic orders to distributed workshops is a challenging problem. To provide real-time and intelligent decision-making of scheduling for distributed flowshops, we studied the distributed permutation flowshop scheduling problem (DPFSP) with dynamic job arrivals using deep reinforcement learning (DRL). The objective is to minimize the total tardiness cost of all jobs. We provided the training and execution procedures of intelligent scheduling based on DRL for the dynamic DPFSP. In addition, we established a DRL-based scheduling model for distributed flowshops by designing suitable reward function, scheduling actions, and state features. A novel reward function is designed to directly relate to the objective. Various problem-specific dispatching rules are introduced to provide efficient actions for different production states. Furthermore, four efficient DRL algorithms, including deep Q-network (DQN), double DQN (DbDQN), dueling DQN (DlDQN), and advantage actor-critic (A2C), are adapted to train the scheduling agent. The training curves show that the agent learned to generate better solutions effectively and validate that the system design is reasonable. After training, all DRL algorithms outperform traditional meta-heuristics and well-known priority dispatching rules (PDRs) by a large margin in terms of solution quality and computation efficiency. This work shows the effectiveness of DRL for the real-time scheduling of dynamic DPFSP.  相似文献   

10.
Machine vision, especially deep learning methods, has become a hot topic for product surface inspection. In practice, capturing high quality images is a base for defect detection. It turns out to be challenging for complex products as image quality suffers from occlusion, illumination, and other issues. Multiple images from different viewpoints are often required in this scenario to cover all the important areas of the products. Reducing the viewpoints while ensuring the coverage is the key to make the inspection system more efficient in production. This paper proposes a high-efficient view planning method based on deep reinforcement learning to solve this problem. First, visibility estimation method is developed so that the visible areas can be quickly identified for a given viewpoint. Then, a new reward function is designed, and the Asynchronous Advantage Actor-Critic method is applied to solve the view planning problem. The effectiveness and efficiency of the proposed method is verified with a set of experiments. The proposed method could also be potentially applied to other similar vision-based tasks.  相似文献   

11.
对强化学习中的探索方案进行了研究,描述了间接探索和直接探索两种方案各自的特点.综合它们的优点,提出了一种集直接探索和间接探索为一体的混合探索方案.该方案在学习的初始阶段,由于对环境的经验知识较少,侧重于直接探索;在获得比较多的经验后,侧重于间接探索,使得行动选择渐渐趋向于最优策略.实验表明该方案比纯粹的间接探索-greedy方案有更高的学习效率.  相似文献   

12.
郭方洪  何通  吴祥  董辉  刘冰 《控制理论与应用》2022,39(10):1881-1889
随着海量新能源接入到微电网中, 微电网系统模型的参数空间成倍增长, 其能量优化调度的计算难度不断上升. 同时, 新能源电源出力的不确定性也给微电网的优化调度带来巨大挑战. 针对上述问题, 本文提出了一种基于分布式深度强化学习的微电网实时优化调度策略. 首先, 在分布式的架构下, 将主电网和每个分布式电源看作独立智能体. 其次, 各智能体拥有一个本地学习模型, 并根据本地数据分别建立状态和动作空间, 设计一个包含发电成本、交易电价、电源使用寿命等多目标优化的奖励函数及其约束条件. 最后, 各智能体通过与环境交互来寻求本地最优策略, 同时智能体之间相互学习价值网络参数, 优化本地动作选择, 最终实现最小化微电网系统运行成本的目标. 仿真结果表明, 与深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)相比, 本方法在保证系统稳定以及求解精度的前提下, 训练速度提高了17.6%, 成本函数值降低了67%, 实现了微电网实时优化调度.  相似文献   

13.
In the design phase of Li-ion batteries for electric vehicles, battery manufacturers need to carry out cycle life tests on a large number of formulations to get the best one that meets customer demands. However, such tests take considerable time and money due to the long cycle life of power Li-ion batteries. Aiming at reducing the cost of cycle life tests, we propose a prediction method that can learn historical degradation data and extrapolate to predict the remaining degradation trend of the current formulation sample taking the initial stage of partial cycle life test results as input. Compared with existing methods, the proposed deep reinforcement learning based method is able to learn degradation trends with different formulations and predict long-term degradation trends. Based on the deep deterministic policy gradient algorithm, the proposed method builds a degradation trend prediction model. Meanwhile, an interactive environment is designed for the model to explore and learn in the training phase. The proposed method is verified with real test data from battery manufacturers under three different temperature conditions in the formulation design stage. The comparisons indicate that the proposed method is superior to traditional degradation trend prediction methods in both accuracy and stability.  相似文献   

14.
目前对于随机工期的分布式资源受限多项目调度问题(SDRCMPSP)的研究较少且大多数为静态调度方案,无法针对环境的变化实时地对策略进行调整优化,及时响应频繁发生的动态因素。为此建立了最小化总拖期成本为目标的随机资源受限多项目动态调度DRL模型,设计了相应的智能体交互环境,采用强化学习中的DDDQN算法对模型进行求解。实验首先对算法的超参数进行灵敏度分析,其次将最优组合在活动工期可变和到达时间不确定两种不同条件下对模型进行训练及测试,结果表明深度强化学习算法能够得到优于任意单一规则的调度结果,有效减少随机资源受限多项目期望总拖期成本,多项目调度决策优化提供良好的依据。  相似文献   

15.
Prediction of wind speed can provide a reference for the reliable utilization of wind energy. This study focuses on 1-hour, 1-step ahead deterministic wind speed prediction with only wind speed as input. To consider the time-varying characteristics of wind speed series, a dynamic ensemble wind speed prediction model based on deep reinforcement learning is proposed. It includes ensemble learning, multi-objective optimization, and deep reinforcement learning to ensure effectiveness. In part A, deep echo state network enhanced by real-time wavelet packet decomposition is used to construct base models with different vanishing moments. The variety of vanishing moments naturally guarantees the diversity of base models. In part B, multi-objective optimization is adopted to determine the combination weights of base models. The bias and variance of ensemble model are synchronously minimized to improve generalization ability. In part C, the non-dominated solutions of combination weights are embedded into a deep reinforcement learning environment to achieve dynamic selection. By reasonably designing the reinforcement learning environment, it can dynamically select non-dominated solution in each prediction according to the time-varying characteristics of wind speed. Four actual wind speed series are used to validate the proposed dynamic ensemble model. The results show that: (a) The proposed dynamic ensemble model is competitive for wind speed prediction. It significantly outperforms five classic intelligent prediction models and six ensemble methods; (b) Every part of the proposed model is indispensable to improve the prediction accuracy.  相似文献   

16.
The multimodal perception of intelligent robots is essential for achieving collision-free and efficient navigation. Autonomous navigation is enormously challenging when perception is acquired using only vision or LiDAR sensor data due to the lack of complementary information from different sensors. This paper proposes a simple yet efficient deep reinforcement learning (DRL) with sparse rewards and hindsight experience replay (HER) to achieve multimodal navigation. By adopting the depth images and pseudo-LiDAR data generated by an RGB-D camera as input, a multimodal fusion scheme is used to enhance the perception of the surrounding environment compared to using a single sensor. To alleviate the misleading way for the agent to navigate with dense rewards, the sparse rewards are intended to identify its tasks. Additionally, the HER technique is introduced to address the sparse reward navigation issue for accelerating optimal policy learning. The results show that the proposed model achieves state-of-the-art performance in terms of success, crash, and timeout rates, as well as generalization capability.  相似文献   

17.
Context-aware ubiquitous learning (u-learning) is an innovative approach that integrates wireless, mobile, and context-awareness technologies to detect the situation of learners in the real world and provide adaptive support or guidance accordingly. In this paper, a context-aware u-learning environment is developed for guiding inexperienced researchers to practice single-crystal X-ray diffraction operations. Experimental results showed that the benefits of this innovative approach are that it is “systematic”, “authentic”, and “economical”, which implies the potential of applying it to complex science experiments, such as physics, chemistry or biotechnology experiments, for graduate and PhD students in colleges, or research workers in research institutes.  相似文献   

18.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.  相似文献   

19.
遥感图像中含有大量的微小目标,只有准确检测到这些微小目标,才能实现远程目标的识别与跟踪。为了给远程跟踪工作提供有效的辅助工具,以深度学习算法为技术支持,优化设计遥感图像微小目标检测方法。利用硬件设备实时采集包含微小目标的遥感图像,通过几何校正、灰度化转换、噪声抑制、去雾以及图像增强等步骤,完成初始图像的预处理。通过前景与背景图像的分割,选择遥感图像中的待检测目标。构建深度卷积神经网络作为深度学习算法的运行环境,经过前向传播、反向传播提取遥感图像特征。最终通过特征匹配,得出包含微小目标数量以及位置坐标的检测结果。通过性能测试实验得出结论:与传统遥感图像目标检测方法相比,优化设计方法的查准率和查全率分别提高了6.3%和10.74%,目标位置检测误差得到明显降低,且响应时间缩短了2440ms,由此证明优化设计方法具有良好的检测性能。  相似文献   

20.
增强学习可以帮助协商Agent选择最优行动实现其最终目标。对基于增强学习的协商策略进行优化,在协商过程中充分利用对手的历史信息,加快协商解的收敛和提高协商解的质量。最后通过实验验证了算法的有效性和可用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号