首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the actual working site, the equipment often works in different working conditions while the manufacturing system is rather complicated. However, traditional multi-label learning methods need to use the pre-defined label sequence or synchronously predict all labels of the input sample in the fault diagnosis domain. Deep reinforcement learning (DRL) combines the perception ability of deep learning and the decision-making ability of reinforcement learning. Moreover, the curriculum learning mechanism follows the learning approach of humans from easy to complex. Consequently, an improved proximal policy optimization (PPO) method, which is a typical algorithm in DRL, is proposed as a novel method on multi-label classification in this paper. The improved PPO method could build a relationship between several predicted labels of input sample because of designing an action history vector, which encodes all history actions selected by the agent at current time step. In two rolling bearing experiments, the diagnostic results demonstrate that the proposed method provides a higher accuracy than traditional multi-label methods on fault recognition under complicated working conditions. Besides, the proposed method could distinguish the multiple labels of input samples following the curriculum mechanism from easy to complex, compared with the same network using the pre-defined label sequence.  相似文献   

2.
Bearings and tools are the important parts of the machine tool. And monitoring automatically the fault of bearings and the wear of tools under different working conditions is the necessary performance of the intelligent manufacturing system. In this paper, a multi-label imitation learning (MLIL) framework is proposed to monitor the tool wear and bearing fault under different working conditions. Specially, the multi-label samples with multiple sublabels are transformed into the imitation objects, and the MLIL develops a discriminator and a deep reinforcement learning (DRL) to imitate the feature from imitation objects. In detail, the DRL is implemented without setting the reward function to enhance the feature extraction ability of deep neural networks, and meanwhile the discriminator is used to discriminate the generations of DRL and imitation objects. As a result, the MLIL framework can not only deal with the correlation between multiple working conditions including different speeds and loads, but also distinguish the compound fault composed of coinstantaneous bearing fault and tool wear. Two cases demonstrate jointly the imitation ability of the MLIL framework on monitoring tool wear and bearing fault under different working conditions.  相似文献   

3.
The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance.  相似文献   

4.
Transfer in variable-reward hierarchical reinforcement learning   总被引:2,自引:1,他引:1  
Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.  相似文献   

5.
Health sensing system (HSS), offering a variety of health services, has attracted considerable research attention in the area of smart healthcare. However, continuous sensing inevitably brings dramatic energy consumption of mobile sensing devices. On the other hand, the reduction of sensing time duration causes excessive delay in sensing a user state change and the missing of critical physiologic signal. Thus, the trade-off between energy consumption and delay constitutes a primary challenge in the design of HSS. In this paper, we propose an adaptive sensing strategy to intelligently determine the trigger time for sensing physiological parameters at a HSS. Furthermore, human context recognition (HCR) is adopted to design context-aware sensing strategy, where the health condition, sensing requirements, and dependence on physiological data are considered simultaneously. To devise the sensing strategy, we first generate a dynamic observation model. Next, we propose a sort retention double-DQN based sensing strategy. In comparison to traditional double-DQN, the proposed approach can effectively enhance learning stability and sample efficiency. With SRD-DQN, we can obtain the optimized solution for the schedule of the successive window according to the current state. We implement blood pressure and heart rate monitoring simulations to evaluate the performance of the proposed sensing strategy. Simulation results reveal that the sensing strategy can effectively restrain energy consumption and delay, and SRD-DQN converges faster than traditional DQN.  相似文献   

6.
In the design phase of Li-ion batteries for electric vehicles, battery manufacturers need to carry out cycle life tests on a large number of formulations to get the best one that meets customer demands. However, such tests take considerable time and money due to the long cycle life of power Li-ion batteries. Aiming at reducing the cost of cycle life tests, we propose a prediction method that can learn historical degradation data and extrapolate to predict the remaining degradation trend of the current formulation sample taking the initial stage of partial cycle life test results as input. Compared with existing methods, the proposed deep reinforcement learning based method is able to learn degradation trends with different formulations and predict long-term degradation trends. Based on the deep deterministic policy gradient algorithm, the proposed method builds a degradation trend prediction model. Meanwhile, an interactive environment is designed for the model to explore and learn in the training phase. The proposed method is verified with real test data from battery manufacturers under three different temperature conditions in the formulation design stage. The comparisons indicate that the proposed method is superior to traditional degradation trend prediction methods in both accuracy and stability.  相似文献   

7.
Many neural network methods such as ML-RBF and BP-MLL have been used for multi-label classification. Recently, extreme learning machine (ELM) is used as the basic elements to handle multi-label classification problem because of its fast training time. Extreme learning machine based auto encoder (ELM-AE) is a novel method of neural network which can reproduce the input signal as well as auto encoder, but it can not solve the over-fitting problem in neural networks elegantly. Introducing weight uncertainty into ELM-AE, we can treat the input weights as random variables following Gaussian distribution and propose weight uncertainty ELM-AE (WuELM-AE). In this paper, a neural network named multi layer ELM-RBF for multi-label learning (ML-ELM-RBF) is proposed. It is derived from radial basis function for multi-label learning (ML-RBF) and WuELM-AE. ML-ELM-RBF firstly stacks WuELM-AE to create a deep network, and then it conducts clustering analysis on samples features of each possible class to compose the last hidden layer. ML-ELM-RBF has achieved satisfactory results on single-label and multi-label data sets. Experimental results show that WuELM-AE and ML-ELM-RBF are effective learning algorithms.  相似文献   

8.
In the era of Big Data, a practical yet challenging task is to make learning techniques more universally applicable in dealing with the complex learning problem, such as multi-source multi-label learning. While some of the early work have developed many effective solutions for multi-label classification and multi-source fusion separately, in this paper we learn the two problems together, and propose a novel method for the joint learning of multiple class labels and data sources, in which an optimization framework is constructed to formulate the learning problem, and the result of multi-label classification is induced by the weighted combination of the decisions from multiple sources. The proposed method is responsive in exploiting the label correlations and fusing multi-source data, especially in the fusion of long-tail data. Experiments on various multi-source multi-label data sets reveal the advantages of the proposed method.  相似文献   

9.
We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max–Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC’s ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.  相似文献   

10.
Fault diagnosis methods for rotating machinery have always been a hot research topic, and artificial intelligence-based approaches have attracted increasing attention from both researchers and engineers. Among those related studies and methods, artificial neural networks, especially deep learning-based methods, are widely used to extract fault features or classify fault features obtained by other signal processing techniques. Although such methods could solve the fault diagnosis problems of rotating machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-linear mapping between raw data and the corresponding fault modes, the performance of such fault diagnosis methods highly depends on the quality of the extracted features. (2) The optimization of neural network architecture and parameters, especially for deep neural networks, requires considerable manual modification and expert experience, which limits the applicability and generalization of such methods. As a remarkable breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep reinforcement learning, provides inspiration and direction for the aforementioned shortcomings. Combining the advantages of deep learning and reinforcement learning, deep reinforcement learning is able to build an end-to-end fault diagnosis architecture that can directly map raw fault data to the corresponding fault modes. Thus, based on deep reinforcement learning, a novel intelligent diagnosis method is proposed that is able to overcome the shortcomings of the aforementioned diagnosis methods. Validation tests of the proposed method are carried out using datasets of two types of rotating machinery, rolling bearings and hydraulic pumps, which contain a large number of measured raw vibration signals under different health states and working conditions. The diagnosis results show that the proposed method is able to obtain intelligent fault diagnosis agents that can mine the relationships between the raw vibration signals and fault modes autonomously and effectively. Considering that the learning process of the proposed method depends only on the replayed memories of the agent and the overall rewards, which represent much weaker feedback than that obtained by the supervised learning-based method, the proposed method is promising in establishing a general fault diagnosis architecture for rotating machinery.  相似文献   

11.
深度强化学习在训练过程中会探索大量环境样本,造成算法收敛时间过长,而重用或传输来自先前任务(源任务)学习的知识,对算法在新任务(目标任务)的学习具有提高算法收敛速度的潜力.为了提高算法学习效率,提出一种双Q网络学习的迁移强化学习算法,其基于actor-critic框架迁移源任务最优值函数的知识,使目标任务中值函数网络对策略作出更准确的评价,引导策略快速向最优策略方向更新.将该算法用于Open AI Gym以及在三维空间机械臂到达目标物位置的实验中,相比于常规深度强化学习算法取得了更好的效果,实验证明提出的双Q网络学习的迁移强化学习算法具有较快的收敛速度,并且在训练过程中算法探索更加稳定.  相似文献   

12.
In the Internet of Things (IoT), a huge amount of valuable data is generated by various IoT applications. As the IoT technologies become more complex, the attack methods are more diversified and can cause serious damages. Thus, establishing a secure IoT network based on user trust evaluation to defend against security threats and ensure the reliability of data source of collected data have become urgent issues, in this paper, a Data Fusion and transfer learning empowered granular Trust Evaluation mechanism (DFTE) is proposed to address the above challenges. Specifically, to meet the granularity demands of trust evaluation, time–space empowered fine/coarse grained trust evaluation models are built utilizing deep transfer learning algorithms based on data fusion. Moreover, to prevent privacy leakage and task sabotage, a dynamic reward and punishment mechanism is developed to encourage honest users by dynamically adjusting the scale of reward or punishment and accurately evaluating users’ trusts. The extensive experiments show that: (i) the proposed DFTE achieves high accuracy of trust evaluation under different granular demands through efficient data fusion; (ii) DFTE performs excellently in participation rate and data reliability.  相似文献   

13.
Fault diagnosis of rolling bearing is crucial for safety of large rotating machinery. However, in practical engineering, the fault modes of rolling bearings are usually compound faults and contain a large amount of noise, which increases the difficulty of fault diagnosis. Therefore, a deep feature enhanced reinforcement learning method is proposed for the fault diagnosis of rolling bearing. Firstly, to improve robustness, the neural network is modified by the Elu activation function. Secondly, attention model is used to improve the feature enhanced ability and acquire essential global information. Finally, deep Q network is established to accurately diagnosis the fault modes. Sufficient experiments are conducted on the rolling bearing dataset. Test result shows that the proposed method is superior to other intelligent diagnosis methods.  相似文献   

14.
郭方洪  何通  吴祥  董辉  刘冰 《控制理论与应用》2022,39(10):1881-1889
随着海量新能源接入到微电网中, 微电网系统模型的参数空间成倍增长, 其能量优化调度的计算难度不断上升. 同时, 新能源电源出力的不确定性也给微电网的优化调度带来巨大挑战. 针对上述问题, 本文提出了一种基于分布式深度强化学习的微电网实时优化调度策略. 首先, 在分布式的架构下, 将主电网和每个分布式电源看作独立智能体. 其次, 各智能体拥有一个本地学习模型, 并根据本地数据分别建立状态和动作空间, 设计一个包含发电成本、交易电价、电源使用寿命等多目标优化的奖励函数及其约束条件. 最后, 各智能体通过与环境交互来寻求本地最优策略, 同时智能体之间相互学习价值网络参数, 优化本地动作选择, 最终实现最小化微电网系统运行成本的目标. 仿真结果表明, 与深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)相比, 本方法在保证系统稳定以及求解精度的前提下, 训练速度提高了17.6%, 成本函数值降低了67%, 实现了微电网实时优化调度.  相似文献   

15.
强化学习主要研究智能体如何根据环境作出较好的决策,其核心是学习策略。基于传统策略模型的动作选择主要依赖于状态感知、历史记忆及模型参数等,其智能体行为很难受到控制。然而,当人类智能体完成任务时,通常会根据自身的意愿或动机选择相应的行为。受人类决策机制的启发,为了让强化学习中的行为选择可控,使智能体能够根据意图选择动作,将意图变量加入到策略模型中,提出了一种基于意图控制的强化学习策略学习方法。具体地,通过意图变量与动作的互信息最大化使两者产生高相关性,使得策略能够根据给定意图变量选择相关动作,从而达到对智能体的控制。最终,通过复杂的机器人控制仿真任务Mujoco验证了所提方法能够有效地通过意图变量控制机器人的移动速度和移动角度。  相似文献   

16.
Mental fatigue is one of the major factors leading to human errors. To avoid failures caused by mental fatigue, researchers are working on ways to detect/monitor fatigue using different types of signals. Electroencephalography (EEG) signal is one of the most popular methods to recognize mental fatigue since it directly measures the neurophysiological activities in the brain. Current EEG-based fatigue recognition algorithms are usually subject-specific, which means a classifier needs to be trained per subject. However, as fatigue may need a relatively long period to induce, collecting training data from each new user could be time-consuming and troublesome. Calibration-free methods are desired but also challenging since significant variability of physiological signals exists among different subjects. In this paper, we proposed algorithms using inter-subject transfer learning for EEG-based mental fatigue recognition, which did not need a calibration. To explore the influence of the number of EEG channels on the algorithms’ accuracy, we also compared the cases of using one channel only and multiple channels. Random forest was applied to choose the channel that has the most distinguishable features. A public EEG fatigue dataset recorded during driving was used to validate the algorithms. EEG data from 11 subjects were selected from the dataset and leave-one-subject-out cross-validation was employed. The channel from the occipital lobe is selected when only one channel is desired. The proposed transfer learning-based algorithms using Maximum Independence Domain Adaptation (MIDA) achieved an accuracy of 73.01% with all thirty channels, and using Transfer Component Analysis (TCA) achieved 68.00% with the one selected channel.  相似文献   

17.
模糊Sarsa学习(FSL)是基于Sarsa学习而提出来的一种模糊强化学习算法,它是一种通过在线策略来逼近动作值函数的算法,在其每条模糊规则中,动作的选择是按照Softmax公式选择下一个动作。对于连续空间的复杂学习任务,FSL不能较好平衡探索和利用之间的关系,为此,本文提出了一种新的基于蚁群优化的模糊强化学习算法(ACO-FSL),主要工作是把蚁群优化(ACO)思想和传统的模糊强化学习算法结合起来形成一种新的算法。给出了算法的设计原理、方法和具体步骤,小车爬山问题的仿真实验表明本文提出的ACO-FSL算法在学习速度和稳定性上优于FSL算法。  相似文献   

18.
How to design System of Systems has been widely concerned in recent years, especially in military applications. This problem is also known as SoS architecting, which can be boiled down to two subproblems: selecting a number of systems from a set of candidates and specifying the tasks to be completed for each selected system. Essentially, such a problem can be reduced to a combinatorial optimization problem. Traditional exact solvers such as branch-bound algorithm are not efficient enough to deal with large scale cases. Heuristic algorithms are more scalable, but if input changes, these algorithms have to restart the searching process. Re-searching process may take a long time and interfere with the mission achievement of SoS in highly dynamic scenarios, e.g., in the Mosaic Warfare. In this paper, we combine artificial intelligence with SoS architecting and propose a deep reinforcement learning approach DRL-SoSDP for SoS design. Deep neural networks and actor–critic algorithms are used to find the optimal solution with constraints. Evaluation results show that the proposed approach is superior to heuristic algorithms in both solution quality and computation time, especially in large scale cases. DRL-SoSDP can find great solutions in a near real-time manner, showing great potential for cases that require an instant reply. DRL-SoSDP also shows good generalization ability and can find better results than heuristic algorithms even when the scale of SoS is much larger than that in training data.  相似文献   

19.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.  相似文献   

20.
Prediction of wind speed can provide a reference for the reliable utilization of wind energy. This study focuses on 1-hour, 1-step ahead deterministic wind speed prediction with only wind speed as input. To consider the time-varying characteristics of wind speed series, a dynamic ensemble wind speed prediction model based on deep reinforcement learning is proposed. It includes ensemble learning, multi-objective optimization, and deep reinforcement learning to ensure effectiveness. In part A, deep echo state network enhanced by real-time wavelet packet decomposition is used to construct base models with different vanishing moments. The variety of vanishing moments naturally guarantees the diversity of base models. In part B, multi-objective optimization is adopted to determine the combination weights of base models. The bias and variance of ensemble model are synchronously minimized to improve generalization ability. In part C, the non-dominated solutions of combination weights are embedded into a deep reinforcement learning environment to achieve dynamic selection. By reasonably designing the reinforcement learning environment, it can dynamically select non-dominated solution in each prediction according to the time-varying characteristics of wind speed. Four actual wind speed series are used to validate the proposed dynamic ensemble model. The results show that: (a) The proposed dynamic ensemble model is competitive for wind speed prediction. It significantly outperforms five classic intelligent prediction models and six ensemble methods; (b) Every part of the proposed model is indispensable to improve the prediction accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号