首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Batch or semi-batch processing is becoming more prevalent in industrial chemical manufacturing but it has not benefited from advanced control technologies to a same degree as continuous processing. This is due to its several unique aspects which pose challenges to implementing model-based optimal control, such as its highly nonstationary operation and significant run-to-run variabilities. While existing advanced control methods like model predictive control (MPC) have been extended to address some of the challenges, they still suffer from certain limitations which have prevented their widespread industrial adoption. Reinforcement learning (RL) where the agent learns the optimal policy by interacting with the system offers an alternative to the existing model-based methods and has potential for bringing significant improvements to industrial batch process control practice. With such motivation, this paper examines the advantages that RL offers over the traditional model-based optimal control methods and how it can be tailored to better address the characteristics of industrial batch process control problems. After a brief review of the existing batch control methods, the basic concepts and algorithms of RL are introduced and issues for applying them to batch process control problems are discussed. The nascent literature on the use of RL in batch process control is briefly reviewed, both in recipe optimization and tracking control, and our perspectives on future research directions are shared.  相似文献   

2.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.  相似文献   

3.
This paper describes and examines thoroughly a stochastic production/inventory system that produces a single type of products. During the production process, the system is affected by several deterioration failures. It is restored to its initial and previous deterioration state by repair and maintenance activities. Both maintenance and repair duration are assumed as exponential random variables. Moreover, the quality of the manufactured products is assumed to be affected by the current deterioration level of the system. The aim of this paper is to find the optimal trade-off between conflicting performance metrics for the optimization of the total expected profit of the system. To tackle such optimization problems, researchers frequently employ Dynamic Programming. This method, though, is not appropriate for the addressed problem due to complexity reasons. To this end, a Reinforcement Learning-based approach is proposed in order to obtain the optimal joint production, maintenance and product quality control policies. To the authors’ knowledge, the proposed approach is novel and there are few examples of such implementation in the academic literature.  相似文献   

4.
针对二连杆机械臂的运动控制问题,提出了一种基于深度强化学习的控制方法.首先,搭建机械臂仿真环境,包括二连杆机械臂、目标物与障碍物;然后,根据环境模型的目标设置、状态变量和奖罚机制来建立三种深度强化学习模型进行训练,最后实现二连杆机械臂的运动控制.对比分析所提出的三种模型后,选择深度确定性策略梯度(DDPG)算法进行进一...  相似文献   

5.
The development of estimation and control theories for quantum systems is a fundamental task for practical quantum technology. This vision article presents a brief introduction to challenging problems and potential opportunities in the emerging areas of quantum estimation, control and learning. The topics cover quantum state estimation, quantum parameter identification, quantum filtering, quantum open-loop control, quantum feedback control, machine learning for estimation and control of quantum systems, and quantum machine learning.  相似文献   

6.
This paper firstly introduces common wearable sensors, smart wearable devices and the key application areas. Since multi-sensor is defined by the presence of more than one model or channel, e.g. visual, audio, environmental and physiological signals. Hence, the fusion methods of multi-modality and multi-location sensors are proposed. Despite it has been contributed several works reviewing the stateoftheart on information fusion or deep learning, all of them only tackled one aspect of the sensor fusion applications, which leads to a lack of comprehensive understanding about it. Therefore, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of the fusion methods of wearable sensors. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of multi-sensor applications for human activity recognition, including those recently added to the field for unsupervised learning and transfer learning. Finally, the open research issues that need further research and improvement are identified and discussed.  相似文献   

7.
In this work, “policy iteration algorithm” (PIA) is applied for controlling arterial oxygen saturation that does not require mathematical models of the plant. This technique is based on nonlinear optimal control to solve the Hamilton–Jacobi–Bellman equation. The controller is synthesized using a state feedback configuration based on an unidentified model of complex pathophysiology of pulmonary system in order to control gas exchange in ventilated patients, as under some circumstances (like emergency situations), there may not be a proper and individualized model for designing and tuning controllers available in time. The simulation results demonstrate the optimal control of oxygenation based on the proposed PIA by iteratively evaluating the Hamiltonian cost functions and synthesizing the control actions until achieving the converged optimal criteria. Furthermore, as a practical example, we examined the performance of this control strategy using an interconnecting three-tank system as a real nonlinear system.  相似文献   

8.

This survey paper provides a review and perspective on intermediate and advanced reinforcement learning (RL) techniques in process industries. It offers a holistic approach by covering all levels of the process control hierarchy. The survey paper presents a comprehensive overview of RL algorithms, including fundamental concepts like Markov decision processes and different approaches to RL, such as value-based, policy-based, and actor-critic methods, while also discussing the relationship between classical control and RL. It further reviews the wide-ranging applications of RL in process industries, such as soft sensors, low-level control, high-level control, distributed process control, fault detection and fault tolerant control, optimization, planning, scheduling, and supply chain. The survey paper discusses the limitations and advantages, trends and new applications, and opportunities and future prospects for RL in process industries. Moreover, it highlights the need for a holistic approach in complex systems due to the growing importance of digitalization in the process industries.

  相似文献   

9.
基于因果建模的强化学习技术在智能控制领域越来越受欢迎. 因果技术可以挖掘控制系统中的结构性因果知识, 并提供了一个可解释的框架, 允许人为对系统进行干预并对反馈进行分析. 量化干预的效果使智能体能够在复杂的情况下 (例如存在混杂因子或非平稳环境) 评估策略的性能, 提升算法的泛化性. 本文旨在探讨基于因果建模的强化学习控制技术 (以下简称因果强化学习) 的最新进展, 阐明其与控制系统各个模块的联系. 首先介绍了强化学习的基本概念和经典算法, 并讨论强化学习算法在变量因果关系解释和迁移场景下策略泛化性方面存在的缺陷. 其次, 回顾了因果理论的研究方向, 主要包括因果效应估计和因果关系发现, 这些内容为解决强化学习的缺陷提供了可行方案. 接下来, 阐释了如何利用因果理论改善强化学习系统的控制与决策, 总结了因果强化学习的四类研究方向及进展, 并整理了实际应用场景. 最后, 对全文进行总结, 指出了因果强化学习的缺点和待解决问题, 并展望了未来的研究方向.  相似文献   

10.
This paper addresses an integrated relative position and attitude control strategy for a pursuer spacecraft flying to a space target in proximity operation missions. Relative translation and rotation dynamics are both presented, and further integratedly considered due to mutual couplings, which results in a six degrees-of-freedom (6-DOF) control system. In order to simultaneously achieve relative position and attitude requirements, an adaptive backstepping control law is designed, where a command filter is introduced to overcome "explosion of terms". Within the Lyapunov framework, the proposed controller is proved to ensure the ultimate boundedness of relative position and attitude signals, in the presence of external disturbances and unknown system parameters. Numerical simulation demonstrates the effect of the designed control law.  相似文献   

11.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

12.
In this paper, we formulate and explore the characteristics of iterative learning in ballistic control problems. The iterative learning control (ILC) theory provides a suitable framework for derivations and analysis of ballistic control under learning process. To overcome the obstacles caused by uncertain gradient and redundant control input, we incorporate extra trials into iterative learning. With the help of trial results, proper control and updating direction can be determined. Then, iterative learning can be applied to ballistic control problem. Several initial state learning algorithms are studied for initial speed control, force control, as well as combined speed and angle control. In the end, shooting angle learning in the basketball shot process is simulated to verify the effectiveness of iterative learning methods in ballistic control problems.  相似文献   

13.
Composite adaptation and learning techniques were initially proposed for improving parameter convergence in adaptive control and have generated considerable research interest in the last three decades, inspiring numerous robot control applications. The key idea is that more sources of parametric information are applied to drive parameter estimates aside from trajectory tracking errors. Both composite adaptation and learning can ensure superior stability and performance. However, composite learning possesses a unique feature in that online data memory is fully exploited to extract parametric information such that parameter convergence can be achieved without a stringent condition termed persistent excitation. In this article, we provide the first systematic and comprehensive survey of prevalent composite adaptation and learning approaches for robot control, especially focusing on exponential parameter convergence. Composite adaptation is classified into regressor-filtering composite adaptation and error-filtering composite adaptation, and composite learning is classified into discrete-data regressor extension and continuous-data regressor extension. For the sake of clear presentation and better understanding, a general class of robotic systems is applied as a unifying framework to show the motivation, synthesis, and characteristics of each parameter estimation method for adaptive robot control. The strengths and deficiencies of all these methods are also discussed sufficiently. We have concluded by suggesting possible directions for future research in this area.  相似文献   

14.
The purpose of this article is to offer some reflections on the relationships between digital technologies and learning. It is argued that activities of learning, as they have been practised within institutionalized schooling, are coming under increasing pressure from the developments of digital technologies and the capacities to store, access and manipulate information that such resources offer. Thus, the technologies do not merely support learning; they transform how we learn and how we come to interpret learning. The metaphors of learning currently emerging as relevant in the new media ecology emphasize the transformational and performative nature of such activities, and of knowing in general. These developments make the hybrid nature of human knowing and learning obvious; what we know and master is, to an increasing extent, a function of the mediating tools we are familiar with. At a theoretical and practical level, this implies that the interdependences between human agency, minds, bodies and technologies have to serve as foundations when attempting to understand and improve learning. Attempts to account for what people know without integrating their mastery of increasingly sophisticated technologies into the picture will lack ecological validity.  相似文献   

15.
Reinforcement learning (RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming (ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively. Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks, showing how they promote ADP formulation significantly. Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has demonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.  相似文献   

16.
Bioinspired soft robotics allow for safer clinical interactions with human patients but conventional, hard robots, which are often built with rigid materials and complex control systems, compromise tissue integrity, freedom of movement, conformability, and overall human bio-compatibility. Soft, compliant materials intrinsically reduce mechanical complexity, accommodate their usage environment, and provide great practical potential for medical device developments. Previous review papers have generally covered the topics of materials, manufacturing processes, actuator modeling and control, and current trends. Here, we focus on recent developments in soft robotic applications for the medical field including advances in cardiac devices, surgical robots, and soft rehabilitation and assistance devices. In medical applications, soft robotic devices not only expedite the evolution of minimally invasive surgery but also improve the bio-compatibility of rehabilitation and assistance devices. Here, we evaluate design requirements, mechanisms, achievements and challenges in these key areas. Of particular note, this paper concludes with a discussion on advances in 3D printing and adapting neural networks for modeling and control frameworks that have facilitated the development of faster and less expensive soft medical devices.  相似文献   

17.
Based on recent papers that have demonstrated that robust iterative learning control can be based on parameter optimization using either the inverse plant or gradient concepts, this paper presents a unification of these ideas for discrete‐time systems that not only retains the convergence properties and the robustness properties derived in previous papers but also permits the inclusion of filters in the input update formula and a detailed analysis of the effect of non‐minimum‐phase dynamics on algorithm performance in terms of a ‘plateauing’ or ‘flat‐lining’ effect in the error norm evolution. Although the analysis is in the time domain, the robustness conditions are expressed as frequency domain inequalities. The special case of a version of the inverse algorithm that can be used to construct a robust stable anti‐causal inverse non‐minimum‐phase plant is presented and analysed in detail. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
In wireless networks, context awareness and intelligence are capabilities that enable each host to observe, learn, and respond to its complex and dynamic operating environment in an efficient manner. These capabilities contrast with traditional approaches where each host adheres to a predefined set of rules, and responds accordingly. In recent years, context awareness and intelligence have gained tremendous popularity due to the substantial network-wide performance enhancement they have to offer. In this article, we advocate the use of reinforcement learning (RL) to achieve context awareness and intelligence. The RL approach has been applied in a variety of schemes such as routing, resource management and dynamic channel selection in wireless networks. Examples of wireless networks are mobile ad hoc networks, wireless sensor networks, cellular networks and cognitive radio networks. This article presents an overview of classical RL and three extensions, including events, rules and agent interaction and coordination, to wireless networks. We discuss how several wireless network schemes have been approached using RL to provide network performance enhancement, and also open issues associated with this approach. Throughout the paper, discussions are presented in a tutorial manner, and are related to existing work in order to establish a foundation for further research in this field, specifically, for the improvement of the RL approach in the context of wireless networking, for the improvement of the RL approach through the use of the extensions in existing schemes, as well as for the design and implementation of RL in new schemes.  相似文献   

19.
Web-based technology has a dramatic impact on learning and teaching. A framework that delineates the relationships between learner control and learning effectiveness is absent. This study aims to fill this void. Our work focuses on the effectiveness of a technology-mediated virtual learning environment (TVLE) in the context of basic information technology skills training. Grounded in the technology-mediated learning literature, this study presents a framework that addresses the relationship between the learner control and learning effectiveness, which contains four categories: learning achievement, self-efficacy, satisfaction, and learning climate. In order to compare the learning effectiveness under traditional classroom and TVLE, we conducted a field experiment. Data were collected from a junior high school of Taiwan. A total of 210 usable responses were analysed. We identified four results from this study. (1) Students in the TVLE environment achieve better learning performance than their counterparts in the traditional environment; (2) Students in the TVLE environment report higher levels of computer self-efficacy than their counterparts in the traditional environment; (3) Students in the TVLE environment report higher levels of satisfaction than students in the traditional environment; and (4) Students in the TVLE environment report higher levels of learning climate than their counterparts in the traditional environment. The implications of this study are discussed, and further research directions are proposed.  相似文献   

20.
As to control systems, transient performance is as important as steady-state performance. For some special dynamic systems, transient performance is a more prior index in comparison with the steady-state one. Prescribed performance control (PPC) has been proved to be a powerful tool that guarantees control system outputs/errors with desired transient performance as well as steady-state performance. The purpose of this paper is to give a comprehensive review on the latest developments of PPC theories and applications. The existing performance functions are classified into five different categories, and their features are comprehensively compared, providing a useful guidance for further applications. Then, the latest developments of PPC's applications in all kinds of control systems are recalled. Specially, the faced challenges and theoretical defects of PPC are discussed, which is expected to point out the further research direction for PPC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号