首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reinforcement learning is one of the fastest growing areas in machine learning, and has obtained great achievements in biomedicine, Internet of Things (IoT), logistics, robotic control, etc. However, there are still many challenges for engineering applications, such as how to speed up the learning process, how to balance the trade-off between exploration and exploitation. Quantum technology, which can solve complex problems faster than classical methods, especially in supercomputers, provides us a new paradigm to overcome these challenges in reinforcement learning. In this paper, a quantum-enhanced reinforcement learning is pictured for optimal control. In this algorithm, the states and actions of reinforcement learning are quantized by quantum technology. And then, a probability amplification method, which can effectively avoid the trade-off between exploration and exploitation via quantized technology, is presented. Finally, the optimal control policy is learnt during the process of reinforcement learning. The performance of this quantized algorithm is demonstrated in both MountainCar reinforcement learning environment and CartPole reinforcement learning environment—one kind of classical control reinforcement learning environment in the OpenAI Gym. The preliminary study results validate that, compared with Q-learning, this quantized reinforcement learning method has better control performance without considering the trade-off between exploration and exploitation. The learning performance of this new algorithm is stable with different learning rates from 0.01 to 0.10, which means it is promising to be employed in unknown dynamics systems.  相似文献   

2.
Given a collection of parameterized multi-robot controllers associated with individual behaviors designed for particular tasks, this paper considers the problem of how to sequence and instantiate the behaviors for the purpose of completing a more complex, overarching mission. In addition, uncertainties about the environment or even the mission specifications may require the robots to learn, in a cooperative manner, how best to sequence the behaviors. In this paper, we approach this problem by using reinforcement learning to approximate the solution to the computationally intractable sequencing problem, combined with an online gradient descent approach to selecting the individual behavior parameters, while the transitions among behaviors are triggered automatically when the behaviors have reached a desired performance level relative to a task performance cost. To illustrate the effectiveness of the proposed method, it is implemented on a team of differential-drive robots for solving two different missions, namely, convoy protection and object manipulation.  相似文献   

3.
This study concentrates on solving the output consensus problem for a class of heterogeneous uncertain nonstrict-feedback nonlinear multi-agent systems under switching-directed communication topologies, in which all followers are subjected to multi-type input constraints such as unknown asymmetric saturation, unknown dead-zone and their integration. A unified representation is presented to overcome the difficulties originating from multi-agent input constraints. Moreover, the uncertain system functions in a non-lower triangular form and the interaction terms among agents are dealt with by exploiting the fuzzy logic systems and their special property. Furthermore, by introducing a nonlinear filter to alleviate the problem of “explosion of complexity” during the backstepping design, a distributed common adaptive control protocol is proposed to ensure that the synchronization errors converge to a small neighborhood of the origin despite the existence of multiple input constraints and arbitrary switching communication topologies. Both stability analysis and simulation results are conducted to show the effectiveness and performance of the proposed control methodology.  相似文献   

4.
While brain computer interfaces (BCIs) offer the potential of allowing those suffering from loss of muscle control to once again fully engage with their environment by bypassing the affected motor system and decoding user intentions directly from brain activity, they are prone to errors. One possible avenue for BCI performance improvement is to detect when the BCI user perceives the BCI to have made an unintended action and thus take corrective actions. Error-related potentials (ErrPs) are neural correlates of error awareness and as such can provide an indication of when a BCI system is not performing according to the user’s intentions. Here, we investigate the brain signals of an implanted BCI user suffering from locked-in syndrome (LIS) due to late-stage ALS that prevents her from being able to speak or move but not from using her BCI at home on a daily basis to communicate, for the presence of error-related signals. We first establish the presence of an ErrP originating from the dorsolateral pre-frontal cortex (dLPFC) in response to errors made during a discrete feedback task that mimics the click-based spelling software she uses to communicate. Then, we show that this ErrP can also be elicited by cursor movement errors in a continuous BCI cursor control task. This work represents a first step toward detecting ErrPs during the daily home use of a communications BCI.  相似文献   

5.
An extended state observer (ESO)-based loop flter is designed for the phase-locked loop (PLL) involved in a disturbed gridconnected converter (GcC). This ESO-based design enhances the performances and robustness of the PLL, and, therefore, improves control performances of the disturbed GcCs. Besides, the ESO-based LF can be applied to PLLs with extra flters for abnormal grid conditions. The unbalanced grid is particularly taken into account for the performance analysis. A tuning approach based on the well-designed PI controller is discussed, which results in a fair comparison with conventional PItype PLLs. The frequency domain properties are quantitatively analysed with respect to the control stability and the noises rejection. The frequency domain analysis and simulation results suggest that the performances of the generated ESO-based controllers are comparable to those of the PI control at low frequency, while have better ability to attenuate high-frequency measurement noises. The phase margin decreases slightly, but remains acceptable. Finally, experimental tests are conducted with a hybrid power hardware-in-the-loop benchmark, in which balanced/unbalanced cases are both explored. The obtained results prove the efectiveness of ESO-based PLLs when applied to the disturbed GcC.  相似文献   

6.
Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the performance of the MPC and its computational cost. A long horizon with a large sample count allows the MPC to better estimate the inputs when the states have rapid changes over time, which results in better performance but at the expense of high computational cost. However, this long horizon is not always necessary, especially for slowly-varying states. In this case, a short horizon with less sample count is preferable as the same MPC performance can be obtained but at a fraction of the computational cost. In this paper,we propose an adaptive regression-based MPC that predicts the bestminimum horizon length and the sample count from several features extracted from the time-varying changes of the states. The proposed technique builds a synthetic dataset using the system model and utilizes the dataset to train a support vector regressor that performs the prediction. The proposed technique is experimentally compared with several state-of-the-art techniques on both linear and non-linear models. The proposed technique shows a superior reduction in computational time with a reduction of about 35–65% compared with the other techniques without introducing a noticeable loss in performance.  相似文献   

7.
A pneumatic actuator is a fast and economical tool that converts compressed air into mechanical motion. In this paper, an extended state observer (ESO)-based sliding mode controller (SMC) is developed to adjust the air pressure of the actuator for accurate position control. Specifcally, an impedance control module is established to produce desired air pressure based on the relationship between forces and desired positions. Then, the ESO-based SMC is implemented to adjust the air pressure to the required level despite the presence of system uncertainties and disturbances. As a result, the position of the actuator is controlled to a setpoint through the regulation of pressure. The performance of ESO-based SMC is compared with that of a classic active disturbance rejection controller (ADRC) and a SMC. Simulation results demonstrate that the ESO-based SMC shows comparable performance to ADRC in terms of precise pressure control. In addition, it requires the least control efort necessary to excite valves among the three controllers. The stability of ESO-based SMC is theoretically justifed through Lyapunov approach.  相似文献   

8.
This paper deals with the dynamic output feedback stabilization problem of deterministic finite automata (DFA). The static form of this problem is defined and solved in previous studies via a set of equivalent conditions. In this paper, the dynamic output feedback (DOF) stabilization of DFAs is defined in which the controller is supposed to be another DFA. The DFA controller will be designed to stabilize the equilibrium point of the main DFA through a set of proposed equivalent conditions. It has been proven that the design problem of DOF stabilization is more feasible than the static output feedback (SOF) stabilization. Three simulation examples are provided to illustrate the results of this paper in more details. The first example considers an instance DFA and develops SOF and DOF controllers for it. The example explains the concepts of the DOF controller and how it will be implemented in the closed-loop DFA. In the second example, a special DFA is provided in which the DOF stabilization is feasible, whereas the SOF stabilization is not. The final example compares the feasibility performance of the SOF and DOF stabilizations through applying them to one hundred random-generated DFAs. The results reveal the superiority of the DOF stabilization.  相似文献   

9.
In this paper, a data-driven method for disturbance estimation and rejection is presented. The proposed approach is divided into two stages: an inner stabilization loop, to set the desired reference model, together with an outer loop for disturbance estimation and compensation. Inspired by the active disturbance rejection control framework, the exogenous and endogenous disturbances are lumped into a total disturbance signal. This signal is estimated using an on-line algorithm based on a datadriven predictor scheme, whose parameters are chosen to satisfy high robustness-performance criteria. The above process is presented as a novel enhancement to design a disturbance observer, which constitutes the main contribution of the paper. In addition, the control strategy is completely presented in discrete time, avoiding the use of discretization methods for its digital implementation. As a case study, the voltage control of a DC-DC synchronous buck converter afected by disturbances in the input voltage and the load is considered. Finally, experimental results that validate the proposed strategy and some comparisons with the classical disturbance observer-based control are presented.  相似文献   

10.
In recent years, cyber attacks have posed great challenges to the development of cyber-physical systems. It is of great significance to study secure state estimation methods to ensure the safe and stable operation of the system. This paper proposes a secure state estimation for multi-input and multi-output continuous-time linear cyber-physical systems with sparse actuator and sensor attacks. First, for sparse sensor attacks, we propose an adaptive switching mechanism to mitigate the impact of sparse sensor attacks by filtering out their attack modes. Second, an unknown input sliding mode observer is designed to not only observe the system states, sensor attack signals, and measurement noise present in the system but also counteract the effects of sparse actuator attacks through an unknown input matrix. Finally, for the design of an unknown input sliding mode state observer, the feasibility of the observing system is demonstrated by means of Lyapunov functions. Additionally, simulation experiments are conducted to show the effectiveness of this method.  相似文献   

11.
In this study, an adaptive neuro-observer-based optimal control (ANOPC) policy is introduced for unknown nonaffine nonlinear systems with control input constraints. Hamilton–Jacobi–Bellman (HJB) framework is employed to minimize a non-quadratic cost function corresponding to the constrained control input. ANOPC consists of both analytical and algebraic parts. In the analytical part, first, an observer-based neural network (NN) approximates uncertain system dynamics, and then another NN structure solves the HJB equation. In the algebraic part, the optimal control input that does not exceed the saturation bounds is generated. The weights of two NNs associated with observer and controller are simultaneously updated in an online manner. The ultimately uniformly boundedness (UUB) of all signals of the whole closed-loop system is ensured through Lyapunov’s direct method. Finally, two numerical examples are provided to confirm the effectiveness of the proposed control strategy.  相似文献   

12.
We examine three simple linear systems from the viewpoint of ergodic theory. We digitize the output and record only the sign of the output at integer times. We show that even with this minimal output we can recover important information about the systems. In particular, for a two-dimensional system viewed as a flow on the circle, we can determine the rate of rotation. We then use these results to determine the slope of the trajectories for constant irrational flow on the two-dimensional torus. To achieve this, we randomize the system by partitioning the state space and only recording which partition the state is in at each integer time. We show directly that these systems have entropy zero. Finally, we examine two four-dimensional systems and reduce them to the study of linear flows on the two-dimensional torus.  相似文献   

13.
In this paper, the asymptotic stability of Port-Hamiltonian (PH) systems with constant inputs is studied. Constant inputs are useful for stabilizing systems at their nonzero equilibria and can be realized by step signals. To achieve this goal, two methods based on integral action and comparison principle are presented in this paper. These methods change the convex Hamiltonian function and the restricted damping matrix of the previous results into a Hamiltonian function with a local minimum and a positive semidefinite matrix, respectively. Due to common conditions of Hamiltonian function and damping matrix, the proposed method asymptotically stabilizes more classes of PH systems with constant inputs than the existing methods. Finally, the validity and advantages of the presented methods are shown in an example.  相似文献   

14.
This paper discusses the problem of global state regulation via output feedback for a class of feedforward nonlinear time-delay systems with unknown measurement sensitivity. Different from previous works, the nonlinear terms are dominated by upper triangular linear unmeasured (delayed) states multiplied by unknown growth rate. The unknown growth rate is composed of an unknown constant, a power function of output, and an input function. Furthermore, due to the measurement uncertainty of the system output, it is more difficult to solve this problem. It is proved that the presented output feedback controller can globally regulate all states of the nonlinear systems using the dynamic gain scaling technique and choosing the appropriate Lyapunov–Krasovskii functionals.  相似文献   

15.
Actuator faults usually cause security problem in practice. This paper is concerned with the security control of positive semi-Markovian jump systems with actuator faults. The considered systems are with mode transition-dependent sojourntime distributions, which may also lead to actuator faults. First, the time-varying and bounded transition rate that satisfies the mode transition-dependent sojourn-time distribution is considered. Then, a stochastic co-positive Lyapunov function is constructed. Using matrix decomposition technique, a set of state-feedback controllers for positive semi-Markovian jump systems with actuator faults are designed in terms of linear programming. Under the designed controllers, stochastic stabilization of the systems with actuator faults are achieved and the security of the systems can be guaranteed. Furthermore, the proposed results are extended to positive semi-Markovian jump systems with interval and polytopic uncertainties. By virtue of a segmentation technique of the transition rates, a less conservative security control design is also proposed. Finally, numerical examples are provided to demonstrate the validity of the presented results.  相似文献   

16.
In this paper, we presented the development of a navigation control system for a sailboat based on spiking neural networks (SNN). Our inspiration for this choice of network lies in their potential to achieve fast and low-energy computing on specialized hardware. To train our system, we use the modulated spike time-dependent plasticity reinforcement learning rule and a simulation environment based on the BindsNET library and USVSim simulator. Our objective was to develop a spiking neural network-based control systems that can learn policies allowing sailboats to navigate between two points by following a straight line or performing tacking and gybing strategies, depending on the sailing scenario conditions. We presented the mathematical definition of the problem, the operation scheme of the simulation environment, the spiking neural network controllers, and the control strategy used. As a result, we obtained 425 SNN-based controllers that completed the proposed navigation task, indicating that the simulation environment and the implemented control strategy work effectively. Finally, we compare the behavior of our best controller with other algorithms and present some possible strategies to improve its performance.  相似文献   

17.
This paper presents an in-depth analytical and empirical assessment of the performance of DoubleBee, a novel hybrid aerial– ground robot. Particularly, the dynamic model of the robot with ground contact is analyzed, and the unknown parameters in the model are identified. We apply an unscented Kalman filter-based approach and a least square-based approach to estimate the parameters with given measurements and inputs at every time step. Real data are collected and used to estimate the parameters; test data verify that the values obtained are able to model the rotation of the robot accurately. A gain-scheduled feedback controller is proposed, which leverages the identified model to generate accurate control inputs to drive the system to the desired states. The system is proven to track a constant-velocity reference signal with bounded error. Simulations and real-world experiments using the proposed controller show improved performance than the PID-based controller in tracking step commands and maintaining attitude under robot movement.  相似文献   

18.
This paper considers distributed state estimation of continuous-time linear system monitored by a network of multiple sensors. Each sensor can only access locally partial measurement output of the system and effectively communicates with its neighbors to cooperatively achieve the asymptotic estimation of the target full system state. For a constructive design, we shall incorporate the concept of system immersions and propose a class of distributed tracking observers for the problem under a reasonable condition of the locally joint observability. Moreover, as a direct application of the proposed observer design, we further present an interesting leader-following consensus design for multi-agent system.  相似文献   

19.
This paper addresses the state estimation for a class of nonlinear time-varying stochastic systems with both uncertain dynamics and unknown measurement bias. A novel extended state based Kalman flter (ESKF) algorithm is developed to estimate the original state, the uncertain dynamics and the measurement bias. It is shown that the estimation error of the proposed algorithm is bounded in the mean square sense. Also, the estimation of the measurement bias asymptotically converges to its true value, such that the infuence of measurement bias is eliminated. Furthermore, the asymptotic optimality of the estimation result is proved while the uncertain dynamics approaches to a constant vector. Finally, a simulation study for harmonic oscillator system model is provided to illustrate the efectiveness of proposed method.  相似文献   

20.
Rotating stall and surge are two violent unstable phenomena of an aero-engine compressor. The early detection of rotating stall is a critical and difficult issue in the operation of a compressor. Recently, a deterministic learning based stall inception detection approach (SIDA) has been developed for modeling and detecting stall inception in aero-engine compressors. This paper considers the derivation of analytical results on the detection capabilities for the SIDA based on deterministic learning. First, by utilizing the input/output stability of the residual system, a detectability condition of the SIDA is presented, and how to choose the parameters of the diagnostic system is also analyzed. Second, based on the relationship between NN approximation capabilities and radial basis function (RBF) network structures, the influence of RBF network structures on the performance properties of the SIDA is analyzed. Finally, a simulation study is presented, in which the Mansoux-C2 compressor model is utilized to verify the effectiveness of the proposed SIDA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号