首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
偏置动量轮控卫星姿态控制   总被引:1,自引:0,他引:1  
吕建婷  马广富  宋斌 《控制工程》2007,14(6):569-572
研究了轮控的三轴稳定的偏置动量卫星姿态控制问题。卫星的俯仰回路采用偏置动量轮,滚动/偏航轴上各安装一个反作用飞轮来完成姿态控制。小姿态角下俯仰回路可以单独设计,利用测量的俯仰角来实现其姿态控制;滚动/偏航回路利用滚动信息,采用基于偏航观测器的滑模控制器设计。磁力矩器提供的磁矩与地磁场作用产生的力矩实现了飞轮的动量卸载。对卫星姿态控制系统进行的仿真研究结果表明,所设计的控制方案在飞轮输出力矩工作范围内,可使卫星达到很高的姿态控制精度。  相似文献   

2.
在状态空间满足结构化条件的前提下,通过状态空间的维度划分直接将复杂的原始MDP问题递阶分解为一组简单的MDP或SMDP子问题,并在线对递阶结构进行完善.递阶结构中嵌入不同的再励学习方法可以形成不同的递阶学习.所提出的方法在具备递阶再励学习速度快、易于共享等优点的同时,降低了对先验知识的依赖程度,缓解了学习初期回报值稀少的问题.  相似文献   

3.
将自适应模糊控制与输入输出线性化控制相结合,构成混合控制器,并将其应用于挠性卫星的姿态机动控制.给出了卫星姿态控制器的基本形式,分析了控制器参数的选取准则.在线调节自适应模糊控制器的参数,以补偿不确定性卫星的姿态跟踪误差.仿真结果表明,该控制算法通过在线学习能有效地克服挠性卫星的不确定性,具有较强的鲁棒性,从而有效地提高了挠性卫星的姿态控制精度.  相似文献   

4.
王剑非  姜斌  冒泽慧 《控制工程》2008,15(3):334-336
提出了一种基于最小二乘支持向量机(LSSVM)非线性观测器的卫星姿态控制系统故障诊断方法。与标准的支持向量机回归算法相比,最小二乘支持向量机回归算法收敛速度快,适用于在线训练。该方法利用其回归逼近非线性函数的能力,设计基于最小二乘支持向量机的非线性系统状态观测器,在线训练最小二乘支持向量机回归,并用于估计卫星姿态控制系统故障。最后,通过仿真验证了这种方法可以快速准确地估计出卫星姿态控制系统的故障。  相似文献   

5.
针对三轴稳定卫星的姿态控制系统,在离散事件仿真器OMNeT 的基础上,建立了以陀螺、星敏感器为敏感器,以反作用飞轮为执行机构的三轴稳定卫星的闭环控制系统.对于卫星姿态控制系统中控制周期的选取作了初步的分析,最后采用双矢量定姿算法和PID控制算法,设定多种控制周期,对该卫星在对地定向模式下的控制精度进行了仿真,仿真结果清楚的显示了控制周期对姿态控制精度的影响.  相似文献   

6.
研究了一种带有的CMAC神经网络的再励学习(RL)控制方法,以解决具有高度非线性的系统控制问题。研究的重点在于算法的简化以及具有连续输出的函数学习上。控制策略由两部分构成;再励学习控制器和固定增益常规控制器。前者用于学习系统的非线性,后者用于稳定系统。仿真结果表明,所提出的控制策略不仅是有效的,而且具有很高的控制精度。  相似文献   

7.
为了提高小卫星姿态控制系统的控制精度和稳定性,解决反作用飞轮转速过零时由摩擦力引起的非线性扰动问题,根据姿态动力学理论,建立了卫星姿态控制的力学模型与数学模型.采用分别加入颤震信号和脉冲信号的方法,对摩擦力进行补偿,消除非线性扰动.通过仿真实验,对比加入颤震信号和加入脉冲信号的补偿控制方法,两种方法能够有效地抑制反作用飞轮转速过零时引起的姿态扰动,且加入脉冲信号方法优于加入颤振信号方法,前者能够更快的使系统达到稳定且消耗的能量少.从而实现了卫星姿态控制高精度和稳定性的控制.  相似文献   

8.
卫星姿态控制VXI自动化测试系统   总被引:1,自引:0,他引:1  
卫星姿态控制VXI自动化测试系统是针对卫星姿态控制系统的测试而开发的,达到了国际先进水平。本文介绍了姿控VXI测试的研制过程、系统的技术指标和系统中 先进技术和。  相似文献   

9.
卫星姿态控制VXI自动化测试系统(以下简称姿控VXI测试系统)是针对卫星姿态控制系统的测试而开发的,达到了国际先进水平。本文介绍了姿控VXI测试系统的研制过程、系统的技术指标和系统中采用的先进技术和措施。  相似文献   

10.
OPAL卫星的姿态是用两对磁力矩线圈和一个三轴磁强计加以控制的。一对磁线圈安装在这颗小卫星发射窗口的侧面板上,另一对磁线圈安装在底面板上。OPAL小卫星姿态控制系统的基本要求有二个:一是在小卫星发射入轨阶段降低卫星相对于其本体轴的自旋速率,以尽量降低扰动;二是当卫星发射入轨后增加卫星的自旋速率,以满足热控的要求。为了实现这些要求,星上采用-B控制律。本文介绍OPAL卫星姿态控制系统的设计和仿真检验。  相似文献   

11.
为了提高强化学习的控制性能,提出一种基于分数梯度下降RBF神经网络的强化学习算法.通过评价神经网络和执行神经网络组成强化学习系统,利用神经网络记忆和联想,学会控制倒立摆,提高控制精度,使误差趋于零,直至学习成功,并证明闭环系统的稳定性.通过倒立摆的物理实验发现,当分数阶阶数较大,微分的作用更显著,对角速度和速度的控制效果更好,角速度和速度的均方误差和平均绝对误差较小;当分数阶阶数较小,积分的作用更显著,对倾斜角和位移的控制效果更好,因此倾斜角和位移的均方误差和平均绝对误差较小.仿真实验的结果表明,所提算法动态响应好,超调量小,调整时间短,精度高,泛化性能好.它优于基于RBF神经网络的强化学习算法和传统强化学习算法,能有效地加快梯度下降法的收敛速度,提高其控制性能.在引入适当的干扰后,所提算法能够快速地自我调节并恢复稳定状态,控制器的鲁棒性和动态性能满足实际要求.  相似文献   

12.
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.  相似文献   

13.
Online learning control by association and reinforcement   总被引:4,自引:0,他引:4  
This paper focuses on a systematic treatment for developing a generic online learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This online learning system improves its performance over time in two aspects: 1) it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance; and 2) system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of online learning control design is introduced. Real-time learning algorithms is derived for individual components in the learning system. Some analytical insight is provided to give guidelines on the learning process took place in each module of the online learning control system.  相似文献   

14.
In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements.  相似文献   

15.
与传统同步轨道通信卫星(GEO)相比,以SpaceX、Starlink、O3b等为代表的新一代中低轨卫星互联网星座具备广域覆盖、全时空互联、多星协同等显著优势,已成为当今世界各国研究的焦点之一。传统卫星资源调度方法主要研究单颗GEO卫星下的资源调度问题,难以满足以多星协同、联合组网、海量用户为特征的低轨卫星星座的资源调度需求。为此,构建了基于用户满意度的多星协同智能资源调度模型,提出了一种基于强化学习的卫星网络资源调度机制IRSUP。IRSUP针对用户服务定制的个性化需求,设计了用户服务偏好智能优化模块;针对多星资源联合优化难题,设计了基于强化学习的智能调度模块。模拟仿真结果表明:IRSUP能有效提高资源调度合理性、链路资源利用率和用户满意度等指标,其中业务容量提升30%~60%,用户满意度提升一倍以上。  相似文献   

16.
As a powerful tool for solving nonlinear complex system control problems, the model-free reinforcement learning hardly guarantees system stability in the early stage of learning, especially with high complicity learning components applied. In this paper, a reinforcement learning framework imitating many cognitive mechanisms of brain such as attention, competition, and integration is proposed to realize sample-efficient self-stabilized online learning control. Inspired by the generation of consciousness in human brain, multiple actors that work either competitively for best interaction results or cooperatively for more accurate modeling and predictions were applied. A deep reinforcement learning implementation for challenging control tasks and a real-time control implementation of the proposed framework are respectively given to demonstrate the high sample efficiency and the capability of maintaining system stability in the online learning process without requiring an initial admissible control.  相似文献   

17.
针对存在线性外部干扰和状态反馈过程中发生丢包的网络控制系统的跟踪控制问题,采用输出调节的思想,提出基于离轨策略强化学习的数据驱动最优输出调节控制方法,实现仅利用在线数据即可求解控制策略.首先,对系统状态在网络传输过程存在丢包的情况,利用史密斯预估器重构系统的状态;然后基于输出调节控制框架,提出一种基于离轨策略强化学习的数据驱动最优控制算法,在系统状态发生丢包时仅利用在线数据计算反馈增益,在求解反馈增益过程中找到与求解输出调节问题的联系;接着基于求解反馈增益过程中得到的与输出调节问题中求解调节器方程相关的参数,计算前馈增益的无模型解;最后,通过仿真结果验证所提出方法的有效性.  相似文献   

18.
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.  相似文献   

19.
精确跟踪对准控制系统在卫星光通信中起着至关重要的作用.我国已完成的墨子号量子科学实验卫星,是基于经典随动系统理论设计的跟踪与瞄准系统,并在实践中取得了圆满效果.面向未来更远距离的空间通信应用,对跟踪与瞄准系统提出了更高的精度要求,传统的控制方法很难满足.为此本文提出了精确瞄准系统的一种参数化设计方法,抛弃了传统方法的精、粗系统分别设计的思想,对两级子系统进行整体设计,充分地利用了系统中的设计自由度.通过综合优化这些设计自由度,实现了系统对阶跃干扰的解耦和复杂干扰的抑制、不敏感极点配置和控制增益极小化等各项设计要求,从而显著地提高了对准精度.仿真结果表明,对准精度由原来的微弧度量级提高到了纳弧度量级.  相似文献   

20.
遥感卫星为实现宽视场和高分辨率,一般搭载含可往复转动大惯量成像部件的载荷,然而其转动产生的干扰给卫星姿态控制带来的影响,往往超出载荷成像所必须的姿态稳定度和指向精度要求,卫星平台需要采取措施对干扰力矩进行抑制。但由于加工、装配等原因,载荷干扰力矩与设计值一般均存在差异,给补偿方案设计、参数装订及地面验证置信度等带来不确定性。本文介绍了一种利用单轴气浮台实现航天器运动部件干扰力矩标定的方法,设计试验对该技术在实验室非真空条件下的实现、天地差情况进行说明,并对影响标定结果的误差进行了分析。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号