共查询到20条相似文献,搜索用时 390 毫秒
1.
2.
3.
4.
5.
针对三轴稳定卫星的姿态控制系统,在离散事件仿真器OMNeT 的基础上,建立了以陀螺、星敏感器为敏感器,以反作用飞轮为执行机构的三轴稳定卫星的闭环控制系统.对于卫星姿态控制系统中控制周期的选取作了初步的分析,最后采用双矢量定姿算法和PID控制算法,设定多种控制周期,对该卫星在对地定向模式下的控制精度进行了仿真,仿真结果清楚的显示了控制周期对姿态控制精度的影响. 相似文献
6.
7.
8.
卫星姿态控制VXI自动化测试系统 总被引:1,自引:0,他引:1
《计算机自动测量与控制》1998,(2):2-8
卫星姿态控制VXI自动化测试系统是针对卫星姿态控制系统的测试而开发的,达到了国际先进水平。本文介绍了姿控VXI测试的研制过程、系统的技术指标和系统中 先进技术和。 相似文献
9.
卫星姿态控制VXI自动化测试系统(以下简称姿控VXI测试系统)是针对卫星姿态控制系统的测试而开发的,达到了国际先进水平。本文介绍了姿控VXI测试系统的研制过程、系统的技术指标和系统中采用的先进技术和措施。 相似文献
10.
OPAL卫星的姿态是用两对磁力矩线圈和一个三轴磁强计加以控制的。一对磁线圈安装在这颗小卫星发射窗口的侧面板上,另一对磁线圈安装在底面板上。OPAL小卫星姿态控制系统的基本要求有二个:一是在小卫星发射入轨阶段降低卫星相对于其本体轴的自旋速率,以尽量降低扰动;二是当卫星发射入轨后增加卫星的自旋速率,以满足热控的要求。为了实现这些要求,星上采用-B控制律。本文介绍OPAL卫星姿态控制系统的设计和仿真检验。 相似文献
11.
为了提高强化学习的控制性能,提出一种基于分数梯度下降RBF神经网络的强化学习算法.通过评价神经网络和执行神经网络组成强化学习系统,利用神经网络记忆和联想,学会控制倒立摆,提高控制精度,使误差趋于零,直至学习成功,并证明闭环系统的稳定性.通过倒立摆的物理实验发现,当分数阶阶数较大,微分的作用更显著,对角速度和速度的控制效果更好,角速度和速度的均方误差和平均绝对误差较小;当分数阶阶数较小,积分的作用更显著,对倾斜角和位移的控制效果更好,因此倾斜角和位移的均方误差和平均绝对误差较小.仿真实验的结果表明,所提算法动态响应好,超调量小,调整时间短,精度高,泛化性能好.它优于基于RBF神经网络的强化学习算法和传统强化学习算法,能有效地加快梯度下降法的收敛速度,提高其控制性能.在引入适当的干扰后,所提算法能够快速地自我调节并恢复稳定状态,控制器的鲁棒性和动态性能满足实际要求. 相似文献
12.
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired. 相似文献
13.
Online learning control by association and reinforcement 总被引:4,自引:0,他引:4
This paper focuses on a systematic treatment for developing a generic online learning control system based on the fundamental principle of reinforcement learning or more specifically neural dynamic programming. This online learning system improves its performance over time in two aspects: 1) it learns from its own mistakes through the reinforcement signal from the external environment and tries to reinforce its action to improve future performance; and 2) system states associated with the positive reinforcement is memorized through a network learning process where in the future, similar states will be more positively associated with a control action leading to a positive reinforcement. A successful candidate of online learning control design is introduced. Real-time learning algorithms is derived for individual components in the learning system. Some analytical insight is provided to give guidelines on the learning process took place in each module of the online learning control system. 相似文献
14.
Wipawee Usaha Javier A Barria 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(3):515-527
In this paper, we develop and assess online decision-making algorithms for call admission and routing for low Earth orbit (LEO) satellite networks. It has been shown in a recent paper that, in a LEO satellite system, a semi-Markov decision process formulation of the call admission and routing problem can achieve better performance in terms of an average revenue function than existing routing methods. However, the conventional dynamic programming (DP) numerical solution becomes prohibited as the problem size increases. In this paper, two solution methods based on reinforcement learning (RL) are proposed in order to circumvent the computational burden of DP. The first method is based on an actor-critic method with temporal-difference (TD) learning. The second method is based on a critic-only method, called optimistic TD learning. The algorithms enhance performance in terms of requirements in storage, computational complexity and computational time, and in terms of an overall long-term average revenue function that penalizes blocked calls. Numerical studies are carried out, and the results obtained show that the RL framework can achieve up to 56% higher average revenue over existing routing methods used in LEO satellite networks with reasonable storage and computational requirements. 相似文献
15.
与传统同步轨道通信卫星(GEO)相比,以SpaceX、Starlink、O3b等为代表的新一代中低轨卫星互联网星座具备广域覆盖、全时空互联、多星协同等显著优势,已成为当今世界各国研究的焦点之一。传统卫星资源调度方法主要研究单颗GEO卫星下的资源调度问题,难以满足以多星协同、联合组网、海量用户为特征的低轨卫星星座的资源调度需求。为此,构建了基于用户满意度的多星协同智能资源调度模型,提出了一种基于强化学习的卫星网络资源调度机制IRSUP。IRSUP针对用户服务定制的个性化需求,设计了用户服务偏好智能优化模块;针对多星资源联合优化难题,设计了基于强化学习的智能调度模块。模拟仿真结果表明:IRSUP能有效提高资源调度合理性、链路资源利用率和用户满意度等指标,其中业务容量提升30%~60%,用户满意度提升一倍以上。 相似文献
16.
As a powerful tool for solving nonlinear complex system control problems, the model-free reinforcement learning hardly guarantees system stability in the early stage of learning, especially with high complicity learning components applied. In this paper, a reinforcement learning framework imitating many cognitive mechanisms of brain such as attention, competition, and integration is proposed to realize sample-efficient self-stabilized online learning control. Inspired by the generation of consciousness in human brain, multiple actors that work either competitively for best interaction results or cooperatively for more accurate modeling and predictions were applied. A deep reinforcement learning implementation for challenging control tasks and a real-time control implementation of the proposed framework are respectively given to demonstrate the high sample efficiency and the capability of maintaining system stability in the online learning process without requiring an initial admissible control. 相似文献
17.
针对存在线性外部干扰和状态反馈过程中发生丢包的网络控制系统的跟踪控制问题,采用输出调节的思想,提出基于离轨策略强化学习的数据驱动最优输出调节控制方法,实现仅利用在线数据即可求解控制策略.首先,对系统状态在网络传输过程存在丢包的情况,利用史密斯预估器重构系统的状态;然后基于输出调节控制框架,提出一种基于离轨策略强化学习的数据驱动最优控制算法,在系统状态发生丢包时仅利用在线数据计算反馈增益,在求解反馈增益过程中找到与求解输出调节问题的联系;接着基于求解反馈增益过程中得到的与输出调节问题中求解调节器方程相关的参数,计算前馈增益的无模型解;最后,通过仿真结果验证所提出方法的有效性. 相似文献
18.
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance. 相似文献
19.
精确跟踪对准控制系统在卫星光通信中起着至关重要的作用.我国已完成的墨子号量子科学实验卫星,是基于经典随动系统理论设计的跟踪与瞄准系统,并在实践中取得了圆满效果.面向未来更远距离的空间通信应用,对跟踪与瞄准系统提出了更高的精度要求,传统的控制方法很难满足.为此本文提出了精确瞄准系统的一种参数化设计方法,抛弃了传统方法的精、粗系统分别设计的思想,对两级子系统进行整体设计,充分地利用了系统中的设计自由度.通过综合优化这些设计自由度,实现了系统对阶跃干扰的解耦和复杂干扰的抑制、不敏感极点配置和控制增益极小化等各项设计要求,从而显著地提高了对准精度.仿真结果表明,对准精度由原来的微弧度量级提高到了纳弧度量级. 相似文献
20.
遥感卫星为实现宽视场和高分辨率,一般搭载含可往复转动大惯量成像部件的载荷,然而其转动产生的干扰给卫星姿态控制带来的影响,往往超出载荷成像所必须的姿态稳定度和指向精度要求,卫星平台需要采取措施对干扰力矩进行抑制。但由于加工、装配等原因,载荷干扰力矩与设计值一般均存在差异,给补偿方案设计、参数装订及地面验证置信度等带来不确定性。本文介绍了一种利用单轴气浮台实现航天器运动部件干扰力矩标定的方法,设计试验对该技术在实验室非真空条件下的实现、天地差情况进行说明,并对影响标定结果的误差进行了分析。 相似文献