排序方式: 共有17条查询结果,搜索用时 35 毫秒
1.
针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点,使用深度强化学习(DRL)方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束,对卫星观测任务规划问题进行建模. 基于指针网络(PN)的运行机制建立序列决策算法模型,使用Mask向量来考虑卫星观测任务规划问题中的各类约束,并通过Actor Critic强化学习算法对模型进行训练,以获得最大的收益率. 借鉴多头注意力(MHA)机制的思想对PN进行改进,提出多头注意力指针网络(MHA-PN)算法. 根据实验结果可以看出,MHA-PN算法显著提高了模型的训练速度和泛化性能,训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理,避免传统启发式算法迭代求解的过程,具有较高的求解效率. 相似文献
2.
在双容水箱液位优化控制的研究中,由于双容水箱液位控制系统是一个典型的具有大滞后、非线性特征的难控系统,造成系统液位控制不稳定.为解决上述问题,提出一种近似动态规划的预测控制算法,建立反映系统预测时域内参考轨迹与预测输出偏差的性能指标函数J.将函数J的优化问题看作动态规划问题,为了避免“维数灾”,采用近似动态规划求解J的近似值,获得最(次)优控制策略.算法包括评价网络、模型网络和执行网络三部分.评价网络近似性能指标函数,模型网络近似系统输入输出关系,执行网络给出相应控制策略.通过不断在线调整各网络参数,使系统输出逼近参考液位.仿真结果表明,与经典PID算法相比,改进算法表现出更好的控制效果. 相似文献
3.
以苹果渣为原料固态发酵生产柠檬酸的研究 总被引:4,自引:0,他引:4
本文对以苹果渣为原料固态发酵生产柠檬酸的若干技术问题进行了研究。在单因素试验的基础上,通过正交试验得出了产柠檬酸的最佳工艺条件,最高产柠檬酸量为78克/千克果渣. 相似文献
4.
We demonstrate the unity absorption of visible light with an ultra-narrow 0.1 nm linewidth. It arises from the Bloch surface wave resonance in alternating TiO2/SiO2 multilayers. The total absorption and narrow linewidth are explained from the radiative and absorptive damping, which are quantitatively determined by the temporal coupled mode theory. When a silver film with proper thickness is added to the absorber, the perfect absorption is achieved with only 3 structural bilayers, in contrast with 8 bilayers required without Ag. Furthermore, significant field enhancement and an ultrahigh 2600/RIU sensing figure-of-merit are simultaneously obtained at resonance, which might facilitate applications in nonlinear optical devices and high resolution refractive index sensing. 相似文献
5.
6.
《建筑学研究前沿(英文版)》2014,3(2):213-223
The restoration of the former Pirelli Tower in Milan, which dates back to the early 1950s, is an example of various issues in approaching the “conservation of the new”. This project was completed with the broad use of industrial products that evoked different kinds of reflections, if only within the same planning methodology, common to all interventions of architectural restoration. This restoration constitutes an exemplary episode where only a careful and critical evaluation facilitated the understanding of which elements are important in conservation and which can be substituted or updated. This approach uses case-to-case evaluations. The conservation of “new” architecture is similar to other restoration problems, except for the closeness in time to the original works and, sometimes, with its creator.The main intervention concerns the recovery of the structure with over 10,000 m2 of continuous aluminum and glass façade in a skyscraper designed by Italian master Gio Ponti and the repair of the damage to the reinforced concrete (RC) structures (designed by another Italian master, Pier Luigi Nervi) caused by a plane crash. The straightening and repair of the RC using entirely innovative methods and the conservation of the structures of the whole façade also translates into financial savings. Approximately 20% of the savings is derived from the complete substitution of the curtain wall. This idea of authenticity results in a method of restoration in which all single parts may not always be replaced for every functional upgrade. This scenario is important news, especially for modern architecture that usually prefers the value of what appears to be new, showing parts that are always perfect since the time they were built. People also consider the conservation of items that were considered as merely industrial products a few years ago. 相似文献
7.
提高强化学习速度的方法研究 总被引:4,自引:0,他引:4
张汝波 《计算机工程与应用》2001,37(22):38-40
强化学习一词出自于行为心理学,这门学科把学习看作为反复试验的过程,以便把环境的状态映射为动作。强化学习的这种特性必然增加智能系统的困难性,学习时间增长。强化学习学习速度较慢的原因是没有明确的监督信号。因此,强化学习系统在与环境交互时不得不采取反复试验的方法依靠外部评价信号来调整自己的行为。智能系统必然经过很长的学习过程。如何提高强化学习速度是一个最重要的研究问题。该文从几个方面来讨论提高强化学习速度的方法。 相似文献
8.
针对移动机器人避障上存在的自适应能力较差的问题,结合遗传算法(GA)的进化思想,以自适应启发评价(AHC)学习和操作条件反射(OC)理论为基础,提出了一种基于进化操作行为学习模型(EOBLM)的移动机器人学习避障行为的方法。该方法是一种改进的AHC学习模式,评价单元采用多层前向神经网络来实现,利用TD算法和梯度下降法进行权值更新,这一阶段学习用来生成取向性信息,作为内在动机决定进化的方向;动作选择单元主要用来优化操作行为以实现状态到动作的最佳映射。优化过程分两个阶段来完成,第一阶段通过操作条件反射学习算法得到的信息熵作为个体适应度,执行GA学习算法搜索最优个体;第二阶段由OC学习算法选择最优个体内的最优操作行为,并得到新的信息熵值。通过移动机器人避障仿真实验,结果表明所设计的EOBLM能使机器人通过不断与外界未知环境进行交互主动学会避障的能力,与传统的AHC方法相比其自学习自适应的能力得到加强。 相似文献
9.
10.