首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   23篇
  免费   14篇
  国内免费   13篇
电工技术   4篇
综合类   2篇
化学工业   1篇
机械仪表   1篇
矿业工程   1篇
无线电   3篇
一般工业技术   2篇
自动化技术   36篇
  2024年   1篇
  2023年   3篇
  2022年   8篇
  2021年   5篇
  2020年   7篇
  2019年   4篇
  2018年   6篇
  2017年   1篇
  2016年   1篇
  2014年   2篇
  2013年   2篇
  2012年   2篇
  2011年   2篇
  2009年   2篇
  2008年   1篇
  2006年   1篇
  2005年   1篇
  2002年   1篇
排序方式: 共有50条查询结果,搜索用时 187 毫秒
1.
针对行动者—评论家(AC)算法存在的经验学习样本维度高、策略梯度模型鲁棒性低等问题,依据多代理系统的信息协作优势,构建注意力机制网络并作为代理体,引入多层并行注意力机制网络模型对AC算法进行改进,提出一种基于多层并行注意力机制的柔性AC算法。将其用于解决动态未知环境下的机器人路径规划问题,可增强行动者的策略梯度鲁棒性并降低评论家的回归误差,实现机器人路径规划最优方案的快速收敛。实验结果表明,该算法有效克服机器人路径规划的局部最优,具有计算速度快、稳定收敛的优点。  相似文献   
2.
针对一类含有离散和分布时延神经网络,在神经激活函数较弱的约束条件下,通过定义一个更具一般性的Lyapunov泛函,使用凸组合技术,得到了新的基于线性矩阵不等式表示的指数稳定性判据.与现有结果相比,这些判据具有较小的保守性.仿真算例表明,得到的结果是有效的且保守性小.  相似文献   
3.
This paper is to develop a simplified optimized tracking control using reinforcement learning (RL) strategy for a class of nonlinear systems. Since the nonlinear control gain function is considered in the system modeling, it is challenging to extend the existing RL-based optimal methods to the tracking control. The main reasons are that these methods' algorithm are very complex; meanwhile, they also require to meet some strict conditions. Different with these exiting RL-based optimal methods that derive the actor and critic training laws from the square of Bellman residual error, which is a complex function consisting of multiple nonlinear terms, the proposed optimized scheme derives the two RL training laws from negative gradient of a simple positive function, so that the algorithm can be significantly simplified. Moreover, the actor and critic in RL are constructed by employing neural network (NN) to approximate the solution of Hamilton–Jacobi–Bellman (HJB) equation. Finally, the feasibility of the proposed method is demonstrated in accordance with both Lyapunov stability theory and simulation example.  相似文献   
4.
电子政务云中心的任务调度一直是个复杂的问题。大多数现有的任务调度方法依赖于专家知识,通用性不强,无法处理动态的云环境,通常会导致云中心的资源利用率降低和服务质量下降,任务的完工时间变长。为此,提出了一种基于演员评论家(actor-critic,A2C)算法的深度强化学习调度方法。首先,actor网络参数化策略并根据当前系统状态选择调度动作,同时critic网络对当前系统状态给出评分;然后,使用梯度上升的方式来更新actor策略网络,其中使用了critic网络的评分来计算动作的优劣;最后,使用了两个真实的业务数据集进行模拟实验。结果显示,与经典的策略梯度算法以及五个启发式任务调度方法相比,该方法可以提高云数据中心的资源利用率并缩短离线任务的完工时间,能更好地适应动态的电子政务云环境。  相似文献   
5.
Parametric uncertainty associated with unmodeled disturbance always exist in physical electrical–optical gyro-stabilized platform systems, and poses great challenges to the controller design. Moreover, the existence of actuator deadzone nonlinearity makes the situation more complicated. By constructing a smooth dead-zone inverse, the control law consisting of the robust integral of a neural network (NN) output plus sign of the tracking error feedback is proposed, in which adaptive law is synthesized to handle parametric uncertainty and RISE robust term to attenuate unmodeled disturbance. In order to reduce the measure noise, a desired compensation method is utilized in controller design, in which the model compensation term depends on the reference signal only. By mainly activating an auxiliary robust control component for pulling back the transient escaped from the neural active region, a multi-switching robust neuro adaptive controller in the neural approximation domain, which can achieve globally uniformly ultimately bounded (GUUB) tracking stability of servo systems recently. An asymptotic tracking performance in the presence of unknown dead-zone, parametric uncertainties and various disturbances, which is vital for high accuracy tracking, is achieved by the proposed robust adaptive backstepping controller. Extensively comparative experimental results are obtained to verify the effectiveness of the proposed control strategy.  相似文献   
6.
基于自回归预测模型的深度注意力强化学习方法   总被引:1,自引:0,他引:1  
近年来,深度强化学习在各种决策、规划问题中展示了强大的智能性和良好的普适性,出现了诸如AlphaGo、OpenAI Five、Alpha Star等成功案例.然而,传统深度强化学习对计算资源的重度依赖及低效的数据利用率严重限制了其在复杂现实任务中的应用.传统的基于模型的强化学习算法通过学习环境的潜在动态性,可充分利用样本信息,有效提升数据利用率,加快模型训练速度,但如何快速建立准确的环境模型是基于模型的强化学习面临的难题.结合基于模型和无模型两类强化学习的优势,提出了一种基于时序自回归预测模型的深度注意力强化学习方法.利用自编码模型压缩表示潜在状态空间,结合自回归模型建立环境预测模型,基于注意力机制结合预测模型估计每个决策状态的值函数,通过端到端的方式统一训练各算法模块,实现高效的训练.通过CartPole-V0等经典控制任务的实验结果表明,该模型能够高效地建立环境预测模型,并有效结合基于模型和无模型两类强化学习方法,实现样本的高效利用.最后,针对导弹突防智能规划问题进行了算法实证研究,应用结果表明,采用所提出的学习模型可在特定场景取得优于传统突防规划的效果.  相似文献   
7.
As the shipbuilding industry is an engineering-to-order industry, different types of products are manufactured according to customer requests, and each product goes through different processes and workshops. During the shipbuilding process, if the product is not able to go directly to the subsequent process due to physical constraints of workshop, it temporarily waits in a stockyard. Since the waiting process involves unpredictable circumstances, plans regarding time and space cannot be established in advance. Therefore, unnecessary movement often occurs when ship blocks enter or depart from the stockyard. In this study, a reinforcement learning approach was proposed to minimise rearrangement in such circumstances. For this purpose, an environment in which blocks are arranged and rearranged was defined. Rewards based on the simplified rules were logically defined, and simulation was performed for quantitative evaluation using the proposed reinforcement learning algorithm. This algorithm was verified using an example model derived from actual data from a shipyard. The method proposed in this study can be used not only to the arrangement problem of ship block stockyards but also to the various arrangement and allocation problems or logistics problems in the manufacturing industry.  相似文献   
8.
策略梯度作为一种能有效解决连续空间决策问题的方法被广泛研究.然而,由于在策略估计过程中存在较大的方差,因此基于策略梯度的方法往往受到样本利用率低、收敛速度慢等限制.针对该问题,提出了真实在线增量式自然梯度行动者-评论家算法(TOINAC).TOINAC算法采用优于传统梯度的自然梯度,在真实在线时间差分(TOTD)算法的基础上,提出了一种新型的向前观点,改进了自然梯度行动者-评论家算法.在评论家部分,利用TOTD算法高效性的特点来估计值函数;在行动者部分,引入一种新的向前观点来估计自然梯度,再利用资格迹将自然梯度估计变为在线估计,提高了自然梯度估计的准确性和算法的效率.将TOINAC算法与核方法以及正态策略分布结合,解决连续空间问题.最后,在平衡杆、Mountain Car、以及Acrobot等连续问题上进行了仿真实验,验证算法的有效性.  相似文献   
9.
In this paper, an adaptive neural network (NN) control approach is proposed for nonlinear pure-feedback systems with time-varying full state constraints. The pure-feedback systems of this paper are assumed to possess nonlinear function uncertainties. By using the mean value theorem, pure-feedback systems can be transformed into strict feedback forms. For the newly generated systems, NNs are employed to approximate unknown items. Based on the adaptive control scheme and backstepping algorithm, an intelligent controller is designed. At the same time, time-varying Barrier Lyapunov functions (BLFs) with error variables are adopted to avoid violating full state constraints in every step of the backstepping design. All closedloop signals are uniformly ultimately bounded and the output tracking error converges to the neighborhood of zero, which can be verified by using the Lyapunov stability theorem. Two simulation examples reveal the performance of the adaptive NN control approach.   相似文献   
10.
本文将深度强化学习应用于二维不规则多边形的排样问题中,使用质心到轮廓距离将多边形的形状特征映射到一维向量当中,对于在随机产生的多边形中实现了1%以内的压缩损失.给定多边形零件序列,本文使用多任务的深度强化学习模型对不规则排样件的顺序以及旋转角度进行预测,得到优于标准启发式算法5%-10%的排样效果,并在足够次数的采样后...  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号