期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	217篇
免费	47篇
国内免费	60篇

专业分类

电工技术	24篇
综合类	30篇
机械仪表	9篇
建筑科学	1篇
矿业工程	1篇
能源动力	4篇
水利工程	1篇
武器工业	2篇
无线电	33篇
一般工业技术	8篇
冶金工业	1篇
自动化技术	210篇

出版年

2024年	5篇
2023年	11篇
2022年	28篇
2021年	25篇
2020年	25篇
2019年	11篇
2018年	7篇
2017年	11篇
2016年	8篇
2015年	10篇
2014年	16篇
2013年	13篇
2012年	15篇
2011年	21篇
2010年	15篇
2009年	17篇
2008年	19篇
2007年	12篇
2006年	11篇
2005年	7篇
2004年	4篇
2003年	6篇
2002年	7篇
2001年	4篇
2000年	1篇
1999年	4篇
1998年	5篇
1997年	2篇
1996年	2篇
1994年	2篇

排序方式： 共有324条查询结果，搜索用时 158 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»

基于Q强化学习的综合能源服务商现货市场申报策略研究

郝旭东孙伟程定一张国强匡洪辉《电力建设》2020,41(9):132-138

随着综合能源系统建设和电力市场改革推进,综合能源服务商有望成为新的市场交易成员。为解决申报阶段有限的决策参考信息制约申报策略制定的问题,文章提出了一种基于Q强化学习的综合能源服务商现货市场申报策略,以提升申报策略的理想度。该方法的主要特点在于充分利用庞大的历史运行信息,通过人工智能算法训练申报策略智能体,建立综合能源服务商所掌握的有限参考信息与最优申报策略之间的内在关系。智能体以市场公开信息、社会公共信息及服务商私有信息为环境变量,能够实现申报策略的自动生成和智能改进。最后,基于某省电网实际数据构造算例表明,该方法能较好地拟合合作博弈下的申报策略,具有收敛速度快、理想度高、计算效率高等特点,更符合综合能源服务商决策需求。相似文献

基于Q学习算法的综合能源系统韧性提升方法

吴熙唐子逸徐青山周亦洲《电力自动化设备》2020,40(4):146-152

将综合能源系统随机动态优化问题建模为马尔可夫决策过程,并引入Q学习算法实现该复杂问题的求解。针对Q学习算法的弊端,对传统的Q学习算法做了2个改进:改进了Q值表初始化方法,采用置信区间上界算法进行动作选择。仿真结果表明:Q学习算法在实现问题求解的同时保证了较好的收敛性,改进的初始化方法和采用的置信区间上界算法能显著提高计算效率,使结果收敛到更优解;与常规混合整数线性规划模型相比,Q学习算法具有更好的优化结果。相似文献

基于Q-learning的不确定环境BDI Agent最优策略规划研究

万谦刘玮徐龙龙郭竞知《计算机工程与科学》2019,41(1):166-172

BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。相似文献

Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems

下载免费PDF全文

Xiaofeng Li Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》2021,8(1):227-238

In this paper, a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems. The system state is forced to track the reference signal by minimizing the performance function. First, the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function (also named as action value function). Then, an iterative algorithm based on adaptive dynamic programming (ADP) is developed to find the optimal solution which is totally based on sampled data. The linear-in-parameter (LIP) neural network is taken as the value function approximator. Considering the presence of approximation error at each iteration step, the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions. Moreover, the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper. A sufficient condition for asymptotically stability of the tracking error is derived. Finally, the effectiveness of the algorithm is demonstrated with three simulation examples. 相似文献

基于博弈论及Q学习的多Agent协作追捕算法

郑延斌樊文鑫韩梦云陶雪丽《计算机应用》2020,40(6):1613-1620

多Agent协作追捕问题是多Agent协调与协作研究中的一个典型问题。针对具有学习能力的单逃跑者追捕问题，提出了一种基于博弈论及Q学习的多Agent协作追捕算法。首先,建立协作追捕团队，并构建协作追捕的博弈模型；其次,通过对逃跑者策略选择的学习，建立逃跑者有限的Step-T累积奖赏的运动轨迹，并把运动轨迹调整到追捕者的策略集中；最后,求解协作追捕博弈得到Nash均衡解，每个Agent执行均衡策略完成追捕任务。同时,针对在求解中可能存在多个均衡解的问题，加入了虚拟行动行为选择算法来选择最优的均衡策略。C#仿真实验表明，所提算法能够有效地解决障碍环境中单个具有学习能力的逃跑者的追捕问题，实验数据对比分析表明该算法在同等条件下的追捕效率要优于纯博弈或纯学习的追捕算法。相似文献

Self-learning energy management for plug-in hybrid electric bus considering expert experience and generalization performance

Hongqiang Guo Fengrui Zhao Hongliang Guo Qinghu Cui Erlei Du Kun Zhang 《国际能源研究杂志》2020,44(7):5659-5674

A self-learning energy management is proposed for plug-in hybrid electric bus, by combining Q-Learning (QL) and Pontryagin's minimum principle algorithms. Different from the existing strategies, the expert experience and generalization performance are focused in the proposed strategy. The expert experience is designed as the approximately optimal reference state-of-charge (SOC) trajectories, and the generalization performance is enhanced by a multiply driving cycle training method. In specific, an efficient zone of SOC is firstly designed based on the approximately optimal reference SOC trajectories. Then, the agent of the QL is trained off-line by taking the expert experience as reference SOC trajectories. Finally, an adaptive strategy is proposed based on the well-trained agent. Specially, two different reward functions are defined. That is, the reward function in the off-line training mainly considers the tracking performance between the expert experience and the SOC, while mainly considering the punishment in the adaptive strategy. Simulation results show that the proposed strategy has good generalization performance and can improve the fuel economy by 22.49%, compared to a charge depleting-charge sustaining (CDCS) strategy. 相似文献

Path Selection in Disaster Response Management Based on Q-learning

Zhao-Pin Su Jian-Guo Jiang Chang-Yong Liang Guo-Fu Zhang 《Canadian Metallurgical Quarterly》2011,8(1)

Suitable rescue path selection is very important to rescue lives and reduce the loss of disasters,and has been a key issue in the field of disaster response management.In this paper,we present a path selection algorithm based on Q-learning for disaster response applications.We assume that a rescue team is an agent,which is operating in a dynamic and dangerous environment and needs to find a safe and short path in the least time.We first propose a path selection model for disaster response management,and deduce that path selection based on our model is a Markov decision process.Then,we introduce Q-learning and design strategies for action selection and to avoid cyclic path.Finally,experimental results show that our algorithm can find a safe and short path in the dynamic and dangerous environment,which can provide a specific and significant reference for practical management in disaster response applications. 相似文献

基于Q学习和规划的传感器节点任务调度算法^*

魏振春徐祥伟冯琳丁蓓《模式识别与人工智能》2016,29(11):1028-1036

为了改善节点的学习策略,提高节点的应用性能,以数据收集为应用建立任务模型,提出基于Q学习和规划的传感器节点任务调度算法,包括定义状态空间、延迟回报、探索和利用策略等基本元素.根据无线传感器网络(WSN)特性,建立基于优先级机制和过期机制的规划过程,使节点可以有效利用经验知识,改善学习策略.实验表明,文中算法具备根据当前WSN环境进行动态任务调度的能力.相比其它任务调度算法,文中算法能量消耗合理且获得较好的应用性能. 相似文献

Reactive fuzzy controller design by Q-learning for mobile robot navigation

张文志吕恬生《哈尔滨工业大学学报(英文版)》2005,12(3):319-324

In this paper a learning mechanism for reactive fuzzy controller design of a mobile robot navigating in unknown environments is proposed. The fuzzy logical controller is constructed based on the kinematics model of a real robot. The approach to learning the fuzzy rule base by relatively simple and less computational Q-learning is described in detail. After analyzing the credit assignment problem caused by the rules collision, a remedy is presented. Furthermore, time-varying parameters are used to increase the learning speed. Simulation results prove the mechanism can learn fuzzy navigation rules successfully only using scalar reinforcement signal and the rule base learned is proved to be correct and feasible on real robot platforms. 相似文献

10.

基于量子计算的多Agent协作学习算法 总被引：1，自引：0，他引：1

下载免费PDF全文

谭万禹王建忠孟祥萍《计算机工程与应用》2008,44(26):62-64

针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论的多Agent协作学习算法。新算法借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅表示行为选择概率,并用量子搜索算法来加速多Agent的学习。相应的仿真实验结果显示新算法的有效性。相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»