全文获取类型
收费全文 | 168篇 |
免费 | 33篇 |
专业分类
电工技术 | 1篇 |
综合类 | 1篇 |
金属工艺 | 2篇 |
机械仪表 | 3篇 |
能源动力 | 2篇 |
无线电 | 10篇 |
一般工业技术 | 4篇 |
自动化技术 | 178篇 |
出版年
2023年 | 2篇 |
2022年 | 10篇 |
2021年 | 4篇 |
2020年 | 5篇 |
2019年 | 1篇 |
2018年 | 1篇 |
2017年 | 2篇 |
2016年 | 2篇 |
2015年 | 1篇 |
2014年 | 17篇 |
2013年 | 10篇 |
2012年 | 4篇 |
2011年 | 6篇 |
2009年 | 15篇 |
2008年 | 20篇 |
2007年 | 13篇 |
2006年 | 17篇 |
2005年 | 13篇 |
2004年 | 16篇 |
2003年 | 4篇 |
2002年 | 5篇 |
2001年 | 3篇 |
2000年 | 1篇 |
1999年 | 2篇 |
1998年 | 3篇 |
1997年 | 5篇 |
1996年 | 6篇 |
1995年 | 4篇 |
1994年 | 3篇 |
1992年 | 4篇 |
1991年 | 2篇 |
排序方式: 共有201条查询结果,搜索用时 30 毫秒
1.
《Mechatronics》2014,24(8):1001-1007
Passivity-based control (PBC) is commonly used for the stabilization of port-Hamiltonian (PH) systems. The PH framework is suitable for multi-domain systems, for example mechatronic devices or micro-electro-mechanical systems. Passivity-based control synthesis for PH systems involves solving partial differential equations, which can be cumbersome. Rather than explicitly solving these equations, in our approach the control law is parameterized and the unknown parameter vector is learned using an actor–critic reinforcement learning algorithm. The key advantages of combining learning with PBC are: (i) the complexity of the control design procedure is reduced, (ii) prior knowledge about the system, given in the form of a PH model, speeds up the learning process, (iii) physical meaning can be attributed to the learned control law. In this paper we extended the learning-based PBC method to a regulation problem and present the experimental results for a two-degree-of-freedom manipulator. We show that the learning algorithm is capable of achieving feedback regulation in the presence of model uncertainties. 相似文献
2.
《Expert systems with applications》2014,41(9):4073-4082
In this paper, an intelligent agent (using the Fuzzy SARSA learning approach) is proposed to negotiate for bilateral contracts (BC) of electrical energy in Block Forward Markets (BFM or similar market environments). In the BFM energy markets, the buyers (or loads) and the sellers (or generators) submit their bids and offers on a daily basis. The loads and generators could employ intelligent software agents to trade energy in BC markets on their behalves. Since each agent attempts to choose the best bid/offer in the market, conflict of interests might happen. In this work, the trading of energy in BC markets is modeled and solved using Game Theory and Reinforcement Learning (RL) approaches. The Stackelberg equation concept is used for the match making among load and generator agents. Then to overcome the negotiation limited time problems (it is assumed that a limited time is given to each generator–load pairs to negotiate and make an agreement), a Fuzzy SARSA Learning (FSL) method is used. The fuzzy feature of FSL helps the agent cope with continuous characteristics of the environment and also prevents it from the curse of dimensionality. The performance of the FSL (compared to other well-known traditional negotiation techniques, such as time-dependent and imitative techniques) is illustrated through simulation studies. The case study simulation results show that the FSL based agent could achieve more profits compared to the agents using other reviewed techniques in the BC energy market. 相似文献
3.
The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning
algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often
preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among
the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic,
stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of
the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we
present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among
the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm
and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method
to determine an effective reward without performing extensive simulations. We then test this method in both a static and a
dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’
movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward
efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a
full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational
limitations of the domain, providing rewards that combine the best properties of traditional rewards. 相似文献
4.
Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns 总被引:1,自引:0,他引:1
In business applications such as direct marketing, decision-makers are required to choose the action which best maximizes
a utility function. Cost-sensitive learning methods can help them achieve this goal. In this paper, we introduce Pessimistic
Active Learning (PAL). PAL employs a novel pessimistic measure, which relies on confidence intervals and is used to balance
the exploration/exploitation trade-off. In order to acquire an initial sample of labeled data, PAL applies orthogonal arrays
of fractional factorial design. PAL was tested on ten datasets using a decision tree inducer. A comparison of these results
to those of other methods indicates PAL’s superiority. 相似文献
5.
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated
by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers
and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration
scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine.
The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less
computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning
domains: inverted pendulum and mountain-car. 相似文献
6.
Cheng-Jian Lin Yong-Cheng Liu Chi-Yung Lee 《Journal of Intelligent and Robotic Systems》2008,52(2):285-312
This study presents a wavelet-based neuro-fuzzy network (WNFN). The proposed WNFN model combines the traditional Takagi–Sugeno–Kang
(TSK) fuzzy model and the wavelet neural networks (WNN). This study adopts the non-orthogonal and compactly supported functions
as wavelet neural network bases. A novel supervised evolutionary learning, called WNFN-S, is proposed to tune the adjustable
parameters of the WNFN model. The proposed WNFN-S learning scheme is based on dynamic symbiotic evolution (DSE). The proposed
DSE uses the sequential-search-based dynamic evolutionary (SSDE) method. In some real-world applications, exact training data
may be expensive or even impossible to obtain. To solve this problem, the reinforcement evolutionary learning, called WNFN-R,
is proposed. Computer simulations have been conducted to illustrate the performance and applicability of the proposed WNFN-S
and WNFN-R learning algorithms. 相似文献
7.
将强化学习引入机械臂的避碰问题研究,建立了平面三自由度机械臂的多Agent避碰系统,系统结合了最近障碍物信息和偏差角信息来产生控制指令.采用基于K-均值聚类的强化学习方法作为基本的控制策略,给出了系统算法的具体实施过程.通过仿真试验,证明了基于聚类划分的强化学习方法在机械臂避碰问题中的可行性和有效性. 相似文献
8.
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov
Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete)
action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient
search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient
estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and
an additional averaging is performed for enhanced performance. A proof of convergence to a locally optimal policy is presented.
Next, we discuss a memory efficient implementation that uses a feature-based representation of the state-space and performs
TD(0) learning along the faster timescale. The TD(0) algorithm does not follow an on-line sampling of states but is observed
to do well on our setting. Numerical experiments on a problem of rate based flow control are presented using the proposed
algorithms. We consider here the model of a single bottleneck node in the continuous time queueing framework. We show performance
comparisons of our algorithms with the two-timescale actor-critic algorithms of Konda and Borkar (1999) and Bhatnagar and Kumar (2004). Our algorithms exhibit more than an order of magnitude better performance over those of Konda and Borkar (1999).
相似文献
Shalabh Bhatnagar (Corresponding author)Email: |
9.
Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals.
Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal
learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL)
method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical
levels; in the first level agents learn to select their target and then they select the action directed to their target in
the second level. The agents communicate their perception to their neighbors and use the communication information in their
decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.
Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle
East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University,
Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent
perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding.
Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received
his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees
in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting
NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests
include artificial intelligence, multi-agent systems and object oriented data models. 相似文献
10.
One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN. 相似文献