期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Skillful control under uncertainty via direct reinforcement learning

Vijaykumar Gullapalli 《Robotics and Autonomous Systems》1995,15(4):237-246

Complexity and uncertainty in modern robots and other autonomous systems make it difficult to design controllers for such systems that can achieve desired levels of precision and robustness. Therefore learning methods are being incorporated into controllers for such systems, thereby providing the adaptibility necessary to meet the performance demands of the task. We argue that for learning tasks arising frequently in control applications, the most useful methods in practice probably are those we call direct associative reinforcement learning methods. We describe direct reinforcement learning methods and also illustrate with an example the utility of these methods for learning skilled robot control under uncertainty. 相似文献

2.

Learning action probabilities from delayed reinforcement

S. I. AHSON R. SRINIVAS 《International journal of systems science》2013,44(12):2415-2421

A reinforcement scheme for learning automata, applicable to real situations where the reinforcement received from the environment is delayed, is presented. The scheme divides the state space into regions following the boxes approach of Michie and Chambers. Each region maintains estimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the system state enters it. Estimates of reward characteristics are obtained using reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Barto et al. (1983). 相似文献

3.

Learning to annotate via social interaction analytics

Tong Xu Hengshu Zhu Enhong Chen Baoxing Huai Hui Xiong Jilei Tian 《Knowledge and Information Systems》2014,41(2):251-276

Recent years have witnessed increased interests in exploiting automatic annotating techniques for managing and retrieving media contents. Previous studies on automatic annotating usually rely on the metadata which are often unavailable for use. Instead, multimedia contents usually arouse frequent preference-sensitive interactions in the online social networks of public social media platforms, which can be organized in the form of interaction graph for intensive study. Inspired by this observation, we propose a novel media annotating method based on the analytics of streaming social interactions of media content instead of the metadata. The basic assumption of our approach is that different types of social media content may attract latent social group with different preferences, thus generate different preference-sensitive interactions, which could be reflected as localized dense subgraph with clear preferences. To this end, we first iteratively select nodes from streaming records to build the preference-sensitive subgraphs, then uniformly extract several static and topologic features to describe these subgraphs, and finally integrate these features into a learning-to-rank framework for automatic annotating. Extensive experiments on several real-world date sets clearly show that the proposed approach outperforms the baseline methods with a significant margin. 相似文献

4.

Learning to Caricature via Semantic Shape Transform

Chu Wenqing Hung Wei-Chih Tsai Yi-Hsuan Chang Yu-Ting Li Yijun Cai Deng Yang Ming-Hsuan 《International Journal of Computer Vision》2021,129(9):2663-2679

International Journal of Computer Vision - Caricature is an artistic drawing created to abstract or exaggerate facial features of a person. Rendering visually pleasing caricatures is a difficult... 相似文献

5.

Learning through reinforcement for N-person repeated constrained games

Poznyak A.S. Najim K. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(6):759-771

The design and analysis of an adaptive strategy for N-person averaged constrained stochastic repeated game are addressed. Each player is modeled by a stochastic variable-structure learning automaton. Some constraints are imposed on some functions of the probabilities governing the selection of the player's actions. After each stage, the payoff to each player as well as the constraints are random variables. No information concerning the parameters of the game is a priori available. The "diagonal concavity" conditions are assumed to be fulfilled to guarantee the existence and uniqueness of the Nash equilibrium. The suggested adaptive strategy which uses only the current realizations (outcomes and constraints) of the game is based on the Bush-Mosteller reinforcement scheme in connection with a normalization procedure. The Lagrange multipliers approach with a regularization is used. The asymptotic properties of this algorithm are analyzed. Simulation results illustrate the feasibility and the performance of this adaptive strategy. 相似文献

6.

Learning classifier systems from a reinforcement learning perspective

P. L. Lanzi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(3-4):162-170

We analyze learning classifier systems in the light of tabular reinforcement learning. We note that although genetic algorithms are the most distinctive feature of learning classifier systems, it is not clear whether genetic algorithms are important to learning classifiers systems. In fact, there are models which are strongly based on evolutionary computation (e.g., Wilson's XCS) and others which do not exploit evolutionary computation at all (e.g., Stolzmann's ACS). To find some clarifications, we try to develop learning classifier systems “from scratch”, i.e., starting from one of the most known reinforcement learning technique, Q-learning. We first consider thebasics of reinforcement learning: a problem modeled as a Markov decision process and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use to implement tabular Q-learning. We formally define generalization within rules and discuss the possible approaches to extend our rule-based Q-learning with generalization capabilities. We suggest that genetic algorithms are probably the most general approach for adding generalization although they might be not the only solution. 相似文献

7.

Stochastic control via direct comparison

Xi-Ren Cao De-Xin Wang Tao Lu Yifan Xu 《Discrete Event Dynamic Systems》2011,21(1):11-38

The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning. 相似文献

8.

增强学习中的直接策略搜索方法综述 总被引：1，自引：0，他引：1

王学宁陈伟张锰徐昕贺汉根《智能系统学报》2007,2(1):16-24

对增强学习中各种策略搜索算法进行了简单介绍，建立了策略梯度方法的理论框架，并且根据这个理论框架的指导，对一些现有的策略梯度算法进行了推广，讨论了近年来出现的提高策略梯度算法收敛速度的几种方法-对于非策略梯度搜索算法的最新进展进行了介绍，对进一步研究工作的方向进行了展望．相似文献

9.

Learning via finitely many queries

Andrew?C.?Lee Email author 《Annals of Mathematics and Artificial Intelligence》2005,44(4):401-418

This work introduces a new query inference model that can access data and communicate with the teacher by asking finitely many Boolean queries in a language L. In this model the parameters of interest are the number of queries used and the expressive power of L. We study how the learning power varies with these parameters. Results suggest that this model may help studying query inference in a resource bounded environment.AMS subject classification 68Q05, 68Q32, 68T05, 03D10, 03D80 相似文献

10.

User preference-aware video highlight detection via deep reinforcement learning

Wang Han Wang Kexin Wu Yuqing Wang Zhongzhi Zou Ling 《Multimedia Tools and Applications》2020,79(21-22):15015-15024

Multimedia Tools and Applications - Video highlight detection is a technique to retrieval short video clips that capture a user’s primary attention or interest within an unedited video. There... 相似文献

11.

Automatic abstraction controller in reinforcement learning agent via automata

《Applied Soft Computing》2014

Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method. 相似文献

12.

Facilitating the migration to the microservice architecture via model-driven reverse engineering and reinforcement learning

Dehghani MohammadHadi Kolahdouz-Rahimi Shekoufeh Tisi Massimo Tamzalit Dalila 《Software and Systems Modeling》2022,21(3):1115-1133

The microservice architecture has gained remarkable attention in recent years. Microservices allow developers to implement and deploy independent services, so they are a naturally effective architecture for continuously deployed systems. Because of this, several organizations are undertaking the costly process of manually migrating their traditional software architectures to microservices. The research in this paper aims at facilitating the migration from monolithic software architectures to microservices. We propose a framework which enables software developers/architects to migrate their software systems more efficiently by helping them remodularize the source code of their systems. The framework leverages model-driven reverse engineering to obtain a model of the legacy system and reinforcement learning to propose a mapping of this model toward a set of microservices.

相似文献

13.

Learning potential functions and their representations for multi-task reinforcement learning

Matthijs Snel Shimon Whiteson 《Autonomous Agents and Multi-Agent Systems》2014,28(4):637-681

In multi-task learning, there are roughly two approaches to discovering representations. The first is to discover task relevant representations, i.e., those that compactly represent solutions to particular tasks. The second is to discover domain relevant representations, i.e., those that compactly represent knowledge that remains invariant across many tasks. In this article, we propose a new approach to multi-task learning that captures domain-relevant knowledge by learning potential-based shaping functions, which augment a task’s reward function with artificial rewards. We address two key issues that arise when deriving potential functions. The first is what kind of target function the potential function should approximate; we propose three such targets and show empirically that which one is best depends critically on the domain and learning parameters. The second issue is the representation for the potential function. This article introduces the notion of $k$ -relevance, the expected relevance of a representation on a sample sequence of $k$ tasks, and argues that this is a unifying definition of relevance of which both task and domain relevance are special cases. We prove formally that, under certain assumptions, $k$ -relevance converges monotonically to a fixed point as $k$ increases, and use this property to derive Feature Selection Through Extrapolation of k-relevance (FS-TEK), a novel feature-selection algorithm. We demonstrate empirically the benefit of FS-TEK on artificial domains. 相似文献

14.

Learning future representation with synthetic observations for sample-efficient reinforcement learning

Xin LIU;Yaran CHEN;Haoran LI;Dongbin ZHAO 《中国科学:信息科学(英文版)》2025,(5):20-37

Image-based reinforcement learning(RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing selfsupervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint,proposing a novel RL auxiliary task named learning future representation with synthetic observations(LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip(LNC)is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application(e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks(leading on 12/13 tasks), and enables advanced RL visual pre-training(outperforming the next best method by 1.51×) on action-free video demonstrations. 相似文献

15.

Learning type PID control system using input dependence reinforcement scheme

Hideharu Sawada Ji-Sun Shin Fumihiro Shoji Hee-Hyol Lee 《Artificial Life and Robotics》2008,13(1):139-143

PID control has widely used in the field of process control and a lot of methods have been used to design PID parameters. When the characteristic values of a controlled object are changed due to a change over the years or disturbance, the skilled operators observe the feature of the controlled responses and adjust the PID parameters using their knowledge and know-how, and a lot of labors are required to do it. In this research, we design a learning type PID control system using the stochastic automaton with learning function, namely learning automaton, which can autonomously adjust the control parameters updating the state transition probability using relative amount of controlled error. We show the effectiveness of the proposed learning type PID control system by simulations. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 相似文献

16.

Nonlinear system identification via direct weight optimization 总被引：2，自引：0，他引：2

Jacob Roll Author Vitae Alexander Nazin^{Author Vitae} 《Automatica》2005,41(3):475-490

A general framework for estimating nonlinear functions and systems is described and analyzed in this paper. Identification of a system is seen as estimation of a predictor function. The considered predictor function estimate at a particular point is defined to be affine in the observed outputs and the estimate is defined by the weights in this expression. For each given point, the maximal mean-square error (or an upper bound) of the function estimate over a class of possible true functions is minimized with respect to the weights, which is a convex optimization problem. This gives different types of algorithms depending on the chosen function class. It is shown how the classical linear least squares is obtained as a special case and how unknown-but-bounded disturbances can be handled.Most of the paper deals with the method applied to locally smooth predictor functions. It is shown how this leads to local estimators with a finite bandwidth, meaning that only observations in a neighborhood of the target point will be used in the estimate. The size of this neighborhood (the bandwidth) is automatically computed and reflects the noise level in the data and the smoothness priors.The approach is applied to a number of dynamical systems to illustrate its potential. 相似文献

17.

基于卷积-自动编码机的三维形状特征学习

《计算机辅助设计与图形学学报》2015,(11)

三维形状特征在三维物体分类、检索和语义分析中起着关键的作用.传统的三维特征设计过程繁复,而且不能从已有的大量三维数据中自动学习而得.在深度神经网络的研究领域中,卷积神经网络和自动编码机是比较流行的2种网络结构.在超限学习机的框架之下,将两者结合起来,提出一种基于卷积-自动编码机的三维特征自动学习方法.实验结果表明,文中方法的特征学习速度比其他深度学习方法提高约2个数量级,且提取的特征在三维模型分类、三维物体检测等任务中都取得了良好的结果. 相似文献

18.

Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Hachiya H Peters J Sugiyama M 《Neural computation》2011,23(11):2798-2832

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R3), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .). 相似文献

19.

Document Classification via Nonlinear Metric Learning

Xin Li Yanqin Bai Siyun Zhou Ying Li 《Neural Processing Letters》2018,48(3):1335-1345

Learning a proper distance metric is an important problem in document classification, because the similarities of samples in many problems are usually measured by distance metric. In this paper, we address the nonlinear metric leaning problem with applying in the document classification. First, we propose a new representation about nonlinear metric by using a linear combination of some basic kernels. Second, we give a linear metric learning method by a triplet constraint and k-nearest neighbors, and then we develop it to a nonlinear method based on multiple kernel by above nonlinear metric. Further, the corresponding problem can be rewritten as an unconstrained optimization problem on positive definite matrices groups. At last, to ensure the learned distance matrix must be a positive definite matrix, we provide an improved intrinsic steepest descent algorithm with adaptive step-size to solve this unconstrained optimization. The experimental results show that our proposed method is effective on some document classification problems. 相似文献

20.

Coordinating Multiple Agents via Reinforcement Learning 总被引：2，自引：0，他引：2

Gang?Chen Email author Zhonghua?Yang Hao?He Kiah Mok?Goh 《Autonomous Agents and Multi-Agent Systems》2005,10(3):273-328

In this paper, we attempt to use reinforcement learning techniques to solve agent coordination problems in task-oriented environments. The Fuzzy Subjective Task Structure model (FSTS) is presented to model the general agent coordination. We show that an agent coordination problem modeled in FSTS is a Decision-Theoretic Planning (DTP) problem, to which reinforcement learning can be applied. Two learning algorithms, coarse-grained and fine-grained, are proposed to address agents coordination behavior at two different levels. The coarse-grained algorithm operates at one level and tackle hard system constraints, and the fine-grained at another level and for soft constraints. We argue that it is important to explicitly model and explore coordination-specific (particularly system constraints) information, which underpins the two algorithms and attributes to the effectiveness of the algorithms. The algorithms are formally proved to converge and experimentally shown to be effective. 相似文献