首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Continuous-Action Q-Learning   总被引:1,自引:0,他引:1  
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the winning unit weighted by their Q-values. Then, TD() updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.  相似文献   

2.
A problem related to the use of reinforcement learning (RL) algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.  相似文献   

3.
周雷  孔凤  唐昊  张建军 《控制理论与应用》2011,28(11):1665-1670
研究单站点传送带给料生产加工站(conveyor-serviced production station,CSPS)系统的前视(look-ahead)距离最优控制问题,以提高系统的工作效率.论文运用半Markov决策过程对CSPS优化控制问题进行建模.考虑传统Q学习难以直接处理CSPS系统前视距离为连续变量的优化控制问题,将小脑模型关节控制器网络的Q值函数逼近与在线学习技术相结合,给出了在线Q学习及模型无关的在线策略迭代算法.仿真结果表明,文中算法提高了学习速度和优化精度.  相似文献   

4.
Distributed groupware systems provide computer support for manipulating objects such as a text document or a filesystem, shared by two or more geographically separated users. Data replication is a technology to improve performance and availability of data in distributed groupware systems. Indeed, each user has a local copy of the shared objects, upon which he may perform updates. Locally executed updates are then transmitted to the other users. This replication potentially leads, however, to divergent (i.e. different) copies. In this respect, Operational Transformation (OT) algorithms are applied for achieving convergence of all copies, i.e. all users view the same objects. Using these algorithms users can exchange their updates in any order since the convergence should be ensured in all cases. However, the design of such algorithms is a difficult and error-prone activity since building the correct updates for maintaining good convergence properties of the local copies requires examining a large number of situations. In this paper, we present the modelling and deductive verification of OT algorithms with algebraic specifications. We show in particular that many OT algorithms in the literature do not satisfy convergence properties unlike what was stated by their authors.  相似文献   

5.
I/O parallelism is considered to be a promising approach to achieving high performance in parallel data warehousing systems where huge amounts of data and complex analytical queries have to be processed. This paper proposes a parallel secondary data cube storage structure (PHC for short) to efficiently support the processing of range sum queries and dynamic updates on data cube using parallel computing systems. Based on PHC, two parallel algorithms for processing range sum queries and updates are proposed also. Both the algorithms have the same time complexity, O(logdn/P). The analytical and experimental results show that PHC and the parallel algorithms have high performance and achieve optimum speedup.  相似文献   

6.
唐蔚  张栋梁  范媛媛 《计算机科学》2011,38(12):106-109,138
针对移动网络环境下移动对象的位置更新、断连检测和处理的问题,提出了现有位置更新策略的改进算法。采用固定时限策略进行断连检测,虽然能够获取快速而准确的断连检测,但是直接增加了位置更新的次数,降低了原有位置更新策略的性能。为了改正该缺点,提出了自适应策略,即根据移动对象的位置估计一个最晚的更新时限,实现强制更新。此外,还提出了群组更新策略中断连的处理算法。实验仿真结果表明,新算法既能减少位置更新次数,又能实现有效合理的网络处理。  相似文献   

7.
Consistency Algorithms for Multi-Source Warehouse View Maintenance   总被引:1,自引:0,他引:1  
A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.  相似文献   

8.
This paper considers the average consensus problem on a network of digital links, and proposes a set of algorithms based on pairwise “gossip” communications and updates. We study the convergence properties of such algorithms with the goal of answering two design questions, arising from the literature: whether the agents should encode their communication by a deterministic or a randomized quantizer, and whether they should use, and how, exact information regarding their own states in the update.  相似文献   

9.
用户兴趣更新是指在用户兴趣模型建立之后,对模型增加新获取的用户兴趣知识或者删除过时不用的用户兴趣知识。本文基于艾宾浩斯遗忘规律,对用户兴趣漂移和更新用户兴趣模型分别提出前验用户兴趣漂移和基于遗忘百分比更新的算法,这两个算法共同组成了用户兴趣模型的更新机制。  相似文献   

10.
We investigate adaptive mixture methods that linearly combine outputs of m constituent filters running in parallel to model a desired signal. We use Bregman divergences and obtain certain multiplicative updates to train the linear combination weights under an affine constraint or without any constraints. We use unnormalized relative entropy and relative entropy to define two different Bregman divergences that produce an unnormalized exponentiated gradient update and a normalized exponentiated gradient update on the mixture weights, respectively. We then carry out the mean and the mean-square transient analysis of these adaptive algorithms when they are used to combine outputs of m constituent filters. We illustrate the accuracy of our results and demonstrate the effectiveness of these updates for sparse mixture systems.  相似文献   

11.
System M is an experimental transaction processing testbed that runs on top of the Mach operating system. Its database is stored in primary memory. The structure and algorithms used in System M are described. The checkpointer is the component that periodically sweeps memory and propagates updates to a backup database copy on disk. Several different checkpointing (and logging) algorithms were implemented, and their performance was experimentally evaluated  相似文献   

12.
增量式K-Medoids聚类算法   总被引:3,自引:0,他引:3  
高小梅  冯志  冯兴杰 《计算机工程》2005,31(Z1):181-183
聚类是一种非常有用的数据挖掘方法,可用于发现隐藏在数据背后的分组和数据分布信息。目前已经提出了许多聚类算法及其变种,但在增量式聚类算法研究方面所做的工作较少。当数据集因更新而发生变化时,数据挖掘的结果也应该进行相应的更新。由于数据量大,在更新后的数据集上重新执行聚类算法以更新挖掘结果显然比较低效,因此亟待研究增量式聚类算法。该文通过对K-Medoids聚类算法的改进,提出一种增量式K-Medoids聚类算法。它能够很好地解决传统聚类算法在伸缩性、数据定期更新时所面临的问题。  相似文献   

13.
The development of machine learning algorithms has been gathering relevance to address the increasing modelling complexity of manufacturing decision-making problems. Reinforcement learning is a methodology with great potential due to the reduced need for previous training data, i.e. the system learns along time with actual operation. This study focuses on the implementation of a reinforcement learning algorithm in an assembly problem of a given object, aiming to identify the effectiveness of the proposed approach in the optimisation of the assembly process time. A model-free Q-Learning algorithm is applied, considering the learning of a matrix of Q-values (Q-table) from the successive interactions with the environment to suggest an assembly sequence solution. This implementation explores three scenarios with increasing complexity so that the impact of the Q-Learning's parameters and rewards is assessed to improve the reinforcement learning agent performance. The optimisation approach achieved very promising results by learning the optimal assembly sequence 98.3% of the times.  相似文献   

14.
Cees Duin 《Algorithmica》2005,41(2):131-145
We formulate and study an algorithm for all-pairs shortest paths in a network with $n $ nodes and $m $ arcs of positive length. Using the dynamic programming principle of optimality of subpaths the algorithm avoids redundant updates of distance labels. A shortest $v$--$w$ path, say $\langle v, r_{1} , r_{2} , \ldots , r_{k } = w \rangle$ with $k $ arcs ($k \geq 1$), is only then combined with an arc $(w,t) \in A$ to update the distance label of pair $v$--$t$, if $(w,t) $ is present on the shortest $r_{\ell } $--$ t$ path for each node $r_{\ell}$ $(\ell=k- 1 , k- 2, \ldots, 1) $. The algorithm extracts shortest paths in order of length from a data structure and builds two shortest path trees per node, an extra effort of $O(n^{2})$. This way it can execute efficiently only the aforementioned distance updates, by picking the arcs $(w,t)$ out of these trees. The time complexity order per distance update and path extraction is similar as in other algorithms. An implementation with a data structure of heaps is possible, but a bucket-type data structure may be more appropriate. The implied number of distance updates does not exceed $nm_{0}$ ($m_{0}$ being the total number of shortest path arcs), but is frequently much lower. In extreme cases the new algorithm applies $O(n^{2})$ distance updates, whereas known algorithms require $\Omega( n ^{3})$ updates. The algorithm is especially suited for undirected graphs; here the construction of one tree per node is sufficient and the computation times halve.  相似文献   

15.
In this paper we consider the problem of dynamic transitive closure with lookahead. We present a randomized one-sided error algorithm with updates and queries in O(n ω(1,1,ε)−ε ) time given a lookahead of n ε operations, where ω(1,1,ε) is the exponent of multiplication of n×n matrix by n×n ε matrix. For ε≤0.294 we obtain an algorithm with queries and updates in O(n 2−ε ) time, whereas for ε=1 the time is O(n ω−1). This is essentially optimal as it implies an O(n ω ) algorithm for boolean matrix multiplication. We also consider the offline transitive closure in planar graphs. For this problem, we show an algorithm that requires O(n\fracw2)O(n^{\frac{\omega}{2}}) time to process n\frac12n^{\frac{1}{2}} operations. We also show a modification of these algorithms that gives faster amortized queries. Finally, we give faster algorithms for restricted type of updates, so called element updates. All of the presented algorithms are randomized with one-sided error. All our algorithms are based on dynamic algorithms with lookahead for matrix inverse, which are of independent interest.  相似文献   

16.
We study modern implementations of the discrete Kalman filter, namely array square-root algorithms. An important feature of such algorithms is the use of orthogonal and J-orthogonal transformations on each filtering step. For the first time, we develop for this class of algorithms a simple universal approach that lets us generalize any numerically stable implementation of this type to the case of updates in sensitivity equations of the filter with respect to unknown system model parameters. An advantage of the resulting adaptive schemes is their numerical stability with respect to machine rounding errors. Estimation of the noisy state vector of the system and identification of unknown system parameters occur simultaneously. The proposed approach can be used for parameter identification problems, adaptive control problems, experiment planning, and others.  相似文献   

17.
Cees Duin 《Algorithmica》2004,41(2):131-145
We formulate and study an algorithm for all-pairs shortest paths in a network with $n $ nodes and $m $ arcs of positive length. Using the dynamic programming principle of optimality of subpaths the algorithm avoids redundant updates of distance labels. A shortest $v$--$w$ path, say $\langle v, r_{1} , r_{2} , \ldots , r_{k } = w \rangle$ with $k $ arcs ($k \geq 1$), is only then combined with an arc $(w,t) \in A$ to update the distance label of pair $v$--$t$, if $(w,t) $ is present on the shortest $r_{\ell } $--$ t$ path for each node $r_{\ell}$ $(\ell=k- 1 , k- 2, \ldots, 1) $. The algorithm extracts shortest paths in order of length from a data structure and builds two shortest path trees per node, an extra effort of $O(n^{2})$. This way it can execute efficiently only the aforementioned distance updates, by picking the arcs $(w,t)$ out of these trees. The time complexity order per distance update and path extraction is similar as in other algorithms. An implementation with a data structure of heaps is possible, but a bucket-type data structure may be more appropriate. The implied number of distance updates does not exceed $nm_{0}$ ($m_{0}$ being the total number of shortest path arcs), but is frequently much lower. In extreme cases the new algorithm applies $O(n^{2})$ distance updates, whereas known algorithms require $\Omega( n ^{3})$ updates. The algorithm is especially suited for undirected graphs; here the construction of one tree per node is sufficient and the computation times halve.  相似文献   

18.
夏英  刘婉蓉 《计算机应用》2008,28(12):3224-3226
现有的关联规则算法大多都致力于解决增量式更新问题,需要多次扫描数据集,无法对海量数据进行有效处理。针对此问题,提出了基于滑动窗口的关联规则增量式更新算法(SWIUA),利用滑动窗口进行数据更新,挖掘出用户感兴趣的关联规则。该算法只需要扫描原始数据集和更新的数据各一遍,降低了I/O时间;并采用优化策略对候选项集过滤和删除,提高了关联规则的挖掘性能,能有效处理大量新增数据。  相似文献   

19.
The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin +, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.  相似文献   

20.
In this paper, the distributed and recursive blind channel identification algorithms are proposed for single-input multi-output (SIMO) systems of sensor networks (both time-invariant and time-varying networks). At any time, each agent updates its estimate using the local observation and the information derived from its neighboring agents. The algorithms are based on the truncated stochastic approximation and their convergence is proved. A simulation example is presented and the computation results are shown to be consistent with theoretical analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号