共查询到20条相似文献,搜索用时 0 毫秒
1.
We present new results for the Stochastic Shortest Path problem when an unlimited number of hops may be used. Nodes and links in the network may be congested or uncongested, and their states change over time. The goal is to adaptively choose the next node to visit such that the expected cost to the destination is minimized. Since the state of a node may change, having the option to revisit a node can improve the expected cost. Therefore, the option to use an unbounded number of hops may be required to achieve the minimum expected cost. We show that when revisits are prohibited, the optimal routing problem is np-hard. We also prove properties about networks for which continual improvement may occur.We study the related routing problem which asks whether it is possible to determine the optimal next node based on the current node and state, when an unlimited number of hops is allowed. We show that as the number of hops increases, this problem may not converge to a solution. 相似文献
2.
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm. 相似文献
3.
A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of path-based clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image data set of human segmentations. 相似文献
4.
In this paper we introduce a novel neural reinforcement learning method. Unlike existing methods, our approach does not need a model of the system and can be trained directly using the measurements of the system. We achieve this by only using one function approximator and approximate the improved policy from this. An experiment using a mobile robot shows that it can be trained using a real system within reasonable time. 相似文献
5.
Spectral clustering and path-based clustering are two recently developed clustering approaches that have delivered impressive results in a number of challenging clustering tasks. However, they are not robust enough against noise and outliers in the data. In this paper, based on M-estimation from robust statistics, we develop a robust path-based spectral clustering method by defining a robust path-based similarity measure for spectral clustering under both unsupervised and semi-supervised settings. Our proposed method is significantly more robust than spectral clustering and path-based clustering. We have performed experiments based on both synthetic and real-world data, comparing our method with some other methods. In particular, color images from the Berkeley segmentation data set and benchmark are used in the image segmentation experiments. Experimental results show that our method consistently outperforms other methods due to its higher robustness. 相似文献
6.
Due to introduction and increased availability of Electronic Health Records (EHRs), disease prediction has recently gained immense research attention and achieved impressive progress. Existing methods are based on RNN-like architectures, which treat every disease equally, and learn the representations from medical knowledge. However, strong structural information among diseases is ignored in these methods. In this paper, we introduce a novel Path-based reasoning model for self-AttentIonal Disease prediction (ProAID), which utilizes medical paths extracted from patient EHR and external medical knowledge bases to augment the latent interaction between diseases and learn highly representative patient embeddings. By explicitly incorporating medical paths, ProAID effectively generates embeddings that capture the hierarchical information of diseases and learn effective representations of a patient based on the historical patient admission sequences in her/his EHRs to allow accurate disease prediction for the next hospital admission. Extensive experiments on public medical datasets show that ProAID achieves better performance than the compared methods, which indicates the effectiveness of the proposed model. 相似文献
8.
Object-oriented languages and rule-based languages offer two distinct and useful programming abstractions. However, previous attempts to integrate data-driven rules into object-oriented languages have typically achieved an uneasy union at best. R++ is a new, closer integration of the rule-based and object-oriented paradigms that extends C++ with a single programming construct, the path-based rule, as a new kind of class member. Path-based rules-data-driven rules that are restricted to following pointers between objects-are like automatic methods that are triggered by changes to the objects they monitor. Path-based rules provide a useful level of abstraction that encourages a more declarative style of programming and are valuable in object-oriented designs as a means of modeling dynamic collections of interdependent objects. Unlike more traditional pattern-matching rules, path-based rules are not at odds with the object-oriented paradigm and offer performance advantages for many natural applications 相似文献
9.
A group of cooperative and homogeneous Q-learning agents can cooperate to learn faster and gain more knowledge. In order to do so, each learner agent must be able to evaluate the expertness and the intelligence level of the other agents, and to assess the knowledge and the information it gets from them. In addition, the learner needs a suitable method to properly combine its own knowledge and what it gains from the other agents according to their relative expertness. In this paper, some expertness measuring criteria are introduced. Also, a new cooperative learning method called weighted strategy sharing (WSS) is introduced. In WSS, based on the amount of its teammate expertness, each agent assigns a weight to their knowledge and utilizes it accordingly. WSS and the expertness criteria are tested on two simulated hunter–prey and object-pushing systems. 相似文献
10.
This paper gives a general coalgebraic account of temporal logics whose semantics involves a notion of computation path. Examples of such logics include the logic CTL* for transition systems and the logic PCTL for probabilistic transition systems. Our path-based temporal logics are interpreted over coalgebras of endofunctors obtained as the composition of a computation type (e.g. non-deterministic or stochastic) with a general transition type. The semantics of such logics relies on the existence of execution maps similar to the trace maps introduced by Jacobs and co-authors as part of the coalgebraic theory of finite traces (Hasuo et al., 2007 [1]). We consider finite execution maps derived from the theory of finite traces, and a new notion of maximal execution map that accounts for maximal, possibly infinite executions. The latter is needed to recover the logics CTL* and PCTL as specific path-based logics. 相似文献
11.
Use of links to enhance page ranking has been widely studied. The underlying assumption is that links convey recommendations.
Although this technique has been used successfully in global web search, it produces poor results for website search, because
the majority of the links in a website are used to organize information and convey no recommendations. By distinguishing these
two kinds of links, respectively for recommendation and information organization, this paper describes a path-based method
for web page ranking. We define the Hierarchical Navigation Path (HNP) as a new resource for improving web search. HNP is
composed of multi-step navigation information in visitors’ website browsing. It provides indications of the content of the
destination page. We first classify the links inside a website. Then, the links for web page organization are exploited to
construct the HNPs for each page. Finally, the PathRank algorithm is described for web page retrieval. The experiments show
that our approach results in significant improvements over existing solutions. 相似文献
12.
By using other agents' experiences and knowledge, a learning agent may learn faster, make fewer mistakes, and create some rules for unseen situations. These benefits would be gained if the learning agent can extract proper rules from the other agents' knowledge for its own requirements. One possible way to do this is to have the learner assign some expertness values (intelligence level values) to the other agents and use their knowledge accordingly. Some criteria to measure the expertness of the reinforcement learning agents are introduced. Also, a new cooperative learning method, called weighted strategy sharing (WSS) is presented. In this method, each agent measures the expertness of its teammates and assigns a weight to their knowledge and learns from them accordingly. The presented methods are tested on two Hunter-Prey systems. We consider that the agents are all learning from each other and compare them with those who cooperate only with the more expert ones. Also, the effect of communication noise, as a source of uncertainty, on the cooperative learning method is studied. Moreover, the Q-table of one of the cooperative agents is changed randomly and its effects on the presented methods are examined. 相似文献
13.
传统谱聚类算法受高斯核尺度参数的影响较大,对噪声点较为敏感,并且不能利用先验信息指导聚类过程。针对以上问题,提出了一种基于路径相似度测量的鲁棒性谱聚类算法(RPB-SC)。该算法将路径聚类与谱聚类算法相结合,通过定义高斯核的邻域加权尺度因子计算相似度,再用路径聚类思想对全局相似度进行调节,同时通过成对限制先验信息辅助聚类搜索。在人工数据集和真实数据集上的实验表明,新提出的算法能有效减弱高斯核尺度参数的影响,增强对噪声点的鲁棒性,提高聚类性能。 相似文献
14.
Machine Learning - Reinforcement learning (RL) is a machine learning technique aiming to learn how to take actions in an environment to maximize some kind of reward. Recent research has shown that... 相似文献
15.
We present a novel algorithm to solve the non-negative single-source shortest path problem on road networks and graphs with low highway dimension. After a quick preprocessing phase, we can compute all distances from a given source in the graph with essentially a linear sweep over all vertices. Because this sweep is independent of the source, we are able to reorder vertices in advance to exploit locality. Moreover, our algorithm takes advantage of features of modern CPU architectures, such as SSE and multiple cores. Compared to Dijkstra’s algorithm, our method needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels. We gain additional speedup when implementing our algorithm on a GPU, where it is up to three orders of magnitude faster than Dijkstra’s algorithm on a high-end CPU. This makes applications based on all-pairs shortest-paths practical for continental-sized road networks. Several algorithms, such as computing the graph diameter, arc flags, or exact reaches, can be greatly accelerated by our method. 相似文献
16.
The capability of multidestination wormhole allows a message to be propagated along any valid path in a wormhole-routed network conforming to the underlying base routing scheme. The multicast on the path-based routing model is highly dependent on the spatial locality of destinations participating in multicasting. In this paper, we propose two proximity grouping schemes for efficient multicast in wormhole-routed mesh networks with multidestination capability by exploiting the spatial locality of the destination set. The first grouping scheme, graph-based proximity grouping, is proposed to group the destinations together with locality to construct several disjoint sub-meshes. This is achieved by modeling the proximity grouping problem to graph partitioning problem. The second one, pattern-based proximity grouping, is proposed by the pattern classification schemes to achieve the goal of the proximity grouping. By simulation results, we show the routing performance gains over the traditional Hamiltonian-path routing scheme. 相似文献
17.
Network congestion has a negative impact on the performance of on-chip networks due to the increased packet latency. Many congestion-aware routing algorithms have been developed to alleviate traffic congestion over the network. In this paper, we propose a congestion-aware routing algorithm based on the Q-learning approach for avoiding congested areas in the network. By using the learning method, local and global congestion information of the network is provided for each switch. This information can be dynamically updated, when a switch receives a packet. However, Q-learning approach suffers from high area overhead in NoCs due to the need for a large routing table in each switch. In order to reduce the area overhead, we also present a clustering approach that decreases the number of routing tables by the factor of 4. Results show that the proposed approach achieves a significant performance improvement over the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms. 相似文献
18.
One of the most influential points in cooperative learning is the type of exchanging information. If the content of exchanging information among agents is rich, cooperation gives rise to better results. To extract proper knowledge of agents during the cooperation process, some expertness measures that assign expertness levels to the other agents are used. In this paper, a new method named Multi-Criteria Expertness based cooperative Q-learning (MCE) is proposed that utilizes all of the expertness measures and attempts to enrich the exchanging information more efficiently. In MCE, all expertness measures are considered simultaneously and collective knowledge is equal to the combination of learned knowledge by each of expertness measures. The experimental results confirm outstanding performance of the proposed method on a sample maze world and a hunter-prey problem. 相似文献
19.
This paper studies a multi-goal Q-learning algorithm of cooperative teams. Member of the cooperative teams is simulated by an agent. In the virtual cooperative team, agents adapt its knowledge according to cooperative principles. The multi-goal Q-learning algorithm is approached to the multiple learning goals. In the virtual team, agents learn what knowledge to adopt and how much to learn (choosing learning radius). The learning radius is interpreted in Section 3.1. Five basic experiments are manipulated proving the validity of the multi-goal Q-learning algorithm. It is found that the learning algorithm causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence learning goals. It is also proved that the learning algorithm is beneficial to the multiple goals. Furthermore, the paper analyzes how sensitive the learning performance is affected by the parameter values of the learning algorithm. 相似文献
20.
Given a loop on a surface, its homotopy class can be specified as a word consisting of letters representing the homotopy group generators. One of the interesting problems is how to compute the shortest word for a given loop. This is an NP-hard problem in general. However, for a closed surface that allows a hyperbolic metric and is equipped with a canonical set of fundamental group generators, the shortest word problem can be reduced to finding the shortest loop that is homotopic to the given loop, which can be solved efficiently. In this paper, we propose an efficient algorithm to compute the shortest words for loops given on triangulated surface meshes. The design of this algorithm is inspired and guided by the work of Dehn and Birman–Series. In support of the shortest word algorithm, we also propose efficient algorithms to compute shortest paths and shortest loops under hyperbolic metrics using a novel technique, called transient embedding, to work with the universal covering space. In addition, we employ several techniques to relieve the numerical errors. Experimental results are given to demonstrate the performance in practice. 相似文献
|