首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Multi-armed bandits with switching penalties   总被引:2,自引:0,他引:2  
The multi-armed bandit problem with switching penalties (switching cost and switching delays) is investigated. It is shown that under an optimal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index, the well-known “Gittins index” or a “switching index” that is defined for switching cost and switching delays. An algorithm for the computation of the “switching index” is presented. Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above-mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed bandits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal policies is sufficient to determine an optimal allocation strategy. In general, the determination of optimal allocation policies remains a difficult and challenging task  相似文献   

2.
This paper examines the problem of adaptive beam scheduling to minimise target tracking error with a phased array radar. It is shown that this can be posed in a framework that is similar to a particular type of dynamic programming problem known as the restless bandit problem. We will show that when the problem is put in this framework it has an indexable solution under certain circumstances.  相似文献   

3.
4.
5.
6.
We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor).  相似文献   

7.
Investigates the multiarmed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be beta-distributed. Every time a bandit is selected, its beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long-term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations, which occur when a bandit has been played N times: the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large.  相似文献   

8.
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.  相似文献   

9.
We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.  相似文献   

10.
Luedtke  Alex  Kaufmann  Emilie  Chambaz  Antoine 《Machine Learning》2019,108(11):1919-1949
Machine Learning - We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time...  相似文献   

11.
We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability p c per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among n I randomly chosen bandits), and (3) Observe (social learning for a good bandit). Each agent has two parameters (c, p obs ) to specify the decision: (i) c, the threshold value for Exploit, and (ii) p obs , the probability for Observe in learning. The parameters (c, p obs ) are uniformly distributed. We determine the optimal strategies for the player using complete knowledge about the rMAB. We show whether or not social or asocial learning is more optimal in the (p c , n I ) space and define the swarm intelligence effect. We conduct a laboratory experiment (67 subjects) and observe the swarm intelligence effect only if (p c , n I ) are chosen so that social learning is far more optimal than asocial learning.  相似文献   

12.
Constructing and maintaining topic-specific Web indexes is modeled by a restless-bandits generalization and resolved by a reinforcement-learning algorithm. The authors outline the potential role of topic-specific robots in distributed search engine design, and they model the complex problem of automatically constructing and maintaining topic-specific Web indexes. Experimental results establish the viability of a topic-specific Web robot design based on the restless bandit model. The results indicate that the proposed algorithm is a good foundation on which to build a complete solution  相似文献   

13.
14.
Patterns often occur as homogeneous groups or fields generated by the same source. In multisource recognition problems, such isogeny induces statistical dependencies between patterns (termed style context). We model these dependencies by second-order statistics and formulate the optimal classifier for normally distributed styles. We show that model parameters estimated only from pairs of classes suffice to train classifiers for any test field length. Although computationally expensive, the style-conscious classifier reduces the field error rate by up to 20 percent on quadruples of handwritten digits from standard NIST data sets.  相似文献   

15.
Twitter is one of the most popular social media platforms for online users to create and share information. Tweets are short, informal, and large-scale, which makes it difficult for online users to find reliable and useful information, arising the problem of Twitter summarization. On the one hand, tweets are short and highly unstructured, which makes traditional document summarization methods difficult to handle Twitter data. On the other hand, Twitter provides rich social-temporal context beyond texts, bringing about new opportunities. In this paper, we investigate how to exploit social-temporal context for Twitter summarization. In particular, we provide a methodology to model temporal context globally and locally, and propose a novel unsupervised summarization framework with social-temporal context for Twitter data. To assess the proposed framework, we manually label a real-world Twitter dataset. Experimental results from the dataset demonstrate the importance of social-temporal context in Twitter summarization.  相似文献   

16.
Future pervasive computing applications are envisioned to adapt the applications’ behaviors by utilizing various contexts of an environment and its users. Such context information may often be ambiguous and also heterogeneous, which make the delivery of unambiguous context information to real applications extremely challenging. Thus, a significant challenge facing the development of realistic and deployable context-aware services for pervasive computing applications is the ability to deal with these ambiguous contexts. In this paper, we propose a resource optimized quality assured context mediation framework based on efficient context-aware data fusion and semantic-based context delivery. In this framework, contexts are first fused by an active fusion technique based on Dynamic Bayesian Networks and ontology, and further mediated using a composable ontological rule-based model with the involvement of users or application developers. The fused context data are then organized into an ontology-based semantic network together with the associated ontologies in order to facilitate efficient context delivery. Experimental results using SunSPOT and other sensors demonstrate the promise of this approach.  相似文献   

17.
The context unification problem is a generalization of standard term unification. It consists of finding a unifier for a set of term equations containing first-order variables and context variables. In this paper we analyze the special case of context unification where the use of at most one context variable is allowed and show that it is in NP. The motivation for investigating this subcase of context unification is interprocedural program analysis for programs described using arbitrary terms, generalizing the case where terms were restricted to using unary function symbols. Our results imply that the redundancy problem is in coNP, and that the finite redundancy property holds in this case. We also exhibit particular cases where one context unification is polynomial.  相似文献   

18.
19.
We discuss the implementation of a bounded context switching algorithm in the Spin model checker. The algorithm allows us to find counter-examples that are often simpler to understand, and that may be more likely to occur in practice. We discuss extensions of the algorithm that allow us to use this new algorithm in combination with most other search modes supported in Spin, including partial order reduction and bitstate hashing. We show that, other than often assumed, the enforcement of a bounded context switching discipline does not decrease but increases the complexity of the model checking procedure. We discuss the performance of the algorithm on a range of applications.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号