首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Luedtke  Alex  Kaufmann  Emilie  Chambaz  Antoine 《Machine Learning》2019,108(11):1919-1949
Machine Learning - We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time...  相似文献   

4.
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.  相似文献   

5.
Investigates the multiarmed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be beta-distributed. Every time a bandit is selected, its beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long-term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations, which occur when a bandit has been played N times: the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large.  相似文献   

6.
Knowledge engineering is the term given to the process of developing expert systems and knowledge engineers are the people who acquire the requisite knowledge from experts and structure that knowledge into a useable computer program. As knowledge engineering becomes a more accepted technology, there is increasing concern about attendant social costs, such as job displacement or possible exploitation of experts. This paper reports on our efforts to explore this latter issue by scrutinizing how knowledge engineers think about the domain expert and the role that person plays in system development. To accomplish this aim, we asked several samples of novice engineers to write story completions to a preamble that describes a knowledge engineer encountering a reluctant expert who may be fearing job loss if the system is implemented. The resulting accounts were content-analysed for insights as to how novice system builders think about experts. The results indicate that experts are conceived more as a tool to be used rather than a person to be respected.Preparation of this chapter was supported by National Science Foundation Grant BNS 87-21882 to Marianne LaFrance.  相似文献   

7.
Abstract : The excessive time devoted to the development, testing and maintenance of expert systems (ESs) needs to be reduced radically. What is needed is general-purpose software that can interlace the extracting, structuring, testing, and encoding of knowledge gained from debriefing the expert in any field. LAPS (Logic Aids for Problem Solving) is just such a program and has been successfully used to produce an ES in the domain of submarine diving officers. This automatically-encoded ES was produced by each expert entering his responses to the LAPS series of strategic-induction, AI-jargon-free queries. Currently, the LAPS prototype has four core sessions or functions: (1) the initial or sample-solution session; (2) the dechunking or hidden-knowledge session; (3) the alternatives or completeness-testing session; and (4) the automatic rule-production session. After any session the interviewee can use the fourth function of LAPS to produce frame-laden rules (in M.I, for now; later, in CLIPS). These rules can of course be used to carry out consultation sessions, though fewer of these are needed. LAPS has itself been translated from M.I to C to provide a very fast program on an almost universally available platform — the microcomputer. In accord with the overall technical approach to this project, other successful offline rigorous interviewing functions are being added to LAPS. Other code additions pertain to (a) graphical and other user-friendly reasoning-aid enhancements; (b) a series of simpler follow-up queries; and (c) the fleshing out of many stubs, or functions having intermediate results but as yet no processing.  相似文献   

8.
We prove that any N-superconcentrator of indegree two has at least 4N - o(N) nodes. From this lower bounds of 4N - o(N) follow on the number of additions required to compute the Discrete Fourier Transform of prime order and cyclic convolution. Using small examples we illustrate how small superconcentrators can suggest fast algorithms for instances of these problems.For superconcentrators with no degree restrictions we prove a lower bound of 5N - o(N) edges. Also, we give a recursive construction with 3Nlog2N edges that improves on the best bounds previously known for values of N up to several thousand.  相似文献   

9.
The paper shows how a system of important concepts and approaches proposed by system thinkers (such as philosophers, mathematicians, engineers, and computing scientists) and described in international (ISO) standards has been used to understand and specify various kinds of business and IT systems, and to base IT work on a solid foundation that has been used for communicating with non-IT experts, thus establishing successful and meaningful interactions between business and IT experts and organizations. These common elegant concepts — such as abstraction, system, structure, relationship, composition, pattern, name in context, etc. — come from exact philosophy and mathematics. They have been stable for centuries, and have been successfully used in theory, in industrial practice (including international standards), and in teaching of business and IT modeling. The essential stable semantics of these fundamental concepts and of systems specified using them ought to be clearly separated from the accidental (often IT-industry-imposed excessively complex and rapidly changing) details. The paper includes two case studies of applying the approach – with demonstrable success – in a large financial institution and in a leading publishing company.  相似文献   

10.
Recently, it has been shown that the regret of the Follow the Regularized Leader (FTRL) algorithm for online linear optimization can be bounded by the total variation of the cost vectors rather than the number of rounds. In this paper, we extend this result to general online convex optimization. In particular, this resolves an open problem that has been posed in a number of recent papers. We first analyze the limitations of the FTRL algorithm as proposed by Hazan and Kale (in Machine Learning 80(2–3), 165–188, 2010) when applied to online convex optimization, and extend the definition of variation to a gradual variation which is shown to be a lower bound of the total variation. We then present two novel algorithms that bound the regret by the gradual variation of cost functions. Unlike previous approaches that maintain a single sequence of solutions, the proposed algorithms maintain two sequences of solutions that make it possible to achieve a variation-based regret bound for online convex optimization. To establish the main results, we discuss a lower bound for FTRL that maintains only one sequence of solutions, and a necessary condition on smoothness of the cost functions for obtaining a gradual variation bound. We extend the main results three-fold: (i) we present a general method to obtain a gradual variation bound measured by general norm; (ii) we extend algorithms to a class of online non-smooth optimization with gradual variation bound; and (iii) we develop a deterministic algorithm for online bandit optimization in multipoint bandit setting.  相似文献   

11.
12.
An important aspect associated with the solution of any mathematical programme; is a sensitivity analysis applied to the various parameters of the programme. For the case of linear programmes, the notion of sensitivity analysis is fully developed. However, for non-linear programmes there is a scarcity of useful results. It is the purpose of this paper to establish lower bounds on the solution of a convex programme when the parameters of the programme are allowed to vary. Such bounds may be useful, for example, in establishing the minimal effects of inflation on optimal cost estimations.  相似文献   

13.
14.
In this work, we study the impact of dynamically changing link capacities on the delay bounds of LIS (Longest-In-System) and SIS (Shortest-In-System) protocols on specific networks (that can be modelled as Directed Acyclic Graphs (DAGs)) and stability bounds of greedy contention–resolution protocols running on arbitrary networks under the Adversarial Queueing Theory. Especially, we consider the model of dynamic capacities  , where each link capacity may take on integer values from [1,C][1,C] with C>1C>1, under a (w,ρ)(w,ρ)-adversary. We show that the packet delay on DAGs for LIS is upper bounded by O(iwρC)O(iwρC) and lower bounded by Ω(iwρC)Ω(iwρC) where i   is the level of a node in a DAG (the length of the longest path leading to node vv when nodes are ordered by the topological order induced by the graph). In a similar way, we show that the performance of SIS on DAGs is lower bounded by Ω(iwρC)Ω(iwρC), while the existence of a polynomial upper bound for packet delay on DAGs when SIS is used for contention–resolution remains an open problem. We prove that every queueing network running a greedy contention–resolution protocol is stable for a rate not exceeding a particular stability threshold, depending on C and the length of the longest path in the network.  相似文献   

15.
16.
This paper uses theory and experiment to help explain why state assignment algorithms which use two-level-based cost measures often give good multi-level logic implementations. First, we develop theorems that give conditions under which an input encoding that results in multi-cube functions in the minimized Boolean network can be re-encoded to change the multi-cube functions into smaller functions to produce a smaller network. Second, we measure the properties of some typical finite-state machines to determine how well they fit the requirements of the theorems. The good fit between the requirements of the theorems and the properties of typical state machines helps explain why state assignment algorithms designed for two-level-logic implementations are relatively successful in designing state assignments for multi-level logic.  相似文献   

17.
A robustification procedure for LQ state feedback design is presented. Such a procedure consists of choosing the state and input weighting matrices according to the kind of uncertainties on the system. Both structured and norm-bounded additive uncertainties are addressed, and upper bounds for the uncertainties that do not destabilize the closed-loop system are presented. Connections with the quadratic stabilizability problem are established  相似文献   

18.
Ellul, Krawetz, Shallit and Wang prove an exponential lower bound on the size of any context-free grammar generating the language of all permutations over some alphabet. We generalize their method and obtain exponential lower bounds for many other languages, among them the set of all squares of given length, and the set of all words containing each symbol at most twice.  相似文献   

19.
In this paper, we analyze the complexity of a zero-test for expressions built from formal power series solutions of first order differential equations with non-degenerate initial conditions. We will prove a doubly exponential complexity bound. This bound establishes a power series analogue for “witness conjectures”.  相似文献   

20.
This paper considers nonhomogeneous M(t)/M(t)1 queues which can model systems such as communications networks. For such systems, bounds on moment-generating functions and on the tail distribution of the queue process are obtained. These bounds are useful for characterizing the quality of service a system can provide to its users. An approach utilizing the theory of differential equations is adapted. The bounds given in this paper are tighter than those previously available. In fact, the bounds can be made arbitrarily tight given sufficient computational effort  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号